Re: Performance about api between http api and java api?

2014-10-15 Thread David Pilato
What do you have in updateScript?

Sounds like you were not using script with HTTP REST requests.


David

> Le 16 oct. 2014 à 04:28, Charles Chou  a écrit :
> 
> I want to partial update doc,but i also want a good performace,so I tried 2 
> way to test ,one is by http rest api,the other is by java api version0.90; 
> java_version:1.6.20 
> es_version:0.90.1 
> 
> HTTP REST: 
> http://127.0.0.1:9200/_bulk
> 
> {"update":{"_index":"mobile","_type":"mobile_property_mid","_id":"15141307986"}}
>  /n 
> {"doc":{"consume_vol_tot":"0.0","last_login_date":"2012-06-27","login_days_3month":"0","log_avg_interval_half_year":"0.0","consume_days_3month":"0"}}
>  /n 
> {"update":{"_index":"mobile","_type":"mobile_property_mid","_id":"15141308091"}}
>  /n 
> {"doc":{"consume_vol_tot":"0.0","last_login_date":"2012-12-26","login_days_3month":"0","log_avg_interval_half_year":"0.0","consume_days_3month":"0"}}
>  /n 
> 
> JAVA API: 
> 
> bulkRequest.add(client.prepareUpdate(indexName, 
> indexTypeName,key).setUpsertRequest(builder.endObject()).setScript(updateScript.toString()).setFields("_source"));
>  
> BulkResponse bulkResponse = bulkRequest.execute().actionGet(); 
> 
> Here is the test result: 
> http rest: 
> bulk nums:8000,cost time:1306 
> bulk nums:8000,cost time:1348 
> bulk nums:8000,cost time:1320 
> bulk nums:8000,cost time:1277 
> bulk nums:8000,cost time:1214 
> bulk nums:8000,cost time:1336 
> bulk nums:8000,cost time:1338 
> bulk nums:8000,cost time:1399 
> bulk nums:8000,cost time:1231 
> bulk nums:8000,cost time:1280 
> bulk nums:8000,cost time:1482 
> bulk nums:8000,cost time:1248 
> bulk nums:8000,cost time:1394 
> 
> java api: 
> bulkResponse items8000,cost time:5252 
> bulkResponse items8000,cost time:5171 
> bulkResponse items8000,cost time:5077 
> bulkResponse items8000,cost time:5230 
> bulkResponse items8000,cost time:5469 
> bulkResponse items8000,cost time:5898 
> bulkResponse items8000,cost time:5443 
> bulkResponse items8000,cost time:5579 
> bulkResponse items8000,cost time:5026 
> bulkResponse items8000,cost time:5279 
> bulkResponse items8000,cost time:5851 
> bulkResponse items8000,cost time:5708 
> bulkResponse items8000,cost time:5115 
> 
> So why is java api way so slower than http? 
> And How can I slove it?I read l a lot doc,but can't found the answer.
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/0d152ecb-d141-483b-a0ef-48b7f6bcc11a%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4A44464D-AB1C-4385-ADDC-8BC2DC5B49BF%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Custom date format in elasticsearch mapping

2014-10-15 Thread Roopendra Vishwakarma
Resolved:

Corrected version date format is :

  
"postDate": {
"type": "date",
"format": "E MMM d H:m:s z Y"
  },



On Tuesday, 14 October 2014 17:53:30 UTC+5:30, Roopendra Vishwakarma wrote:
>
> I am trying to index data with date format *Tue May 14 17:06:01 PDT 2013*. 
> As described in Elasticsearch Date Format 
> 
>  
> document I need to use custom date format. I refer to DateTimeFormat 
> 
>  
> document and respective format is *E M d H:m:s z Y*
>
> I am able to create mapping but when I am trying to index data its giving 
> me error. 
>
> *Mapping:-*
> 
>
> {
>   "tweet": {
> "properties": {
>   "user": {
> "type": "string",
> "index": "not_analyzed"
>   },
>   "message": {
> "type": "string",
> "null_value": "na"
>   },
>   "postDate": {
> "type": "date",
> "format": "E M d H:m:s z Y"
>   },
>   "priority": {
> "type": "integer"
>   },
>   "rank": {
> "type": "float"
>   }
> }
>   }
> }
>
>
>
> *Index Document:-*
> -
> curl -XPUT 'http://localhost:9200/tweets/tweet/1' -d '{
> "user" : "kimchy",
> "message" : "This is a tweet!",
> "postDate" : "Tue May 14 17:06:01 PDT 2013",
> "priority" : 4,
> "rank" : 12.3
> }'
>
>
>
> *Error:-*
> ---
>
> {"error":"MapperParsingException[failed to parse [postDate]]; 
> nested: MapperParsingException[failed to parse date field [Tue May 14 
> 17:06:01 PDT 2013],
> tried both date format [E M d H:m:s z Y], and timestamp number with 
> locale []];
> nested: IllegalArgumentException[Invalid format: \"Tue May 14 17:06:01 
> PDT 2013\"
> is malformed at \"May 14 17:06:01 PDT 2013\"]; ","status":400}
>
>
>
>
> Any Suggestion? Which date format I should use here?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c361c787-c75d-4e35-a772-c8b22bb21ae0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: nested type mapping error

2014-10-15 Thread shekhar chauhan
Is there anyone to help me about the above problem?...please help me..

Thanks in advance,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/098e9176-fd1c-473c-9ca3-9d62de679ee9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Many indices.fielddata.breaker errors in logs and cluster slow...

2014-10-15 Thread Robin Clarke
I'm still having this problem... has anybody got an idea what the cause / 
solution might be?

Thank you! :)

On Tuesday, 7 October 2014 14:29:22 UTC+2, Robin Clarke wrote:
>
> I'm getting a lot of these errors in my Elasticsearch logs, and am also 
> experiencing a lot of slowness on the cluster... 
>
> New used memory 7670582710 [7.1gb] from field [machineName.raw] would be 
> larger than configured breaker: 7666532352 [7.1gb], breaking
> ...
> New used memory 7674188379 [7.1gb] from field [@timestamp] would be larger 
> than configured breaker: 7666532352 [7.1gb], breaking
>
> I've looked at the documentation about memory limits 
> ,
>  
> but I don't really understand what is causing this, and more importantly 
> how to avoid this...
>
> My cluster is 10 machines @ 32GB memory and 8 CPU cores each.  I have one 
> ES node on each machine with 12GB memory allocated.  On each machine there 
> is additionally one logstash agent (1GB) and one redis server (2GB).
> I have 10 indexes open with one replication per shard (so each node should 
> only be holding 22 shards (two more for kibana-int)).
>
> I'm using Elasticsearch 1.3.3, Logstash 1.4.2
>
> Thanks for your help!
>
> -Robin-
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5935b1f4-809c-46ac-ba03-f1df33a8737e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


local gateway recovery

2014-10-15 Thread Ashish Mishra
>From 
>http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway-local.html#_dangling_indices

When a node joins the cluster, any shards/indices stored in its local data/ 
directory which do not already exist in the cluster will be imported into 
the cluster by default.


Does this also include unassigned shards?  
Suppose shard allocation is disabled, and a node gets partitioned from the 
cluster (so its shards become unassigned).  When it rejoins the cluster, 
will it pick up those shards again, or will they remain unassigned.

Does this behavior change for primary vs. replica shards?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b669b59-974f-4f6c-a08a-f246c9bb7603%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cluster discovery on Amazon EC2 problem - need urgent help

2014-10-15 Thread Zoran Jeremic
Hi Norberto,

Thank you so much for great advices.

As I'm starting tomorrow with real users, I'll keep for now the 
configuration I have at the moment (all nodes have master=true, data=true). 
I hope it will work.
For the zone availability, I had to go with everything in one zone. Main 
reason was the problem to connect ELB controlled application instances with 
backend instances (MySQL, MongoDB and Elasticsearch). It's not possible to 
add rule to the backend instances having port+elb security group if 
instances are in different zones, so I had to keep everything in one zone. 
The other reason was as you mentioned price.

Thanks,
Zoran

On Sunday, 12 October 2014 16:50:34 UTC-7, Norberto Meijome wrote:
>
> Inline below ...
>
> On Sun, Oct 12, 2014 at 5:28 AM, Zoran Jeremic  > wrote:
>
> Hi Norberto,
>
> Thank you for your advices. This is really helpful, since I have never 
> used elasticsearch in the  cluster before, and never had went live with a 
> number of users. My previous experience was on ES single node and very 
> small number of users, so I'm still concern how this will work. The main 
> problem is that I don't know how many users I could expect, so I should be 
> ready to expand the cluster if it's necessary.
>
>
> Sure - that's one of the nice things about ES , and AWS - you can keep 
> tuning as you go...
>  
>
>
> So far, I created a cluster of 3 m3.large instances having 3 indexes (5 
> shards and 2 replicas). 
> I couldn't manage to connect it with ec2 autodiscovery. The only option 
> that worked for me is having one node that will be referred from other 
> nodes as unicast host. I think it might work if I have one node that will 
> always been on.
>
>
> build for failure.
>  
>
>
> You were right about having a keys in config. I didn't need it. Can I also 
> remove this from my java application? I guess it could be removed if launch 
> configuration contains IAM instance profile.
>
>
> I don't know why your app needs AWS credentials, so I cannot really answer 
> that - but, in general, if the AWS library you use supports IAM profiles 
> then you should be able to remove hardcoded creds. YMMV.
>  
>
> I also decreased zen discovery timeout to 3s.
>
>  - your master config shows master false... You want the master with 
> master =true and data = false... Obviously you want  more than one master ( 
> if you don't have too much load start with all nodes available as data and 
> master, then separate functionality as needed). Don't forget to set the 
> minimum expected # nodes to n-master/2+1 to prevent split brain scenarios.
> I've set all 3 nodes as master and data, but I'm not sure that I 
> understand what is the advantage of having nodes that are not master nodes. 
> I know these nodes will not be elected as master, but what is the idea for 
> that, and what would I get if I set master not to have data on it? Would it 
> increase performance?
>
>
> TL;DR - scalability, performance : There are certain operations which need 
> to be performed by master node in a timely . If your node is already too 
> busy handling searches, 'master operations' will suffer( and your whole 
> cluster will slow down ). 
>
> It is much cheaper to run separate, smaller master (and load balancer ) 
> nodes , separate from your data nodes, than to scale up + out your data 
> nodes to handle all the operations. 
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html
>  
>
>
>
>  It should work pretty well with ec2 auto discovery - unicast is a good 
> starting point but unless you are statically assigning them via cloud 
> formation (or manually?), it may not be worth the trouble (and it stops you 
> from dynamically scaling your cluster)
> How will ES node behave in Amazon auto-scale and could it be used like I'm 
> using auto scaling to meet high load? If I already have set 5 shards and 2 
> replicas on previous 3 nodes, will these shards and replicas be moved to 
> new nodes, and how long it might take for this? If this is what is going 
> on, I guess it's not good idea to auto-scale new ES node when I have a high 
> intensity of ES use, and then to turn it off later.
>
>
> yeah, that's definitely not something that will always work with 
> autoscaling.
> -  You can use autoscaling to ensure the minimum # of nodes is defined 
> (ie, automatic rebuild of killed node).
> - if you know you have, say, 8 hours with 50% more traffic, you can 
> increase the number of nodes some time before peak, increase # of replicas 
>  after the peak, reduce replica # and remove nodes... Not autoscaling 
> per se, but building from the get go without hardcoded hostnames will help 
> you do things like this. 
>
> btw, you also want to play with routing awareness, so your replicas are 
> distributed across different AZ.
>
> AND beware of cost of inter-AZ traffic :) ( yes, it conflicts with the 'AZ 
> routing awareness') 
>  
>
> Sorry if these questions are too naive.
>
>
> :) not at al

Doing an AND Query within a span_near, e.g., foo WITHIN 1 (biz AND buz)

2014-10-15 Thread Michael Sander


Hi All,

I posted this stackoverflow question (copied below) and would be grateful 
for any input.

http://stackoverflow.com/questions/26395923/elasticsearch-span-near-with-a-large-slop-within-a-span-near-with-a-small-slop


*Original SO Question:*

How would you build a json Elasticsearch query that looks like this (in 
english):

foo WITHIN 1 (biz AND buz)

I expect the query to return documents where both biz and buz exist, and 
also, where the word foois adjacent to one of those words.
--

*My Original Solution*

One way would be to put use span_near. Then for its first clause, use the 
foo term, and for its second clause, use a boolean AND. However, in 
Elasticsearch, you cannot put booleans within spans, you can only put other 
spans within spans. Accordingly, you must simulate the boolean AND with 
another span_near with a large slop.

The solution I tried is:

{'span_near': {'clauses': [{'span_term': {'text': 'foo'}},
   {'span_near': {'clauses': [{'span_term': {'text': 
'biz'}},
  {'span_term': {'text': 
'buz'}}],
  'in_order': False,
  'slop': 100}}],
   'in_order': False,
   'slop': 0}}

Notice that we simulate AND with a slop of 100 (effectively infinite 
for my domain). Unfortunately, the above query *does not work*. Instead, 
the above query returns all documents with the words foo, biz, and buz, and 
where the word foo occurs between the occurrences of biz and buz.
--

*Another Solution, but Onerous*

I think one solution would be to convert the original query into

(biz AND buz) AND ((foo WITHIN 1 biz) OR (foo WITHIN 1 buz))

This seems very difficult to implement as one would need to parse for AND 
keywords 
within a span_near operation and do the necessary conversions. Any other 
ideas?

*Note*: I am using Elasticsearch, but this question would apply equally to 
Lucene using their Java primitives.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d2f12ba6-9596-4b87-a57b-219b83ec83f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: could not retrieve data randomly

2014-10-15 Thread xzer LR
Now I finally confirmed that it is a flappy item issue which had been 
mentioned 
before: https://groups.google.com/forum/#!topic/elasticsearch/a-LELkkLRoE

but it apears in version 1.2.3

Is there any advise about this issue? I am also interest in that whether 
there is an internal mechanism in elasticsearch to avoid this situation?

I AM ALSO FRUSTRATED BY NO ANSWER HERE. 

在 2014年10月14日星期二UTC+9下午12时43分17秒,xzer LR写道:
>
> additional information:
>
> The replica factor of my index is 1, which means there are only 2 copies 
> in the cluster.
>
> Since I can get the data every 2 times, I guess that there is one missing 
> copy, but how can I confirm it?
>
> The cluster health api resports green.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dbe9c2f0-7571-437d-ad6c-4dff06ea3114%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Performance about api between http api and java api?

2014-10-15 Thread Charles Chou
I want to partial update doc,but i also want a good performace,so I tried 2 
way to test ,one is by http rest api,the other is by java api version0.90; 
java_version:1.6.20 
es_version:0.90.1 

HTTP REST: 
http://127.0.0.1:9200/_bulk

{"update":{"_index":"mobile","_type":"mobile_property_mid","_id":"15141307986"}}
 
/n 
{"doc":{"consume_vol_tot":"0.0","last_login_date":"2012-06-27","login_days_3month":"0","log_avg_interval_half_year":"0.0","consume_days_3month":"0"}}
 
/n 
{"update":{"_index":"mobile","_type":"mobile_property_mid","_id":"15141308091"}}
 
/n 
{"doc":{"consume_vol_tot":"0.0","last_login_date":"2012-12-26","login_days_3month":"0","log_avg_interval_half_year":"0.0","consume_days_3month":"0"}}
 
/n 

JAVA API: 

bulkRequest.add(client.prepareUpdate(indexName, 
indexTypeName,key).setUpsertRequest(builder.endObject()).setScript(updateScript.toString()).setFields("_source"));
 
BulkResponse bulkResponse = bulkRequest.execute().actionGet(); 

Here is the test result: 
http rest: 
bulk nums:8000,cost time:1306 
bulk nums:8000,cost time:1348 
bulk nums:8000,cost time:1320 
bulk nums:8000,cost time:1277 
bulk nums:8000,cost time:1214 
bulk nums:8000,cost time:1336 
bulk nums:8000,cost time:1338 
bulk nums:8000,cost time:1399 
bulk nums:8000,cost time:1231 
bulk nums:8000,cost time:1280 
bulk nums:8000,cost time:1482 
bulk nums:8000,cost time:1248 
bulk nums:8000,cost time:1394 

java api: 
bulkResponse items8000,cost time:5252 
bulkResponse items8000,cost time:5171 
bulkResponse items8000,cost time:5077 
bulkResponse items8000,cost time:5230 
bulkResponse items8000,cost time:5469 
bulkResponse items8000,cost time:5898 
bulkResponse items8000,cost time:5443 
bulkResponse items8000,cost time:5579 
bulkResponse items8000,cost time:5026 
bulkResponse items8000,cost time:5279 
bulkResponse items8000,cost time:5851 
bulkResponse items8000,cost time:5708 
bulkResponse items8000,cost time:5115 

So why is java api way so slower than http? 
And How can I slove it?I read l a lot doc,but can't found the answer.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0d152ecb-d141-483b-a0ef-48b7f6bcc11a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Performance about api between http api and java api?

2014-10-15 Thread zouxcs
I want to partial update doc,but i also want a good performace,so I tried 2
way to test ,one is by http rest api,the other is by java api version0.90;
java_version:1.6.20
es_version:0.90.1

HTTP REST:
http://127.0.0.1:9200/_bulk

{"update":{"_index":"mobile","_type":"mobile_property_mid","_id":"15141307986"}}
/n
{"doc":{"consume_vol_tot":"0.0","last_login_date":"2012-06-27","login_days_3month":"0","log_avg_interval_half_year":"0.0","consume_days_3month":"0"}}
/n
{"update":{"_index":"mobile","_type":"mobile_property_mid","_id":"15141308091"}}
/n
{"doc":{"consume_vol_tot":"0.0","last_login_date":"2012-12-26","login_days_3month":"0","log_avg_interval_half_year":"0.0","consume_days_3month":"0"}}
/n 

JAVA API:

bulkRequest.add(client.prepareUpdate(indexName,
indexTypeName,key).setUpsertRequest(builder.endObject()).setScript(updateScript.toString()).setFields("_source"));
BulkResponse bulkResponse = bulkRequest.execute().actionGet();

Here is the test result:
http rest:
bulk nums:8000,cost time:1306
bulk nums:8000,cost time:1348
bulk nums:8000,cost time:1320
bulk nums:8000,cost time:1277
bulk nums:8000,cost time:1214
bulk nums:8000,cost time:1336
bulk nums:8000,cost time:1338
bulk nums:8000,cost time:1399
bulk nums:8000,cost time:1231
bulk nums:8000,cost time:1280
bulk nums:8000,cost time:1482
bulk nums:8000,cost time:1248
bulk nums:8000,cost time:1394

java api:
bulkResponse items8000,cost time:5252
bulkResponse items8000,cost time:5171
bulkResponse items8000,cost time:5077
bulkResponse items8000,cost time:5230
bulkResponse items8000,cost time:5469
bulkResponse items8000,cost time:5898
bulkResponse items8000,cost time:5443
bulkResponse items8000,cost time:5579
bulkResponse items8000,cost time:5026
bulkResponse items8000,cost time:5279
bulkResponse items8000,cost time:5851
bulkResponse items8000,cost time:5708
bulkResponse items8000,cost time:5115

So why is java api way so slower than http?
And How can I slove it?I read l a lot doc,but can't found the answer.



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Performance-about-api-between-http-api-and-java-api-tp4064935.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1413426451814-4064935.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question about completion in ES

2014-10-15 Thread Adrien Grand
Hi Franck,

There are two issues here. First the completion suggester is not based on
n-grams but on a FST, so you should not use a n-gram tokenizer. Another
thing to know is that the completion suggester is a prefix suggester, not
an infix suggester, meaning that it can only suggest from the beginning of
the input, yet you don't have any input that starts with "Munich". The only
way to make it recommend hotels based in Munich would be to add more inputs
to your documents that start with "Munich".

On Wed, Oct 15, 2014 at 9:19 PM, Franck B  wrote:

> You're right, it works by changing the analyzer.
>
> Now evolving issues, I customize the analyzer
>
> PUT hotels
>>
>> {
>>
>>"settings": {
>>
>>   "analysis": {
>>
>>  "analyzer": {
>>
>> "labelstreet_analyzer": {
>>
>>"type": "custom",
>>
>>"tokenizer": "nGram_tokenizer",
>>
>>"filter": [
>>
>>   "lowercase",
>>
>>   "elision",
>>
>>   "stopwords"
>>
>>]
>>
>> }},
>>
>> "tokenizer": {
>>
>>"nGram_tokenizer": {
>>
>>   "type": "nGram",
>>
>>   "min_gram": "2",
>>
>>   "max_gram": "4",
>>
>>   "token_chars": [
>>
>>  "letter",
>>
>>  "digit"
>>
>>   ]
>>
>>}
>>
>> },
>>
>> "filter": {
>>
>>"elision": {
>>
>>   "type": "elision",
>>
>>   "article": [
>>
>>  "l",
>>
>>  "m",
>>
>>  "t",
>>
>>  "qu",
>>
>>  "n",
>>
>>  "s",
>>
>>  "j",
>>
>>  "d"
>>
>>   ]
>>
>>},
>>
>>"stopwords": {
>>
>>   "type": "stop",
>>
>>   "stopwords": [
>>
>>  "_french_"
>>
>>   ],
>>
>>   "ignore_case": true
>>
>>}
>>
>> }
>>
>>  }
>>
>>},
>>
>>"mappings": {
>>
>>   "hotel": {
>>
>>  "properties": {
>>
>> "name": {
>>
>>"type": "string",
>>
>>"analyzer": "labelstreet_analyzer"
>>
>> },
>>
>> "city": {
>>
>>"type": "string",
>>
>>"analyzer": "labelstreet_analyzer"
>>
>> },
>>
>> "name_suggest": {
>>
>>"type": "completion",
>>
>>"analyzer": "labelstreet_analyzer"
>>
>> }
>>
>>  }
>>
>>   }
>>
>>}
>>
>> }
>>
>>
>
>
> PUT /hotels/hotel/1
>
> {
>
>"name": "Mercure Hotel Munich",
>
>"city": "Munich",
>
>"name_suggest": {
>
>   "input": [
>
>  "Mercure Hotel Munich",
>
>  "Mercure Munich"
>
>   ]
>
>}
>
> }
>
>
>> PUT hotels/hotel/2
>
> {
>
>"name": "Hotel Monaco",
>
>"city": "Munich",
>
>"name_suggest": {
>
>   "input": [
>
>  "Monaco Munich",
>
>  "Hotel Monaco"
>
>   ]
>
>}
>
> }
>
>
>> PUT /hotels/hotel/3
>
> {
>
>"name": "Courtyard by Marriot Munich City",
>
>"city": "Munich",
>
>"name_suggest": {
>
>   "input": [
>
>  "Courtyard by Marriot Munich City",
>
>  "Marriot Munich City"
>
>   ]
>
>}
>
> }
>
>
> If i check the analyser, I get the result I want. That is, if I type
> "mun", the tokenizer gets me a right token
>
> GET /hotels/_analyze?analyzer=labelstreet_analyzer&text=Munich
>>
>
> Yet when I execute this query, the result is null
>
> POST /hotels/_suggest
>>
>> {
>>
>>   "hotels" : {
>>
>> "text" : "mun",
>>
>> "completion" : {
>>
>>   "field" : "name_suggest"
>>
>> }
>>
>>   }
>>
>> }
>>
>>
> Why the result with "mun" is not displayed ?
>
> Thanks for your help
>
>
> Franck
> Le mardi 14 octobre 2014 14:21:18 UTC+2, Adrien Grand a écrit :
>>
>> Completion fields use the `simple` analyzer by default, which removes
>> numbers. If you try another analyzer that keeps numbers such as the
>> standard analyzer, this should work:
>>
>> PUT /hotels
>> {
>>   "mappings": {
>> "hotel" : {
>>   "properties" : {
>> "name" : { "type" : "string" },
>> "city" : { "type" : "string" },
>> "name_suggest" : {
>>   "type" : "completion",
>>   "analyzer": "standard"
>> }
>>   }
>> }
>>   }
>> }
>>
>> On Tue, Oct 14, 2014 at 2:03 PM, Franck B  wrote:
>>
>>> Mapping is very basic. There are no analyzers.
>>>
>>> Le mardi 14 octobre 2014 13:24:42 UTC+2, Adrien Grand a écrit :

 You are probably using an analyzer that removes numbers?

 On Mon, Oct 13, 2014 at 6:45 PM, Franck B  wrote:

> Hi all,
>
> I try to put an auto completion system based on ES.
>
> This 

Re: Problems with auto-_timestamp

2014-10-15 Thread Joshua Holbrook
This works for me. I thought I was actually going crazy. XD Thanks a bunch 
Jordan!

--Josh

On Wednesday, October 15, 2014 8:07:31 PM UTC-4, Jordan Sissel wrote:
>
>
>
> On Tuesday, October 14, 2014 5:46:10 PM UTC-7, Joshua Holbrook wrote:
>>
>> Hello,
>>
>> I'm working on an irc bot that indexes in-channel image links, and so far 
>> so good---except I can't seem to get automatic timestamps working! I did my 
>> best to follow the docs at 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-timestamp-field.html
>>  
>> but I don't see any change in what my search docs look like, even after 
>> deleting the entire index and reconfiguring first.
>>
>> Right now my mapping looks like this:
>>
>> $ curl http://localhost:9200/archivist/_mapping?pretty
>> {
>>   "archivist" : {
>> "mappings" : {
>>   "image" : {
>> "_timestamp" : {
>>   "enabled" : true,
>>   "store" : true
>> },
>> "properties" : {
>>   "channel" : {
>> "type" : "string"
>>   },
>>   "image" : {
>> "type" : "string"
>>   },
>>   "message" : {
>> "type" : "string"
>>   },
>>   "user" : {
>> "type" : "string"
>>   }
>> }
>>   }
>> }
>>   }
>> }
>>
>> and after indexing something (code doing this at 
>> https://github.com/jesusabdullah/archivist/blob/master/src/main/java/com/jesusabdullah/archivist/Indexer.java#L71-L82
>>  
>> first java project ever) my search results look like:
>>
>> $ curl http://localhost:9200/archivist/image/_search?pretty
>> {
>>   "took" : 0,
>>   "timed_out" : false,
>>   "_shards" : {
>> "total" : 5,
>> "successful" : 5,
>> "failed" : 0
>>   },
>>   "hits" : {
>> "total" : 1,
>> "max_score" : 1.0,
>> "hits" : [ {
>>   "_index" : "archivist",
>>   "_type" : "image",
>>   "_id" : "jK8dY6oKTRifbUU4o406pw",
>>   "_score" : 1.0,
>>   
>> "_source":{"channel":"#nodebombrange","user":"jesusabdullah","message":"snoop!
>>  
>> http://i.imgur.com/iktO9TK.gif","image":"http://i.imgur.com/iktO9TK.gif"}
>> } ]
>>   }
>> }
>>
>> Am I doing something wrong?
>>
>
> This was also confusing to me at first. I asked my coworkers who gave me a 
> good answer and also was pointed at a useful answer on the mailing list - 
> so between you and me and some other folks asking questions, you aren't 
> alone in your confusion! :)
>
> The short answer is that you must request _timestamp being returned to you:
>
> # Here, I ask for _source and _timestamp to be shown to me:
> % curl 'localhost:9200/archivist/_search?pretty&fields=_timestamp,_source'
> {
>   "took" : 1,
>   "timed_out" : false,
>   "_shards" : {
> "total" : 5,
> "successful" : 5,
> "failed" : 0
>   },
>   "hits" : {
> "total" : 1,
> "max_score" : 1.0,
> "hits" : [ {
>   "_index" : "archivist",
>   "_type" : "image",
>   "_id" : "jUWknMq1RmusrlU6hP2BGw",
>   "_score" : 1.0,
>   "_source": { "fancy": "pants whoa"},
>   "fields" : {
> "_timestamp" : 1413417808235
>   }
> } ]
>   }
> }
>
> This post has the more detailed answer:
>
> https://groups.google.com/forum/#!msg/elasticsearch/pebxC9ezowg/XCXH-POYvuQJ 
>
> -Jordan
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aed34486-e7ea-4304-8889-ce5b40839d58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problems with auto-_timestamp

2014-10-15 Thread Jordan Sissel


On Tuesday, October 14, 2014 5:46:10 PM UTC-7, Joshua Holbrook wrote:
>
> Hello,
>
> I'm working on an irc bot that indexes in-channel image links, and so far 
> so good---except I can't seem to get automatic timestamps working! I did my 
> best to follow the docs at 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-timestamp-field.html
>  
> but I don't see any change in what my search docs look like, even after 
> deleting the entire index and reconfiguring first.
>
> Right now my mapping looks like this:
>
> $ curl http://localhost:9200/archivist/_mapping?pretty
> {
>   "archivist" : {
> "mappings" : {
>   "image" : {
> "_timestamp" : {
>   "enabled" : true,
>   "store" : true
> },
> "properties" : {
>   "channel" : {
> "type" : "string"
>   },
>   "image" : {
> "type" : "string"
>   },
>   "message" : {
> "type" : "string"
>   },
>   "user" : {
> "type" : "string"
>   }
> }
>   }
> }
>   }
> }
>
> and after indexing something (code doing this at 
> https://github.com/jesusabdullah/archivist/blob/master/src/main/java/com/jesusabdullah/archivist/Indexer.java#L71-L82
>  
> first java project ever) my search results look like:
>
> $ curl http://localhost:9200/archivist/image/_search?pretty
> {
>   "took" : 0,
>   "timed_out" : false,
>   "_shards" : {
> "total" : 5,
> "successful" : 5,
> "failed" : 0
>   },
>   "hits" : {
> "total" : 1,
> "max_score" : 1.0,
> "hits" : [ {
>   "_index" : "archivist",
>   "_type" : "image",
>   "_id" : "jK8dY6oKTRifbUU4o406pw",
>   "_score" : 1.0,
>   
> "_source":{"channel":"#nodebombrange","user":"jesusabdullah","message":"snoop!
>  
> http://i.imgur.com/iktO9TK.gif","image":"http://i.imgur.com/iktO9TK.gif"}
> } ]
>   }
> }
>
> Am I doing something wrong?
>

This was also confusing to me at first. I asked my coworkers who gave me a 
good answer and also was pointed at a useful answer on the mailing list - 
so between you and me and some other folks asking questions, you aren't 
alone in your confusion! :)

The short answer is that you must request _timestamp being returned to you:

# Here, I ask for _source and _timestamp to be shown to me:
% curl 'localhost:9200/archivist/_search?pretty&fields=_timestamp,_source'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
  },
  "hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
  "_index" : "archivist",
  "_type" : "image",
  "_id" : "jUWknMq1RmusrlU6hP2BGw",
  "_score" : 1.0,
  "_source": { "fancy": "pants whoa"},
  "fields" : {
"_timestamp" : 1413417808235
  }
} ]
  }
}

This post has the more detailed answer:
https://groups.google.com/forum/#!msg/elasticsearch/pebxC9ezowg/XCXH-POYvuQJ 


-Jordan

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/06994111-8931-4c60-a9f1-1eda3e26d15f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


suggest next word without suggesting entire field value

2014-10-15 Thread Richard Tier
Hi,

I am a bit stuck with suggestions. I would like to suggest the next word, 
and I would like the suggestion to occur even if the word the user provides 
is in the middle of a field's value (and not just at the start). e.g., 

document containing 
{ body: "nice quick bronw fox jumped over" }

quick --> quick brown
brown --> brown fox 
quick bro --> quick brown

However, when I use "completion" suggester I am only offered completion 
options if the search term is at the start of the field value - so "nice" 
in this case.

Moreover I am offered the entire field value, but I would like only the 
next word.

I followed the tutorial here 

.

I also experimented with the other suggestors but they were not suitable

 - term: suggests single word. Does not suggest next word.
 - phrase: suggest multiple words, but does not suggest next word

This seems an obvious use case for suggestors so I think I am missing 
something fundamental here?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/16d19ffd-986a-48f6-9c62-c6dcd47ba34a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Optimal configuration for 2 high powered nodes

2014-10-15 Thread Mark Walkom
I'd go with option 3 and use puppet to manage it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 15 October 2014 08:10, Rémi Nonnon  wrote:

> Hi,
>
> I think the 2nd proposition could be the worst. With more than 32GB the
> JVM will use uncompressed pointer.
> Take a look at this :
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html
>
> Hope this help
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/aa905691-102d-4a6c-8c2f-a2b50a0164af%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZpssqDdTK%3DoQOdUDas6%2BWQg57Y2HKkDW7MDEwy0qwHrg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help with architecture

2014-10-15 Thread Mark Walkom
There's heaps of good docs;
http://logstash.net/docs/1.4.2/
http://www.logstashbook.com/

If you want separate data then keep separate stacks! You can use the one ES
cluster if you want and then just use different indexes.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 16 October 2014 06:57, Joshua Toepfer  wrote:

> We have moved to a distributed Linux/Apache Tomcat environment, and the
> logs [apache, tomcat, applications, sys, etc] are killing me.  We keep
> talking about centralize logging but doesn't seem like an easy task.  I've
> been reading the docs on ELK, and I like what I see.  What I'm still not
> seeing is the overall architecture in a distributed system.  So I have a
> Logstash process on each of my server nodes?  Then each of those nodes
> parse and report back to a centralized Elasticsearch engine?  Is there any
> documentation that anyone could point me to get a better understanding?
>
> So that is question 1.  The second question is that we visualized a copy
> of our production in our test environment.  How can I keep the events
> separate from our production and test environments?
>
> Any help would be greatly appreciated.
>
> Thanks,
> Josh
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/54489f93-97e1-4db4-8254-762df46e5d0a%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624aoyWVkF-CkNtKUoreBTGdv5rvWUJ0BAYqSf-oNvUrDqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Help with architecture

2014-10-15 Thread Joshua Toepfer
We have moved to a distributed Linux/Apache Tomcat environment, and the 
logs [apache, tomcat, applications, sys, etc] are killing me.  We keep 
talking about centralize logging but doesn't seem like an easy task.  I've 
been reading the docs on ELK, and I like what I see.  What I'm still not 
seeing is the overall architecture in a distributed system.  So I have a 
Logstash process on each of my server nodes?  Then each of those nodes 
parse and report back to a centralized Elasticsearch engine?  Is there any 
documentation that anyone could point me to get a better understanding?

So that is question 1.  The second question is that we visualized a copy of 
our production in our test environment.  How can I keep the events separate 
from our production and test environments?

Any help would be greatly appreciated.

Thanks,
Josh

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54489f93-97e1-4db4-8254-762df46e5d0a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problems with auto-_timestamp

2014-10-15 Thread Joshua Holbrook
Update:

I did some digging and discovered the setTimestamp method 
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/index/IndexRequestBuilder.java#L297-L303
 
so I tried adding that to my code 
https://github.com/jesusabdullah/zoltun/blob/master/src/main/java/com/jesusabdullah/zoltun/Indexer.java#L83-L85
 
but this doesn't appear to have done the trick either.

I suspect my problems are coming from using the java api, which uses the 
tcp node-to-node communication protocol instead of the typical http/json 
one.

--Josh

On Tuesday, October 14, 2014 8:46:10 PM UTC-4, Joshua Holbrook wrote:
>
> Hello,
>
> I'm working on an irc bot that indexes in-channel image links, and so far 
> so good---except I can't seem to get automatic timestamps working! I did my 
> best to follow the docs at 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-timestamp-field.html
>  
> but I don't see any change in what my search docs look like, even after 
> deleting the entire index and reconfiguring first.
>
> Right now my mapping looks like this:
>
> $ curl http://localhost:9200/archivist/_mapping?pretty
> {
>   "archivist" : {
> "mappings" : {
>   "image" : {
> "_timestamp" : {
>   "enabled" : true,
>   "store" : true
> },
> "properties" : {
>   "channel" : {
> "type" : "string"
>   },
>   "image" : {
> "type" : "string"
>   },
>   "message" : {
> "type" : "string"
>   },
>   "user" : {
> "type" : "string"
>   }
> }
>   }
> }
>   }
> }
>
> and after indexing something (code doing this at 
> https://github.com/jesusabdullah/archivist/blob/master/src/main/java/com/jesusabdullah/archivist/Indexer.java#L71-L82
>  
> first java project ever) my search results look like:
>
> $ curl http://localhost:9200/archivist/image/_search?pretty
> {
>   "took" : 0,
>   "timed_out" : false,
>   "_shards" : {
> "total" : 5,
> "successful" : 5,
> "failed" : 0
>   },
>   "hits" : {
> "total" : 1,
> "max_score" : 1.0,
> "hits" : [ {
>   "_index" : "archivist",
>   "_type" : "image",
>   "_id" : "jK8dY6oKTRifbUU4o406pw",
>   "_score" : 1.0,
>   
> "_source":{"channel":"#nodebombrange","user":"jesusabdullah","message":"snoop!
>  
> http://i.imgur.com/iktO9TK.gif","image":"http://i.imgur.com/iktO9TK.gif"}
> } ]
>   }
> }
>
> Am I doing something wrong?
>
> Thanks,
>
> --Josh
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4b33374b-5c14-451b-acb4-59ccd45e0fe9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question about completion in ES

2014-10-15 Thread Franck B
You're right, it works by changing the analyzer.

Now evolving issues, I customize the analyzer

PUT hotels
>
> {
>
>"settings": {
>
>   "analysis": {
>
>  "analyzer": {
>
> "labelstreet_analyzer": {
>
>"type": "custom",
>
>"tokenizer": "nGram_tokenizer",
>
>"filter": [
>
>   "lowercase",
>
>   "elision",
>
>   "stopwords"
>
>]
>
> }},
>
> "tokenizer": {
>
>"nGram_tokenizer": {
>
>   "type": "nGram",
>
>   "min_gram": "2",
>
>   "max_gram": "4",
>
>   "token_chars": [
>
>  "letter",
>
>  "digit"
>
>   ]
>
>}
>
> },
>
> "filter": {
>
>"elision": {
>
>   "type": "elision",
>
>   "article": [
>
>  "l",
>
>  "m",
>
>  "t",
>
>  "qu",
>
>  "n",
>
>  "s",
>
>  "j",
>
>  "d"
>
>   ]
>
>},
>
>"stopwords": {
>
>   "type": "stop",
>
>   "stopwords": [
>
>  "_french_"
>
>   ],
>
>   "ignore_case": true
>
>}
>
> }
>
>  }
>
>},
>
>"mappings": {
>
>   "hotel": {
>
>  "properties": {
>
> "name": {
>
>"type": "string",
>
>"analyzer": "labelstreet_analyzer"
>
> },
>
> "city": {
>
>"type": "string",
>
>"analyzer": "labelstreet_analyzer"
>
> },
>
> "name_suggest": {
>
>"type": "completion",
>
>"analyzer": "labelstreet_analyzer"
>
> }
>
>  }
>
>   }
>
>}
>
> }
>
>
 

PUT /hotels/hotel/1

{

   "name": "Mercure Hotel Munich",

   "city": "Munich",

   "name_suggest": {

  "input": [

 "Mercure Hotel Munich",

 "Mercure Munich"

  ]

   }

}


> PUT hotels/hotel/2

{

   "name": "Hotel Monaco",

   "city": "Munich",

   "name_suggest": {

  "input": [

 "Monaco Munich",

 "Hotel Monaco"

  ]

   }

}


> PUT /hotels/hotel/3

{

   "name": "Courtyard by Marriot Munich City",

   "city": "Munich",

   "name_suggest": {

  "input": [

 "Courtyard by Marriot Munich City",

 "Marriot Munich City"

  ]

   }

}


If i check the analyser, I get the result I want. That is, if I type "mun", 
the tokenizer gets me a right token

GET /hotels/_analyze?analyzer=labelstreet_analyzer&text=Munich
>

Yet when I execute this query, the result is null 

POST /hotels/_suggest
>
> {
>
>   "hotels" : {
>
> "text" : "mun",
>
> "completion" : {
>
>   "field" : "name_suggest"
>
> }
>
>   }
>
> }
>
>
Why the result with "mun" is not displayed ?

Thanks for your help


Franck
Le mardi 14 octobre 2014 14:21:18 UTC+2, Adrien Grand a écrit :
>
> Completion fields use the `simple` analyzer by default, which removes 
> numbers. If you try another analyzer that keeps numbers such as the 
> standard analyzer, this should work:
>
> PUT /hotels
> {
>   "mappings": {
> "hotel" : {
>   "properties" : {
> "name" : { "type" : "string" },
> "city" : { "type" : "string" },
> "name_suggest" : {
>   "type" : "completion",
>   "analyzer": "standard"
> }
>   }
> }
>   }
> }
>
> On Tue, Oct 14, 2014 at 2:03 PM, Franck B  > wrote:
>
>> Mapping is very basic. There are no analyzers.
>>
>> Le mardi 14 octobre 2014 13:24:42 UTC+2, Adrien Grand a écrit :
>>>
>>> You are probably using an analyzer that removes numbers?
>>>
>>> On Mon, Oct 13, 2014 at 6:45 PM, Franck B  wrote:
>>>
 Hi all,

 I try to put an auto completion system based on ES.

 This article http://www.elasticsearch.org/blog/you-complete-me/ helps 
 me a lot, but I don't understand a case:



 PUT /hotels
>
> {
>
>   "mappings": {
>
> "hotel" : {
>
>   "properties" : {
>
> "name_suggest" : {
>
>   "type" : "completion"
>
> }
>
>   }
>
> }
>
>   }
>
> }
>
>

 PUT /hotels/hotel/1
>
> {
>
>   "name_suggest" : {
>
> "input" :  [
>
>   "21 Mercure Hotel Munich",
>
>   "24 Mercure Munich"
>
> ]
>
>
>>   }
>
> }
>
>
>> PUT hotels/hotel/2
>
> {
>
>   "name_suggest" : {
>
> "input" :  [
>
>   "Monaco Munich",
>
>   "Hotel Monaco"
>

Re: Hot backup strategy for Elasticsearch

2014-10-15 Thread skm
Thank you!



On Tuesday, October 14, 2014 3:12:45 PM UTC-7, skm wrote:
>
> Hello List,
>
> Going through the current documentation I found that snapshot/restore 
> mechanism is one type of backup strategy that we can use for ES clusters. 
> Any other recommendations?
>
> Using the following
>
> 1.elasticsearch-
> "version" : {
> "number" : "1.3.4",
>
> 2. AWS-cloud-plugin
> 3. curator
>
>
> curator snapshot --repository mys3_repository --all-indices  (weekend)
> curator snapshot --repository mys3_repository --most-recent 1 (every week 
> day)
>
> The above would be run as cron jobs from one of the nodes in the cluster.
>
> Let me know of recommendations for hot backup for elastic search cluster.
>
> Thanks,
> skm
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0703c2b5-e645-4e70-b3bc-f714db48a5c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch treatment of duplicate Netflow Records

2014-10-15 Thread Alan Robertson
Bump

On Friday, October 10, 2014 10:40:27 AM UTC-4, Alan Robertson wrote:
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/24cc97ff-2935-4491-9b14-d5c0752a7b04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Exporting sub set of data from main index to a new index

2014-10-15 Thread Alexandre Rafalovitch
Have you looked at Scroll and Scan?
http://www.elasticsearch.org/guide/en/elasticsearch/guide/master/scan-scroll.html

This assumes your _source field has not been disabled.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 15 October 2014 13:14, Vijay Tiwary  wrote:
> Is there a easy way to export a some part of data(based on some filters)
> from a index say for e.g from a master index to a new index. Apparently it
> looks like I will have to use bulk API to query the data from the master
> index (using some filters) and then  I will have insert those documents into
> the new index. Is there any better and easier way.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f3778121-8019-44ba-a30c-3194ae72f72f%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEFAe-HhSy0eoUg6gYdNYir%2BaBCy1L7QPqOcqdmOQ9JU2VJVkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch process on a node takes 99% of RAM as seen by top, and eventually gets killed by kernel.

2014-10-15 Thread Narendra Yadala
This is the cluster setup I have 5 x m3.2xlarge instaces (240G general 
purpose ssd backed ebs volume for each instances). I have allocated 22 G of 
30 G for elasticsearch (with mlockall option set). Initially I had 5 x 
m3.xlarge instaces but they were crashing because of oom, so I ended up 
doubling up the RAM. It has 225 million documents occupying 170 G (replicas 
not taken into account).

Indexing wise the cluster receives 500-1000 documents per minute and I do 
not do any bulk processing. Search wise, there are some batches which 
mostly issue "timestamp in between" queries, and then some multigets. 

What I see is that kopf/elastichq indicate es node takes 13-14 GB out of 22 
G I allocated to the process but when I log in to the box, the process is 
taking 98.3% of RAM. 

This is the output of top command

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND   

  
 1903 elastics   20   053.8g  28g   121m S 19.3 98.3 563:14.74 
java

Eventually somewhere down the line, the GC kicks in starts its job, but 
kernel is killing the node abruptly causing oom-killer to be invoked. This 
the message in /var/log/messages
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.903747] java invoked 
oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.908324] java cpuset=/ 
mems_allowed=0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.910666] CPU: 7 PID: 3031 
Comm: java Not tainted 3.10.42-52.145.amzn1.x86_64 #1
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.915092] Hardware name: 
Xen HVM domU, BIOS 4.2.amazon 05/23/2014
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.918807]   
880037565978 8144c2b9 8800375659e8
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.923221]  814497df 
88078ffbfb38 00a9 880037565a50
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.926556]  8800375659b0 
8807523fdec0  81a4ffe0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.930586] Call Trace:
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.931799] 
 [] dump_stack+0x19/0x1b
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.933982] 
 [] dump_header+0x7f/0x1c2
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.936321] 
 [] oom_kill_process+0x1a9/0x310
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.939246] 
 [] ? security_capable_noaudit+0x15/0x20
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.942265] 
 [] out_of_memory+0x429/0x460
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.944663] 
 [] __alloc_pages_nodemask+0x947/0x9e0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.947900] 
 [] alloc_pages_current+0xa9/0x170
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.950776] 
 [] __page_cache_alloc+0x87/0xb0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.953393] 
 [] filemap_fault+0x185/0x430
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.955758] 
 [] __do_fault+0x6f/0x4f0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.958038] 
 [] ? __wait_on_bit_lock+0xab/0xc0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.961099] 
 [] handle_pte_fault+0x93/0xa10
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.963773] 
 [] ? generic_file_aio_read+0x588/0x700
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.966555] 
 [] handle_mm_fault+0x299/0x690
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.969017] 
 [] __do_page_fault+0x150/0x4f0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.972042] 
 [] do_page_fault+0xe/0x10
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.974363] 
 [] page_fault+0x28/0x30
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.976827] Mem-Info:
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.977849] Node 0 DMA 
per-cpu:
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.980128] CPU0: hi:   
 0, btch:   1 usd:   0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.982184] CPU1: hi:   
 0, btch:   1 usd:   0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.984273] CPU2: hi:   
 0, btch:   1 usd:   0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.986381] CPU3: hi:   
 0, btch:   1 usd:   0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.988446] CPU4: hi:   
 0, btch:   1 usd:   0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.990510] CPU5: hi:   
 0, btch:   1 usd:   0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.992796] CPU6: hi:   
 0, btch:   1 usd:   0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.994810] CPU7: hi:   
 0, btch:   1 usd:   0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.997023] Node 0 DMA32 
per-cpu:
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125758.999044] CPU0: hi: 
 186, btch:  31 usd:   5
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125759.001285] CPU1: hi: 
 186, btch:  31 usd:   0
Oct 14 22:06:10 ip-10-213-155-189 kernel: [125759.003415] C

Exporting sub set of data from main index to a new index

2014-10-15 Thread Vijay Tiwary
Is there a easy way to export a some part of data(based on some filters) 
from a index say for e.g from a master index to a new index. Apparently it 
looks like I will have to use bulk API to query the data from the master 
index (using some filters) and then  I will have insert those documents 
into the new index. Is there any better and easier way. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f3778121-8019-44ba-a30c-3194ae72f72f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hot backup strategy for Elasticsearch

2014-10-15 Thread David Pilato
Incremental means that it has only to copy differences.
As you said, nothing was indexed. So nothing new (or less) was copied.

David

> Le 15 oct. 2014 à 17:07, skm  a écrit :
> 
> Thank you!  
> 
> I run the snapshot commands couple times and see the same size snapshots in 
> s3 bucket. Shouldn't it be smaller if this is incremental? there is no change 
> in the data between these time periods.
> 
> 
> Thanks,
> skm
> 
> 
> 
> 
>> On Tuesday, October 14, 2014 3:12:45 PM UTC-7, skm wrote:
>> Hello List,
>> 
>> Going through the current documentation I found that snapshot/restore 
>> mechanism is one type of backup strategy that we can use for ES clusters. 
>> Any other recommendations?
>> 
>> Using the following
>> 
>> 1.elasticsearch-
>> "version" : {
>> "number" : "1.3.4",
>> 
>> 2. AWS-cloud-plugin
>> 3. curator
>> 
>> 
>> curator snapshot --repository mys3_repository --all-indices  (weekend)
>> curator snapshot --repository mys3_repository --most-recent 1 (every week 
>> day)
>> 
>> The above would be run as cron jobs from one of the nodes in the cluster.
>> 
>> Let me know of recommendations for hot backup for elastic search cluster.
>> 
>> Thanks,
>> skm
>>  
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/3f5f834a-98ed-4f6f-9772-abfdc85eb077%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/B29432AE-60C8-4407-9CC5-880E9AE424BE%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Not able to search long values

2014-10-15 Thread Dan Fairs
Hi


> And I tried searching the CUSTOMER_NUMBER and MERCHANT_NUMBER from Head
> plugin as well as from Kibana UI.
> But its not getting searched. I even tried to explicitly provide 
> *"index":"not_analyzed"
> *for CUSTOMER_NUMBER and MERCHANT_NUMBER.
> Even that didn't work. Only after changing these fields to String, I was
> able to search on these fields. [Eg: CUSTOMER_NUMBER  1234567899876543210]
> Can someone tell me why is the behaviour like this?
>
>
It's probably Kibana. Javascript has problems with very large numbers - we
encountered a similar problem recently while messing around with head. Try
running your query directly with curl, or something else that doesn't have
javascript in the pipeline.

(Interesting aside: we also noticed that the popular 'jq' utility also has
problems with long values...)

Cheers,
Dan

-- 
Dan Fairs  | @danfairs

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJn_wTHham95PQ1JdEZiqE388cXOqcFQEVxdVCJnU6%3D54mMCwA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Don't Return Similar Documents from a Query

2014-10-15 Thread Adam Toy
Hi all,

Is there a way to remove highly similar, but slightly different documents 
from a query result before it is returned? And if so, can you set the 
threshold for which it judges the similarity?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a16a5c7-09ac-4948-990b-40331abf854b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Filtering special characters in search results

2014-10-15 Thread Yuan LIN
Hello everyone,

We use currently ES to index documents with the mapper-attachments plugin.
Sometimes, files that we index can contain some special characters, such as 
special symbols (ex : cellphone symbol) added in word.
In our case, special characters have nothing to do with language, but only 
graphical symbol in word.

Finally, the search results will return things like
 \n� 000 123 456 \n


Since indexed files are encoded in base64 and stored directly in ES without 
any copy, I don't think we shall filter the files before its storage in ES. 
(Otherwise we can't retrieve the same document as it was before indexing)

Maybe we should try to clean the search result by eliminating these 
unreadable characters.

Do you have some ideas please?

Thank you very much.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cd988a84-2c64-425b-b522-86f74311390e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hot backup strategy for Elasticsearch

2014-10-15 Thread skm
Thank you!  

I run the snapshot commands couple times and see the same size snapshots in 
s3 bucket. Shouldn't it be smaller if this is incremental? there is no 
change in the data between these time periods.


Thanks,
skm




On Tuesday, October 14, 2014 3:12:45 PM UTC-7, skm wrote:
>
> Hello List,
>
> Going through the current documentation I found that snapshot/restore 
> mechanism is one type of backup strategy that we can use for ES clusters. 
> Any other recommendations?
>
> Using the following
>
> 1.elasticsearch-
> "version" : {
> "number" : "1.3.4",
>
> 2. AWS-cloud-plugin
> 3. curator
>
>
> curator snapshot --repository mys3_repository --all-indices  (weekend)
> curator snapshot --repository mys3_repository --most-recent 1 (every week 
> day)
>
> The above would be run as cron jobs from one of the nodes in the cluster.
>
> Let me know of recommendations for hot backup for elastic search cluster.
>
> Thanks,
> skm
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3f5f834a-98ed-4f6f-9772-abfdc85eb077%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hot backup strategy for Elasticsearch

2014-10-15 Thread Itamar Syn-Hershko
Incremental. See
http://www.elasticsearch.org/blog/introducing-snapshot-restore/

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 

On Wed, Oct 15, 2014 at 5:15 PM, skm  wrote:

> Thank you for the response!
>
> Usually for large amounts of data (TBs) how the snapshot backup strategy
> work? Full snapshots every week and then most-recent snapshots work well?
> The most recent would be redundant if there is no new data in the last 24
> hrs.?
>
> Thanks,
> skm
>
>
> On Wednesday, October 15, 2014 12:54:13 AM UTC-7, Itamar Syn-Hershko wrote:
>>
>> No - you should definitely use the snapshot and restore as its the most
>> stable and efficient way for backups there is.
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko 
>> Freelance Developer & Consultant
>> Author of RavenDB in Action 
>>
>> On Wed, Oct 15, 2014 at 1:12 AM, skm  wrote:
>>
>>> Hello List,
>>>
>>> Going through the current documentation I found that snapshot/restore
>>> mechanism is one type of backup strategy that we can use for ES clusters.
>>> Any other recommendations?
>>>
>>> Using the following
>>>
>>> 1.elasticsearch-
>>> "version" : {
>>> "number" : "1.3.4",
>>>
>>> 2. AWS-cloud-plugin
>>> 3. curator
>>>
>>>
>>> curator snapshot --repository mys3_repository --all-indices  (weekend)
>>> curator snapshot --repository mys3_repository --most-recent 1 (every
>>> week day)
>>>
>>> The above would be run as cron jobs from one of the nodes in the cluster.
>>>
>>> Let me know of recommendations for hot backup for elastic search cluster.
>>>
>>> Thanks,
>>> skm
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/fdb9ebae-0352-491c-bca6-dc905cd623ae%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e42872cd-7f44-4ada-b1d5-e988edac60e0%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZtbuP1qpXXDpV-zrkdhuuO6DACq32-LF45z%3DQ9s2QA4Ag%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hot backup strategy for Elasticsearch

2014-10-15 Thread skm
Thank you for the response!

Usually for large amounts of data (TBs) how the snapshot backup strategy 
work? Full snapshots every week and then most-recent snapshots work well? 
The most recent would be redundant if there is no new data in the last 24 
hrs.?

Thanks,
skm


On Wednesday, October 15, 2014 12:54:13 AM UTC-7, Itamar Syn-Hershko wrote:
>
> No - you should definitely use the snapshot and restore as its the most 
> stable and efficient way for backups there is.
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Author of RavenDB in Action 
>
> On Wed, Oct 15, 2014 at 1:12 AM, skm > 
> wrote:
>
>> Hello List,
>>
>> Going through the current documentation I found that snapshot/restore 
>> mechanism is one type of backup strategy that we can use for ES clusters. 
>> Any other recommendations?
>>
>> Using the following
>>
>> 1.elasticsearch-
>> "version" : {
>> "number" : "1.3.4",
>>
>> 2. AWS-cloud-plugin
>> 3. curator
>>
>>
>> curator snapshot --repository mys3_repository --all-indices  (weekend)
>> curator snapshot --repository mys3_repository --most-recent 1 (every week 
>> day)
>>
>> The above would be run as cron jobs from one of the nodes in the cluster.
>>
>> Let me know of recommendations for hot backup for elastic search cluster.
>>
>> Thanks,
>> skm
>>  
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/fdb9ebae-0352-491c-bca6-dc905cd623ae%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e42872cd-7f44-4ada-b1d5-e988edac60e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Make elastic treat "c#" and "c++" differently

2014-10-15 Thread Sri Harsha
Hello

I need elasticsearch return considerably different results when searched 
for "c#" and "c++".

After going through documenation/stackoverflow and few trail and errors, I 
implemented custom analyzer that will consider # and + as ALPHANUM. 
Now I am getting better results. But, few of my results even now contain 
"c++" WITHOUT "c#" text when searched with "c#".

*Below is my analyzer:*
"settings": {
"analysis": {
"filter": {
"hash_filter": {
"type": "word_delimiter",
"type_table": [
"# => ALPHANUM",
"+ => ALPHANUM",
". => ALPHANUM"
],
"split_on_numerics":"false"
}
},
"analyzer": {
"hash_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"hash_filter"
],
"char_filter": "html_strip"
}
}
} 

*Below is the dummy mapper:*
"mapping": {
"myIndex": {
"_all": {
"enabled": "true",
"index": "analyzed"
},
"properties": {
"description": {
"type": "string",
"index": "analyzed",
"analyzer": "hash_analyzer"
}
}
}
}

Please provide me tips on how to improve so that I can get better results.

Thank you :)
Sri Harsha



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e076367b-67ed-4c18-a68e-7ed0a8e54cbf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: has_child filter and min_children

2014-10-15 Thread Darren McDaniel
I guess if I bang at it hard enough.. I'll find a solution... this 
'Appears' to work... :)

{

   "min_score": 0,

   "query": {

  "function_score": {

 "boost_mode": "multiply",

 "functions": [

{

   "script_score": {

  "params": {

 "min_value":0,

 "max_value": 5

  },

  "script": "_score <= max_value && _score >= 
min_value?1:-1"

   }

}

 ],

 "query": {

"filtered": {

   "query": {

  "has_child": {

 "type": "service",

 "score_type": "total",

 "query": {

"match_all": {}

 }

  }

   },

   "filter": {

  "and": [

 {

"terms": {

   "owk": [

  23621

   ]

}

 },

 {

"terms": {

   "oowk": [

  23621

   ]

}

 },

 {

"term": {

   "ismd": true

}

 },

 {

"and": [

   {

  "terms": {

 "owk": [

23621

 ]

  }

   },

   {

  "terms": {

 "oowk": [

23621

 ]

  }

   },

   {

  "term": {

 "ipl": false

  }

   },

   {

  "terms": {

 "pwk": [

21754

 ]

  }

   },

   {

  "range": {

 "my": {

"gte": 1980,

"lte": 2015

 }

  }

   },

   {

  "term": {

 "isb": false

  }

   }

]

 }

  ]

   }

}

 }

  }

   },

   "from": 0,

   "size": 0,

   "aggs": {

  "unique": {

 "cardinality": {

"field": "hhk",

"precision_threshold": 4

 }

  }

   }

}

On Wednesday, October 15, 2014 8:50:24 AM UTC-4, Darren McDaniel wrote:
>
> I've come up with something that will allow me to get the minimun level... 
> Now, do I need to do this at the client level to get the 'max' number... 
> say I want a range between 5 and 10... how do I limit the results to n 
> number of children? 
>
> {
> "min_score": 5,
> "query": {
> "filtered": {
> "query": {
> "has_child": {
> "type": "service",
> "score_type": "total",
> "query": {
> "match_all": {}
> }
> }
> },
> "filter": {
> "and": [
> {
> "terms": {
> "owk": [
> 23621
> ]
> }
> },
> {
> "terms": {
> "oowk": [
> 23621
> ]
> }
> },
> {
> "term": {
> "ismd": true
> }
> },
> {
> "and": [
> {
> "terms": {
> "owk": [
> 23621
> ]
> }
> 

Q: Deleting documents by _id but without index info?

2014-10-15 Thread Bernt Rostad
Hi, 

This may be a newbie question but I'm struggling to solve a problem for my 
company in which a MySql database is used to feed and maintain a cluster of 
Elasticsearch servers used for front-end searches.

My problem is: Given a set of document IDs from MySql, I need to remove 
those documents from all of our Elasticsearch indices. We typically have 
one index per year and though I know the document ID, which has the same ID 
in Elasticsearch, I don't know the index it is stored in.

I started out trying the Perl module Search::Elasticsearch::Bulk which 
offered the delete_ids() and add_action() methods, which both seemed to 
allow me to delete a large number of documents, but requiring an index 
name. I thought I could do the same trick as with the _search endpoint and 
use a wildcard to indicate "all indices", e.g. index => 'published-*', but 
that failed spectacularly:

InvalidIndexNameException[[pulse-*] Invalid index name [published-*], must 
not contain the following characters [\, /, *, ?, ", <, >, |,  , ,]];  at 
/usr/local/lib/site_perl/Cpan/share/perl/5.14.2/Search/Elasticsearch/Role/Bulk.pm
 
line 188.

So, I couldn't use a wildcard and I still didn't know the exact index for a 
given document ID. I was back to the drawing board again. 

My next attempt was to dynamically decide the index for each document, by 
doing a search before building up the bulk delete. For this I thought I 
could use the _mget endpoint, which seemed similar to _search and thus 
would allow me to query all the document IDs to learn their indices. But 
that didn't work either. Here I've copied the commands I tried running in 
Sense:

GET /published-*/_mget
{
  "docs" : [
{ "_id" : "019001201409294aa579ddb20348cbbf402116c91f6d15_80" },
{ "_id" : "0190012014092947e9fe351dc78763dabc45231301e9f9_80" }
  ]
}

GET /_mget
{
  "docs" : [
{ "_id" : "019001201409294aa579ddb20348cbbf402116c91f6d15_80" },
{ "_id" : "0190012014092947e9fe351dc78763dabc45231301e9f9_80" }
  ]
}

The first only returned errors with each document while the second didn't 
return anything, just issued an "index is missing" error.

However, when calling the _search endpoint I can either use '/published-*' 
or no index info at all and still get a sensible result back, e.g.:

GET /_search
{
  "query": {
"filtered": {
  "query": {
"term" : { "_id" : 
"0190012014092947e9fe351dc78763dabc45231301e9f9_80" }
  }
}
  }
}


This has left me perplexed: Why can I query one document ID from the 
_search endpoint and get back the index information but not from _mget?

This situation seems to force me to loop over each document ID, of possibly 
hundreds of thousand per night, calling the _search endpoint for each ID to 
get the index information and then build up the bulk delete.

Can life really be this difficult?

Are there other mechanisms I can look at that will allow me, for a given 
list of document IDs, to delete the associated documents from unspecified 
Elasticsearch indices?


I'm sorry if this was a trivial question but I've spent several days 
pouring over the Search::Elasticsearch documentation and googling 
Elasticsearch examples without finding other ways to get the job done.

Best wishes,
Bernt Rostad
Retriever Norge

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/719fd493-9eaa-4f22-8552-3f1c91245786%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: has_child filter and min_children

2014-10-15 Thread Darren McDaniel
I've come up with something that will allow me to get the minimun level... 
Now, do I need to do this at the client level to get the 'max' number... 
say I want a range between 5 and 10... how do I limit the results to n 
number of children? 

{
"min_score": 5,
"query": {
"filtered": {
"query": {
"has_child": {
"type": "service",
"score_type": "total",
"query": {
"match_all": {}
}
}
},
"filter": {
"and": [
{
"terms": {
"owk": [
23621
]
}
},
{
"terms": {
"oowk": [
23621
]
}
},
{
"term": {
"ismd": true
}
},
{
"and": [
{
"terms": {
"owk": [
23621
]
}
},
{
"terms": {
"oowk": [
23621
]
}
},
{
"term": {
"ipl": false
}
},
{
"terms": {
"pwk": [
21754
]
}
},
{
"range": {
"my": {
"gte": 1980,
"lte": 2015
}
}
},
{
"term": {
"isb": false
}
}
]
}
]
}
}
},
"from": 0,
"size": 0,
"aggs": {
"unique": {
"cardinality": {
"field": "hhk",
"precision_threshold": 4
}
}
}
}


 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5539c864-c50c-4ded-8077-bcef3d87deeb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch 0.90.0: UnavailableShardsException

2014-10-15 Thread Prakhar Mishra
Hello everyone,

I am using *Elasticsearch 0.90.0*. When I am trying to put a json doc in 
some index, I am getting an exception named *UnavailableShardsException*. I 
don't have any idea why I am getting this exception. I tried:

*curl -v -XPUT "http://localhost:9200/index-15-10-14/type-15-10-14/123"; -d 
"{}"*

and what I am getting is this:

{"error":"UnavailableShardsException[[index-15-10-14][4] [2] shardIt, [0] 
active : Timeout waiting for [1m], request: index 
{[index-15-10-14][type-15-10-14][123], source[{}]}]","status":503}

I also tried to change no. of replicas of the index, as follows:


*curl -v -XPUT "http://localhost:9200/index-15-10-14/_settings?pretty"; -d 
'{*
* "index": {*
* "number_of_replicas": 0*
* }*
*}*
*'*
but, no luck. When I am using elasticsearch-head plugin to view it, it 
shows:


Please help!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d045e17e-03ed-4cc7-ab77-da394f06afd0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: clarification around unicast host discovery w/ round-robin dns

2014-10-15 Thread Roy Olsen

On Tuesday, 10 December 2013 19:28:25 UTC+1, Kevin S wrote:
>
> I would prefer to automate the scale-out of our cluster and use a 
> homogeneous elasticsearch image.  (we use docker for this and it works well 
> for masterless distributed services)
>

I'm with Kevin on this one. Host discovery via unicast round-robin DNS 
would be useful in the numerous scenarios where multicast is less 
desirable, and I frankly see no reason for ElasticSearch not to support 
this. It is simple, convenient, predictable, widely supported, very 
manageable and well-behaved functionality. 

One could always get the full array of A records and consider it a list of 
nodes. 

I'm not much of a java developer, but I expect this could also be done via 
a plugin through the existing API? 

-- 
Roy Olsen 
Lead DBA, Xait

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4b1a7142-6e7e-46c7-897c-00bcafc89e7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: School project - Analysis of Intranet appliaction traffic

2014-10-15 Thread Alexandre Rafalovitch
Your examples seem to be more from the Complex Event Processing
domain: http://en.wikipedia.org/wiki/Complex_event_processing


Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 15 October 2014 04:38, Vojtěch Bašta  wrote:
> Hey!
>
> Every GET/POST request to the university intranet is tracked and saved to a
> table that contains following information:
> - UserId
> - Date
> - IPAddress
> - TargetUrl
> - BrowserInfo (headers)
> - ResponseTime (in miliseconds)
>
> This is database is running on another server and I need to replicate the
> data on another server.
>
> I would like to persist the data and then perform some statistical analysis
> and display alerts when something seems wrong. For example:
> - User is usually connecting from IP in CZ but now he logged from China
> - There is 50% more requests from this user compared to an average user.
>
> 1) Is this something that I should be able to achieve with logstash /
> elasticsearch?
> 2) What approach would you suggest to get data from external oracle database
> to logstash?
> 3) Does Elastic search support such queries or does it expose some API so
> it's possible to build alerting engine on top of it?
>
> Thanks a lot in advance!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0ffd37ab-3a72-4bbc-8c25-fd65e0e59384%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEFAe-HoXGENNJG8G3FnEK_vhZQncooB1qQzPvXjgCD9DmaQJQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Postgres->elasticsearch realtime update

2014-10-15 Thread joergpra...@gmail.com
If you have an SQL command that can update your data and you can live with
a delay in the minute range, you will be fine with JDBC river.

If you want to track all deletes/updates/insert no matter if there is an
SQL command, or if you want instant updates (aka "push", e.g. with
trigger), this is not possible with JDBC river.

Jörg

On Wed, Oct 15, 2014 at 9:16 AM, Jorge von Rudno <
jorge.vonrudno...@googlemail.com> wrote:

> Dear College,
>
> I have used "river" in order to load an index in elasticsearch from
> postgresql and now I want to keep the index synchronized with the database.
> Can I use river for this purpose?, how? or there are another tool to do
> this?
>
> At the beginning I have considered to write a trigger and a stored
> procedure, but If "river" is designed to do this I will prefer use it.
>
> kind regards.
>
> Jorge von Rudno
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAFqKu%3DaF86B2nqc9aXyUA77OXjZOXQihfT1x0QbuLpNWXe8HnQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGR%3DdxpFhBO8%2B303u%2B6XD3odhXJkbEK%3DVkdoshDo_2E%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: refresh thread consumes CPU resource when changing refresh_interval to -1

2014-10-15 Thread Shinsuke Sugaya
Thank you for fixing it!

-shinsuke

On Wednesday, October 15, 2014 5:52:15 PM UTC+9, Michael McCandless wrote:
>
> OK this will be fixed in the next ES release: 
> https://github.com/elasticsearch/elasticsearch/pull/8087
>
> Thank you for reporting this.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Oct 15, 2014 at 2:40 AM, Shinsuke Sugaya  > wrote:
>
>> Hi,
>>
>> I encountered a problem in InternalIndexShard#EngineRefresher.
>> The problem is, 1 core consumed 100% CPU resource when changing 
>> refresh_interval to -1.
>>
>> In ES 1.3, I took a thread dump twice as below:
>>
>> $ top -n1 -b -H
>> ...
>>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 
>>
>>  4074 elastics  20   0  9.9g 5.8g 316m R 99.6 18.5  53:19.94 java
>> ... 
>> $ jstack -F 
>> ...
>> Thread 4074: (state = IN_JAVA)
>>  - org.apache.lucene.index.IndexReader.tryIncRef() @bci=23, line=226 
>> (Compiled frame; information may be imprecise)
>>  - 
>> org.apache.lucene.search.SearcherManager.tryIncRef(org.apache.lucene.search.IndexSearcher)
>>  
>> @bci=4, line=128 (Compiled frame)
>>  - org.apache.lucene.search.SearcherManager.tryIncRef(java.lang.Object) 
>> @bci=5, line=58 (Compiled frame)
>>  - org.apache.lucene.search.ReferenceManager.acquire() @bci=21, line=100 
>> (Compiled frame)
>>  - org.apache.lucene.search.SearcherManager.isSearcherCurrent() @bci=1, 
>> line=142 (Compiled frame)
>>  - org.elasticsearch.index.engine.internal.InternalEngine.refreshNeeded() 
>> @bci=11, line=743 (Compiled frame)
>>  - 
>> org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
>>  
>> @bci=7, line=930 (Compiled frame)
>>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
>> (Compiled frame)
>>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled 
>> frame)
>>  - 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>>  
>> @bci=1, line=180 (Compiled frame)
>>  - 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
>> @bci=30, line=293 (Compiled frame)
>>  - 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>>  
>> @bci=95, line=1142 (Compiled frame)
>>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
>> (Interpreted frame)
>>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
>>  ... 
>> $ jstack -F 
>> ...
>>  Thread 4074: (state = IN_JAVA)
>>  - org.elasticsearch.common.unit.TimeValue.millis() @bci=8, line=95 
>> (Compiled frame; information may be imprecise)
>>  - 
>> org.elasticsearch.threadpool.ThreadPool.schedule(org.elasticsearch.common.unit.TimeValue,
>>  
>> java.lang.String, java.lang.Runnable) @bci=30, line=229 (Compiled frame)
>>  - 
>> org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
>>  
>> @bci=59, line=933 (Compiled frame)
>>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
>> (Compiled frame)
>>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled 
>> frame)
>>  - 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>>  
>> @bci=1, line=180 (Compiled frame)
>>  - 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
>> @bci=30, line=293 (Compiled frame)
>>  - 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>>  
>> @bci=95, line=1142 (Compiled frame)
>>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
>> (Interpreted frame)
>>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
>> ...
>>
>> I looked into InternalIndexShard$EngineRefresher#run() and
>> InternalIndexShard$ApplyRefreshSettings#onRefreshSettings().
>> If a thread is running in EngineRefresher#run() method when
>> changing refresh_interval to -1, invoking 
>> refreshScheduledFuture.cancel(false) 
>> in ApplyRefreshSettings#onRefreshSettings() does not cancel 
>> EngineRefresher thread. Moreover, the refresh_interval is
>> changed to -1 and EngineRefresher seems to be invoked 
>> with no interval(the thread comsumes 100% CPU).
>>
>> I think that a fix is to check if refreshInterval.millis() > 0.
>> For example, the fix is:
>>
>> class EngineRefresher implements Runnable {
>> @Override
>> public void run() {
>> // we check before if a refresh is needed, if not, we 
>> reschedule, otherwise, we fork, refresh, and then reschedule
>> if (!engine().refreshNeeded()) {
>> synchronized (mutex) {
>> if (state != IndexShardState.CLOSED && 
>> refreshInterval.millis() > 0) { // <== HERE
>> refreshScheduledFuture = 
>> threadPool.schedule(refreshInterval, ThreadPool.Name

Re: refresh thread consumes CPU resource when changing refresh_interval to -1

2014-10-15 Thread Michael McCandless
OK this will be fixed in the next ES release:
https://github.com/elasticsearch/elasticsearch/pull/8087

Thank you for reporting this.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Oct 15, 2014 at 2:40 AM, Shinsuke Sugaya 
wrote:

> Hi,
>
> I encountered a problem in InternalIndexShard#EngineRefresher.
> The problem is, 1 core consumed 100% CPU resource when changing
> refresh_interval to -1.
>
> In ES 1.3, I took a thread dump twice as below:
>
> $ top -n1 -b -H
> ...
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>
>  4074 elastics  20   0  9.9g 5.8g 316m R 99.6 18.5  53:19.94 java
> ...
> $ jstack -F 
> ...
> Thread 4074: (state = IN_JAVA)
>  - org.apache.lucene.index.IndexReader.tryIncRef() @bci=23, line=226
> (Compiled frame; information may be imprecise)
>  -
> org.apache.lucene.search.SearcherManager.tryIncRef(org.apache.lucene.search.IndexSearcher)
> @bci=4, line=128 (Compiled frame)
>  - org.apache.lucene.search.SearcherManager.tryIncRef(java.lang.Object)
> @bci=5, line=58 (Compiled frame)
>  - org.apache.lucene.search.ReferenceManager.acquire() @bci=21, line=100
> (Compiled frame)
>  - org.apache.lucene.search.SearcherManager.isSearcherCurrent() @bci=1,
> line=142 (Compiled frame)
>  - org.elasticsearch.index.engine.internal.InternalEngine.refreshNeeded()
> @bci=11, line=743 (Compiled frame)
>  -
> org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
> @bci=7, line=930 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> (Compiled frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
>  -
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> @bci=1, line=180 (Compiled frame)
>  -
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
> @bci=30, line=293 (Compiled frame)
>  -
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=95, line=1142 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
>  ...
> $ jstack -F 
> ...
>  Thread 4074: (state = IN_JAVA)
>  - org.elasticsearch.common.unit.TimeValue.millis() @bci=8, line=95
> (Compiled frame; information may be imprecise)
>  -
> org.elasticsearch.threadpool.ThreadPool.schedule(org.elasticsearch.common.unit.TimeValue,
> java.lang.String, java.lang.Runnable) @bci=30, line=229 (Compiled frame)
>  -
> org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
> @bci=59, line=933 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> (Compiled frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
>  -
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> @bci=1, line=180 (Compiled frame)
>  -
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
> @bci=30, line=293 (Compiled frame)
>  -
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=95, line=1142 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> ...
>
> I looked into InternalIndexShard$EngineRefresher#run() and
> InternalIndexShard$ApplyRefreshSettings#onRefreshSettings().
> If a thread is running in EngineRefresher#run() method when
> changing refresh_interval to -1, invoking
> refreshScheduledFuture.cancel(false)
> in ApplyRefreshSettings#onRefreshSettings() does not cancel
> EngineRefresher thread. Moreover, the refresh_interval is
> changed to -1 and EngineRefresher seems to be invoked
> with no interval(the thread comsumes 100% CPU).
>
> I think that a fix is to check if refreshInterval.millis() > 0.
> For example, the fix is:
>
> class EngineRefresher implements Runnable {
> @Override
> public void run() {
> // we check before if a refresh is needed, if not, we
> reschedule, otherwise, we fork, refresh, and then reschedule
> if (!engine().refreshNeeded()) {
> synchronized (mutex) {
> if (state != IndexShardState.CLOSED &&
> refreshInterval.millis() > 0) { // <== HERE
> refreshScheduledFuture =
> threadPool.schedule(refreshInterval, ThreadPool.Names.SAME, this);
> }
> }
> return;
> }
>
> Could you check this problem?
>
> Thanks,
>  shinsuke
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails 

nested type mapping error

2014-10-15 Thread shekhar chauhan
Hi All,
  I am facing a problem.How to do mapping of making a object of nested 
type.I* only wants itemResults object of nested type.*

my document structure is - 

{
   "_id": "inspectionresults-07d864fd-76ec-427a-a1c6-b70e420ce3d8",
   "_rev": "12-64c44ddcfcd477dfa86cdb9d3c659748",
   "sectionResults": [
   
   {
   "SectionName": "More Details",
   "itemResults": [
   
   {
   "InspectionItem": "Launch Month",
   "ItemInfo": ""
   
   }
   ]
   }
  
   ]
}

and my query is - 
PUT realtek_release_v6_nestedsearch/_mapping
{
  "realtek_release_v6_nestedsearch":{
"properties": {
  "sectionResults":{
"type": "object",
"properties": {
  "itemResults":{
"type": "nested"
  }
}
  }
}
  }
}

error is -  "nested object under path [sectionResults] is not of nested 
type"

please send me query according to my document structure.*i only wants 
itemResults object of nested type*.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7e301db9-8160-475c-90f9-42ae87dcb748%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query problem with nested fields using DSL with elasticsearch

2014-10-15 Thread Ramy
You should use path for your nested fields!!!
please read this article: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html


Am Mittwoch, 15. Oktober 2014 10:38:24 UTC+2 schrieb José Ramón Palanco:
>
> When I try to query a nested field that exists, I didn't get any result 
> (it isn't return any error):
>
> result = es.search(
>   index="mydb", 
>   doc_type="collection", 
>   body={"query": 
> { 
>   "term" : {
>   "foo.bar.field" : value
>   }
> }
>   }
> )
>
> NOTE: field is inside bar and bar is inside foo:
>
> { 'topfield' : 23, 'foo' : { 'bar' : { 'field' : 69 }, 'otherfield' : 1}}
>
> If I try the same with a field in the top, it works properly:
>
> result = es.search(
>   index="mydb", 
>   doc_type="collection", 
>   body={"query": 
> { 
>   "term" : {
>   "topfield" : value
>   }
> }
>   }
> )
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6e045b5f-b709-49a1-9422-5c5bb8ffb8ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Query problem with nested fields using DSL with elasticsearch

2014-10-15 Thread José Ramón Palanco


When I try to query a nested field that exists, I didn't get any result (it 
isn't return any error):

result = es.search(
  index="mydb", 
  doc_type="collection", 
  body={"query": 
{ 
  "term" : {
  "foo.bar.field" : value
  }
}
  }
)

NOTE: field is inside bar and bar is inside foo:

{ 'topfield' : 23, 'foo' : { 'bar' : { 'field' : 69 }, 'otherfield' : 1}}

If I try the same with a field in the top, it works properly:

result = es.search(
  index="mydb", 
  doc_type="collection", 
  body={"query": 
{ 
  "term" : {
  "topfield" : value
  }
}
  }
)


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/93d419b4-f409-4c57-a2e0-fe528efd3569%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


School project - Analysis of Intranet appliaction traffic

2014-10-15 Thread Vojtěch Bašta
Hey!

Every GET/POST request to the university intranet is tracked and saved to a 
table that contains following information:
- UserId
- Date
- IPAddress
- TargetUrl
- BrowserInfo (headers)
- ResponseTime (in miliseconds)

This is database is running on another server and I need to replicate the 
data on another server.

I would like to persist the data and then perform some statistical analysis 
and display alerts when something seems wrong. For example:
- User is usually connecting from IP in CZ but now he logged from China
- There is 50% more requests from this user compared to an average user.

1) Is this something that I should be able to achieve with logstash / 
elasticsearch?
2) What approach would you suggest to get data from external oracle 
database to logstash?
3) Does Elastic search support such queries or does it expose some API so 
it's possible to build alerting engine on top of it?

Thanks a lot in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0ffd37ab-3a72-4bbc-8c25-fd65e0e59384%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: stats, extended stats, percentiles for doc_count in aggregations

2014-10-15 Thread Raz Lachyani
Hi Loren,

Maybe it is possible to use sum. I think you can try to find field or 
create one that when U use sum so its weight is equivalent to the 
doc_count. 
I haven't tried it yet, its just a thought.

On Friday, May 2, 2014 10:37:08 AM UTC+3, Loren wrote:
>
> Thanks for the quick reply, Adrien. Yep, I can certainly pull all the 
> doc_counts down to the client and calculate it all there. I was just hoping 
> to avoid that.
>
> On Friday, May 2, 2014 12:21:51 AM UTC-7, Adrien Grand wrote:
>>
>> Hi,
>>
>> There is currently no way to do that but I think this could be done on 
>> client side?
>>
>>
>> On Fri, May 2, 2014 at 8:56 AM, Loren  wrote:
>>
>>> Is it possible to get stats, extended stats, or percentiles across the 
>>> doc_counts in each bucket of an aggregation? I see how to use it on an 
>>> existing numeric field value (e.g., height, grade), but I want to see the 
>>> average bucket size, stddev, or other stats on how one doc_count compares 
>>> to doc_counts in the other buckets.
>>>  
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/8f847621-e92c-4bdf-915f-60bd799071ee%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> -- 
>> Adrien Grand
>>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/efb12592-0c59-466e-83c1-bd92ba43dfa3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Optimal configuration for 2 high powered nodes

2014-10-15 Thread Rémi Nonnon
Hi,

I think the 2nd proposition could be the worst. With more than 32GB the JVM 
will use uncompressed pointer. 
Take a look at this : 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html

Hope this help

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aa905691-102d-4a6c-8c2f-a2b50a0164af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: refresh thread consumes CPU resource when changing refresh_interval to -1

2014-10-15 Thread Michael McCandless
I agree, this looks like a bug.  I'll open an issue ... thank you for
reporting!

Mike McCandless

http://blog.mikemccandless.com

On Wed, Oct 15, 2014 at 2:40 AM, Shinsuke Sugaya 
wrote:

> Hi,
>
> I encountered a problem in InternalIndexShard#EngineRefresher.
> The problem is, 1 core consumed 100% CPU resource when changing
> refresh_interval to -1.
>
> In ES 1.3, I took a thread dump twice as below:
>
> $ top -n1 -b -H
> ...
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>
>  4074 elastics  20   0  9.9g 5.8g 316m R 99.6 18.5  53:19.94 java
> ...
> $ jstack -F 
> ...
> Thread 4074: (state = IN_JAVA)
>  - org.apache.lucene.index.IndexReader.tryIncRef() @bci=23, line=226
> (Compiled frame; information may be imprecise)
>  -
> org.apache.lucene.search.SearcherManager.tryIncRef(org.apache.lucene.search.IndexSearcher)
> @bci=4, line=128 (Compiled frame)
>  - org.apache.lucene.search.SearcherManager.tryIncRef(java.lang.Object)
> @bci=5, line=58 (Compiled frame)
>  - org.apache.lucene.search.ReferenceManager.acquire() @bci=21, line=100
> (Compiled frame)
>  - org.apache.lucene.search.SearcherManager.isSearcherCurrent() @bci=1,
> line=142 (Compiled frame)
>  - org.elasticsearch.index.engine.internal.InternalEngine.refreshNeeded()
> @bci=11, line=743 (Compiled frame)
>  -
> org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
> @bci=7, line=930 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> (Compiled frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
>  -
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> @bci=1, line=180 (Compiled frame)
>  -
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
> @bci=30, line=293 (Compiled frame)
>  -
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=95, line=1142 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
>  ...
> $ jstack -F 
> ...
>  Thread 4074: (state = IN_JAVA)
>  - org.elasticsearch.common.unit.TimeValue.millis() @bci=8, line=95
> (Compiled frame; information may be imprecise)
>  -
> org.elasticsearch.threadpool.ThreadPool.schedule(org.elasticsearch.common.unit.TimeValue,
> java.lang.String, java.lang.Runnable) @bci=30, line=229 (Compiled frame)
>  -
> org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
> @bci=59, line=933 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> (Compiled frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
>  -
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> @bci=1, line=180 (Compiled frame)
>  -
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
> @bci=30, line=293 (Compiled frame)
>  -
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=95, line=1142 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> ...
>
> I looked into InternalIndexShard$EngineRefresher#run() and
> InternalIndexShard$ApplyRefreshSettings#onRefreshSettings().
> If a thread is running in EngineRefresher#run() method when
> changing refresh_interval to -1, invoking
> refreshScheduledFuture.cancel(false)
> in ApplyRefreshSettings#onRefreshSettings() does not cancel
> EngineRefresher thread. Moreover, the refresh_interval is
> changed to -1 and EngineRefresher seems to be invoked
> with no interval(the thread comsumes 100% CPU).
>
> I think that a fix is to check if refreshInterval.millis() > 0.
> For example, the fix is:
>
> class EngineRefresher implements Runnable {
> @Override
> public void run() {
> // we check before if a refresh is needed, if not, we
> reschedule, otherwise, we fork, refresh, and then reschedule
> if (!engine().refreshNeeded()) {
> synchronized (mutex) {
> if (state != IndexShardState.CLOSED &&
> refreshInterval.millis() > 0) { // <== HERE
> refreshScheduledFuture =
> threadPool.schedule(refreshInterval, ThreadPool.Names.SAME, this);
> }
> }
> return;
> }
>
> Could you check this problem?
>
> Thanks,
>  shinsuke
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...

Re: NotFilter dude

2014-10-15 Thread Itamar Syn-Hershko
See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-not-filter.html

You should probably switch to a bool and a should clause before instead of
an and filter

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 

On Tue, Oct 14, 2014 at 9:26 PM, Waldemar Neto  wrote:

> Hello all!
> ia have a criteria with *AND* , *OR* and *NOT* operator, but the *NOT* is
> a single Filter, what is the best way to set multiples *NOT's*?
>
> see my query with *AND*
>
> my *AND* i need *NOT* :D
>
> {
> "highlight": {
> "fields": {
> "*": {
> "fragment_size": 150,
> "number_of_fragments": 1,
> "pre_tags": [
> ""
> ],
> "post_tags": [
> ""
> ]
> }
> }
> },
> "facets": {
> "documents": {
> "terms": {
> "field": "primary_field",
> "size": 5
> }
> }
> },
> "fields": [
> "Document.id",
> "Document.name",
> "Document.updated",
> "DocumentTag.name",
> "Document.approval_number",
> "Document.approval_number_us",
> "Document.approval_number_jp",
> "Version.status",
> "Document.rate",
> "Document.last_status"
> ],
> "sort": [
> "_type"
> ],
> "size": 10,
> "query": {
> "filtered": {
> "query": {
> "match_all": {}
> },
> "filter": {
> "and": {
> "filters": [
> {
> "terms": {
> "Product.id": [
> "6"
> ]
> }
> },
> {
> "terms": {
> "Version.jp": [
> true
> ]
> }
> },
> {
> "terms": {
> "Version.jp": [
> true
> ]
> }
> },
> {
> "terms": {
> "Document.last_status": [
> "4"
> ]
> }
> }
> ]
> }
> }
> }
> }
> }
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d9d34e14-7d3c-41a5-b3ad-a33ccbd79d45%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZvhhpLcLojKEk0%2Bw_sx%3DFeMkOnvSbiK-aWYeH7ByD1WWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hot backup strategy for Elasticsearch

2014-10-15 Thread Itamar Syn-Hershko
No - you should definitely use the snapshot and restore as its the most
stable and efficient way for backups there is.

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 

On Wed, Oct 15, 2014 at 1:12 AM, skm  wrote:

> Hello List,
>
> Going through the current documentation I found that snapshot/restore
> mechanism is one type of backup strategy that we can use for ES clusters.
> Any other recommendations?
>
> Using the following
>
> 1.elasticsearch-
> "version" : {
> "number" : "1.3.4",
>
> 2. AWS-cloud-plugin
> 3. curator
>
>
> curator snapshot --repository mys3_repository --all-indices  (weekend)
> curator snapshot --repository mys3_repository --most-recent 1 (every week
> day)
>
> The above would be run as cron jobs from one of the nodes in the cluster.
>
> Let me know of recommendations for hot backup for elastic search cluster.
>
> Thanks,
> skm
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/fdb9ebae-0352-491c-bca6-dc905cd623ae%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZtpT5Q2C2sswPDJRN0KK3xNWGbxdUFoctqGd%3D%2B1q7cs1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Why elasticsearch results are not consistant?

2014-10-15 Thread Xinuo Chen
I have some documents in elasticsearch.

first I do

{
   "sort":[
  "@timestamp"
   ],
   "query":{
  "in":{
 "action_type":[
"start",
"end"
 ]
  }
   },
   "size":0
}

I get

{
   "took":58,
   "timed_out":false,
   "_shards":{
  "total":5,
  "successful":5,
  "failed":0
   },
   "hits":{
  "total":24435,
  "max_score":0.0,
  "hits":[

  ]
   }
}


--


Then I want to get latest 100 document, I do


{
   "sort":[
  "@timestamp"
   ],
   "query":{
  "in":{
 "action_type":[
"start",
"end"
 ]
  }
   },
   "from":24335,
   "size":100
}

it returns

{
   "took":25,
   "timed_out":false,
   "_shards":{
  "total":5,
  "successful":5,
  "failed":0
   },
   "hits":{
  "total":18327,
  "max_score":null,
  "hits":[

  ]
   }
}

Basically, the 2nd query indicates the `total` has been changed.

If I do the first query again, then still get the 1st result.

Why two queries are not returning same result?

by the way, if I use `filter`, it is the same

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/36fab4b9-ad31-459e-a94d-88e9bbca4883%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Postgres->elasticsearch realtime update

2014-10-15 Thread David Pilato
My thoughts on this.

If you can modify the application which writes to your RDBMS, then you should 
do it from there.
If not, it depends on your data I'd say.

For example, if your application delete data instead of marking them as 
deleted, you will basically have to reindex everything often.
If your database contains timestamp for your data, then you can use that to 
only index/remove changes.

If your entities are not flat, it means that you have to run multiple SQL calls 
to generate a full JSON object. In that case, you should write your own code as 
a batch or so.



> Le 15 oct. 2014 à 09:16, Jorge von Rudno  a 
> écrit :
> 
> Dear College,
> 
> I have used "river" in order to load an index in elasticsearch from 
> postgresql and now I want to keep the index synchronized with the database. 
> Can I use river for this purpose?, how? or there are another tool to do this?
> 
> At the beginning I have considered to write a trigger and a stored procedure, 
> but If "river" is designed to do this I will prefer use it.
> 
> kind regards.
> 
> Jorge von Rudno
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAFqKu%3DaF86B2nqc9aXyUA77OXjZOXQihfT1x0QbuLpNWXe8HnQ%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7F4E864F-11C4-41E1-86D4-BEA6F8D1AC36%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Postgres->elasticsearch realtime update

2014-10-15 Thread Jorge von Rudno
Dear College,

I have used "river" in order to load an index in elasticsearch from
postgresql and now I want to keep the index synchronized with the database.
Can I use river for this purpose?, how? or there are another tool to do
this?

At the beginning I have considered to write a trigger and a stored
procedure, but If "river" is designed to do this I will prefer use it.

kind regards.

Jorge von Rudno

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFqKu%3DaF86B2nqc9aXyUA77OXjZOXQihfT1x0QbuLpNWXe8HnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: nested filter query giving error

2014-10-15 Thread shekhar chauhan
Hi Adrien,
  Thanks for your reply.but Now I am facing a new problem.How to do mapping 
of making a object of nested type.

my document structure is - 

{
   "_id": "inspectionresults-07d864fd-76ec-427a-a1c6-b70e420ce3d8",
   "_rev": "12-64c44ddcfcd477dfa86cdb9d3c659748",
   "sectionResults": [
   
   {
   "SectionName": "More Details",
   "itemResults": [
   
   {
   "InspectionItem": "Launch Month",
   "ItemInfo": ""
   
   }
   ]
   }
  
   ]
}

and my query is - 
PUT realtek_release_v6_nestedsearch/_mapping
{
  "realtek_release_v6_nestedsearch":{
"properties": {
  "sectionResults":{
"type": "object",
"properties": {
  "itemResults":{
"type": "nested"
  }
}
  }
}
  }
}

error is -  "nested object under path [sectionResults] is not of nested 
type"

please send me query according to my document structure.i only wants 
itemResults object of nested type.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4be1b236-9233-4d41-a66c-a821667ed4ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.