Re: how to retrieve cluster and node stats on data node when disable http (http.enabled: false)

2014-10-22 Thread David Pilato
How do you search in your cluster?
Are you using Java Client?



 Le 22 oct. 2014 à 03:17, Terence Tung tere...@teambanjo.com a écrit :
 
 hi there,
 
 i followed the recommendation from 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html
  to create dedicated master, client and data nodes. for my master and data 
 nodes, i disabled http.enabled so they will communicate via transport 9300. 
 however, previously we were using curl localhost:9200/_cluster/stats and 
 /_node/stats to fetch monitoring stats(e.g. heap usage, num of docs, thread 
 counts, and etc). my question is how can i fetch these monitoring stats 
 anymore? i searched and read thru elasticsearch doc but couldn't find it.
 
 please help.
 
 thanks and really appreciate for any help.
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/2a031652-baab-4a34-901c-a8cd5807efd4%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6A004C59-3769-4107-A296-ED64B5110BDF%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: How to check the elasticsearch is available

2014-10-22 Thread David Pilato
If you are doing Java, you can do:

node.client().admin().cluster().prepareHealth().setWaitForYellowStatus().execute().actionGet();



 Le 22 oct. 2014 à 04:33, Weiguo Xia xia...@gmail.com a écrit :
 
 Hi,
   I am new to elasticsearch, I meet a some problem. 
 
   I write a code that need the elasticsearch is ready to use.  In my code, I 
 run the elasticsearch first. But it need some time that elasticsearch is 
 ready.
 
   How can I tell when the elasticsearch is ready?
 
   Thank you.
 WX
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/7a24d6d0-5186-4e04-a471-9a4718ef46f7%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/E992B67D-D9E5-45D4-BAF6-4EBC0DE7B03B%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Zoning vs using several clusters

2014-10-22 Thread DH
Hi, everyone,

In our cluster, we have several types of rather complex documents and are 
contemplating the possibility to separate them.

As far as I know, we have two possibilities : 

1) Using routing to create zones on our cluster, and indexing each type 
of document to its defined zone. 

   - That would make for easier maintenance; for we would only have one 
   cluster to take care of. 
   - Our web appication would only need one client as well.
   - this allows for transversal requests (IE requests on all types of 
   documents at once)

  however, 


   - However, if our cluster suffers from the dreaded split-brain (yes, we 
   still have it despite using proper configuration, due to network problems), 
   all our indices and all our indexation processes are impacted, thus, all 
   our data is at risk, and our web-app is utterly unusable.
   

2) Using a cluster per document type


   - a bit of split-brain resilience. A cluster entering split brain would 
   only inpact its own type of document, thus lowering the overall risk.
   - That would allow us to close parts of our webapp, leaving others open.

  however



   - No more transversal request (we are not using them, so, not really a 
   con, for us)
   - slightly more complex web app . (needs one client per cluster and we 
   need to make sure we are using the proper one)
   - harder ES maintenance. (would need eyes on every ones of these clusters




My understanding of ES leads me to believe that both method would be 
equally efficient requesting-wise and indexing-wise  (tough I could be 
wrong).

So, we wonder :

_is there any other benefits using routing instead over using several 
cluster?
_Am I right to think that using either method will be the same 
performance-wise?

Any insight/advice on that matter would be a tremendous help.
Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e8d771b9-8ec8-4a82-8daf-6ed1189c9e62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Children aggregation (1.4.0.Beta1) Round-Robin result

2014-10-22 Thread Vlad Vlaskin
Hi Martijn,

Would you help with another question considering this topic

I red that ES stores parent-child relations in a heap, could it be that 
this bug prevents some objects from being GC-ed, e.g. there is a memory 
leak? 
And what happens if there is no more heap but there are more parent-child 
relations incoming? 

The reason Im asking is that our cluster (8 rxlarge, etc etc) went down 
after 2 days updating paren-child relations. 
Index volume is tiny, but the number of child documents updated is huge. 

Thank you.

Vlad 


On Tuesday, October 21, 2014 4:38:55 PM UTC+2, Martijn v Groningen wrote:

 Hi Vlad,

 I opened: https://github.com/elasticsearch/elasticsearch/pull/8180

 Many thanks for reporting this issue!
 Besides this bug the parent/child model works well, so I recommend to keep 
 it. I don't know exactly when the next 1.4 release is released, but I 
 expect within a week or 2.

 Martijn 


 On 21 October 2014 16:17, Vlad Vlaskin vl...@admoment.ru javascript: 
 wrote:

 Hi Martijn,

 great news, thank you!

 Would you recommend to keep parent-child data model and wait for a 
 release?  (Do you have a feeling of the date?).

 Thank you

 Vlad



 On Tuesday, October 21, 2014 4:01:47 PM UTC+2, Martijn v Groningen wrote:

 Hi Vlad, 

 I reproduced it. The children agg doesn't take documents marked as 
 deleted into account properly.

 When documents are deleted they are initially marked as deleted before 
 they're removed from the index. This also applies to updates, because that 
 translate into an index + delete. 

 The issue you're experiencing can also happen when not using the bulk 
 api. It may just be a bit less likely to manifest.

 The fix for this bug is small. I'll open a PR soon.

 Martijn

 On 21 October 2014 15:51, Vlad Vlaskin vl...@admoment.ru wrote:

 Hi Martijn,

 Couple hours age I tried to submit a bug on ES Github issues and during 
 creating steps of reproduce realized one more thing.

 *It happens only if you update the same child document within one bulk 
 request.*

 Because I didn't manage to reproduce the arithmetic progression 
 effect with curling my localhost, but it is still reproducible from java 
 code doing bulk-update (script + upsert doc). 
 I understand that bulk-updating the same document is a pretty ugly 
 thing 
 and I was surprised when it worked normally (without exceptions about 
 version conflicts) from java client. 

 If it might be helpful: these are the steps and queries to curl your 
 localhost with parent-child.
 Unfortunately I don't know how to create a curl with bulk updates. 


  #Create index test with parent-cild mappings

  curl -XPUT localhost:9200/test -d '{mappings:{root:{
 properties:{country:{type:string}}},metric:{_
 parent:{type:root},properties:{count:{type:long}'
  
 #Index parent document:
 curl -XPUT localhost:9200/test/root/1 -d '{country:de}'

 #Index child document:
 curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d 
 '{count:1}'
  #Update child document:
 curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d 
 '{script:ctx._source.count+=ct, params:{ct:1}}'
 #Query with benchmark query, it should return 2
 curl -XGET localhost:9200/test/_search -d '{size:0,query:{match_
 all:{}},aggs:{requests:{sum:{field:count'
 #Query with child aggregation query, exepected 2
  curl -XGET localhost:9200/test/metric/_search -d 
 '{size:0,query:{match_all:{}},aggs:{child:{
 children:{type:metric},aggs:{requests:{sum:{
 field:count}}'



 Thank you

 On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen 
 wrote:

 Hi Vlad,

 What you're describing shouldn't happen. The child docs should get 
 detached. I think this is a bug.
 Let me verify and get back to you.

 Martijn

 On 21 October 2014 13:26, Vlad Vlaskin vl...@admoment.ru wrote:

 After some experiments I believe I found the cause of the discrepancy 
 problem:

 *ElasticSearch does not detach child object after it has been updated 
 from parent child aggregation and uses it in child aggregation. *

 E.g. I have my child updated 4 times with script (within batch 
 update), and it has 4 versions:
 { count: 1}, { count: 2}, { count: 3}, { count: 4}

 Query to the child document (after refresh) shows you proper version: 
 {count: 4}

 But child aggregation {sum:{field:count}} shows you 10, because:

 1 + 2 +3 +4 = 10

 It works pretty accurate (e.g. for 5 you have 15). 

 It explains the behavior here.





 On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:

 Dear ES group,
 we've been using ES in production for a while and test eagerly all 
 new-coming features such as cardinality and others.

 We try data modeling with parent-child relations (ES version 
 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
 With data model of: 
 *Parent*
 {
   key: value  
 }

 and a timeline with children, holding metrics:

 *Child* (type metrics)
 {
  day: 2014-10-20,
   count: 10
 }

 We update metric documents and 

Recurring Heap Problems

2014-10-22 Thread Vincent Bernardi
Hello ES group,
I have had recurring heap problems (java.lang.OutOfMemoryError: Java heap 
space”) on my 2-nodes ES cluster (16GB RAM/node, 8GB allocated to ES) the 
last month and I really don’t know how to tackle them.
It started at a time where I was doing aggregations on a “milliseconds 
since EPOCH” field, and I was given to understand that it was probably the 
cause of my problems since it created a very large number of buckets before 
aggregating them. So I stopped doing aggregations on this field (I did not 
delete it though).
Recently I was told that my index had too few shards respective to its size 
(2 primary shards, 1 replica each, 100-150 Mdocs). So I decided to try 
reindexing into a new index with more shards (I am using es-reindex.rb, 
which itself uses the bulk API). But now I am having OutOfMemoryError happen 
during reindexing. Needless to say, once an OutOfMemoryError happens, my 
cluster seems to never recover until I reboot each node.
It should be noted that I use ES almost exclusively with search_type=count, 
since I am only trying to do analytics on website data.
I am not sure how to proceed from this point, I don’t know the right tool 
to pinpoint my memory problems and there doesn’t seem to be a way to ask ES 
for heap usage by index/query/task type
I’d be very grateful for any advice you can offer.
Thanks in advance,
Vincent Bernardi

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4687cf5b-34c8-4f5d-88be-f134367a888b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Scoring of queries on nested documents

2014-10-22 Thread barry
After some investigation, the number of nested docs get counted 
individually along with the root doc. 

On Tuesday, October 21, 2014 4:55:56 PM UTC+1, ba...@intalex.com wrote:

 Thanks for the help Mark. 
 When calculating relevance can I assume that TF is the number of times 
 that the term appears in the collapsed nested field? I.e. all of the city 
 names get merged into one field, or is it handled a different way? Is the 
 Field Length Norm calculated in the same way?

 Barry

 On Tuesday, October 21, 2014 3:48:15 PM UTC+1, Mark Harwood wrote:

 The score_mode setting determines how the scores of the various child 
 docs are attributed to the parent doc which is the final scored element.
 See 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html#query-dsl-nested-query

 You can for example choose to take the average, max or sum of all the 
 child documents that match your nested query and reward the parent doc with 
 that value



 On Tuesday, October 21, 2014 9:56:51 AM UTC+1, ba...@intalex.com wrote:

 Hello,
 I am having a problem understanding how scoring of nested documents 
 works. I have found other people with similar questions which have remained 
 unanswered:


 http://stackoverflow.com/questions/25619632/elasticsearch-how-is-the-score-for-nested-queries-computed


 http://stackoverflow.com/questions/26263562/elasticsearch-boost-score-with-nested-query

 The relevant section of my current mapping (with nested parts) is:
 mappings: {

 person: {
 properties: {
 city: {
 type: nested
 properties: {
 visityear: {
 type: integer
 }
 name: {
 type: string
 }
 }
 }
 }
 }

 }

 If I have three people who have visited different numbers of cities and 
 I search for a common city they have all visited I get different score 
 values. The person who visited the greatest number of cities is ranked 
 first, with the person who visited only one city getting a score of 1 
 (currently ranked lowest). The output of the explanation is that hthe score 
 is based on 'child doc range from 0 to x'. My question is how do TF, IDF 
 and Field Norm work for nested documents when the score is being 
 calculated? 

 Many thanks,
 Barry



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8809201-3806-4a49-9b87-7eb0c2e02dc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: copy index

2014-10-22 Thread eunever32
Jorg,

Thanks for the quick turnaround on putting in the fix.

What I found when I tested is that it works for test, testcopy

But when I try with myindex, myindexcopy doesn't work

I noticed in the logs when I was trying myindex that it was looking for 
an index test which was a bit odd

So I copied my myindex to an index named literally  test and only then 
it worked
So the only index that can be copied is test 
The target index can be anything.

Logs:

[2014-10-22 12:05:07,649][INFO ][KnapsackPushAction   ] start of push: 
{mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway}
[2014-10-22 12:05:07,649][INFO ][KnapsackService  ] update cluster 
settings: plugin.knapsack.export.state - 
[{mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway}]
[2014-10-22 12:05:07,650][INFO ][KnapsackPushAction   ] 
map={myindex=myindexcopy}
[2014-10-22 12:05:07,650][INFO ][KnapsackPushAction   ] getting 
settings for indices [test, myindex]
[2014-10-22 12:05:07,651][INFO ][KnapsackPushAction   ] found indices: 
[test, myindex]
[2014-10-22 12:05:07,652][INFO ][KnapsackPushAction   ] getting 
mappings for index test and types []
[2014-10-22 12:05:07,652][INFO ][KnapsackPushAction   ] found mappings: 
[test]
[2014-10-22 12:05:07,653][INFO ][KnapsackPushAction   ] adding mapping: 
test
[2014-10-22 12:05:07,653][INFO ][KnapsackPushAction   ] creating index: 
test
[2014-10-22 12:05:07,672][INFO ][KnapsackPushAction   ] count=2 
status=OK

I guess you can put in a quick fix?

I would have to ask if anyone is using this?

And what are most people doing? Are there any plans by ES to create a 
product or does the snapshot feature suffice for most people?

Again I just would repeat my requirements: I want to change the mapping 
types for an existing index. Therefore I create my new index and copy the 
old index data into the new.

Thanks in advance.

On Monday, October 20, 2014 8:42:48 PM UTC+1, Jörg Prante wrote:

 I admit there is something overcautious in the knapsack release to prevent 
 overwriting existing data. I will add a fix that will allow writing into an 
 empty index.

 https://github.com/jprante/elasticsearch-knapsack/issues/57

 Jörg

 On Mon, Oct 20, 2014 at 6:47 PM, eune...@gmail.com javascript: wrote:

 By the way
 Es version 1.3.4
 Knapsack version built with 1.3.4


 Regards.

 --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/e69c6778-cbc5-4e56-bf71-9bac56b66942%40googlegroups.com
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2ff794cd-c1bf-463f-81f1-ce9a20da3b6e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: copy index

2014-10-22 Thread joergpra...@gmail.com
Yes, I can put up a fix - looks weird.

Most users have either a constant mapping that can extend dynamically, or
does not change on existing field.

If fields have to change for future documents, you can also change mapping
by using alias technique:

- old index with old fields (no change)

- new index created with changed fields

- assigning an index alias to both indices

- search on index alias

No copy required.

Jörg




On Wed, Oct 22, 2014 at 1:27 PM, euneve...@gmail.com wrote:

 Jorg,

 Thanks for the quick turnaround on putting in the fix.

 What I found when I tested is that it works for test, testcopy

 But when I try with myindex, myindexcopy doesn't work

 I noticed in the logs when I was trying myindex that it was looking for
 an index test which was a bit odd

 So I copied my myindex to an index named literally  test and only then
 it worked
 So the only index that can be copied is test
 The target index can be anything.

 Logs:

 [2014-10-22 12:05:07,649][INFO ][KnapsackPushAction   ] start of push:
 {mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway}
 [2014-10-22 12:05:07,649][INFO ][KnapsackService  ] update cluster
 settings: plugin.knapsack.export.state -
 [{mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway}]
 [2014-10-22 12:05:07,650][INFO ][KnapsackPushAction   ]
 map={myindex=myindexcopy}
 [2014-10-22 12:05:07,650][INFO ][KnapsackPushAction   ] getting
 settings for indices [test, myindex]
 [2014-10-22 12:05:07,651][INFO ][KnapsackPushAction   ] found indices:
 [test, myindex]
 [2014-10-22 12:05:07,652][INFO ][KnapsackPushAction   ] getting
 mappings for index test and types []
 [2014-10-22 12:05:07,652][INFO ][KnapsackPushAction   ] found
 mappings: [test]
 [2014-10-22 12:05:07,653][INFO ][KnapsackPushAction   ] adding
 mapping: test
 [2014-10-22 12:05:07,653][INFO ][KnapsackPushAction   ] creating
 index: test
 [2014-10-22 12:05:07,672][INFO ][KnapsackPushAction   ] count=2
 status=OK

 I guess you can put in a quick fix?

 I would have to ask if anyone is using this?

 And what are most people doing? Are there any plans by ES to create a
 product or does the snapshot feature suffice for most people?

 Again I just would repeat my requirements: I want to change the mapping
 types for an existing index. Therefore I create my new index and copy the
 old index data into the new.

 Thanks in advance.

 On Monday, October 20, 2014 8:42:48 PM UTC+1, Jörg Prante wrote:

 I admit there is something overcautious in the knapsack release to
 prevent overwriting existing data. I will add a fix that will allow writing
 into an empty index.

 https://github.com/jprante/elasticsearch-knapsack/issues/57

 Jörg

 On Mon, Oct 20, 2014 at 6:47 PM, eune...@gmail.com wrote:

 By the way
 Es version 1.3.4
 Knapsack version built with 1.3.4


 Regards.

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/e69c6778-cbc5-4e56-bf71-9bac56b66942%
 40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/2ff794cd-c1bf-463f-81f1-ce9a20da3b6e%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2ff794cd-c1bf-463f-81f1-ce9a20da3b6e%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFR_0-%3DOt%3DsY4Y4tt%3D0quh8-%3D7zEBVjAHAKZGppkAuRFA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: copy index

2014-10-22 Thread joergpra...@gmail.com
I think you have to set up such a curl command like this

curl -XPOST
'localhost:9200/yourindex/_push?map=\{yourindex:yournewindex\}'

to push the index yourindex to another one. Note the endpoint.

How does your curl look like?

Jörg

On Wed, Oct 22, 2014 at 1:27 PM, euneve...@gmail.com wrote:

 Jorg,

 Thanks for the quick turnaround on putting in the fix.

 What I found when I tested is that it works for test, testcopy

 But when I try with myindex, myindexcopy doesn't work

 I noticed in the logs when I was trying myindex that it was looking for
 an index test which was a bit odd

 So I copied my myindex to an index named literally  test and only then
 it worked
 So the only index that can be copied is test
 The target index can be anything.

 Logs:

 [2014-10-22 12:05:07,649][INFO ][KnapsackPushAction   ] start of push:
 {mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway}
 [2014-10-22 12:05:07,649][INFO ][KnapsackService  ] update cluster
 settings: plugin.knapsack.export.state -
 [{mode:push,started:2014-10-22T11:05:07.648Z,node_name:Pathway}]
 [2014-10-22 12:05:07,650][INFO ][KnapsackPushAction   ]
 map={myindex=myindexcopy}
 [2014-10-22 12:05:07,650][INFO ][KnapsackPushAction   ] getting
 settings for indices [test, myindex]
 [2014-10-22 12:05:07,651][INFO ][KnapsackPushAction   ] found indices:
 [test, myindex]
 [2014-10-22 12:05:07,652][INFO ][KnapsackPushAction   ] getting
 mappings for index test and types []
 [2014-10-22 12:05:07,652][INFO ][KnapsackPushAction   ] found
 mappings: [test]
 [2014-10-22 12:05:07,653][INFO ][KnapsackPushAction   ] adding
 mapping: test
 [2014-10-22 12:05:07,653][INFO ][KnapsackPushAction   ] creating
 index: test
 [2014-10-22 12:05:07,672][INFO ][KnapsackPushAction   ] count=2
 status=OK

 I guess you can put in a quick fix?

 I would have to ask if anyone is using this?

 And what are most people doing? Are there any plans by ES to create a
 product or does the snapshot feature suffice for most people?

 Again I just would repeat my requirements: I want to change the mapping
 types for an existing index. Therefore I create my new index and copy the
 old index data into the new.

 Thanks in advance.

 On Monday, October 20, 2014 8:42:48 PM UTC+1, Jörg Prante wrote:

 I admit there is something overcautious in the knapsack release to
 prevent overwriting existing data. I will add a fix that will allow writing
 into an empty index.

 https://github.com/jprante/elasticsearch-knapsack/issues/57

 Jörg

 On Mon, Oct 20, 2014 at 6:47 PM, eune...@gmail.com wrote:

 By the way
 Es version 1.3.4
 Knapsack version built with 1.3.4


 Regards.

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/e69c6778-cbc5-4e56-bf71-9bac56b66942%
 40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/2ff794cd-c1bf-463f-81f1-ce9a20da3b6e%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2ff794cd-c1bf-463f-81f1-ce9a20da3b6e%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH5G4xZxCTHVK-jTjKidMUKOpyNpjwvx-PzQ5xcK2SVZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: copy index

2014-10-22 Thread eunever32
Hey Jorg,

Correct. Whew!

If I run just curl -XPOST 
'localhost:9200/_push?map=\{myindex:myindexcopy\}'

it works fine.

By the way : is there any way to make this work in sense eg
POST /_push?map=\{myindex:myindexcopy\}
POST /_push
{
  map: {
myindex:myindexcopy
  }
}

The second one will submit in sense but results in empty map={}

And is there any plan to put a gui around it?

Aside: I still see these errors in the ES logs

[2014-10-22 13:46:25,736][INFO ][client.transport ] [Astronomer] 
failed to get local cluster state for [#transport#-2][HDQWK037][inet[/10.193
org.elasticsearch.transport.RemoteTransportException: [Abigail 
Brand][inet[/10.193.5.155:9301]][cluster/state]
Caused by: org.elasticsearch.transport.RemoteTransportException: [Abigail 
Brand][inet[/10.193.5.155:9301]][cluster/state]
Caused by: java.lang.IndexOutOfBoundsException: Readable byte limit 
exceeded: 48
at 
org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.readByte(AbstractChannelBuffer.java:236)
at 
org.elasticsearch.transport.netty.ChannelBufferStreamInput.readByte(ChannelBufferStreamInput.java:132)
at 
org.elasticsearch.common.io.stream.StreamInput.readVInt(StreamInput.java:141)
at 
org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:272)
at 
org.elasticsearch.common.io.stream.HandlesStreamInput.readString(HandlesStreamInput.java:61)
at 
org.elasticsearch.common.io.stream.StreamInput.readStringArray(StreamInput.java:362)
at 
org.elasticsearch.action.admin.cluster.state.ClusterStateRequest.readFrom(ClusterStateRequest.java:132)
at 
org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:209)
at 
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109)
at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

On Wednesday, October 22, 2014 1:27:59 PM UTC+1, Jörg Prante wrote:

 I think you have to set up such a curl command like this

 curl -XPOST 
 'localhost:9200/yourindex/_push?map=\{yourindex:yournewindex\}'

 to push the index yourindex to another one. Note the endpoint. 

 How does your curl look like?

 Jörg

 On Wed, Oct 22, 2014 at 1:27 PM, 

System integrations

2014-10-22 Thread Roel van den Berg
 

Hi everyone,

I am looking into the possibility to retrieve data of development tools and 
display some correlations in the data.

To do this I first need to index the following systems:

   1.  Jira
   2.  Git
   3. Sonar
   4. Jenkins

I have found connections for Jira and GIT. These connections are in the 
form of rivers.
https://github.com/searchisko/elasticsearch-river-jira
https://github.com/obazoud/elasticsearch-river-git 

Does anyone knows existing integrations for Sonar or Jenkins.

Thanks,

Roel

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f6b74f21-edb7-48d4-887a-5ce7ef442e97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: copy index

2014-10-22 Thread joergpra...@gmail.com
I can not use the HTTP request body because this is reserved for a search
request like in the _search endpoint. So you can push a part of the index
to a new index (the search hits).

The message failed to get local cluster state for is on INFO level, so I
think it is not an error.

A GUI is a long term project in another context, good for the whole
community. I am unsure how to develop a replacement for the sense plugin.
Maybe a firefox plugin will arrive some time. I don't know.

Jörg

On Wed, Oct 22, 2014 at 3:21 PM, euneve...@gmail.com wrote:

 Hey Jorg,

 Correct. Whew!

 If I run just curl -XPOST 'localhost:9200/_push?map=\{myindex:
 myindexcopy\}'

 it works fine.

 By the way : is there any way to make this work in sense eg
 POST /_push?map=\{myindex:myindexcopy\}
 POST /_push
 {
   map: {
 myindex:myindexcopy
   }
 }

 The second one will submit in sense but results in empty map={}

 And is there any plan to put a gui around it?

 Aside: I still see these errors in the ES logs

 [2014-10-22 13:46:25,736][INFO ][client.transport ] [Astronomer]
 failed to get local cluster state for [#transport#-2][HDQWK037][inet[/10.193
 org.elasticsearch.transport.RemoteTransportException: [Abigail
 Brand][inet[/10.193.5.155:9301]][cluster/state]
 Caused by: org.elasticsearch.transport.RemoteTransportException: [Abigail
 Brand][inet[/10.193.5.155:9301]][cluster/state]

 Caused by: java.lang.IndexOutOfBoundsException: Readable byte limit
 exceeded: 48
 at
 org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.readByte(AbstractChannelBuffer.java:236)
 at
 org.elasticsearch.transport.netty.ChannelBufferStreamInput.readByte(ChannelBufferStreamInput.java:132)
 at
 org.elasticsearch.common.io.stream.StreamInput.readVInt(StreamInput.java:141)
 at
 org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:272)
 at
 org.elasticsearch.common.io.stream.HandlesStreamInput.readString(HandlesStreamInput.java:61)
 at
 org.elasticsearch.common.io.stream.StreamInput.readStringArray(StreamInput.java:362)
 at
 org.elasticsearch.action.admin.cluster.state.ClusterStateRequest.readFrom(ClusterStateRequest.java:132)
 at
 org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:209)
 at
 org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109)
 at
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at
 org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
 at
 org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
 at
 org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
 at
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at
 org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
 at
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
 at
 org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
 at
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
 at
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
 at
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
 at
 org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at
 org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 at
 

how to use es analyzer for compound words?

2014-10-22 Thread sebninse
here is an example for an index with some documents containing dutch 
compound words.
https://gist.github.com/herrvonb/0a247aa7dfd0d155b418


plaatstaal is a dutch compound word. So after the custom analyzer dutch 
has been assigned to the field test I expected a search for plaat would 
return at least one hit.
That's what i get:

search for plaat
{
  took : 2,
  timed_out : false,
  _shards : {
total : 1,
successful : 1,
failed : 0
  },
  hits : {
total : 0,
max_score : null,
hits : [ ]
  }
}

Any idea why this is not working?
Thanks for your help

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5f61d2c7-0385-4a9c-adb0-600c2f442e58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Shard Configuration via Puppet Module

2014-10-22 Thread José Andrés
Hi,

Does the Puppet module allows configuration of the shards for an index? I 
have Logstash sending data to Elasticsearch and the default of 5 shards is 
set; can I change this via Puppet?

Or, can I set the shards and replicas in the Logstash conf file?

Thank you,
-Jose Andres

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e62dbda8-d825-4e21-b9ea-5277cac25ca9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reading Epoch As @timestamp

2014-10-22 Thread ES USER
Antonio's example works.  My problem was a syntax issue as the Logstash 
docs do not really have examples.  I was not able to figure out the 
formatting.



On Wednesday, October 22, 2014 1:08:33 AM UTC-4, vineeth mohan wrote:

 Hello Antonio , 

 I am aware of this. 
 The example you have quoted should actually work.
 Why do you feel that its not working.

 Thanks
   Vineeth

 On Tue, Oct 21, 2014 at 7:38 PM, Antonio Augusto Santos mkh...@gmail.com 
 javascript: wrote:

 If you are using logstash to push your events do ES you need something 
 like this:

  date {
 match = [ field_with_the_epoch, UNIX ]
}


 Read more about it here:  http://logstash.net/docs/1.4.2/filters/date

 On Tuesday, October 21, 2014 8:43:08 AM UTC-3, ES USER wrote:

 For the life of me my Google searching has not revealed any solution to 
 this at least none that work for me.  I have log data with an Epoch 
 timestamp in it and would like to use the date filter in Logstash to 
 overwrite @timestamp with the appropriate converted timestamp derived from 
 that epoch.  Any insight on this would be much appreciated.

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ee39dee4-b113-4fcf-80d6-4d4e7063afc9%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ee39dee4-b113-4fcf-80d6-4d4e7063afc9%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cfd1699c-f41d-4501-a931-8887a9bbb585%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Is it possible to access a variable inside the include area?

2014-10-22 Thread Ramy
Hi everyone,

How can i use a variable inside include area? is it possible? how?
I'm using groovy script in my query and it looks like:

GET /my_index/my_type/_search
{
  _source: false,
  query: {
bool: {
  ...
}
  },
  aggs: {
group: {
  nested: {
path: my_path
  },
  aggs: {
path_id: {
  terms: {
field: my_path.id,
lang: groovy,
script: def *myVar* = (_value.split('.').findIndexOf { 
it == '3238175' }+1); *myVar* + '/' + _value; ,
include: *myVar*/.*,
size: 0
  }
}
  }
}
  }
}

The variable myVar is working inside script. Is it possible to use it 
inside include?
Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/972dc840-28eb-44c7-be9b-4bb868e660d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to use es analyzer for compound words?

2014-10-22 Thread joergpra...@gmail.com
For decompounding, you need a more sophisticated algorithm, like in my
plugin

https://github.com/jprante/elasticsearch-analysis-decompound

which provides decompounding for german words.

Jörg

On Wed, Oct 22, 2014 at 4:09 PM, sebninse sebni...@gmail.com wrote:

 here is an example for an index with some documents containing dutch
 compound words.
 https://gist.github.com/herrvonb/0a247aa7dfd0d155b418


 plaatstaal is a dutch compound word. So after the custom analyzer
 dutch has been assigned to the field test I expected a search for plaat
 would return at least one hit.
 That's what i get:

 search for plaat
 {
   took : 2,
   timed_out : false,
   _shards : {
 total : 1,
 successful : 1,
 failed : 0
   },
   hits : {
 total : 0,
 max_score : null,
 hits : [ ]
   }
 }

 Any idea why this is not working?
 Thanks for your help

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5f61d2c7-0385-4a9c-adb0-600c2f442e58%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/5f61d2c7-0385-4a9c-adb0-600c2f442e58%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGVKKrxKoZ6JKaACj-dHVuXF9m-3H8hJkAwD0ZaP9dubA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[hadoop][pig] Using ES UDF to connect over HTTPS through Apache to ES

2014-10-22 Thread Aidan Higgins
Hi,
I am trying to configure a system to use both Basic Authentication and 
HTTPS to Store data to ElasticSearch.

My system is configured with a Pig script running through Hadoop to connect 
to Apache (configured as a proxy) to forward the request to ElasticSearch. 
Using simple HTTP and Basic Authentication works correctly. However, when I 
try to force my ES UDF to use HTTPS, I get errors in my Apache logs and my 
job fails.

The relevant snippet of my Pig script is below:
*REGISTER 
/bigdata/cloudera/ES_HadoopJar/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-2.0.2.jar*

*DEFINE EsStorage org.elasticsearch.hadoop.pig.EsStorage(*
*'es.nodes=https://127.0.0.1:28443',*
*'es.net.proxy.http.host=https://127.0.0.1',*
*'es.net.proxy.http.port=28443',*
*'es.net.proxy.http.user=myuser',*
*'es.net.proxy.http.pass=mypass',*
*'es.http.retries=10');*


*data = LOAD... ...*

*STORE data INTO 'my_data_index/data' USING EsStorage;*


The error output to the Apache log is as follows:
*SSL Library Error: error:1407609B:SSL 
routines:SSL23_GET_CLIENT_HELLO:https proxy request -- speaking HTTP to 
HTTPS port!?*

The error/stacktrace from my Map job is as follows:
*Error: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: 
Connection error (check network and/or proxy settings)- all nodes failed;*
*tried [[https://127.0.0.1:28443]] at*
*org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:123) 
at *
*org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:300) at *
*org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:284) at *
*org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:288) at *
*org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:117) at *
*org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:99) 
at *
*org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:59)
 
at *
*org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:180)
 
at *
*org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:157)
 
at *
*org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:196) at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
 
at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
 
at *
*org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
 
at *
*org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
 
at *
*org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
 
at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
 
at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
 
at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
 
at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
 
at *
*org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at *
*org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at *
*org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at *
*org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at *
*java.security.AccessController.doPrivileged(Native Method) at *
*javax.security.auth.Subject.doAs(Subject.java:415) at *
*org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 
at *
*org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)*


So my question is, is this possible (i.e. can it work)? And if so, where am 
I going wrong?

Thanks in advance for any help.

Aidan

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7d19a21e-2947-4ac4-8e3e-68cc8c25185b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is it possible to access a variable inside the include area?

2014-10-22 Thread Ramy
I have solved with RegEx

GET /my_index/my_type/_search
{
 _source: false,
 query: {
   bool: {
 ...
   }
 },
 aggs: {
   group: {
 nested: {
   path: my_path
 },
 aggs: {
   path_id: {
 terms: {
   field: my_path.id,
   lang: groovy,
   script: def *myVar* = (_value.split('.').findIndexOf { it 
== '3238175' }+1); *myVar* + '/' + _value; ,
   include: *[1-4]*/.*,
   size: 0
 }
   }
 }
   }
 }
}


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d943e0e8-066c-4db0-89e0-75d8fbaf34e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Question about Elasticsearch and Spark

2014-10-22 Thread Ramdev Wudali
Hi:
   I have a very simple application that queries an ES instance and returns 
the count of documents found by the query.  I am using the Spark interface 
as I intend to 
do run ML algorithms on the result set. With that said here are the 
problems I face :

1. If I set up the Configuration(to use in the newAPIHadoopRDD) or JobCnf 
(to use with hadoopRDD), using a remote ES instance like so :


This is using the new APIHadoopRDD interface

  val sparkConf = new 
SparkConf().setMaster(local[2]).setAppname(TestESSpark)
  sparkConf.set(spark.serializer,classOf[KyroSerializer].getName)
  val sc = new SparkContext(sparkConf)

   val conf = new Configuration   // change to new JobConf for the old API
   conf.set(es.nodes,remote.server:port)
   conf.set(es.resources,index/type)
   conf.set(es.query,{\query\:{\match_all\:{}})
  val esRDD = 
sc.newAPIHadoopRDD(conf,classOf[EsInputFormat[Text,MapWritable]],classOf[Text],classOf[MapWritable])
 
 // change to hadoopRDD for the old API
  val docCount = esRDD.count
  println(docCount)


The application just hangs at the println. //((basically executing  the 
search or so I think).  


2. If I use localhost instead of remote.server:port for the es.nodes, the 
application throws an exception :
Exception in thread main 
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection 
error (check network and/or proxy settings)- all nodes failed; tried 
[[localhost:9200]] 
at 
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:123)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:303)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:287)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:291)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:118)
at 
org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:100)
at 
org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:57)
at 
org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:220)
at 
org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:406)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135)
at org.apache.spark.rdd.RDD.count(RDD.scala:904)
at 
trgr.rd.newsplus.pairgen.ElasticSparkTest1$.main(ElasticSparkTest1.scala:59)
at trgr.rd.newsplus.pairgen.ElasticSparkTest1.main(ElasticSparkTest1.scala)


I am using the 2.1.0.Beta2 version of the elasticsearch-hadoop library. 
 and running it against a local instance ES version 1.3.2/remote instance 
ES version 1.0.0

Any insight as to what I might be missing/doing wrong ?

Thanks

Ramdev

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b42a015-9f39-4a38-963f-f75e7141547a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to retrieve cluster and node stats on data node when disable http (http.enabled: false)

2014-10-22 Thread Terence Tung
the search is still using HTTP API, we have an ELB and 3 dedicated client 
nodes behind ELB, so all search request will go thru that ELB via HTTP. so 
monitoring stats on client node doesn't have problem, the problem is i 
cannot do curl localhost:9200 on the dedicated master and data nodes.


On Tuesday, October 21, 2014 11:13:37 PM UTC-7, David Pilato wrote:

 How do you search in your cluster?
 Are you using Java Client?



 Le 22 oct. 2014 à 03:17, Terence Tung ter...@teambanjo.com javascript: 
 a écrit :

 hi there,

 i followed the recommendation from 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html
  
 to create dedicated master, client and data nodes. for my master and data 
 nodes, i disabled http.enabled so they will communicate via transport 9300. 
 however, previously we were using curl localhost:9200/_cluster/stats and 
 /_node/stats to fetch monitoring stats(e.g. heap usage, num of docs, thread 
 counts, and etc). my question is how can i fetch these monitoring stats 
 anymore? i searched and read thru elasticsearch doc but couldn't find it.

 please help.

 thanks and really appreciate for any help.

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/2a031652-baab-4a34-901c-a8cd5807efd4%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/2a031652-baab-4a34-901c-a8cd5807efd4%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/514cce92-3c5f-4c23-a825-6105f89d18a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: allow_explicit_index and _bulk

2014-10-22 Thread Niccolò Becchi
This issue looks to be fixed on 
https://github.com/elasticsearch/elasticsearch/issues/4668

However, on elasticsearch-1.3.4, running the example with 
rest.action.multi.allow_explicit_index: false:
```
POST /foo/bar/_bulk
{ index: {} }
{ _id : 1, baz: foobar }
```
I am getting the exception:
```
{
   took: 1,
   errors: true,
   items: [
  {
 create: {
_index: foo,
_type: bar,
_id: oX0Xp8dzRbySZiKX8QI0zw,
status: 400,
error: MapperParsingException[failed to parse [_id]]; 
nested: MapperParsingException[Provided id [oX0Xp8dzRbySZiKX8QI0zw] does 
not match the content one [1]]; 
 }
  }
   ]
}
```
Am I doing something wrong or something has changed?

Il giorno giovedì 9 gennaio 2014 15:38:46 UTC, Gabe Gorelick-Feldman ha 
scritto:

 Opened an issue: 
 https://github.com/elasticsearch/elasticsearch/issues/4668

 On Thursday, January 9, 2014 3:39:39 AM UTC-5, Alexander Reelsen wrote:

 Hey,

 after having a very quick look, it looks like a bug (or wrong 
 documentation, need to check further). Can you create a github issue?

 Thanks!


 --Alex


 On Wed, Jan 8, 2014 at 11:08 PM, Gabe Gorelick-Feldman 
 gabego...@gmail.com wrote:

 The documentation on URL-based access control 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/url-access-control.html
  implies 
 that _bulk still works if you set rest.action.multi.allow_explicit_index: 
 false, as long as you specify the index in the URL. However, I can't 
 get it to work.

 POST /foo/bar/_bulk
 { index: {} }
 { _id: 1234, baz: foobar }

 returns 

 explicit index in bulk is not allowed

 Should this work?

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a0d1fa2f-0c28-4142-9f6d-4b28a1695bb3%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a9aff019-33c0-4743-9e14-fe3913bcda1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Limit bug or Limit misunderstanding?

2014-10-22 Thread Jeff Gandt
I have a query that I want to return only one document. Basically, I want 
to do an existence check on a document with a given term filter.

I am executing:

POST profiles/profile/_search
{
  query: {
filtered: {
  filter: {
bool: {
  must: [
{
  limit: {
value: 1
  }
},
{
  term: {
profile_id: salinger-23145
  }
}
  ]
}
  }
}
  }
}

The profiles/profile mapping has tens of millions of documents in it, two 
of which match the given terms query (when the limit is removed entirely).

When I execute the query, I get zero results back. However, If I change the 
limit value to two (2) then one (1) result is returned. If I change the 
limit value to three (3) then two (2) results are returned. It's almost 
like there is an off by one error in limit.

So am I:

1) Writing the query wrong?
I tried placing the limit outside of the must, bool, and filter clauses. 
That caused errors in each case. But I may have just done something silly.

2) Misunderstanding limit?
My understanding of limit is that it returns no more than x documents per 
shard. Given that I have five shards and at least two documents matching 
the query, I should be returning between one and five documents. However, 
looking at the limit documentation 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html
 I 
suspect that I may be misunderstanding how limit works. The wording to 
execute on leads me to believe that it may only be selecting ONE document 
against which the term filter is run. Thus, if the one document that it 
tests doesn't match, it returns zero results. However, the limit 2 
returning one document leads me to believe that my original understanding 
is correct.

3) Staring at an elasticsearch limit bug?
Unfortunately I have been unable to reproduce the error after creating test 
indexes and mappings. The limit behaves exactly as I expect in every other 
case.

4) Doing something else that is equally silly?

Any help or suggestions is appreciated. Can I provide any clarifications?

Thanks,

.jpg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/86988787-fb31-4b5a-b570-427750177ecd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Limit bug or Limit misunderstanding?

2014-10-22 Thread joergpra...@gmail.com
limit is not a limit for response size. It sets a shard limit which is
quite low level, so the resources per shard of ES are not so much under
pressure. If the sum of the limits on the shards matches the total length
of the response is not guaranteed.

The limit parameter for the response is the size parameter. Can you try

POST profiles/profile/_search
{
  size : 1,
  query: {
constant_score : {
  filter : {
term: {
   profile_id: salinger-23145
}
  }
}
  }
}

and see if this works better?

If you want to perform a true existence check of a doc, you should use the
doc _id and a head request, something like

HEAD profiles/profile/id

which is faster than a search.

Jörg


On Wed, Oct 22, 2014 at 8:58 PM, Jeff Gandt jeff.ga...@gmail.com wrote:

 I have a query that I want to return only one document. Basically, I want
 to do an existence check on a document with a given term filter.

 I am executing:

 POST profiles/profile/_search
 {
   query: {
 filtered: {
   filter: {
 bool: {
   must: [
 {
   limit: {
 value: 1
   }
 },
 {
   term: {
 profile_id: salinger-23145
   }
 }
   ]
 }
   }
 }
   }
 }

 The profiles/profile mapping has tens of millions of documents in it, two
 of which match the given terms query (when the limit is removed entirely).

 When I execute the query, I get zero results back. However, If I change
 the limit value to two (2) then one (1) result is returned. If I change the
 limit value to three (3) then two (2) results are returned. It's almost
 like there is an off by one error in limit.

 So am I:

 1) Writing the query wrong?
 I tried placing the limit outside of the must, bool, and filter clauses.
 That caused errors in each case. But I may have just done something silly.

 2) Misunderstanding limit?
 My understanding of limit is that it returns no more than x documents per
 shard. Given that I have five shards and at least two documents matching
 the query, I should be returning between one and five documents. However,
 looking at the limit documentation
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html
  I
 suspect that I may be misunderstanding how limit works. The wording to
 execute on leads me to believe that it may only be selecting ONE document
 against which the term filter is run. Thus, if the one document that it
 tests doesn't match, it returns zero results. However, the limit 2
 returning one document leads me to believe that my original understanding
 is correct.

 3) Staring at an elasticsearch limit bug?
 Unfortunately I have been unable to reproduce the error after creating
 test indexes and mappings. The limit behaves exactly as I expect in every
 other case.

 4) Doing something else that is equally silly?

 Any help or suggestions is appreciated. Can I provide any clarifications?

 Thanks,

 .jpg

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/86988787-fb31-4b5a-b570-427750177ecd%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/86988787-fb31-4b5a-b570-427750177ecd%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG1iiuwKQcysvh%2BBVtLGeEPrj89F%3DR4syTVRBt-bru9oQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question about Elasticsearch and Spark

2014-10-22 Thread Ramdev Wudali
An Update on this :

The Exception mentioned in Item 2 in my original post was due to the ES 
instance being down (and for some reason I failed to realise that).
That said, I am still having trouble with  problem Item 1. Following 
questions came up :

1. Is there a correlation between the number of shards/replication  on the 
ES instance to the number of shard-splits that are crated in the query 
request ? And
2. if the ES instance is on a single shard and has a fairly large number of 
documents, Would the performance be slower ?
3. Is there any network latency issues ? (I am able to query the instance 
using the sense/head plugins, and the response time is not bad  its 
approximately 28ms)


the reason for question 1. is because of the following : 

 6738 [main] INFO  org.elasticsearch.hadoop.mr.EsInputFormat - Created [2] 
shard-splits
6780 [main] INFO  org.apache.spark.SparkContext - Starting job: count at 
ElasticSparkTest1.scala:59
6801 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.DAGScheduler - Got job 0 (count at 
ElasticSparkTest1.scala:59) with 2 output partitions (allowLocal=false)
6802 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.DAGScheduler - Final stage: Stage 0(count at 
ElasticSparkTest1.scala:59)
6802 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.DAGScheduler - Parents of final stage: List()
6808 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.DAGScheduler - Missing parents: List()
6818 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.DAGScheduler - Submitting Stage 0 
(NewHadoopRDD[0] at newAPIHadoopRDD at ElasticSparkTest1.scala:57), which 
has no missing parents
6853 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.storage.MemoryStore - ensureFreeSpace(1568) called with 
curMem=34372, maxMem=503344005
6854 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.storage.MemoryStore - Block broadcast_1 stored as values 
in memory (estimated size 1568.0 B, free 480.0 MB)
6870 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.DAGScheduler - Submitting 2 missing tasks from 
Stage 0 (NewHadoopRDD[0] at newAPIHadoopRDD at ElasticSparkTest1.scala:57)
6872 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.TaskSchedulerImpl - Adding task set 0.0 with 2 
tasks
6912 [sparkDriver-akka.actor.default-dispatcher-2] INFO 
 org.apache.spark.scheduler.TaskSetManager - Starting task 0.0 in stage 0.0 
(TID 0, localhost, ANY, 18521 bytes)
6917 [sparkDriver-akka.actor.default-dispatcher-2] INFO 
 org.apache.spark.scheduler.TaskSetManager - Starting task 1.0 in stage 0.0 
(TID 1, localhost, ANY, 18521 bytes)
6923 [Executor task launch worker-0] INFO 
 org.apache.spark.executor.Executor - Running task 0.0 in stage 0.0 (TID 0)
6923 [Executor task launch worker-1] INFO 
 org.apache.spark.executor.Executor - Running task 1.0 in stage 0.0 (TID 1)
6958 [Executor task launch worker-0] INFO 
 org.apache.spark.rdd.NewHadoopRDD - Input split: ShardInputSplit 
[node=[ZIbTPE4FSxigrYkomftWQw/Strobe|192.189.224.80:9600],shard=1]
6958 [Executor task launch worker-1] INFO 
 org.apache.spark.rdd.NewHadoopRDD - Input split: ShardInputSplit 
[node=[ZIbTPE4FSxigrYkomftWQw/Strobe|192.189.224.80:9600],shard=0]
6998 [Executor task launch worker-0] WARN 
 org.elasticsearch.hadoop.mr.EsInputFormat - Cannot determine task id...
6998 [Executor task launch worker-1] WARN 
 org.elasticsearch.hadoop.mr.EsInputFormat - Cannot determine task id...

I noticed only two shard-splits being created. 

On the other hand when I run the application on localhost with default 
settings, this is what I get  :
4960 [main] INFO  org.elasticsearch.hadoop.mr.EsInputFormat - Created [5] 
shard-splits
5002 [main] INFO  org.apache.spark.SparkContext - Starting job: count at 
ElasticSparkTest1.scala:59
5022 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.DAGScheduler - Got job 0 (count at 
ElasticSparkTest1.scala:59) with 5 output partitions (allowLocal=false)
5023 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.DAGScheduler - Final stage: Stage 0(count at 
ElasticSparkTest1.scala:59)
5023 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.DAGScheduler - Parents of final stage: List()
5030 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.DAGScheduler - Missing parents: List()
5040 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.scheduler.DAGScheduler - Submitting Stage 0 
(NewHadoopRDD[0] at newAPIHadoopRDD at ElasticSparkTest1.scala:57), which 
has no missing parents
5075 [sparkDriver-akka.actor.default-dispatcher-5] INFO 
 org.apache.spark.storage.MemoryStore - ensureFreeSpace(1568) called with 
curMem=34340, maxMem=511377408
5076 

CorruptIndexException when trying to replicate one shard of a new index

2014-10-22 Thread Nate Folkert
Created and populated a new index on a 1.3.1 cluster.  Primary shards work 
fine.  Updated the index to create several replicas, and three of the four 
shards replicated, but one shard fails to replicate on any node with the 
following error (abbreviated some of the hashes for readability):

[2014-10-22 20:31:54,549][WARN ][index.engine.internal] [NODENAME] 
 [INDEXNAME][2] failed engine [corrupted preexisting index]

 [2014-10-22 20:31:54,549][WARN ][indices.cluster  ] [NODENAME] 
 [INDEXNAME][2] failed to start shard

 org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2] Corrupted 
 index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch: 
 actual footer=1161826848 vs expected footer=-1071082520 (resource: 
 MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))]

 at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)

 at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)

 at 
 org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723)

 at 
 org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)

 at 
 org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)

 at 
 org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)

 at 
 org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)

 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:745)

 [2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME] 
 [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R], 
 s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard, 
 message [CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] 
 caused by: CorruptIndexException[codec footer mismatch: actual 
 footer=1161826848 vs expected footer=-1071082520 (resource: 
 MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))

 [2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME] 
 [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R], 
 s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message 
 [corrupted preexisting index][CorruptIndexException[[INDEXNAME][2] 
 Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer 
 mismatch: actual footer=1161826848 vs expected footer=-1071082520 
 (resource: MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))


The index is stuck now in a state where the shards try to replicate on one 
set of nodes, hit this failure, and then switch to try to replicate on a 
different set of nodes.  Have been looking around to see if anyone's 
encountered a similar issue but haven't found anything useful yet.  Anybody 
know if this is recoverable or if I should just scrap it and try building a 
new one?

- Nate

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a0d47e5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Limit bug or Limit misunderstanding?

2014-10-22 Thread Jeff Gandt
I realize limit is not a limit for response size. I'm actually ok with 
getting more than one result. I'm actually not relying on limit for a size.

I often use size in conjunction with limit. I'll do this when I really 
don't care how many items I get back, as long as it is within a range. But 
I implement the limit to help decrease the load on the shards.

That said, I need to understand what expectations I can have around limit. 
Is it completely non-deterministic? Or can I have reasonable expectations 
about it?

I will propose an example and describe my expectations:

Node setup:
1 index
1 mapping
5 shards
1,000,000 documents sharded across the 5 shards
1000 matching documents sharded across the 5 shards
let's assume normal distribution of the matching documents: 200 documents 
per shard. I realize this is not realistic to get an exact distribution 
like this.

If I place a limit of 5 on the query, I expect 25 documents back. That is, 
I get 5 documents from each node. I expect this because I have at least 5 
matching documents per shard. In fact, I have many more than 5 matching 
documents per shard. But I expect the limit to return five documents from 
each shard.

Now I realize there are lots of real world circumstance that would cause 
the query to return fewer than 25 documents. Let's ignore those for the 
time being and remain under the assumption that the distribution is even.

Now, if I place a limit of 1 on the query, I expect 5 documents back.

Are these two expectations correct?

Now let's assume a worst case scenario: all of the matching documents are 
on one shard. A limit of 5 should still return 5 documents. A limit of 1 
should return 1 document.

If these expectations are true, then my original scenario is valid and a 
limit of 1 should still return 1 document.

So are these expectations valid? Or is limit completely non-deterministic?

Size does work, but if I can improve performance with a limit, I would like 
to do so. It is possible that I have tens of thousands of matching 
documents, and limit could be an excellent short-circuit. Basically I want 
the shard to stop searching as soon as it has found one document.

Also, I don't have the document _id so I cannot make the HEAD call.

Do these clarifications help?

On Wednesday, October 22, 2014 3:57:25 PM UTC-4, Jörg Prante wrote:

 limit is not a limit for response size. It sets a shard limit which is 
 quite low level, so the resources per shard of ES are not so much under 
 pressure. If the sum of the limits on the shards matches the total length 
 of the response is not guaranteed.

 The limit parameter for the response is the size parameter. Can you try

 POST profiles/profile/_search
 {
   size : 1,
   query: {
 constant_score : {
   filter : {
 term: {
profile_id: salinger-23145
 }
   }
 }
   }
 }

 and see if this works better?

 If you want to perform a true existence check of a doc, you should use the 
 doc _id and a head request, something like

 HEAD profiles/profile/id

 which is faster than a search.

 Jörg


 On Wed, Oct 22, 2014 at 8:58 PM, Jeff Gandt jeff@gmail.com 
 javascript: wrote:

 I have a query that I want to return only one document. Basically, I want 
 to do an existence check on a document with a given term filter.

 I am executing:

 POST profiles/profile/_search
 {
   query: {
 filtered: {
   filter: {
 bool: {
   must: [
 {
   limit: {
 value: 1
   }
 },
 {
   term: {
 profile_id: salinger-23145
   }
 }
   ]
 }
   }
 }
   }
 }

 The profiles/profile mapping has tens of millions of documents in it, two 
 of which match the given terms query (when the limit is removed entirely).

 When I execute the query, I get zero results back. However, If I change 
 the limit value to two (2) then one (1) result is returned. If I change the 
 limit value to three (3) then two (2) results are returned. It's almost 
 like there is an off by one error in limit.

 So am I:

 1) Writing the query wrong?
 I tried placing the limit outside of the must, bool, and filter clauses. 
 That caused errors in each case. But I may have just done something silly.

 2) Misunderstanding limit?
 My understanding of limit is that it returns no more than x documents per 
 shard. Given that I have five shards and at least two documents matching 
 the query, I should be returning between one and five documents. However, 
 looking at the limit documentation 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html
  I 
 suspect that I may be misunderstanding how limit works. The wording to 
 execute on leads me to believe that it may only be selecting ONE document 
 against which the term filter is run. Thus, if the one document that it 
 tests doesn't match, it returns zero 

Re: Limit bug or Limit misunderstanding?

2014-10-22 Thread Jeff Gandt
I realize limit is not a limit for response size. I'm actually ok with 
getting more than one result. I'm actually not relying on limit for a size.

I often use size in conjunction with limit. I'll do this when I really 
don't care how many items I get back, as long as it is within a range. But 
I implement the limit to help decrease the load on the shards.

That said, I need to understand what expectations I can have around limit. 
Is it completely non-deterministic? Or can I have reasonable expectations 
about it?

I will propose an example and describe my expectations:

Node setup:
1 index
1 mapping
5 shards
1,000,000 documents sharded across the 5 shards
1000 matching documents sharded across the 5 shards
let's assume normal distribution of the matching documents: 200 documents 
per shard. I realize this is not realistic to get an exact distribution 
like this.

If I place a limit of 5 on the query, I expect 25 documents back. That is, 
I get 5 documents from each node. I expect this because I have at least 5 
matching documents per shard. In fact, I have many more than 5 matching 
documents per shard. But I expect the limit to return five documents from 
each shard.

Now I realize there are lots of real world circumstance that would cause 
the query to return fewer than 25 documents. Let's ignore those for the 
time being and remain under the assumption that the distribution is even.

Now, if I place a limit of 1 on the query, I expect 5 documents back.

Are these two expectations correct?

Now let's assume a worst case scenario: all of the matching documents are 
on one shard. A limit of 5 should still return 5 documents. A limit of 1 
should return 1 document.

If these expectations are true, then my original scenario is valid and a 
limit of 1 should still return 1 document.

So are these expectations valid? Or is limit completely non-deterministic?

Size does work, but if I can improve performance with a limit, I would like 
to do so. It is possible that I have tens of thousands of matching 
documents, and limit could be an excellent short-circuit. Basically I want 
the shard to stop searching as soon as it has found one document.

Also, I don't have the document _id so I cannot make the HEAD call.

Do these clarifications help?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7dd91dd3-bec2-48d5-97b6-334fe10e3cb1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [hadoop][pig] Using ES UDF to connect over HTTPS through Apache to ES

2014-10-22 Thread Costin Leau
That's because currently, es-hadoop does not support SSL (and thus HTTPS). There are plans to make this happen in 2.1 
but we are not there yet.

In the meantime I suggest trying to use either an HTTP proxy or an 
HTTP-to-HTTPS proxy.

Cheers,

On 10/22/14 7:11 PM, Aidan Higgins wrote:

Hi,
I am trying to configure a system to use both Basic Authentication and HTTPS to 
Store data to ElasticSearch.

My system is configured with a Pig script running through Hadoop to connect to 
Apache (configured as a proxy) to forward
the request to ElasticSearch. Using simple HTTP and Basic Authentication works 
correctly. However, when I try to force
my ES UDF to use HTTPS, I get errors in my Apache logs and my job fails.

The relevant snippet of my Pig script is below:
/REGISTER 
/bigdata/cloudera/ES_HadoopJar/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-2.0.2.jar/
/
/
/DEFINE EsStorage org.elasticsearch.hadoop.pig.EsStorage(/
/'es.nodes=https://127.0.0.1:28443',/
/'es.net.proxy.http.host=https://127.0.0.1',/
/'es.net.proxy.http.port=28443',/
/'es.net.proxy.http.user=myuser',/
/'es.net.proxy.http.pass=mypass',/
/'es.http.retries=10');/
/
/
/
/
/data = LOAD... .../
/
/
/STORE data INTO 'my_data_index/data' USING EsStorage;/


The error output to the Apache log is as follows:
*SSL Library Error: error:1407609B:SSL routines:SSL23_GET_CLIENT_HELLO:https 
proxy request -- speaking HTTP to HTTPS port!?*

The error/stacktrace from my Map job is as follows:
/Error: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection 
error (check network and/or proxy
settings)- all nodes failed;/
/tried [[https://127.0.0.1:28443]] at/
/org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:123) at 
/
/org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:300) at /
/org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:284) at /
/org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:288) at /
/org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:117) at /
/org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:99) at /
/org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:59)
 at /
/org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:180)
 at /
/org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:157)
 at /
/org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:196) at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at /
/org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
 at /
/org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
 at /
/org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
 at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
 at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
 at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
 at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
 at /
/org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at /
/org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at /
/org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at /
/org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at /
/java.security.AccessController.doPrivileged(Native Method) at /
/javax.security.auth.Subject.doAs(Subject.java:415) at /
/org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at /
/org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)/


So my question is, is this possible (i.e. can it work)? And if so, where am I 
going wrong?

Thanks in advance for any help.

Aidan

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7d19a21e-2947-4ac4-8e3e-68cc8c25185b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7d19a21e-2947-4ac4-8e3e-68cc8c25185b%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 

How to collect docs in order in collect() method of a custom aggregator

2014-10-22 Thread Mouzer
I am developing a custom aggregator using ES 1.3.4. It extends from 
NumericMetricsAggregator.MultiValue class. Its code structure closely 
resembles that of the Stats aggregator. For my requirements, I need the doc 
Ids to be received in ascending order in the overridden collect() method. 
For most queries, I do get the doc Ids in ascending order. Interestingly 
for bool should queries with multiple clauses, I get doc Ids in descending 
order. How can I fix this? Is this a bug?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f3fddb08-ef63-4378-8aa2-ea709612bbd0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: CorruptIndexException when trying to replicate one shard of a new index

2014-10-22 Thread Robert Muir
Can you try the workaround mentioned here:
http://www.elasticsearch.org/blog/elasticsearch-1-3-2-released/

and see if it works? If the compression issue is the problem, you can
re-enable compression, just upgrade to at least 1.3.2 which has the
fix.


On Wed, Oct 22, 2014 at 4:57 PM, Nate Folkert nfolk...@foursquare.com wrote:
 Created and populated a new index on a 1.3.1 cluster.  Primary shards work
 fine.  Updated the index to create several replicas, and three of the four
 shards replicated, but one shard fails to replicate on any node with the
 following error (abbreviated some of the hashes for readability):

 [2014-10-22 20:31:54,549][WARN ][index.engine.internal] [NODENAME]
 [INDEXNAME][2] failed engine [corrupted preexisting index]

 [2014-10-22 20:31:54,549][WARN ][indices.cluster  ] [NODENAME]
 [INDEXNAME][2] failed to start shard

 org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2] Corrupted
 index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch:
 actual footer=1161826848 vs expected footer=-1071082520 (resource:
 MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))]

 at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)

 at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)

 at
 org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723)

 at
 org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)

 at
 org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)

 at
 org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)

 at
 org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:745)

 [2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME]
 [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R],
 s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard, message
 [CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] caused by:
 CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs
 expected footer=-1071082520 (resource:
 MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))

 [2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME]
 [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R],
 s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message
 [corrupted preexisting index][CorruptIndexException[[INDEXNAME][2] Corrupted
 index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch:
 actual footer=1161826848 vs expected footer=-1071082520 (resource:
 MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))


 The index is stuck now in a state where the shards try to replicate on one
 set of nodes, hit this failure, and then switch to try to replicate on a
 different set of nodes.  Have been looking around to see if anyone's
 encountered a similar issue but haven't found anything useful yet.  Anybody
 know if this is recoverable or if I should just scrap it and try building a
 new one?

 - Nate

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a0d47e5a%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMUKNZVEaeNXW%3DH6%2Bczq2M1s7Xf5g1quabGa749M8BZYMUfe%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


stop phrase removal or remove phrases from search query

2014-10-22 Thread Srinivasan Ramaswamy
Does elasticsearch handle stop phrase removal ? I would like to remove some 
phrases (only if they appear in that order) from the search query. 
Currently i am trying to do this only on the search side. I tried it as 
follows, but it didnt work

curl -XPUT 'localhost:9200/designs_v1/_settings' -d '
{
analysis: {
filter: {
shingle_omit_unigrams: {
type: shingle,
max_shingle_size: 3,
output_unigrams: false
},
my_stop: {
type:   stop,
stopwords: [walt disney, magic kingdom, disney, 
kingdom]
}
},
analyzer: {
shingle: {
type: custom,
tokenizer: standard,
filter: [lowercase, my_stop, kstem, 
shingle_omit_unigrams]
}
}
}
}
'

Does anyone know whether this feature is supported in elasticsearch ?

Thanks
Srini

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8bc84b82-45fc-47fc-8c3b-5f809120033b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Limit bug or Limit misunderstanding?

2014-10-22 Thread joergpra...@gmail.com
I am not sure why you are after limit. It is not a size parameter and
it does not work as you expect. There is no guarantee for 5 shards and
limit = 5 that you can always obtain 25 docs.

For filters, Elasticsearch has added some Lucene extensions regarding the
iteration of doc sets. One extension is the LimitFilter. Lucene uses doc
IDs for enumerating docs in the index reader contexts and the IDs are
unordered but they are non-decreasing. There can be many segments on a
shard, each segment carries such a doc ID sequence. On a shard,
Elasticsearch iterates through the matching docs of a filter when applying
a LimitFilter, and this iteration can be short-cut by setting a limit for
this iteration. The price to pay is that parts of the matched docs in the
filter may be dropped. Most users do not want that, this is a very advanced
setting. This is not non-deterministic, it is just very low level.

Jörg


On Wed, Oct 22, 2014 at 11:07 PM, Jeff Gandt jeff.ga...@gmail.com wrote:

 I realize limit is not a limit for response size. I'm actually ok with
 getting more than one result. I'm actually not relying on limit for a size.

 I often use size in conjunction with limit. I'll do this when I really
 don't care how many items I get back, as long as it is within a range. But
 I implement the limit to help decrease the load on the shards.

 That said, I need to understand what expectations I can have around limit.
 Is it completely non-deterministic? Or can I have reasonable expectations
 about it?

 I will propose an example and describe my expectations:

 Node setup:
 1 index
 1 mapping
 5 shards
 1,000,000 documents sharded across the 5 shards
 1000 matching documents sharded across the 5 shards
 let's assume normal distribution of the matching documents: 200 documents
 per shard. I realize this is not realistic to get an exact distribution
 like this.

 If I place a limit of 5 on the query, I expect 25 documents back. That is,
 I get 5 documents from each node. I expect this because I have at least 5
 matching documents per shard. In fact, I have many more than 5 matching
 documents per shard. But I expect the limit to return five documents from
 each shard.

 Now I realize there are lots of real world circumstance that would cause
 the query to return fewer than 25 documents. Let's ignore those for the
 time being and remain under the assumption that the distribution is even.

 Now, if I place a limit of 1 on the query, I expect 5 documents back.

 Are these two expectations correct?

 Now let's assume a worst case scenario: all of the matching documents are
 on one shard. A limit of 5 should still return 5 documents. A limit of 1
 should return 1 document.

 If these expectations are true, then my original scenario is valid and a
 limit of 1 should still return 1 document.

 So are these expectations valid? Or is limit completely non-deterministic?

 Size does work, but if I can improve performance with a limit, I would
 like to do so. It is possible that I have tens of thousands of matching
 documents, and limit could be an excellent short-circuit. Basically I want
 the shard to stop searching as soon as it has found one document.

 Also, I don't have the document _id so I cannot make the HEAD call.

 Do these clarifications help?

 On Wednesday, October 22, 2014 3:57:25 PM UTC-4, Jörg Prante wrote:

 limit is not a limit for response size. It sets a shard limit which is
 quite low level, so the resources per shard of ES are not so much under
 pressure. If the sum of the limits on the shards matches the total length
 of the response is not guaranteed.

 The limit parameter for the response is the size parameter. Can you try

 POST profiles/profile/_search
 {
   size : 1,
   query: {
 constant_score : {
   filter : {
 term: {
profile_id: salinger-23145
 }
   }
 }
   }
 }

 and see if this works better?

 If you want to perform a true existence check of a doc, you should use
 the doc _id and a head request, something like

 HEAD profiles/profile/id

 which is faster than a search.

 Jörg


 On Wed, Oct 22, 2014 at 8:58 PM, Jeff Gandt jeff@gmail.com wrote:

 I have a query that I want to return only one document. Basically, I
 want to do an existence check on a document with a given term filter.

 I am executing:

 POST profiles/profile/_search
 {
   query: {
 filtered: {
   filter: {
 bool: {
   must: [
 {
   limit: {
 value: 1
   }
 },
 {
   term: {
 profile_id: salinger-23145
   }
 }
   ]
 }
   }
 }
   }
 }

 The profiles/profile mapping has tens of millions of documents in it,
 two of which match the given terms query (when the limit is removed
 entirely).

 When I execute the query, I get zero results back. However, If I change
 the limit value to two (2) then one (1) result is returned. 

Re: CorruptIndexException when trying to replicate one shard of a new index

2014-10-22 Thread Nate Folkert
After disabling compression, I was able to successfully replicate that 
shard, so looks like we're hitting that bug.  I guess we'll have to upgrade!

Thanks!
- Nate

On Wednesday, October 22, 2014 5:26:42 PM UTC-4, Robert Muir wrote:

 Can you try the workaround mentioned here: 
 http://www.elasticsearch.org/blog/elasticsearch-1-3-2-released/ 

 and see if it works? If the compression issue is the problem, you can 
 re-enable compression, just upgrade to at least 1.3.2 which has the 
 fix. 


 On Wed, Oct 22, 2014 at 4:57 PM, Nate Folkert nfol...@foursquare.com 
 javascript: wrote: 
  Created and populated a new index on a 1.3.1 cluster.  Primary shards 
 work 
  fine.  Updated the index to create several replicas, and three of the 
 four 
  shards replicated, but one shard fails to replicate on any node with the 
  following error (abbreviated some of the hashes for readability): 
  
  [2014-10-22 20:31:54,549][WARN ][index.engine.internal] [NODENAME] 
  [INDEXNAME][2] failed engine [corrupted preexisting index] 
  
  [2014-10-22 20:31:54,549][WARN ][indices.cluster  ] [NODENAME] 
  [INDEXNAME][2] failed to start shard 
  
  org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2] 
 Corrupted 
  index [CORRUPTED] caused by: CorruptIndexException[codec footer 
 mismatch: 
  actual footer=1161826848 vs expected footer=-1071082520 (resource: 
  MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))] 
  
  at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343) 
  
  at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328) 
  
  at 
  
 org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723)
  

  
  at 
  
 org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)
  

  
  at 
  
 org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)
  

  
  at 
  
 org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
  

  
  at 
  
 org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
  

  
  at 
  
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  

  
  at 
  
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  

  
  at java.lang.Thread.run(Thread.java:745) 
  
  [2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME] 
  [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], 
 [R], 
  s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard, 
 message 
  [CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] 
 caused by: 
  CorruptIndexException[codec footer mismatch: actual footer=1161826848 
 vs 
  expected footer=-1071082520 (resource: 
  MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt)) 
  
  [2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME] 
  [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], 
 [R], 
  s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message 
  [corrupted preexisting index][CorruptIndexException[[INDEXNAME][2] 
 Corrupted 
  index [CORRUPTED] caused by: CorruptIndexException[codec footer 
 mismatch: 
  actual footer=1161826848 vs expected footer=-1071082520 (resource: 
  MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt)) 
  
  
  The index is stuck now in a state where the shards try to replicate on 
 one 
  set of nodes, hit this failure, and then switch to try to replicate on a 
  different set of nodes.  Have been looking around to see if anyone's 
  encountered a similar issue but haven't found anything useful yet. 
  Anybody 
  know if this is recoverable or if I should just scrap it and try 
 building a 
  new one? 
  
  - Nate 
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups 
  elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an 
  email to elasticsearc...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a0d47e5a%40googlegroups.com.
  

  For more options, visit https://groups.google.com/d/optout. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/210c5bf5-c71a-4d5a-891d-3485a86dc0b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Wildcards in exact phrase in query_string search

2014-10-22 Thread Eric
Dara,

Realizing that this is an old post, but I am having this same issue.  

Was there a suggested solution that got you through

Eric





--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Wildcards-in-exact-phrase-in-query-string-search-tp4020826p4065258.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1414017635319-4065258.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Wildcard in an exact phrase query_string search with escaped quotes

2014-10-22 Thread Eric Sloan
Updating a post from 2012.

I have a requirement to allow a wildcard within an exact phrase 
query_string.  

POST _search
{
query: {
 query_string: {
   query: \coors brew*\,
   analyze_wildcard: true
 }
}
}


I get the following zero results set. 

{
   took: 94,
   timed_out: false,
   _shards: {
  total: 5,
  successful: 5,
  failed: 0
   },
   hits: {
  total: 0,
  max_score: null,
  hits: []
   }
}


My expectation is to get variations of the exact match (below) looking 
through all fields in our document.

   - Coors Brewing
   - Coors Brewery
   - Coors Brews
   - etc 
   - etc

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0198bd5d-62e4-4bde-8e81-eae6b465f777%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Custom in memory map/reduce using ES data

2014-10-22 Thread Hajime Takase
Hi,

I have like billion records on 20 nodes and would like to run custom
map/reduce or aggregation (word count,sentiment analysis,etc) immediately
after the ES result set is determined.

I came up with using Plugin system to customize aggregation like this:
https://github.com/algolia/elasticsearch-cardinality-plugin/tree/1.0.X/src/main/java/org/alg/elasticsearch/search/aggregations/cardinality

but want to update the jar quite often which will eventually require ES to
be reload,I look up the scripted map/ reduce
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.4/search-aggregations-metrics-scripted-metric-aggregation.html

but was not sure about the memory usage or customization,I decide to run
hazelcast or Spark on the same node or jvm and use their map/reduce
framework.I use Filter phase to put the ES data like this:
https://github.com/medcl/elasticsearch-filter-redis/blob/master/src/main/java/org/elasticsearch/index/query/RedisFilterParser.java#L121

but it just takes quite long time to put data on those in-memory
middleware...

Is there any best practice to put ES data to in-memory middleware, just to
re-use the same data efficiently in subsequent program?
I don't think I can use the ES query result set (on each shard) which seems
to be on memory ,in my program,am I right?

Thanks,

Haji

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsobDAfy7%3DNXuD0%3DmH12H4haadiFYq25NCz47dfsOkDmmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: CorruptIndexException when trying to replicate one shard of a new index

2014-10-22 Thread Robert Muir
Thanks for closing the loop.

On Wed, Oct 22, 2014 at 6:01 PM, Nate Folkert nfolk...@foursquare.com wrote:
 After disabling compression, I was able to successfully replicate that
 shard, so looks like we're hitting that bug.  I guess we'll have to upgrade!

 Thanks!
 - Nate

 On Wednesday, October 22, 2014 5:26:42 PM UTC-4, Robert Muir wrote:

 Can you try the workaround mentioned here:
 http://www.elasticsearch.org/blog/elasticsearch-1-3-2-released/

 and see if it works? If the compression issue is the problem, you can
 re-enable compression, just upgrade to at least 1.3.2 which has the
 fix.


 On Wed, Oct 22, 2014 at 4:57 PM, Nate Folkert nfol...@foursquare.com
 wrote:
  Created and populated a new index on a 1.3.1 cluster.  Primary shards
  work
  fine.  Updated the index to create several replicas, and three of the
  four
  shards replicated, but one shard fails to replicate on any node with the
  following error (abbreviated some of the hashes for readability):
 
  [2014-10-22 20:31:54,549][WARN ][index.engine.internal] [NODENAME]
  [INDEXNAME][2] failed engine [corrupted preexisting index]
 
  [2014-10-22 20:31:54,549][WARN ][indices.cluster  ] [NODENAME]
  [INDEXNAME][2] failed to start shard
 
  org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2]
  Corrupted
  index [CORRUPTED] caused by: CorruptIndexException[codec footer
  mismatch:
  actual footer=1161826848 vs expected footer=-1071082520 (resource:
  MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))]
 
  at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)
 
  at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)
 
  at
 
  org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723)
 
  at
 
  org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)
 
  at
 
  org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)
 
  at
 
  org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
 
  at
 
  org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
 
  at
 
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
  at
 
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 
  at java.lang.Thread.run(Thread.java:745)
 
  [2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME]
  [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID],
  [R],
  s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard,
  message
  [CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED]
  caused by:
  CorruptIndexException[codec footer mismatch: actual footer=1161826848
  vs
  expected footer=-1071082520 (resource:
  MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))
 
  [2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME]
  [INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID],
  [R],
  s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message
  [corrupted preexisting index][CorruptIndexException[[INDEXNAME][2]
  Corrupted
  index [CORRUPTED] caused by: CorruptIndexException[codec footer
  mismatch:
  actual footer=1161826848 vs expected footer=-1071082520 (resource:
  MMapIndexInput(path=DATAPATH/INDEXNAME/2/index/_7cp.fdt))
 
 
  The index is stuck now in a state where the shards try to replicate on
  one
  set of nodes, hit this failure, and then switch to try to replicate on a
  different set of nodes.  Have been looking around to see if anyone's
  encountered a similar issue but haven't found anything useful yet.
  Anybody
  know if this is recoverable or if I should just scrap it and try
  building a
  new one?
 
  - Nate
 
  --
  You received this message because you are subscribed to the Google
  Groups
  elasticsearch group.
  To unsubscribe from this group and stop receiving emails from it, send
  an
  email to elasticsearc...@googlegroups.com.
  To view this discussion on the web visit
 
  https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a0d47e5a%40googlegroups.com.
  For more options, visit https://groups.google.com/d/optout.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/210c5bf5-c71a-4d5a-891d-3485a86dc0b4%40googlegroups.com.

 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from 

ElasticSearch deployment architecture with tribe nodes

2014-10-22 Thread Connie Yang
Hi,

I want to setup an ELK Stack infrastructure that streams the logs from two 
data centers and make the combined log viewable through a single Kibana 
console.  Each data center has a local ElasticSearch cluster.  So, I'm 
considering using Tribe nodes to bring the data together.

The questions are

   - Because I want to setup the tribe nodes with HA and DR in mind, I'm 
   considering putting two tribe nodes in each data center.  Do you see any 
   problem with this setup?  Any special config I need to be aware of besides 
   the one that's already been mentioned in tribe node blog?
   - Tribe node documentation mentions that multicast is enabled by 
   default.  Will there be any problem if unicast is used?
   - Thinking outside the box a bit more, besides the Tribe node usage, are 
   there any recommended ES deployment architecture that satisfies my highly 
   available and the single view of the data from two different data centers?

Thanks,
Connie

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0a0e266b-718f-471d-b439-7beaeb02131a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.