adding a new node: how to prime the data

2014-11-20 Thread Yves Dorfsman
We upgrade our clusters by adding new nodes, increase the number or 
replicas on the indices, let the new node catch up, then exclude the old 
node, and reduce the number of replicas on the indices.

One cluster has a large index for which this operation takes hours. We 
tried to copy data from an existing node, but it copies everything 
regardless (I suspect it has no way to know what's new or not?). We're do 
plan to split that index into smaller shards, but in the meantime we are 
wondering if there is a better way of doing this?

Thanks.

---
http://yves.zioup.com
gpg: 4096R/32B0F416 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/60428bf4-675b-47bd-8b8b-e90e7e967b0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Changing Analyzer behavior for hyphens - suggestions?

2014-11-20 Thread horst knete
Hi,

thx for response and this awesome plugin bundle (especially for me as 
german).

Unfortunately the hyphen analyzer plugin didnt do the job in the way i 
wanted it to be.

The hyphen-analyzer does something similar like the whitespace analyzer - 
it just dont split on hyphen and instead see them as ALPHANUM characters 
(at least that is what i think right now).

So the term this-is-a-test get tokenized into this-is-a-test which is 
nice behaviour, but in order to make an full-text-search on this field it 
should get tokenized into this-is-a-test, this, is, a and test as 
i wrote before.

i think maybe abusing the word_delimiter token filter could do the job, 
because there is an option preserve_original.

unfortunately if you adjust the filter like this:

PUT /logstash-2014.11.20
{
index : {
analysis : {
analyzer : {
wordtest : {
type : custom,
tokenizer : whitespace,
filter : [
lowercase,
word
]
}
},
   filter : {
word : {
type : word_delimiter,
generate_word_parts: false,
generate_number_parts: false,
catenate_words: false,
catenate_numbers: false,
catenate_all: false,
split_on_case_change: false,
preserve_original: true,
split_on_numerics: false,
stem_english_possessive: true
   }
  }
}
}
}

and make an analyze test:

curl -XGET 'localhost:9200/logstash-2014.11.20/_analyze?filters=word' -d 
'this-is-a-test'

the response is this:
{tokens:[{token:this,start_offset:0,end_offset:4,type:ALPHANUM,position:1},{token:is,start_offset:5,end_offset:7,type:ALPHANUM,position:2},{token:a,start_offset:8,end_offset:9,type:ALPHANUM,position:3},{token:test,start_offset:10,end_offset:14,type:ALPHANUM,position:4}]

which just says it tokenized it in everything expect the original term, 
which make me wonder if the preserver_original settings is working?

Any idea on this?

Am Mittwoch, 19. November 2014 18:26:09 UTC+1 schrieb Jörg Prante:

 You search for a hyphen-aware tokenizer, like this?

 https://gist.github.com/jprante/cd120eac542ba6eec965

 It is in my plugin bundle

 https://github.com/jprante/elasticsearch-plugin-bundle

 Jörg

 On Wed, Nov 19, 2014 at 5:46 PM, horst knete badun...@hotmail.de 
 javascript: wrote:

 Hey guys,

 after working with the ELK stack for a while now, we still got an very 
 annoying problem regarding the behavior of the standard analyzer - it 
 splits terms into tokens using hyphens or dots as delimiters.

 e.g logsource:firewall-physical-management get split into firewall , 
 physical and management. On one side thats cool because if you search 
 for logsource:firewall you get all the events with firewall as an token in 
 the field logsource. 

 The downside on this behaviour is if you are doing e.g. an top 10 
 search on an field in Kibana, all the tokens are counted as an whole term 
 and get rated due to their count: 
 top 10: 
 1. firewall : 10
 2. physical : 10
 3. management: 10

 instead of top 10:
 1. firewall-physical-management: 10

 Well in the standard mapping from logstash this is solved using and .raw 
 field as not_analyzed but the downside on this is you got 2 fields 
 instead of one (even if its a multi_field) and the usage for kibana users 
 is not that great.

 So what we need is that logsource:firewall-physical-management get 
 tokenized into firewall-physical-management, firewall , physical and 
 management.

 I tried this using the word_delimiter filter token with the following 
 mapping:

  analysis : {
  analyzer : {
  my_analyzer : {
  type : custom,
  tokenizer : whitespace,
  filter : [lowercase, asciifolding, 
 my_worddelimiter]
  }
   },
  filter : {
 my_worddelimiter : {
 type : word_delimiter,
 generate_word_parts: false,
 generate_number_parts: false,
 catenate_words: false,
 catenate_numbers: false,
 catenate_all: false,
 split_on_case_change: false,
 preserve_original: true,
 split_on_numerics: false,
 stem_english_possessive: true
}
   }
   }

 But this 

Re: problem with heap space overusage

2014-11-20 Thread tetlika
anyone?

Середа, 19 листопада 2014 р. 13:32:37 UTC+1 користувач Serg Fillipenko 
написав:

 We have contact profiles (20+ fields, containing nested documents) indexed 
 and their social profiles(10+ fields) indexed as child documents of contact 
 profile.
 We run complex bool match queries, delete by query, delete children by 
 query, faceting queries on contact profiles.
 index rate 14.31op/s
 remove by query rate  13.41op/s (such high value caused by fact we delete 
 all child docs first before indexing of parent and then we index children 
 again)
 search rate 2.53op/s
 remove by ids 0.15op/s

 We started to face this trouble under ES 1.2 but just after we started to 
 index and delete (no searching requests yet) child documents. On ES 1.4 we 
 have the same issue.


 What sort of data is it, what sort of queries are you running and how 
 often are they run?

 On 19 November 2014 17:52, tetlika tet...@gmail.com wrote:

 hi,

 we have 6 servers and 14 shards in cluster, the index size 26GB, we have 
 1 replica so total size is 52GB, and ES v1.4.0, java version 1.7.0_65

 we use servers with RAM of 14GB (m3.xlarge), and heap is set to 7GB

 around week ago we started facing next issue:

 random cluster servers around once per day/two are hitting the heap size 
 limit (java.lang.OutOfMemoryError: Java heap space) in log, and cluster is 
 failing - becomes red or yellow

 we tried adding more servers to cluster - even 8, but than it's a matter 
 of time when we'll hit the problem, so looks no matter how many servers are 
 in cluster - it will still hit the limit after some time

 before we started facing the problem we were running smoothly with 3 
 servers
 also we set indices.fielddata.cache.size:  40% but it didnt helped

 also, there are possible workarounds to decrease heap usage:

 1) reboot some server - than heap becomes under 70% and for some time 
 cluster is ok

 or

 2) decrease number of replicas to 0, and than back to 1

 but I dont like to use those workarounds

 how it can happen while all index can fit into RAM it can run out of it?

 thanks much for possible help

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/2ae23017-fde7-4b10-b31b-39076b079f10%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/2ae23017-fde7-4b10-b31b-39076b079f10%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/25ffd149-b6ce-45b3-a702-faa512b33f6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Double entries in Kibana?

2014-11-20 Thread Siddharth Trikha
I am using logstash 1.4.1 and elsticsearch 1.1.1. My setup is showing an 
issue:

For every new line (log) added to the log file I am getting two entries in 
Kibana i.e every log entry is showing twice in Kibana. However when I check 
my logstash console, the log line is showing only once.

Any idea??

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/772279c9-69cd-43bc-8e10-2b2bfd572c27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Double entries in Kibana?

2014-11-20 Thread Siddharth Trikha
My elasticsearch console:

[2014-11-20 14:14:42,229][INFO ][cluster.metadata ] [Brothers 
Grimm] [logstash-2014.11.20] creating index, cause [auto(bulk api)], shards 
[5]/[1], mappings [_default_]
[2014-11-20 14:14:42,672][INFO ][cluster.metadata ] [Brothers 
Grimm] [logstash-2014.11.20] update_mapping [logs] (dynamic)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/528ecded-406e-4b20-82f8-2474306b536b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Issue with higlighting and analyzed tokens

2014-11-20 Thread felix
Hi,

I am experiencing an unexpected result with highlighting when using an 
_analyzer path in the mapping and custom analyzers. The highlighting 
returns no result for some query terms, even though the term matches and 
the document is returned. For other query terms it works fine. Somehow it 
seems that for querying and highlighting a different analyzer is used.

See the following commands to reproduce the issue:

https://gist.github.com/fxh/3246df167e4d72b0372f

I am using ES v1.3.4

Thanks for any hint.

Felix

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/22f735a9-edf5-4ac6-a474-aa5e73cf9d74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Best way to check a document has been indexed

2014-11-20 Thread asanchez
Hello,

I'm developing a piece of code that inserts a document into an 
elastcisearch server. The code uses libcurl to setup an HTTP request and 
capture the response.
So, in order to check wether a document has been properly indexed, what is 
the official or proper way to do it?

This is an example of a correct indexed document response:

HTTP/1.1 201 Created
Content-Type: application/json; charset=UTF-8
Content-Length: 92

{_index:someindex,_type:sometype,_id:fsAx6qXcQGCSrY1DWvQACw,_version:1,created:true}

Should my program check that first header line contains 201 Created, what 
should happen if a 3xx redirection occurs? should I consider it as properly 
indexed too?

Or instead should I just ignore the header and just check that the last 
part of the body string equals ' created:true} ' ?

Thank you!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3b1ea5c6-61bf-4ce4-976a-705e17f3927f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How is the idf calculated for an alias that maps to multiple indexes?

2014-11-20 Thread Dan Tuffery
If I have mapped an alias to more than one index and I execute a search 
using the alias name, will the idf be calculated for each individual index 
or will the idf calculation take into consideration all of the indexes that 
are mapped to the alias?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a6553a0a-6c50-4215-8829-12903faccdd5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Search template in bulk

2014-11-20 Thread Viranch Mehta
Hey,

I was wondering if there is a way to execute search template queries in 
bulk.

For eg, I have couple of search templates registered in .scripts index. I 
want to run a bulk search template using these templates and different set 
of parameters for each search in bulk.

Example query could be:

cat requests
{}
{ template: { id: tempalte1 },params: { title: burger } }
{}
{ template: { id: tempalte2 },params: { title: pizza } }

GET /blogs/post/_msearch/template --data-binary @requests; echo

Is a similar thing possible/planned?

Cheers,
Viranch

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8d56cffc-65b6-41db-bf34-cd2ad2205e30%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


MLT query delivering strange results

2014-11-20 Thread Daniel Kummer
I have been trying to figure out how exactly the more_like_this query 
behaves. The doc says Under the hood, more_like_this simply creates 
multiple should clauses in a bool query of interesting terms extracted from 
some provided text. But I found several examples that I could not explain. 
This one illustrates it:

I am using elasticsearch-1.4.0. I am creating an index like this (no 
mapping defined before):
curl -XPUT 'localhost:9200/twitter/tweet/1' -d '{user : user1, 
message : aaa}'
curl -XPUT 'localhost:9200/twitter/tweet/2' -d '{user : user1, 
message : aaa bbb}'
curl -XPUT 'localhost:9200/twitter/tweet/3' -d '{user : user1, 
message : bbb aaa}'
curl -XPUT 'localhost:9200/twitter/tweet/4' -d '{user : user2, 
message : bbb}'
curl -XPUT 'localhost:9200/twitter/tweet/5' -d '{user : user2, 
message : aaa bbb}'
curl -XPUT 'localhost:9200/twitter/tweet/6' -d '{user : user2, 
message : bbb aaa}'

Then I query it:
curl -XGET 
'http://localhost:9200/twitter/tweet/_search?pretty=truesize=10' -d '{
query: {
more_like_this_field: {
message: {
like_text: aaa bbb,
percent_terms_to_match: 1,
min_term_freq: 1,
max_query_terms: 3,
min_doc_freq: 1
}
}
}
}
{
  took : 3,
  timed_out : false,
  _shards : {
total : 5,
successful : 5,
failed : 0
  },
  hits : {
total : 5,
max_score : 14.4000225,
hits : [ {
  _index : twitter,
  _type : tweet,
  _id : 4,
  _score : 14.4000225,
  _source:{user : user2, message : bbb}
}, {
  _index : twitter,
  _type : tweet,
  _id : 2,
  _score : 12.729599,
  _source:{user : user1, message : aaa bbb}
}, {
  _index : twitter,
  _type : tweet,
  _id : 5,
  _score : 12.72813,
  _source:{user : user2, message : aaa bbb}
}, {
  _index : twitter,
  _type : tweet,
  _id : 3,
  _score : 12.728111,
  _source:{user : user1, message : bbb aaa}
}, {
  _index : twitter,
  _type : tweet,
  _id : 6,
  _score : 12.5501995,
  _source:{user : user2, message : bbb aaa}
} ]
  }
}

So text 1 aaa is missing. I get the same result if I use like_text: 
bbb aaa in the above query. However, if I use like_text: aaa I get 
what I would expect: All texts except bbb are returned.

What kind of should-query is generated by more_like_this in the above 
example? I would have expected:
curl -XGET 
'http://localhost:9200/twitter/tweet/_search?pretty=truesize=10' -d '{
query: {
bool: {
should: [
{
match: {
message: aaa
}
},
{
match: {
message: bbb
}
}
],
minimum_should_match: 2
}
}
}'
but this obviously returns neither aaa nor bbb.


Why does the above more_like_this query return bbb but not aaa?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53fae773-9359-4a1a-980e-a42d1dfd6d0f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Custom Aggregation / Access to documents

2014-11-20 Thread AndyP

When implementing a custom aggregation: can I access the result documents 
in my aggregator so that I can skip result documents based on it's 
properties?

To make it clearer I explain 

I have an index products that contains product documents. A product 
contains a nested collection of variant documents.

The requirements is to have a query that return variant documents. Kind 
of a Nested Aggregation.

To complicate things: not all variants should be returned. Some dynamic 
filtering has to be applied. And this filtering depends on properties of 
the nested variant documents. I need to peek at all variants contained in 
a product in order to determine if a variant should be included in the 
result or not. 

I am thinking that I could accomplish this writing a plugin which contains 
the custom aggregation when my initial question could be answered with 
yes.

Thanks for your suggestions and insights.

A.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7dca8221-0764-4703-8b75-0ae492961dd1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: What is the best practice for periodic snapshotting with awc-cloud+s3

2014-11-20 Thread João Costa
Hello,

Sorry for hijacking this thread, but I'm currently also pondering the best 
way to perform periodic snapshots in AWS.

My main concern is that we are using blue-green deployment with ephemeral 
storage on EC2, so if for some reason there is a problem with the cluster, 
we might lose a lot of data, therefore I would rather do frequent snapshots 
(for this reason, we are still using the deprecated S3 gateway).

The thing is, you claim that Having too many snapshots is problematic and 
that one should prune old snapshots. Since snapshots are incremental, 
this will imply data loss, correct?
Also, is the problem related to the number of snapshots or the size of the 
data? Is there any way to merge old snapshots into one? Would this solve 
the problem?

Finally, if I create a cronjob to make automatic snapshots, can I run into 
problems if two instances attempt to create a snapshot with the same name 
at the same time?
Also, what's the best way to do a snapshot on shutdown? Should I put a 
script on init.d/rc.0 to run on shutdown before elasticsearch shuts down? 
I've seen cases where the EC2 instances have not so grateful shutdowns, 
so it would be wonder if there is a better way to do this on a cluster 
level (ie, if a node A notices that a node B is not responding, then it 
automatically makes a snapshot).

Sorry if some of these questions don't make much sense, I'm still quite new 
to elasticsearch and have not completly understood the new snapshot feature.

Em sexta-feira, 14 de novembro de 2014 08h19min42s UTC, Sally Ahn escreveu:

 Yes, I am now seeing the snapshots complete in about 2 minutes after 
 switching to a new, empty bucket.
 I'm not sure why the initial request to snapshot to the empty repo was 
 hanging because the snapshot did in fact complete in about 2 minutes, 
 according to the S3 timestamp.
 Time to automate deletion of old snapshots. :)
 Thanks for the response!

 On Thursday, November 13, 2014 9:35:20 PM UTC-8, Igor Motov wrote:

 Having too many snapshots is problematic. Each snapshot is done in 
 incremental manner, so in order to figure out what changes and what is 
 available all snapshots in the repository needs to be scanned, which takes 
 time as number of snapshots growing. I would recommend pruning old 
 snapshots as time goes by or starting snapshots into a new bucket/directory 
 if you really need to maintain 2 hour resolution for 2 months old 
 snapshots. The get command can sometimes hang because it's throttled by the 
 on-going snapshot. 


 On Wednesday, November 12, 2014 9:02:33 PM UTC-10, Sally Ahn wrote:

 I am also interested in this topic.
 We were snapshotting our cluster of two nodes every 2 hours (invoked via 
 a cron job) to an S3 repository (we were running ES 1.2.2 with 
 cloud-aws-plugin version 2.2.0, then we upgraded to ES 1.4.0 with 
 cloud-aws-plugin 2.4.0 but are still seeing issues described below).
 I've been seeing an increase in the time it takes to complete a snapshot 
 with each subsequent snapshot. 
 I see a thread 
 https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ
  where 
 someone else was seeing the same thing, but that thread seems to have died.
 In my case, snapshots have gone from taking ~5 minutes to taking about 
 an hour, even between snapshots where data does not seem to have changed. 

 For example, you can see below a list of the snapshots stored in my S3 
 repo. Each snapshot is named with a timestamp of when my cron job invoked 
 the snapshot process. The S3 timestamp on the left shows the completion 
 time of that snapshot, and it's clear that it's steadily increasing:

 2014-09-30 10:05   686   
 s3://bucketname/snapshot-2014.09.30-10:00:01
 2014-09-30 12:05   686   
 s3://bucketname/snapshot-2014.09.30-12:00:01
 2014-09-30 14:05   736   
 s3://bucketname/snapshot-2014.09.30-14:00:01
 2014-09-30 16:05   736   
 s3://bucketname/snapshot-2014.09.30-16:00:01
 ...
 2014-11-08 00:52  1488   
 s3://bucketname/snapshot-2014.11.08-00:00:01
 2014-11-08 02:54  1488   
 s3://bucketname/snapshot-2014.11.08-02:00:01
 ...
 2014-11-08 14:54  1488   
 s3://bucketname/snapshot-2014.11.08-14:00:01
 2014-11-08 16:53  1488   
 s3://bucketname/snapshot-2014.11.08-16:00:01
 ...
 2014-11-11 07:00  1638   
 s3://bucketname/snapshot-2014.11.11-06:00:01
 2014-11-11 08:58  1638   
 s3://bucketname/snapshot-2014.11.11-08:00:01
 2014-11-11 10:58  1638   
 s3://bucketname/snapshot-2014.11.11-10:00:01
 2014-11-11 12:59  1638   
 s3://bucketname/snapshot-2014.11.11-12:00:01
 2014-11-11 15:00  1638   
 s3://bucketname/snapshot-2014.11.11-14:00:01
 2014-11-11 17:00  1638   
 s3://bucketname/snapshot-2014.11.11-16:00:01

 I suspected that this gradual increase was related to the accumulation 
 of old snapshots after I tested the following:
 1. I created a brand new cluster with the same hardware specs in the 
 same datacenter and 

Re: upgrading from 0.90.7 to 1.4. Gotchas?

2014-11-20 Thread Jason Wee
I would be interested too, we are using the same 0.90.7 version.

Jason

On Thu, Nov 20, 2014 at 2:03 PM, Yves Dorfsman y...@zioup.com wrote:

 Are there any precautions to take before upgrading from 0.9 to 1.4?

 Different data types?
 Different API calls?
 etc...

 And, what is the best way to upgrade? Can we just add a node at the newer
 version and let it pull the data?

 Thanks.

 http://yves.zioup.com
 gpg: 4096R/32B0F416

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHO4itzYRinoR1-qW2FRXQT6bxTpuPADoW6zTsJ%3DgKLoGmZBKA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Best way to check a document has been indexed

2014-11-20 Thread vineeth mohan
Hi ,

Just check if its 200 ( Indexed ) or 201 ( Created ) .
HTTP status code alone should be sufficient.

Thanks
Vineeth

On Thu, Nov 20, 2014 at 4:01 PM, asanchez asanchez1...@gmail.com wrote:

 Hello,

 I'm developing a piece of code that inserts a document into an
 elastcisearch server. The code uses libcurl to setup an HTTP request and
 capture the response.
 So, in order to check wether a document has been properly indexed, what is
 the official or proper way to do it?

 This is an example of a correct indexed document response:

 HTTP/1.1 201 Created
 Content-Type: application/json; charset=UTF-8
 Content-Length: 92


 {_index:someindex,_type:sometype,_id:fsAx6qXcQGCSrY1DWvQACw,_version:1,created:true}

 Should my program check that first header line contains 201 Created,
 what should happen if a 3xx redirection occurs? should I consider it as
 properly indexed too?

 Or instead should I just ignore the header and just check that the last
 part of the body string equals ' created:true} ' ?

 Thank you!

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/3b1ea5c6-61bf-4ce4-976a-705e17f3927f%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/3b1ea5c6-61bf-4ce4-976a-705e17f3927f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nMsA%3DkFzSGwApWK-5f8mUmK8YdJuvKn_Zct7yn550Aeg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Custom Aggregation / Access to documents

2014-11-20 Thread Colin Goodheart-Smithe
Hi,

I think you should be able to achieve the functionality you need without 
writing a custom aggregation. If you use a combination of the filter 
aggregation wrapped in a nested aggregation then you should be able to 
filter the child documents (variant) before they are returned. Then if 
you want to return the top X 'variants' you can use the top_hits 
aggregation as a sub-aggregation of the filter aggregation.

Hope this helps,

Colin

 

On Thursday, 20 November 2014 11:51:40 UTC, AndyP wrote:


 When implementing a custom aggregation: can I access the result documents 
 in my aggregator so that I can skip result documents based on it's 
 properties?

 To make it clearer I explain 

 I have an index products that contains product documents. A product 
 contains a nested collection of variant documents.

 The requirements is to have a query that return variant documents. Kind 
 of a Nested Aggregation.

 To complicate things: not all variants should be returned. Some dynamic 
 filtering has to be applied. And this filtering depends on properties of 
 the nested variant documents. I need to peek at all variants contained in 
 a product in order to determine if a variant should be included in the 
 result or not. 

 I am thinking that I could accomplish this writing a plugin which contains 
 the custom aggregation when my initial question could be answered with 
 yes.

 Thanks for your suggestions and insights.

 A.





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e122e8c9-c0df-4d6b-b91c-0b208d4e5e2e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to migrate lucene index into elasticsearch

2014-11-20 Thread Gaurav gupta
Thanks Jorg for the guidance and I have am trying the suggested approach #1
and I have further question on it.

As you mentioned - *- a custom written tool could traverse the segments
and extract field information and build a rudimentary mapping (without
analyzer, without info about _all and _source and all Elasticsearch
add-ons).*

We already have a Lucene Index metadata (i.e. field names, type, analyzer
etc.) available as an xml, so I can create the mapping without traversing
the segments. Should I create segment file segments.gen using the mapping
file and using some dummy values and then put all the other old lucene
index files ( except segments.gen ) from existing lucene index files
(e.g. - segments_2,_0.cfe,_0.cfs,_0.si,_1.cfe,_1.cfs etc.)

*sample mapping xml file :-*
Mapping
indexField
analyzedtrue/analyzed
fieldanalyzerStandard/fieldanalyzer
indexFieldNameAddressLine1/indexFieldName
nameAddressLine1/name
storedtrue/stored
typestring/type
/indexField
indexField
analyzedtrue/analyzed
fieldanalyzerStandard/fieldanalyzer
indexFieldNameBuilding_Name/indexFieldName
nameBuilding_Name/name
storedtrue/stored
typestring/type
/indexField
indexField
analyzedtrue/analyzed
fieldanalyzerKeyword/fieldanalyzer
indexFieldNameGNAF_PID/indexFieldName
nameGNAF_PID/name
storedtrue/stored
typestring/type
/indexField


...
/Mapping

Thanks

On Thu, Nov 13, 2014 at 11:59 PM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 It is almost impossible to use just binary-only Lucene index for
 migration, because Elasticsearch needs additional info which is not
 available in Lucene. The only method is to reindex data over the
 Elasticsearch API.

 There is a bumpy road but I don't know if one ever tried that:

 - a custom written tool could traverse the segments and extract field
 information and build a rudimentary mapping (without analyzer, without info
 about _all and _source and all Elasticsearch add-ons)

 - another tool could try to reconstruct docs (like the tool Luke) and
 write them to a file in bulk format. Not having the source of the docs
 means it must be possible to retrieve the original input from the Lucene
 index (which is almost never the case)

 - the result could be re-indexed using the Elasticsearch API (assuming all
 analyzers and tokenizers are in place) but a lot of work would have to be
 done

 The preferred way is to rewrite the code that uses the Lucene API to use
 the Elasticsearch API and re-run the indexing process.

 Jörg

 On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta gupta.gaurav0...@gmail.com
 wrote:

 Hi All,

 I have an embedded Search Engine in our product which is based on Lucene
 4.8.1 and now I would like to migrate it to latest ElasticSearch 1.4 for
 better distributed support (sharding and replication, mainly). Could you
 guide me how one should migrate the existing indexes created by Lucene to
 ES.

 I have referred to the mail thread - migrate lucene index into
 elasticsearch
 https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ.
 And based on the discussion in it appears to me that  it's not a easy job
 or even not feasible. I am wondering if there is some plugin (river) or
 tool or any work around available to migrate the existing indexes
 created by Lucene to ES.

 I googled that an ES plugin available for SOLR to ES migration :
 http://blog.trifork.com/2013/01/29/migrating-apache-solr-to-elasticsearch/ .
 Do we have someting similar for Lucene to ES migration.

 Thanks
 Gaurav

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web 

Re: Issue with higlighting and analyzed tokens

2014-11-20 Thread Nikolas Everett
I remember there was a github issue about path specified analyzers and
highlighting but I can't find it.  Reading it may be your best bet.

On Thu, Nov 20, 2014 at 5:14 AM, fe...@squirro.com wrote:

 Hi,

 I am experiencing an unexpected result with highlighting when using an
 _analyzer path in the mapping and custom analyzers. The highlighting
 returns no result for some query terms, even though the term matches and
 the document is returned. For other query terms it works fine. Somehow it
 seems that for querying and highlighting a different analyzer is used.

 See the following commands to reproduce the issue:

 https://gist.github.com/fxh/3246df167e4d72b0372f

 I am using ES v1.3.4

 Thanks for any hint.

 Felix

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/22f735a9-edf5-4ac6-a474-aa5e73cf9d74%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/22f735a9-edf5-4ac6-a474-aa5e73cf9d74%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1TZLQ6L0q4DEq77rfRqBpqF78dehYRUnFEtmZx8j0ANw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Is Elasticsearch also supported on AIX and HP Itanium 11.31

2014-11-20 Thread Gaurav gupta
Is Elasticsearch also supported on AIX and HP Itanium 11.31. I didn't find
this information in release notes or installation instructions.

Thanks
Gaurav

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALZAj3Lh3GMkBM6sGs%2B9oR0A3nvTO5tU65ZHEa-Sg7ubBbL_7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: analyzing wildcard queries ...

2014-11-20 Thread mkamm78
hi jörg

just wanted to tell you that i will/can not fork/commit my improvement on 
wildcard analysis cause i'm no longer 100% convinced
that it is really an improvement resp. can be used in general...
 
after rethinking i must admit that i was probably too much focused on my 
concrete issues with email addresses using the standard analyzer 
e.g marco.kamm@brain analyzed into the tokens [marco.kamm] [brain.net]
the original idea behind using the standard analyzer was that users will 
find sth. when searching for brain.net or marco.kamm without having to 
use any wildcards!
(the old lucene standard analyzer did also split on '.' charaters so even 
marco or brain could be found)
 
somehow i thought it would also make sense to search for e.g. 
marco.*@brain.net or marco.kamm@*.net

my first improvement approach was based on the existing code but instead of 
concatenating all the analyzed sub-string parts into a single wildcard query
i tried to build a boolean query containing the individual analyzed parts 
as either prefix or wildcard queries ... 
e.g.
marco.*@brain.net -- marco* AND *brain.net
marco.kamm@*.net -- marco.kamm* AND *net
first query can be only prefix query (when not preceeded by a single 
wildcard char) and last one could be a postfix query
everthing in between was surounded by '*'...'*'

another (optimized) approach is based on the following technique:
generate a random letter sequence that is not present in the search term, 
replace the wildcards by this sequence and feed it to the analyzer
this way if the anlayzer produces more than one token out of a single 
wildcard input you can be sure that original inputs would also be split
into more terms and you need to use more than one single query obj ...
 
after analyzing, process the resulting tokens one by one and combine them 
into a boolean AND query. foreach token undo the wildcard replacement
and check the occurences of wildcard characters. if a token contains no 
wildcards at all use a termquery, if the token only contains a wildcard 
char at the end use prefixquery 
else use wildcard query ...
 
e.g.
marco.*@brain.net -- marco.{randomLetterSequence}@brain.net -- 
[marco.{randomLetterSequence}] [brain.net] -- marco.* AND brain.net
marco.kamm@*.net -- marco.kamm@{randomLetterSequence}.net -- 
[marco.kamm] [{randomLetterSequence}.net] -- marco.kamm AND *.net
 
these approaches could work for my cases (at least they produce some 
results where the original code didn't find anything, althought the results 
maybe inaccurate but this lies in the nature of AND combinations e.g. 
marco.*@brain.net transformed into marco.* AND brain.net could also 
find brain@marco.org etc.) 

but i think for most of the cases (where the queried field uses an analyzer 
that doesn't split up terms into several tokens e.g. keyword analyzer etc. 
) the existing code does already the best effort that can be done in a 
generic way (without knowing what the analyzer is doing with certain 
characters)

maybe you can use sth. out of my 2nd. approach with testing the analyzers 
behaviour by replacing the wildcards with sth. that doesn't get eaten up to 
see if the input is split or not
(i think a sequence of plain asci letters could be a way but i'm not sure 
if this could server as a general solution e.g for japanes analyzers etc. 
for me a sequence of asci letters seems like kind of lowest common 
denominator LCD).
 
for the moment we're trying to live with the current best effort approach 
maybe analyzing some fields twice once with a standard analyzer or sth. and 
additionally with a keyword analyzer, and direct pure wildcard queries to 
the keywork field. or maybe we're going to split up email addresses into a 
seperate username- and domain field etc. 

thank you anyway for your time

cheers marco
 
 

Am Mittwoch, 19. November 2014 09:56:43 UTC+1 schrieb mka...@gmail.com:

  hi

 i have text/email addresses indexed with the standard analyzer. 

 e.g.

 marco.k...@brain.net that results in two tokens being in the index:

 [marco.kamm] and [brain.net]

 i want to search using query_string query and wildcards like:

 {
   fields:[contact_email],
   query : {
 query_string : {
   query : (contact_email:(marco.*@brain.net)),
   default_operator : and,
analyze_wildcard: true
 }
   }
 }

 from my past working-experience with lucene i know that wildcards queries 
 are kind of problematic cause they're not analyzed by default.
 (to workaround this behaviour i wrote a custom parser that prepares the 
 query string depending on the specific field analyzer in prior before 
 passing it to the lucene query parser)

 at first when i noticed the analyze_wildcard parameter/option i thought 
 great/cool! i no longer need my custom magic parser ,-), elasticsearch 
 provides built-in support for my problems ... 

 when testing the analyze_wildcard behaviour with pure prefix queries 
 like marco.kamm@brain.* it worked like a charm! resp. did the same 
 thing i 

Re: Does nested query with operator honor the operator or does it always display some default behavior

2014-11-20 Thread Ramdev Wudali
Hi Ivan: 
I tried using the _explain API (end point to get an explanation, it 
returned this :
{
   _index: news,
   _type: swift,
   _id: _explain,
   _version: 5,
   created: false
}

I tried adding explain:true as part of my query which resulted in this :
_explanation: {
   value: 10.384945,
   description: Score based on child doc range from 75103316 
to 75103366
}


That said,  If you think the syntax is not familiar, How do you suggest the 
query be created ? ( of course, I could split the query into a boolean 
query with two MUST nested conditions) which does result in the documents I 
am looking for). 
However, If I have a list of more than 2 values to be seated for, the query 
becomes unseemly. The JAVA API does seem to allow for list of values to be 
passed in.. here is a code snippet for who I am using the JAVA API :

 
  qb = QueryBuilders.nestedQuery(fieldName, QueryBuilders.
boolQuery().
   must(QueryBuilders.
matchQuery(fieldName + .v, values).
   operator(
MatchQueryBuilder.Operator.AND)).

   must(QueryBuilders

.rangeQuery(fieldName + .s)

.gte(0.6))

Where values is a List of values

please let me know if I am using the API incorrectly. 


Thanks

Ramdev



On Wednesday, 19 November 2014 14:13:41 UTC-6, Ivan Brusic wrote:

 As mentioned before, that syntax seems strange to me. I have never seen an 
 array used with a match query. I wonder what the resulting Lucene query is. 
 I think that analyzed/non-analyzed just might be a red herring. What does 
 the explanation output say?

 -- 
 Ivan

 On Wed, Nov 19, 2014 at 10:24 AM, Ramdev Wudali agas...@gmail.com 
 javascript: wrote:

 The fields (I am searching against) are analyzed, by the default 
 analyzer. 
 The query as you I noted in my question was generated by using the JAVA 
 API, So the array syntax is generated by the API's interpretation.  That 
 said, I ran a few more experiments. If the field were not analyzed (unlike 
 my non experiment  case),  The query function works and returns the right 
 documents. (meaning where both the values exist) in the returned 
 documents.  But if they are analyzed, the  operator is not honored.

 So now my question is,  why would not analyzed fields cause the operator 
 to be honored ?
 and Does the operator field within a nested query depend on if the field 
 in the nested field is actually analyzed or not. ?

 Ramdev



 On Tuesday, 18 November 2014 14:45:53 UTC-6, Ivan Brusic wrote:

 I have never seen the array syntax with the match query, so I am not 
 sure what the behavior should be. Since your search terms are not analyzed 
 in your example, a terms query with a minimum match of 100% should work. If 
 not, perhaps creating a single search term of your existing terms?

 -- 
 Ivan

 On Tue, Nov 18, 2014 at 10:23 AM, Ramdev Wudali agas...@gmail.com 
 wrote:

 Hi :
I have the following query :
 {
   query: {
 bool: {
   must: {
 nested: {
   query: {
 bool: {
   must: [
 {
   match: {
 NESTED_FIELD.v: {
   query: [ AAPL.OQ, GOOGL.OQ],
   operator: and
   
 }
   }
 },
 {
   range: {
 NESTED_FIELD.s: {
   from: 0.6,
   to: null,
   include_lower: true,
   include_upper: true
 }
   }
 }
   ]
 }
   },
   path: NESTED_FIELD
 }
   }
 }
   },
   filter: {
 bool: {
   must: [
 {
   range: {
 DOC_DATE.v: {
   from: 2014-08-19T20:00:00.000-04:00,
   to: 2014-10-18T23:59:59.999Z,
   include_lower: true,
   include_upper: true
 }
   }
 }
   ]
 }
   }
 }

 The behavior I expect is the following :

  In the documents that are returned, they should contain both values 
 for the NESTED_FIELD.v (AAPL.OQ and GOOG.OQ) that satisfy the condition 
 where their corresponding NESTED_FIELD.v range also is satisfied. 

 The behavior I see :
the documents returned contain either one of the values (as in its 
 got AAPL.OQ (OR) GOOG.OQ (OR) Both.  

 I want documents that only have both the values. So the operator 
 :and (and its variant operator:AND) does not seem to have any effect.
 any pointers suggestions regarding this is much 

Re: If I use EC2 Discovery Plugin do I necessarily give internet access to my instances?

2014-11-20 Thread wellszhane
I have the same problem yesterday. What I did is  make elastic IP and 
associate it with your ec2 instance. In the sercuity group you need open 
both private Ip and the elastic IP.  try it.

On Wednesday, November 19, 2014 8:01:48 AM UTC-5, David Vasquez wrote:

 Hi everyone!

 I'm trying to configure tight security rules to my elasticsearch cluster 
 meaning that the network access rules must be exactly what is needed. Now 
 I've found that the EC2 Discovery plugin does a call to AWS (
 ec2.us-east-1.amazonaws.com:443) and for that I would need to give 
 internet access to my elasticsearch instances.

 That said, it means a big drawback for my security configuration because I 
 cannot tie the call to a fixed IP, neither to a fixed port and hence my 
 access rules would be wide open.

 Can you please tell me how do you manage this security issue on AWS?

 Thank you very much!


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/17504959-fd11-4b16-ab3f-640a083c1b19%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Deleted indices keep coming back w/ 1.4.0

2014-11-20 Thread David Smith
Hi,

Since we upgraded to 1.4.0, deleted indices in our time-series index set 
keep coming back right after deletion. So whenever we drop an expired index 
(usually as midnight rolls), it gets deleted and removed from the alias it 
was under. But about half the time it comes back as an empty index.

As you can see from the marvel screenshot below (read from the bottom to 
the top).

https://lh4.googleusercontent.com/-iXhabN33WIw/VG4GDf3Q8QI/AC4/jLF_dGBpGIg/s1600/Screen%2BShot%2B2014-11-20%2Bat%2B10.07.58%2BAM.png

Just wanted to make sure you guys are aware of this bug.

D.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f0ca4974-1940-43ea-bdfc-a7b8bdd80162%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ES seems to be aliasing the byte type to the short type

2014-11-20 Thread Damien Montigny
Hi everyone,

If was experimenting on mappings for index size optimization purpose and I 
have an issue, it seems a bug to me, I cannot find any documentaion about 
it.

When I declare a field of type *byte *ES seems to be considering it as 
*short*, for proof see the error message of the last curl below, it 
mentions the short type even though I declared a byte
(*MapperParsingException[failed to parse [some_data]]; nested: 
JsonParseException[Numeric value (32768) out of range of Java short*)

*Every has been tested on a freshly untared ES.*

*# Create the index*
curl -XPUT 'http://localhost:9200/some_index?pretty' -d '
{
mappings: {
some_type: {
dynamic: strict,
properties: {
some_data: {
type: byte
}
}
}
}
}
'

*# Insert a doc with a value just out of the range of the byte type, 
success, wierd*
curl -XPUT http://localhost:9200/some_index/some_type/1?pretty; -d '
{
some_data: 256
}
'

*# Insert a doc with the max value for the short type, success, still wierd*
curl -XPUT http://localhost:9200/some_index/some_type/1?pretty; -d '
{
some_data: 32767
}
'

*# Insert a doc with a value just out of the range of the short type, 
failure, ok I get it, ES sees it as a short...*
curl -XPUT http://localhost:9200/some_index/some_type/1?pretty; -d '
{
some_data: 32768
}
'

*java -version* outputs :
java version 1.7.0_07
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)

*lsb_release -a* outputs :
Distributor ID:Ubuntu
Description:Ubuntu 12.04.5 LTS
Release:12.04
Codename:precise

*uname -r* outputs:
3.1.10-1.9-ec2

*ES info* : ES 1.4.0

*Thanks in advance for the help.*

Damien

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2af44e04-e495-4641-a275-348d6ce73d5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: upgrading from 0.90.7 to 1.4. Gotchas?

2014-11-20 Thread David Smith
I can't remember what 0.90.x was unlike as that was long ago for us, but we 
recently upgraded from 1.1.0 to 1.4.0. 

Look 
at 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/breaking-changes.html

additionally pay attention to:

   - scripting: 
  - replacement of mvel w/ groovy and disabling dynamic scripting by 
  default. We elected to install the mvel plugin manually and change our 
  scripts to identify that they are mvel (lang=mvel) and make some minor 
  adjustments to make compatible (such as use of _score instead of 
doc.score 
  in scripts). We will do the upgrade to groovy from mvel separately to 
take 
  care of security concerns w/ mvel.
   - lot of percolator changes in 1.x 
   - multi field changes in 1.0.0
   - disk space allocation decider configuration format changed in 1.x 
   sometime (if you're configuring that)
   - enabled CORS if you're using HEAD (see 
   https://github.com/mobz/elasticsearch-head/issues/170)

In general, I would go through release notes 
at http://www.elasticsearch.org/downloads/ and look under breaking changes 
for every version since your last version.


On Thursday, November 20, 2014 7:47:04 AM UTC-5, Jason Wee wrote:

 I would be interested too, we are using the same 0.90.7 version.

 Jason

 On Thu, Nov 20, 2014 at 2:03 PM, Yves Dorfsman yv...@zioup.com 
 javascript: wrote:

 Are there any precautions to take before upgrading from 0.9 to 1.4?

 Different data types?
 Different API calls?
 etc...

 And, what is the best way to upgrade? Can we just add a node at the newer 
 version and let it pull the data?

 Thanks.

 http://yves.zioup.com
 gpg: 4096R/32B0F416 

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5070f24d-92d6-42ba-a447-4dd759c59fb3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: upgrading from 0.90.7 to 1.4. Gotchas?

2014-11-20 Thread 熊贻青
The most surprising part of my upgrade from 0.90 to 1.0.1 was the drop of
indexing performance. So, yes, I’m also interested to know any gotchas.
2014年11月20日 下午8:47于 Jason Wee peich...@gmail.com写道:

 I would be interested too, we are using the same 0.90.7 version.

 Jason

 On Thu, Nov 20, 2014 at 2:03 PM, Yves Dorfsman y...@zioup.com wrote:

 Are there any precautions to take before upgrading from 0.9 to 1.4?

 Different data types?
 Different API calls?
 etc...

 And, what is the best way to upgrade? Can we just add a node at the newer
 version and let it pull the data?

 Thanks.

 http://yves.zioup.com
 gpg: 4096R/32B0F416

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAHO4itzYRinoR1-qW2FRXQT6bxTpuPADoW6zTsJ%3DgKLoGmZBKA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAHO4itzYRinoR1-qW2FRXQT6bxTpuPADoW6zTsJ%3DgKLoGmZBKA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ1W5Rw0QqYtiGExJZUNSJ8QsOAxmxYSDgk-6b_zK%3DkmJA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: upgrading from 0.90.7 to 1.4. Gotchas?

2014-11-20 Thread David Smith
Also, forgot to mention... if you have native scripts, they will 
mysteriously throw Unsupported Operation exception whenever invoked. Looks 
like they made a mistake in 1.4.0 (that is now reverted on master), that 
requires you to override the setScorer in native scripts. It's ok, I just 
wish they documented in breaking changes. 

On Thursday, November 20, 2014 1:03:17 AM UTC-5, Yves Dorfsman wrote:

 Are there any precautions to take before upgrading from 0.9 to 1.4?

 Different data types?
 Different API calls?
 etc...

 And, what is the best way to upgrade? Can we just add a node at the newer 
 version and let it pull the data?

 Thanks.

 http://yves.zioup.com
 gpg: 4096R/32B0F416 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/743f6571-af44-40cc-a1c8-f9474cae4773%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch 1.3.4 - Duplicate data sometimes

2014-11-20 Thread John D. Ament
Hi,

I was wondering how I might be able to trouble shoot issues with duplicate 
data coming back from queries.

In my query, I perform an aggregate query, something like this:

final SearchResponse searchResponse = 
client()
.prepareSearch(indexName)
.setTypes(OBJ_TYPE)
.setFetchSource(true)
.setExplain(true)
.addSort(dateCreated.value, SortOrder.DESC)
.addSort(recId, SortOrder.DESC)
.setSize(1000)
.addAggregation(
AggregationBuilders.filter(filter1).filter(filterBuilder).subAggregation(rangeBuilder))
.execute().actionGet();

The values returned from this query are giving back duplicate object id's. 
 But only sometimes.  I've looked at our elasticsearch config files and 
don't see any way this could happen.  The filters are only reducing based 
on some attributes, I can't think of any reason this could occur.

John

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/11b5f5a9-95da-40c7-aad3-00b5e79d74ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES backups without using snapshots?

2014-11-20 Thread Ivan Brusic
I have never used plugins, but there is also Jorg's tool:
https://github.com/jprante/elasticsearch-knapsack

-- 
Ivan

On Wed, Nov 19, 2014 at 11:27 PM, Mathew D mathew.degerh...@gmail.com
wrote:

 Hi Ivan,

 Thanks for the quick response.  We've got 5 shards per index, so with 2
 replicas each node should in theory have a full set of data.  I was hoping
 that taking the node out of service by stopping it would avoid disruption
 as a result of pausing indexing, but I couldn't find any documentation to
 confirm if such an operation would leave the data files in a consistent
 state that could reliably be used for restore.

 Evan's suggestion of elasticdump looks like the closest to what I'm after,
 although unfortunately I don't have node.js/npm installed (and being an
 enterprise could be tricky to get installed).

 NB I hear your concerns re cluster design.  Incorporating the remote node
 was chosen to minimise data loss following a data centre failure, however
 because of the risk of split brain, the node actually functions more of a
 warm DR than any sort of HA...

 Regards,
 Mat



 On Thursday, November 20, 2014 2:32:14 PM UTC+13, Ivan Brusic wrote:

 How many shards for each index? I am assuming that each node does not
 have all the data.

 If you can stop indexing, you can just rsync the data to a local
 directory. Make sure you execute a flush and preferably an optimize in
 order to merge the segments on disk. The trick part is the manual combine
 you referred to.

 BTW, 3 nodes/2 data centers? Sounds like a recipe for trouble. :)

 Cheers,

 Ivan

 On Wed, Nov 19, 2014 at 7:41 PM, Mathew D mathew.d...@gmail.com wrote:

 Hi there,

 Any suggestions as to how I can create full ES backups without using
 snapshot functionality?

 The reason I can't use snapshots is because they require a shared
 directory mounted on all nodes, but my 3-node cluster spans two data
 centres and I am not able to NFS mount over the WAN.  I'm also not
 permitted to backup to AWS/S3.

 As I have 2 replicas of each index, I'm leaning towards the idea of
 stopping one node and backing up that node's data directory but wondered if
 anyone could suggest a more elegant way.  For example, could I snapshot to
 a local directory on each node, then manually combine the contents into a
 single cohesive backup?

 Regards,
 Mat



  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/7615a20f-7c90-43e4-b22b-052686cf543b%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/7615a20f-7c90-43e4-b22b-052686cf543b%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAfzN%2BbpvL94TbYMHNr0L4x%2BjEA0D6NrM_Hyj8NjUEHmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


min_document_doc in nested aggregations

2014-11-20 Thread kazoompa
Hi,

I have several aggregations each of which have their own inner 
aggregations. It seems that the 'min_document_doc' does not apply when 
their containing aggregation is itself empty. I presumed that because both 
level of aggregations use 'min_document_doc' there would be buckets for the 
inner agg as well.

Can somebody enlighten me on why ES cannot do this, sort of a technical 
insight would be appreciated.

Thanks.


Here is a snippet of my query:
...
  aggregations: {
totalCount: {
  global: {}
},
categories-missing: {
  terms: {
field: categories.missing,
size: 0,
min_doc_count: 0,
order: {
  _term: asc
}
  }
},
datasetId: {
  terms: {
field: datasetId,
size: 0,
min_doc_count: 0,
order: {
  _term: asc
}
  },
  aggregations: {
attributes-Default: {
  terms: {
field: attributes.Default,
size: 0,
min_doc_count: 0,
order: {
  _term: asc
}
  }
},
attributes-Administrative_information: {
  terms: {
field: attributes.Administrative_information,
size: 0,
min_doc_count: 0,
order: {
  _term: asc
}
  }
},
...


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d00b5831-351e-4f36-9197-9d56928667eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: If I use EC2 Discovery Plugin do I necessarily give internet access to my instances?

2014-11-20 Thread Norberto Meijome
Yes..but this might not be an option if your instance is in a private
subnet...it also means handling all your IPS like this ( though in theory
you don't need internal IPs, security group id/name would do as well...) -
there r limits to how many rules you can add to a secgroup

At the same time, adding eip would complicate the OP's apparent sec
requirements ...
On 20/11/2014 12:04 pm, wellszh...@xteros.com wrote:

 I have the same problem yesterday. What I did is  make elastic IP and
 associate it with your ec2 instance. In the sercuity group you need open
 both private Ip and the elastic IP.  try it.

 On Wednesday, November 19, 2014 8:01:48 AM UTC-5, David Vasquez wrote:

 Hi everyone!

 I'm trying to configure tight security rules to my elasticsearch cluster
 meaning that the network access rules must be exactly what is needed. Now
 I've found that the EC2 Discovery plugin does a call to AWS (
 ec2.us-east-1.amazonaws.com:443) and for that I would need to give
 internet access to my elasticsearch instances.

 That said, it means a big drawback for my security configuration because
 I cannot tie the call to a fixed IP, neither to a fixed port and hence my
 access rules would be wide open.

 Can you please tell me how do you manage this security issue on AWS?

 Thank you very much!

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/17504959-fd11-4b16-ab3f-640a083c1b19%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/17504959-fd11-4b16-ab3f-640a083c1b19%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACj2-4%2BripJ%3DmDUgH8VbXMAvFEQwGAbqWSwwS-Nm0TEeyUpOtw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Changing Analyzer behavior for hyphens - suggestions?

2014-11-20 Thread joergpra...@gmail.com
The whitespace tokenizer has the problem that punctuation is not ignored. I
find the word_delimiter filter not working at all with whitespace, only
with keyword tokenizer, with massive pattern matching which is complex and
expensive :(

Therefore I took the classic tokenizer and generalized the hyphen rules in
the grammar. The tokenizer hyphen and filter hyphen are two routines.
The tokenizer hyphen keeps hyphenated words together and handles
punctuation correct. The filter hyphen adds combinations to the original
form.

Main point is to add combinations of dehyphenated forms so they can be
searched.

Single words are only taken into account when the word is positioned at the
edge.

For example, the phrase der-die-das should be indexed in the following
forms:

der-die-das,  derdiedas, das, derdie, derdie-das, die-das, der

Jörg

On Thu, Nov 20, 2014 at 9:29 AM, horst knete baduncl...@hotmail.de wrote:


 So the term this-is-a-test get tokenized into this-is-a-test which is
 nice behaviour, but in order to make an full-text-search on this field it
 should get tokenized into this-is-a-test, this, is, a and test as
 i wrote before.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEveN15MGdB-2fKAx46bntZ8VO8ii88BNxDkfo6W5jPMw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Getting file text content from mapper?

2014-11-20 Thread Raymond Giorgi
Also, this is the first line of what's posted along the river

{ index: {_index:resumes,_type:resume,_id:2158912}}

Things can get truncated when they're as big as a Base64 encoded file :)


On Wednesday, November 19, 2014 6:01:29 PM UTC-5, Raymond Giorgi wrote:

 Hey all,

 I'm hoping someone can help me out with something I'm having an issue with.

 The short: I'm trying to extract plaintext from the attachment-mapper.

 The long: I'm posting the contents of a file Base64 encoded to RabbitMQ 
 which is feeding an ElasticSearch river plugin. Querying against the field 
 works fine, but it only seems to store the Base64 encoding of the file 
 instead of the plaintext. I'd like to extract the contents as plaintext and 
 have that be returnable (i.e. query for the text of a docx). I'm feeding it 
 from a PHP front end, so there are places in the app where I'd like to rely 
 on Elasticsearch's built in Tika processor.

 Thanks!


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/68456ac0-14b9-49f8-a0a0-b930223004f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Marvel / ES query document count major discrepancy

2014-11-20 Thread Mike Seid
Howdy,

I have been hitting my ES cluster pretty hard of recent and I think it is 
holding up great. In the last few days, I have noticed a major discrepancy 
in the document count that Marvel shows versus that of doing a _count query 
of the actual ES cluster. Marvel is reporting about 43.9M documents while 
the ES query shows 8.7M. Where would this discrepancy come from? I would 
suspect it is a monitor error on Marvels part, but I'm not sure. Any ideas?

Marvel Screenshot:
https://www.dropbox.com/s/1y39wui96fpjc14/Screenshot%202014-11-20%2009.57.42.png?dl=0

ES Query:
http://x/pa-2014-11-19/_count
{
   
   - count: 8781919,
   - 
   _shards: {
  - total: 5,
  - successful: 5,
  - failed: 0
   }

}


Thanks,

Mike

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/071a26ac-db82-42dc-880b-6165a0d38d30%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Native script unable to get values, perhaps because it's a child doc? ES v1.1.1

2014-11-20 Thread Jonathan Foy
Hello

I have a native script that I'm using to score/sort queries and it is not 
working properly for one of my three types.

All three types have the same nested field, and I'm using the script to 
check values and score/sort by an externally defined order.  However, for 
one of the three types the values pulled from the doc fields are always 
zero/null (using docFieldLongs(fieldName).getValue() or 
docFieldStrings(stringValue).getValue()).  I can check for the fields to 
be present using doc().containsKey(), and it seems to see them, but it 
never actually sees any values.  I've pulled a few records manually and 
verified that the data looks good.

The only think I can think of that's different is that this one type is a 
child of one of the other two, but I'm querying it completely independently 
of the parent in this case.  Does this sound familiar to anyone by any 
chance?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5fb79559-7d9f-4a32-a8e1-743876c9a152%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Uncertain field types when extracting fields from getSource() (java api)

2014-11-20 Thread Simon Brandhof
A workaround is to cast the value into Number and then to call 
Number#longValue().

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ff2f3a76-f4be-4669-8fdc-c20b2efb1c6e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Deleted indices keep coming back w/ 1.4.0

2014-11-20 Thread Mark Walkom
That's unlikely to be a bug, the only time ES will recreate an index is if
it finds dangling data.

Are your indexes created automatically? How is your data sent to ES? Is it
possible that there is some data that is slower reaching ES than others and
so the time difference causes this to happen?

On 21 November 2014 02:19, David Smith davidksmit...@gmail.com wrote:

 Hi,

 Since we upgraded to 1.4.0, deleted indices in our time-series index set
 keep coming back right after deletion. So whenever we drop an expired index
 (usually as midnight rolls), it gets deleted and removed from the alias it
 was under. But about half the time it comes back as an empty index.

 As you can see from the marvel screenshot below (read from the bottom to
 the top).


 https://lh4.googleusercontent.com/-iXhabN33WIw/VG4GDf3Q8QI/AC4/jLF_dGBpGIg/s1600/Screen%2BShot%2B2014-11-20%2Bat%2B10.07.58%2BAM.png

 Just wanted to make sure you guys are aware of this bug.

 D.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/f0ca4974-1940-43ea-bdfc-a7b8bdd80162%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/f0ca4974-1940-43ea-bdfc-a7b8bdd80162%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmdhCEHMuCMowC2avFKMrmau0Zm%2BWsJgEQMLSPH8JSQXw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is Elasticsearch also supported on AIX and HP Itanium 11.31

2014-11-20 Thread Mark Walkom
Depends what you mean by supported.

I have seen comments of people running it on AIX, but I don't think it is
officially supported.

On 21 November 2014 00:37, Gaurav gupta gupta.gaurav0...@gmail.com wrote:

 Is Elasticsearch also supported on AIX and HP Itanium 11.31. I didn't find
 this information in release notes or installation instructions.

 Thanks
 Gaurav

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALZAj3Lh3GMkBM6sGs%2B9oR0A3nvTO5tU65ZHEa-Sg7ubBbL_7g%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CALZAj3Lh3GMkBM6sGs%2B9oR0A3nvTO5tU65ZHEa-Sg7ubBbL_7g%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZk96y9b4VcsES5UDqQSwoVtmjr6%2BS6dN5XMrwpCxzzwag%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[ANN] it’s {on}: announcing our first user conference – elastic{on}15

2014-11-20 Thread Mark Walkom
http://www.elasticsearch.org/blog/its-on-announcing-our-first-user-conference-elasticon15

 Shay Banon November 20, 2014

 It’s been a little over two years since we formed a company around
 Elasticsearch, and the engagement with our community, users, and customers
 has taken on a life of its own. There are now 90 meetup groups
 http://elasticsearch.meetup.com/ around the globe, hundreds of
 conferences featuring our products, and a growing list of events where our
 own developers engage audiences in the Elasticsearch story. It’s clear we
 hit a nerve.

 Over and over, we kept hearing one question: “When will Elasticsearch get
 a conference of its own?” We listened, and I am happy to announce the first
 Elasticsearch conference, Elastic{ON}15 http://www.elasticon.com/, is
 happening March 9 through 11, 2015 in San Francisco, California.

 The conference details are unfolding as we speak, but there are a few
 things we already have planned that I want to share with you.

 First, Elastic{ON}15 will be centered around Elasticsearch
 http://www.elasticsearch.com/products/elasticsearch/ and the ecosystem
 of products surrounding it, including Apache Lucene, Kibana
 http://www.elasticsearch.com/products/kibana/, Logstash
 http://www.elasticsearch.com/products/logstash/, the various client
 libraries
 http://www.elasticsearch.org/guide/en/elasticsearch/client/community/current/clients.html
 , Elasticsearch for Apache Hadoop
 http://www.elasticsearch.com/products/hadoop/,Marvel
 http://www.elasticsearch.com/products/marvel/, and Shield
 http://www.elasticsearch.com/products/shield/.

 Part of what makes Elasticsearch tick is the close communication we have
 with our users. To that extent, we’re doing a few things to make sure the
 conference is run the same way.

 What does this mean for you? It means that *all* the developers at our
 company (that’s right, every single one of them) will be attending the
 conference — and they want to hear from you. Elastic{ON}15 will feature a
 dedicated track that gives you a unique opportunity to talk with our
 engineers about all the work they currently do and plan to do. Afterwards,
 we’re coordinating an Elasticsearch dev all hands meeting where we’ll
 discuss your feedback and apply it to future products and events.

 The second aspect of the conference is hearing you, the user, speak about
 how you use our platform. I am lucky enough to be able to travel the world
 and talk to users and customers frequently, and am continuously amazed by
 how our products are being put to use. We plan to create a platform for our
 users, customers, and contributors in the community to talk about their use
 cases and successes. Elastic{ON}15 will be a great way to meet and talk
 with other users in your space and share knowledge. Please, if you’re
 interested, don’t hesitate to submit to speak
 http://www.elasticon.com/apex/Elastic_ON_Speak at the conference.

 We will also have a hands-on track with our developers, who will go
 through some high-level overviews and technical deep dives of our various
 products. Or you can drop by our “Agents of Elasticsearch” station to ask
 any questions that are on your mind.

 Bottom line: Elastic{ON}15 is all about you! And obviously, we plan to
 have a lot of fun while we’re together.

 I am super excited about the conference, and I hope you are as well. I
 would love to personally welcome each and every one of you to join us; it’s
 going to be great. (And make sure to sign up
 http://www.elasticon.com/apex/Elastic_ON_Signup to save your spot – my
 events team keeps reminding me that registration will fill up fast!)


(Sent on behalf of the Elasticsearch Team)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZkEbfbwX8CJowcqL9XyX_bT2TC-mn5DSw0KOvVjSCuOrg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


query timing out

2014-11-20 Thread Warner Onstine
Hi all, hoping to get some help with this. I am trying to retrieve the
latest tweet by a person. I'm using the javascript library. Using the
elastic.js library to help build a query. Here is the query generated:
{query:
 {match:
 {talent_id:{query:546e50b989fe347230c4}}},
sort:[{post_date:{order:desc}}],size:1}

When I run this query through curl it works just fine, but when I run it
through the elasticsearch JS lib it times out regularly (not always but a
lot). The curl one comes back almost immediately which I would expect.

Any thoughts on why the JS lib times out? Or a better way to write my query
above to get what I want?

Thank you in advance!

-warner

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJNTuMBj%3DwoYzBb1D5CAv9n6LgRm-n_am3yafT99-2tNhWRAZg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Native script unable to get values, perhaps because it's a child doc? ES v1.1.1

2014-11-20 Thread Shiwen Cheng
Hi, did you index the field you want to use in the native script?

Shiwen


On Thursday, 20 November 2014 11:38:30 UTC-8, Jonathan Foy wrote:

 Hello

 I have a native script that I'm using to score/sort queries and it is not 
 working properly for one of my three types.

 All three types have the same nested field, and I'm using the script to 
 check values and score/sort by an externally defined order.  However, for 
 one of the three types the values pulled from the doc fields are always 
 zero/null (using docFieldLongs(fieldName).getValue() or 
 docFieldStrings(stringValue).getValue()).  I can check for the fields to 
 be present using doc().containsKey(), and it seems to see them, but it 
 never actually sees any values.  I've pulled a few records manually and 
 verified that the data looks good.

 The only think I can think of that's different is that this one type is a 
 child of one of the other two, but I'm querying it completely independently 
 of the parent in this case.  Does this sound familiar to anyone by any 
 chance?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b01edba-1d36-427d-abfe-48a03f4104a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Odd behavior of bulk loading speed - good riddle?

2014-11-20 Thread Christopher Ambler
So this has me perplexed.

I have a bulk data loading job that creates an upsert statement and batches 
500 of them in a bulk operation using the _bulk interface.

I send the bulk insert via HTTP (on 9200) and wait for the response before 
sending the next one, which I do immediately.

I do not hit any thread pool limits.

I have replicas set to zero and refresh interval set to -1 to make the 
loading as lightweight as possible.

Timing these, they start out pretty fast and run about 2000 documents per 
second. Four or so HTTP round trips.

This lasts for a few minutes and then it starts to slow. Within an hour, 
it's running about 1200 per second. In another hour, it's down to about 600 
per second. Then it seems to flatten-out about 400 per second until the job 
is done, some 8 million documents later.

So my question is - why the slowdown? It's very consistent, seems 
reasonably linear, and happens 100% of the time.

Any clues?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a787d461-f467-4f79-943b-e65e12492783%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Odd behavior of bulk loading speed - good riddle?

2014-11-20 Thread Christopher Ambler
The statement, if that helps (this is a line of PHP, hence the $ variables):

{\script\ : \ctx._source.auctionid=$auctionID; 
ctx._source.auctiontype=$auctionType; 
ctx._source.auctionstatus=$auctionStatus; 
ctx._source.auctionprice=$auctionPrice; 
ctx._source.auctionendtime='$auctionEndTime'; 
ctx._source.auctionadult=$adultListingFlag;\, \upsert\: { \auctionid\: 
$auctionID, \auctiontype\: $auctionType, \auctionstatus\: 
$auctionStatus, \auctionprice\: $auctionPrice, \auctionendtime\: 
\$auctionEndTime\, \auctionadult\: $adultListingFlag, \domaintype\: 
\auction\, \fqdn\: \$fqdn\, \sld\: \$sld\, \tld\: \$tld\, 
\vendorid\: 6, \price\: 0, \commissionrate\: 0, \isfasttransfer\: 
false, \isadult\: $aFlag, \istaboo\: $tFlag, \sldlen\: $sldlen, 
\numhyphens\: $numhyphens, \numdigits\: $numdigits, \tokens\:  . 
(($tokens == null) ? '' : json_encode($tokens)) . }}

Creates a document if it doesn't exist, updates it if it does.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4173f9b5-1d46-49a8-9647-c01618ee97e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Increased query count after moving to nested documents

2014-11-20 Thread Ivan Brusic
We have always indexed nested documents, but never fully used them since
issue 3022 is still outstanding. Finally made the move to actually
filtering documents at the nested level.

Tracking metrics with graphite/grafana, I noticed immediately that the
active/current query count is much higher although the actual volume of
queries has not changed. The overall query count is normal. Is using join
queries increasing the number of queries reported?

Cheers,

Ivan

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAnkdygA8g2uN0f3DVJyKcGcObrykVFEZZn6uUgVbxXjg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Native script unable to get values, perhaps because it's a child doc? ES v1.1.1

2014-11-20 Thread Jonathan Foy
Yep, and I can search on it in other queries.

After testing most of the afternoon, I finally seem to have gotten it to 
work by pulling the field using the full name, including the nested path:

Long value = docFieldLongs(nestedPath.propertyName).getValue();

This seems to work in all three places, including the two types that also 
worked without the nested path, which is good.  Afterwords I came across this 
SO post 
http://stackoverflow.com/questions/21289149/trouble-with-has-parent-query-containing-scripted-function-score?rq=1
 
which sounds like a similar problem, though I'm not in a 
has_parent/has_child query (though I AM in a child type).  Sounds like it 
may be a bug.


On Thursday, November 20, 2014 4:00:56 PM UTC-5, Shiwen Cheng wrote:

 Hi, did you index the field you want to use in the native script?

 Shiwen


 On Thursday, 20 November 2014 11:38:30 UTC-8, Jonathan Foy wrote:

 Hello

 I have a native script that I'm using to score/sort queries and it is not 
 working properly for one of my three types.

 All three types have the same nested field, and I'm using the script to 
 check values and score/sort by an externally defined order.  However, for 
 one of the three types the values pulled from the doc fields are always 
 zero/null (using docFieldLongs(fieldName).getValue() or 
 docFieldStrings(stringValue).getValue()).  I can check for the fields to 
 be present using doc().containsKey(), and it seems to see them, but it 
 never actually sees any values.  I've pulled a few records manually and 
 verified that the data looks good.

 The only think I can think of that's different is that this one type is a 
 child of one of the other two, but I'm querying it completely independently 
 of the parent in this case.  Does this sound familiar to anyone by any 
 chance?



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/42ed3e0c-e5a9-4673-afb6-9a65213b91c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Documentation for internals and architecture of Elasticsearch

2014-11-20 Thread joergpra...@gmail.com
Look at the videos from Berlin Buzzwords 2011 and 2012

http://www.elasticsearch.org/videos/page/3/

They are a great intro

Jörg

On Thu, Nov 20, 2014 at 6:13 AM, Rahul Khengare rahulk1...@gmail.com
wrote:

 Hi All,

 When we provides documents or data objects to Elasticsearch using REST
 APIs. Elasticsearch store the data to local store or any node in ES cluster.

 I want to understand how elasticsearch store the data internally.
 Is there any documentation available on architecture and storing
 mechanism.

 Thanks in advance.


 Regards,
 Rahul Khengare

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c23adb7f-a1fd-481f-a2a7-0a88273033ab%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/c23adb7f-a1fd-481f-a2a7-0a88273033ab%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFYsrP8EsCNG9vf7kXtARrLikSuOm0%2BUj-LiHsXoXwzKQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Getting file text content from mapper?

2014-11-20 Thread David Pilato
So that’s the expected behavior.
Mapper attachment only index the content but never modify the _source document..

If you want to see extracted text, you need to store the field and explicitly 
ask for it at query time using fields option.

Have a look here: 
https://github.com/elasticsearch/elasticsearch-mapper-attachments#highlighting-attachments
 
https://github.com/elasticsearch/elasticsearch-mapper-attachments#highlighting-attachments


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
https://twitter.com/elasticsearchfr | @scrutmydocs 
https://twitter.com/scrutmydocs



 Le 20 nov. 2014 à 20:14, Raymond Giorgi raymondgio...@gmail.com a écrit :
 
 Also, this is the first line of what's posted along the river
 
 { index: {_index:resumes,_type:resume,_id:2158912}}
 
 Things can get truncated when they're as big as a Base64 encoded file :)
 
 
 On Wednesday, November 19, 2014 6:01:29 PM UTC-5, Raymond Giorgi wrote:
 Hey all,
 
 I'm hoping someone can help me out with something I'm having an issue with.
 
 The short: I'm trying to extract plaintext from the attachment-mapper.
 
 The long: I'm posting the contents of a file Base64 encoded to RabbitMQ which 
 is feeding an ElasticSearch river plugin. Querying against the field works 
 fine, but it only seems to store the Base64 encoding of the file instead of 
 the plaintext. I'd like to extract the contents as plaintext and have that be 
 returnable (i.e. query for the text of a docx). I'm feeding it from a PHP 
 front end, so there are places in the app where I'd like to rely on 
 Elasticsearch's built in Tika processor.
 
 Thanks!
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com 
 mailto:elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/68456ac0-14b9-49f8-a0a0-b930223004f8%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/68456ac0-14b9-49f8-a0a0-b930223004f8%40googlegroups.com?utm_medium=emailutm_source=footer.
 For more options, visit https://groups.google.com/d/optout 
 https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8A848658-E1A7-4192-B66C-104D664C7A66%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


elasticsearch JAVA version and JDK version?

2014-11-20 Thread Thong Bui
Hi all,

I am new to elasticsearch java API and have some questions?
1) What is the minimum JDK to be used with elasticsearch java API version 
1.4.0?
2) Is there a version of elasticsearch java API that works with JDK 1.6.0?

Thank you!

Thong

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/14fc7889-2cf9-4ba5-a536-710b21439adb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch JAVA version and JDK version?

2014-11-20 Thread Mark Walkom
1.4.X is 1.7u55 or 1.8u20
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html#jvm-version

You'd have to dig back through the older versions of the docs to find what
is supported with Java 1.6.0, but I know 0.90.X was.

On 21 November 2014 11:17, Thong Bui t...@rhapsody.com wrote:

 Hi all,

 I am new to elasticsearch java API and have some questions?
 1) What is the minimum JDK to be used with elasticsearch java API version
 1.4.0?
 2) Is there a version of elasticsearch java API that works with JDK 1.6.0?

 Thank you!

 Thong

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/14fc7889-2cf9-4ba5-a536-710b21439adb%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/14fc7889-2cf9-4ba5-a536-710b21439adb%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZ%3DdXXuHVPWtsxvCV_TuUnYdsK8Lhg9Em0kY%2Be44dFS99A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Bool and And filter, which is faster?

2014-11-20 Thread Fei Xie

In this 
article 
http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/, 
it's saying bool is faster than add/or filters. But at that time it's 
elasticsearch 0.9.
Is this still the truth?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/69acf937-ea05-4bf0-b3a6-f469644f842d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


RE: 1.4.0 data node can't join existing 1.3.4 cluster

2014-11-20 Thread Christian Hedegaard
FYI, I have found a solution that works (at least for me).

I’ve got a small cluster for testing, only 4 v1.3.5 nodes. What I’ve done is 
bring up 4X new v1.4.0 nodes as data-only machines. In the yaml I added a line 
to point the nodes via unicast explicitly to the current master:
discovery.zen.ping.unicast.hosts: [10.210.9.224:9300]

When I restarted elasticsearch with that setting, with cloud-aws installed and 
configured on version 2.4.0, the new nodes found the cluster and properly 
joined it.

I will now start nuking the old v1.3.5 nodes to migrate the data off of them. 
Before the final 1.3.5 node is nuked, I will change the config on one of the 
v1.4.0 nodes to allow it as master and restart it.

I’m not sure if the master stuff is needed or not, but I was very afraid of a 
split-brain problem. I have another 4-node testing cluster that I will be able 
to try this upgrade again with in a more controlled manner.

I’m NOT looking forward to upgrading our current production cluster this way 
(15 data-only nodes, 3 master-only nodes).

So it would appear that the problem is somewhere in the unicast discovery code. 
The question is who’s to blame? Elasticsearch or the cloud-aws plugin?



From: Boaz Leskes [mailto:b.les...@gmail.com]
Sent: Wednesday, November 19, 2014 2:27 PM
To: elasticsearch@googlegroups.com
Cc: Christian Hedegaard
Subject: Re: 1.4.0 data node can't join existing 1.3.4 cluster

Hi Christian,

I'm not sure what thread you refer to exactly, but this shouldn't happen. Can 
you describe the problem you have some more? Anything in the nodes? (both the 
1.4 node and the master)

Cheers,
Boaz

On Wednesday, November 19, 2014 2:39:57 AM UTC+1, Christian Hedegaard wrote:
I found this thread while trying to research the same issue and it looks like 
there is currently no resolution. We like to keep up on our elasticsearch 
upgrades as often as possible and do rolling upgrades to keep our clusters up. 
When testing I’m having the same issue, I cannot add a 1.4.0 box to the 
existing 1.3.4 cluster.

Is there a fix for this anticipated?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.com.
For more options, visit https://groups.google.com/d/optout.


Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Konstantin Erman
I work on an experimental cluster of ES nodes running on Windows Server 
machines. Once in a while we have a need to reboot machines. The initial 
state - cluster is green and well balanced. One machine is gracefully taken 
offline and then after necessary service is performed it comes back online. 
All the hardware and file system content is intact. As soon as ES service 
starts on that machine, it assumes that there is no usable data locally and 
recovers as much data as it deems necessary for balancing from other nodes. 

This behavior puzzles me, because most of the data shards stored on that 
machine file system can be reused as they are. Cluster stores logs, so all 
indices except those for the current day never ever change until they get 
deleted. Can't ES node detect that it has perfect copies of some (actually 
most) of the shards and instead of copying them over just mark them as up 
to date? 

I suspect I don't know about some step to enable this behavior and I'm 
looking to enable it. Any advice? 

Thank you!
Konstantin

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4fb2d8bc-7787-43e3-8c66-e241945d496b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Mark Walkom
You should disable allocation before you reboot, that will save a lot of
shard shuffling -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades

On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com wrote:

 I work on an experimental cluster of ES nodes running on Windows Server
 machines. Once in a while we have a need to reboot machines. The initial
 state - cluster is green and well balanced. One machine is gracefully taken
 offline and then after necessary service is performed it comes back online.
 All the hardware and file system content is intact. As soon as ES service
 starts on that machine, it assumes that there is no usable data locally and
 recovers as much data as it deems necessary for balancing from other nodes.

 This behavior puzzles me, because most of the data shards stored on that
 machine file system can be reused as they are. Cluster stores logs, so all
 indices except those for the current day never ever change until they get
 deleted. Can't ES node detect that it has perfect copies of some (actually
 most) of the shards and instead of copying them over just mark them as up
 to date?

 I suspect I don't know about some step to enable this behavior and I'm
 looking to enable it. Any advice?

 Thank you!
 Konstantin

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4fb2d8bc-7787-43e3-8c66-e241945d496b%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/4fb2d8bc-7787-43e3-8c66-e241945d496b%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmc_rMFzRUUrJSMJ9bY16tz-dZ8eSeUZobC7XaxWZTRPg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Root type mapping not empty after parsing

2014-11-20 Thread samatha kankipati
Hi 

I am trying to upgrade from ES 0.90.2 to 1.4.0


I am using java api to set the settings of index, this was working fine 
with 0.90.2

client.admin().indices().prepareCreate(indexName).setSettings(_).execute().actionGet()


here are my settings:
{
index: {
analysis: {
analyzer: {
keyword_lowercase: {
type: custom,
tokenizer: keyword,
filter: lowercase
},
standard_lowercase: {
type: custom,
tokenizer: standard,
filter: lowercase
}
}
}
}
}


Getting Root type mapping not empty after parsing error now..
Can you please suggest any solutions?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5c74a16-0ba3-436d-8e6f-87c52d1f31e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Yves Dorfsman

If you do disable allocation before you reboot a node and a client writes 
to a shard that had a replica on that node, does the entire replica gets 
copied when the node come up? Or does it get just updated?

On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote:

 You should disable allocation before you reboot, that will save a lot of 
 shard shuffling - 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades

 On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com 
 javascript: wrote:

 I work on an experimental cluster of ES nodes running on Windows Server 
 machines. Once in a while we have a need to reboot machines. The initial 
 state - cluster is green and well balanced. One machine is gracefully taken 
 offline and then after necessary service is performed it comes back online. 
 All the hardware and file system content is intact. As soon as ES service 
 starts on that machine, it assumes that there is no usable data locally and 
 recovers as much data as it deems necessary for balancing from other nodes. 

 This behavior puzzles me, because most of the data shards stored on that 
 machine file system can be reused as they are. Cluster stores logs, so all 
 indices except those for the current day never ever change until they get 
 deleted. Can't ES node detect that it has perfect copies of some (actually 
 most) of the shards and instead of copying them over just mark them as up 
 to date? 

 I suspect I don't know about some step to enable this behavior and I'm 
 looking to enable it. Any advice? 

 Thank you!
 Konstantin



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: priming data for a new node

2014-11-20 Thread Yves Dorfsman

So if a shard has been updated since the data copy, will it copy the entire 
shard, or just update it?

On Wednesday, 19 November 2014 23:34:01 UTC-7, Mark Walkom wrote:

 It doesn't copy everything, only what it needs to balance the shards.

 On 20 November 2014 17:20, Yves Dorfsman yv...@zioup.com javascript: 
 wrote:

 When adding a new node to a cluster, is there a way to prevent it from 
 having
 to copy all the data from the other nodes?

 We tried to copy the data on disk from an existing node (one that had all 
 the
 data for the given indices), but it still copied everything. Is there a 
 way to
 make it update what is new only?

 Thanks.

 --
 http://yves.zioup.com
 gpg: 4096R/32B0F416




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7b30007-972b-40cb-a5b0-5eb1c1b738c5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Mark Walkom
It will enter recovery where it syncs at the segment level from the current
primary, then the translog gets shipped over and (re)played, which brings
it all up to date.

On 21 November 2014 14:51, Yves Dorfsman y...@zioup.com wrote:


 If you do disable allocation before you reboot a node and a client writes
 to a shard that had a replica on that node, does the entire replica gets
 copied when the node come up? Or does it get just updated?

 On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote:

 You should disable allocation before you reboot, that will save a lot of
 shard shuffling - http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/setup-upgrade.html#rolling-upgrades

 On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com wrote:

 I work on an experimental cluster of ES nodes running on Windows Server
 machines. Once in a while we have a need to reboot machines. The initial
 state - cluster is green and well balanced. One machine is gracefully taken
 offline and then after necessary service is performed it comes back online.
 All the hardware and file system content is intact. As soon as ES service
 starts on that machine, it assumes that there is no usable data locally and
 recovers as much data as it deems necessary for balancing from other nodes.

 This behavior puzzles me, because most of the data shards stored on that
 machine file system can be reused as they are. Cluster stores logs, so all
 indices except those for the current day never ever change until they get
 deleted. Can't ES node detect that it has perfect copies of some (actually
 most) of the shards and instead of copying them over just mark them as up
 to date?

 I suspect I don't know about some step to enable this behavior and I'm
 looking to enable it. Any advice?

 Thank you!
 Konstantin

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Nikolas Everett
The thing is that this is a disk level operation. It pretty much rsyncs the
files from the current master shard to the node when it comes back online.
This would be OK if the replica shards matched the master but that is only
normally the case if the shard was moved to the node after it was mostly
complete and then you've had only a few writes. Normally shards don't match
each other because the way the index is maintained is nondeterministic.

The translog replay is only used as a catch up after the rsync-like step.

This is something that is being worked on. Its certainly my biggest
complaint about elasticsearch but I'm confident that it'll get better.

Nik
On Nov 20, 2014 11:11 PM, Mark Walkom markwal...@gmail.com wrote:

 It will enter recovery where it syncs at the segment level from the
 current primary, then the translog gets shipped over and (re)played, which
 brings it all up to date.

 On 21 November 2014 14:51, Yves Dorfsman y...@zioup.com wrote:


 If you do disable allocation before you reboot a node and a client writes
 to a shard that had a replica on that node, does the entire replica gets
 copied when the node come up? Or does it get just updated?

 On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote:

 You should disable allocation before you reboot, that will save a lot of
 shard shuffling - http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/setup-upgrade.html#rolling-upgrades

 On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com wrote:

 I work on an experimental cluster of ES nodes running on Windows Server
 machines. Once in a while we have a need to reboot machines. The initial
 state - cluster is green and well balanced. One machine is gracefully taken
 offline and then after necessary service is performed it comes back online.
 All the hardware and file system content is intact. As soon as ES service
 starts on that machine, it assumes that there is no usable data locally and
 recovers as much data as it deems necessary for balancing from other nodes.

 This behavior puzzles me, because most of the data shards stored on
 that machine file system can be reused as they are. Cluster stores logs, so
 all indices except those for the current day never ever change until they
 get deleted. Can't ES node detect that it has perfect copies of some
 (actually most) of the shards and instead of copying them over just mark
 them as up to date?

 I suspect I don't know about some step to enable this behavior and I'm
 looking to enable it. Any advice?

 Thank you!
 Konstantin

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd09LRJk89wdHybYy48FMpCaYa1wTJ9HX9uX%2BjvNjvYq2g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


understaning terms syntax

2014-11-20 Thread GX
Hi All

Im having the following scenario (elasticsearch 1.0):
the query 
   query: {
term: {
ac: 3A822F3B-3ECF-4463-98F86DF6DE28EC5C
}
}

yields no results

but this works

query: {
 query_string : {
  default_field : ac,
  query : 3A822F3B-3ECF-4463-98F86DF6DE28EC5C
  }
  }

the problem is when I combine it with a must_not or not filter I still 
get the same results

what is the correct syntax I need?

Thanks

GX

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a1f4c507-f85a-4ebd-b71f-4962259abf5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.