When searching for 'Boss' with fuzziness, get higher score for 'Bose' than 'Boss'. ???? How Comes !?!?

2015-01-15 Thread Eylon Steiner
Any ideas?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52e09e54-90b6-4014-8454-34e3db5756e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Just initialize shards when problems but no rebalance

2015-01-15 Thread Matías Waisgold
Great, thank you. We are creating another cluster with more disk space to
avoid this situations.
By any chance do you have the link to the issue?

2015-01-15 13:26 GMT-03:00 Kimbro Staken ksta...@kstaken.com:

 I've experienced what you're describing. I called it a shard relocation
 storm and it's really tough to get under control. I opened a ticket on the
 issue and a fix was supposedly included in 1.4.2. What version are you
 running?

 If you want to truly manually manage this situation you could set
 cluster.routing.allocation.disk.threshold_enabled to false but that will
 likely cause other issues. I ended up just setting
 cluster.routing.allocation.disk.watermark.high to a really low value and
 actively managed shard allocations to prevent nodes from getting anywhere
 near that value. This is tricky as the way ES allocates shards it can
 easily run nodes out of disk if you're regularly creating new indices and
 those grow rapidly.

 Kimbro

 On Thu, Jan 15, 2015 at 6:14 AM, Matías Waisgold mwaisg...@gmail.com
 wrote:

 Yes, I've seen that but the problem is that when the threshold is reached
 it removes all shards from the server instead of just removing 1 and
 balance. And when that happens the cluster starts to move shards over
 everywhere and it never stops.

 Another problem we are having is that in the file storage we see data
 from shards that are not assigned to itself so it can´t allocate anything
 in this dirty state.

 2015-01-15 0:09 GMT-03:00 Mark Walkom markwal...@gmail.com:

 You could do this, but it's a lot of manual overhead to have to deal
 with.
 However ES does have some disk space awareness during allocation, take a
 look at
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/index-modules-allocation.html#disk

 On 15 January 2015 at 10:57, Matías Waisgold mwaisg...@gmail.com
 wrote:

 Hi is there any setting that I can put to ES that it automatically
 assigns shards that are unassigned but never ever rebalance the cluster?
 I´ve found several issues when rebalancing and prefer to do it manually.
 If I set cluster.routing.allocation.enable to none nothing happens.
 If I set it to all then it starts rebalancing.

 Is it ok to combine cluster.routing.allocation.allow_rebalance to
 none and cluster.routing.allocation.enable to all.

 The issue is mainly because we are running low on disk and when that
 happens elasticsearch removes all shards from an instance, that doesn´t
 care about cluster.routing.allocation.cluster_concurrent_rebalance and
 starts moving shards like crazy around the entire cluster, filling the
 storage on other instances in the way that it will never stop balancing.

 Kind regards

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 

Help creating a near real time streaming plugin to perform replication between clusters

2015-01-15 Thread Todd Nine
Hey all,
  I would like to create a plugin, and I need a hand.  Below are the 
requirements I have.


   - Our documents are immutable.  They are only ever created or deleted, 
   updates do not apply.
   - We want mirrors of our ES cluster in multiple AWS regions.  This way 
   if the WAN between regions is severed for any reason, we do not suffer an 
   outage, just a delay in consistency.
   - As documents are added or removed they are rolled up then shipped in 
   batch to the other AWS Regions.  This can be a fast as a few milliseconds, 
   or as slow as minutes, and will be user configurable.  Note that a full 
   backup+load is too slow, this is more of a near realtime operation.
   - This will sync the following operations.  
  - Index creation/deletion
  - Alias creation/deletion
  - Document creation/deletion
   

What I'm thinking architecturally.


   - The plugin is installed on each node in our cluster in all regions
   - The plugin will only gather changes for the primary shards on the 
   local node 
   - After the timeout elapses, the plugin will ship the changelog to the 
   other AWS regions, where the plugin will receive it and process it


Are there any api's I can look at that are a good starting point for 
developing this?  I'd like to do a simple prototype with 2 1 node clusters 
reasonably soon.  I found several plugin tutorials, but I'm more concerned 
with what part of the ES api I can call to receive events, if any.

Thanks,
Todd 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dff53da5-8a0c-4805-8f97-72844019a79e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Filter index to last 24h (REST)

2015-01-15 Thread David Pilato
A range filter on a date field with something like from now/d-1 to now/d+1 
might work I think.
If you don’t have a date field (could be a _timestamp field if you activated 
it), I’m afraid you can’t do that.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
https://twitter.com/elasticsearchfr | @scrutmydocs 
https://twitter.com/scrutmydocs



 Le 15 janv. 2015 à 18:15, Matthew acernu...@gmail.com a écrit :
 
 Hi all,
 
 Is there any way to only load the last 24 hours of indices? I am trying to 
 apply a query to only show the number of documents created over the last 24 
 hours (over the REST API), but I have not had too much luck.
 
 Thanks!
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com 
 mailto:elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/07576351-5bea-4f99-af51-16ff76791914%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/07576351-5bea-4f99-af51-16ff76791914%40googlegroups.com?utm_medium=emailutm_source=footer.
 For more options, visit https://groups.google.com/d/optout 
 https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0882E33E-371F-4B54-BAF9-CD0BABBD7E6F%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Filter index to last 24h (REST)

2015-01-15 Thread Matthew
Hi all,

Is there any way to only load the last 24 hours of indices? I am trying to 
apply a query to only show the number of documents created over the last 24 
hours (over the REST API), but I have not had too much luck.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/07576351-5bea-4f99-af51-16ff76791914%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation - Blank and date aggregation

2015-01-15 Thread Adrien Grand
Then it means that you want to use a date_histogram aggregation with
interval=day. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html

On Thu, Jan 15, 2015 at 4:43 PM, buddarapu nagaraju budda08n...@gmail.com
wrote:

 Hey Adrien ,Thank you.I have one more question on aggregating on dates .

 We actually stored date time in a field called createdDateTime but I
 need only aggregates on date part of date time .

 Any ideas ? Or sample code  can help us ?

 Regards
 Nagaraju
 908 517 6981

 On Wed, Jan 14, 2015 at 6:10 AM, Adrien Grand 
 adrien.gr...@elasticsearch.com wrote:



 On Wed, Jan 14, 2015 at 10:37 AM, buddarapu nagaraju 
 budda08n...@gmail.com wrote:

 Does term aggregation counts on blank field values ?


 Yes, an empty value  counts as a term. Note that you need the field to
 be not analyzed for it to work (or to use an analyzer that emits empty
 strings). Otherwise the standard analyzer would analyzer  as an empty
 list of tokens, so a field value of  would not actually count...


 Does term aggregation is enough for doing date aggregation ? Or there
 any specific aggregations we have ?All I need in date aggregation is to
 know different dates and its counts ?


 A terms aggregation is enough, but a date_histogram aggregation is
 generally more useful on dates as there are lots of unique values and it's
 often more useful to group them based on the year, month or day.

 --
 Adrien Grand

 --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/i9N09n_-n38/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74ZqbBN0zNW6-5Feu7xYTKkomzx%3DDMhx28inFVYLSu5Q%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74ZqbBN0zNW6-5Feu7xYTKkomzx%3DDMhx28inFVYLSu5Q%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAFtuXXKp0JycJfNvLxPGN_5YL7P-X%3DGDzvmYJQ9NFN7Q%2BaJjQw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAFtuXXKp0JycJfNvLxPGN_5YL7P-X%3DGDzvmYJQ9NFN7Q%2BaJjQw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7Nn8h7C9BoW6PUjHbS%2Bnerpw3%3DWUi5RrC5ewtDBtSRaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to find all docs where field_a === val1 and field_b === val2?

2015-01-15 Thread David Reagan
Thanks! I was thinking a bool query was something specific to fields with 
boolean values. Which is why I didn't understand the bool query example in 
the docs. Your posts helped me get what I wanted. :)

On Wednesday, January 14, 2015 at 3:34:05 PM UTC-8, Brian wrote:

 By the way, David, the full query follows:

 {
   from : 0,
   size : 20,
   timeout : 6,
   *query* : {
 *bool* : {
   *must* : [ {
 match : {
   field_a : {
 query : val1,
 type : boolean
   }
 }
   }, {
 match : {
   field_b : {
 query : val2,
 type : boolean
   }
 }
   } ]
 }
   },
   version : true,
   explain : false,
   fields : [ _ttl, _source ]
 }

 Also note that since the _ttl field is being requested (always), then the 
 _source must also be asked for explicitly. If you don't ask for any fields, 
 _source is returned by default. But if you ask for one or more fields 
 explicitly, then you must also ask for _source or it won't be returned.

 Brian

 On Wednesday, January 14, 2015 at 6:31:29 PM UTC-5, Brian wrote:

 David,

 This is what I use. I hope it helps.

 {
   *bool* : {
 *must* : [ {
   match : {
 field_a : {
   query : val1,
   type : boolean
 }
   }
 }, {
   match : {
 field_b : {
   query : val2,
   type : boolean
 }
   }
 } ]
   }
 }

 Brian



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/feb306b6-aa38-4eaf-a9fc-ad23be10ea4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How can I add additional parameters in aggregation?

2015-01-15 Thread Eylon Steiner
I have documents with id and name and title.
I am making aggregation according name, but how can I get in the results 
also the name and title?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b9456488-805b-4c5e-ad16-12cd9a0feaf2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is ElasticSearch truly scalable for analytics?

2015-01-15 Thread Mark Harwood
 Regarding the accuracy of top-k lists

This is perhaps an over-simplification -  we deal with far more complex 
scenarios than a simple, single top-K list - we have whole aggregation 
trees with multiple layers of aggs: geo, time, nested, parent/child, 
percentiles, cardinalities etc etc which can embed multiple top K terms 
aggs, or be contained by one. Today all aggs work in one pass over local 
data to produce a merge-able summary output - if you introduce the idea of 
pausing all of this local computation mid-stream and then resuming it once 
you've centrally determined what top K is across a cluster and for 
various points in the agg tree then coordinating all of these updates gets 
impossibly complex.

I acknowledge it is a highly specialised use-case which not very many 
people run into, but it is a case I'm currently working on.

To be fair multi-level merging is a capability which might also apply to 
analytics in federated architectures where proxy servers might act as the 
front to nodes in remote clusters.

I was thinking to reduce the complete set of buckets locally

I'm unclear on your approach to the reduce:
1) Take the summary outputs of multiple agg pipelines computed in parallel 
and merge them in the same way coordinating nodes do or
2) Take the raw inputs (doc streams) from all shards held on a node and 
feed them through a single aggregation pipeline to get one combined output

The problems being 1) loses accuracy and 2) loses any parallelism because 
agg pipelines are single threaded and must process doc streams serially.
Because you claimed accuracy would be better I guess you mean option 2?



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5967eb30-5bd8-42b8-aa35-1793dc77afa7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Questions about scaling elasticsearch with regard to the number of documents indexed per second

2015-01-15 Thread Chinch Pokli
Awesome! Great to know that. So as a conclusion the steps will be:
1) Stream tweets from twitter
2) Use the bulk API to make batches of 1000 (or more) tweets
3) Once the batch size is reached, spawn a new thread which will index the 
data into ES, meanwhile my original thread will continue streaming tweets

Do these steps sound alright to you or did I miss something?

On Thursday, January 15, 2015 at 7:58:19 PM UTC+5:30, David Pilato wrote:

 I can index on my laptop 1-12000 docs per second. SSD drives of course.

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 15 janv. 2015 à 13:43, Chinch Pokli cpo...@gmail.com javascript: a 
 écrit :

 No, so the whole point was that, will elasticsearch be able to index say 
 10,000 documents per second? If yes, I can simply hook up my twitter code 
 to es. If not, I would need to think of how to make that happen.
 Typically I've seen es indexes just around 30 docs per second which is 
 pretty low.

 I am hoping Redis/ Kafka/ Logstash/ etc. might help elasticsearch to get 
 some breathing room and enable it to index up to 10K docs per second.

 On Thursday, January 15, 2015 at 10:47:31 AM UTC+5:30, David Pilato wrote:

 You have a Twitter input so you can extract content from Twitter and send 
 to elasticsearch. No need to have Redis here. 

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 15 janv. 2015 à 00:02, Chinch Pokli cpo...@gmail.com a écrit :

 Thanks. I'll have a look at the raw option.
 Regarding logstash, I don't fully understand it's utility. It says that 
 it can take messages from a Redis server. But if I have to set up Redis, I 
 could simply use the Redis river to index into Elasticsearch. Is there any 
 additional benefit that Logstash would give me?

 On Thursday, January 15, 2015 at 4:06:12 AM UTC+5:30, David Pilato wrote:

 You should look at raw option or better look at Logstash.

 My 2 cents.

 David

 Le 14 janv. 2015 à 23:29, Chinch Pokli cpo...@gmail.com a écrit :

 Hi,

 I am using elasticsearch to index twitter stream. Until recently I was 
 using the official river which was working great but realized that it 
 throwing out much of the data (e.g. it is not storing number of followers 
 etc. data).

 Is there a way to make the river to store all the data? If not, I am 
 fine with writing a streaming code which will stream and index. But have a 
 concern. How many documents can elasticsearch index per second? I might 
 eventually need to index almost 10,000 documents (each document = 2 KB) per 
 second (current requirement is of 100 documents per second). Is this even 
 feasible? If yes, do I need to make any special modifications?

 Thanks-in-advance!!

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/11bf4f30-d7f6-41ac-886a-c5281dac31bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is ElasticSearch truly scalable for analytics?

2015-01-15 Thread AlexR
I would be also very interested in node level shard results reduction but not 
for scalability but precision reasons. I would like to have an option for a 
node to do complete aggregations on its shards so the results are exact rather 
than approximate. There are many use cases when corpus of data is reltively 
small to fit one powerful node and exactness is a MUST. With 48 core servers 
and ssd drives such node can process good deal of data and produce exact 
results which is a must for traditional datamart-like apps. Having this option 
will allow for this class of apps to be built. And in myltinode setup it wull 
provide better precision too

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d3fb8f8d-4563-4e97-b0fd-3cc220f252bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Questions about scaling elasticsearch with regard to the number of documents indexed per second

2015-01-15 Thread David Pilato
Sounds good.
If you are using Java, you could also look at the river code.
Note that you should use BulkProcessor class which is super handy.

BTW I said 1/s but not for tweets. I have less fields (20) than Twitter 
(100).
With more fields, I guess it would take more time. Though with better machines, 
it could work. I'd say that you need to test on the production cluster.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 15 janv. 2015 à 15:40, Chinch Pokli cpo...@gmail.com a écrit :
 
 Awesome! Great to know that. So as a conclusion the steps will be:
 1) Stream tweets from twitter
 2) Use the bulk API to make batches of 1000 (or more) tweets
 3) Once the batch size is reached, spawn a new thread which will index the 
 data into ES, meanwhile my original thread will continue streaming tweets
 
 Do these steps sound alright to you or did I miss something?
 
 On Thursday, January 15, 2015 at 7:58:19 PM UTC+5:30, David Pilato wrote:
 I can index on my laptop 1-12000 docs per second. SSD drives of course.
 
 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
 
 Le 15 janv. 2015 à 13:43, Chinch Pokli cpo...@gmail.com a écrit :
 
 No, so the whole point was that, will elasticsearch be able to index say 
 10,000 documents per second? If yes, I can simply hook up my twitter code 
 to es. If not, I would need to think of how to make that happen.
 Typically I've seen es indexes just around 30 docs per second which is 
 pretty low.
 
 I am hoping Redis/ Kafka/ Logstash/ etc. might help elasticsearch to get 
 some breathing room and enable it to index up to 10K docs per second.
 
 On Thursday, January 15, 2015 at 10:47:31 AM UTC+5:30, David Pilato wrote:
 You have a Twitter input so you can extract content from Twitter and send 
 to elasticsearch. No need to have Redis here. 
 
 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
 
 Le 15 janv. 2015 à 00:02, Chinch Pokli cpo...@gmail.com a écrit :
 
 Thanks. I'll have a look at the raw option.
 Regarding logstash, I don't fully understand it's utility. It says that 
 it can take messages from a Redis server. But if I have to set up Redis, 
 I could simply use the Redis river to index into Elasticsearch. Is there 
 any additional benefit that Logstash would give me?
 
 On Thursday, January 15, 2015 at 4:06:12 AM UTC+5:30, David Pilato wrote:
 You should look at raw option or better look at Logstash.
 
 My 2 cents.
 
 David
 
 Le 14 janv. 2015 à 23:29, Chinch Pokli cpo...@gmail.com a écrit :
 
 Hi,
 
 I am using elasticsearch to index twitter stream. Until recently I was 
 using the official river which was working great but realized that it 
 throwing out much of the data (e.g. it is not storing number of 
 followers etc. data).
 
 Is there a way to make the river to store all the data? If not, I am 
 fine with writing a streaming code which will stream and index. But 
 have a concern. How many documents can elasticsearch index per second? 
 I might eventually need to index almost 10,000 documents (each document 
 = 2 KB) per second (current requirement is of 100 documents per 
 second). Is this even feasible? If yes, do I need to make any special 
 modifications?
 
 Thanks-in-advance!!
 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
  
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/11bf4f30-d7f6-41ac-886a-c5281dac31bd%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


Re: need help for search

2015-01-15 Thread David Pilato
No worries for your english.
Sorry. I missed your gist.

Based on your examples, it sounds like you are french. Are you aware of the 
french mailing list? 
https://groups.google.com/forum/?hl=frfromgroups#!forum/elasticsearch-fr 
https://groups.google.com/forum/?hl=frfromgroups#!forum/elasticsearch-fr

It would help a lot if you can simplify with some sample data and small queries 
what you are trying to do what does not work.
So suppress all analyzers as I guess here it’s not really your concern at this 
stage.
Try with only two or 3 fields.


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
https://twitter.com/elasticsearchfr | @scrutmydocs 
https://twitter.com/scrutmydocs



 Le 15 janv. 2015 à 17:13, Thibaut Owczarz thib...@1001pharmacies.com a 
 écrit :
 
 hi,
 
 in my structure send in my gist,
 my question is just that:
 
 i have a search field. no say what i type in this field.
 but i need 1 request like this.
 {
 query : {
 bool: {
 must: [ ],
 must_not: [ ],
 should: [
 {
 term: {
 sku: $datasearch
 }
 },
 {
 term: {
internal_code: $datasearch
 }
 },
 {
 match: {
firstname: $datasearch
 }
 },
 {
 match: {
lastname: $datasearch
 }
 },
 {
 match: {
address: $datasearch
 }
 },
 {
 match: {
city: $datasearch
 }
 },   
 {
 match: {
localized_description: $datasearch
 }
 },   
 {
 match: {
localized_keywords: $datasearch
 }
 },
 {
 match: {
service.localized_label: $datasearch
 }
 },
 {
 match: {
medias.localized_label: $datasearch
 }
 },
 {
 match: {
services.localized_label: $datasearch
 }
 }
 ]
 }
 }
 }';
 
 Exemple :
 -
 - if $datasearch=sku, i have directly 1 user with this sku
 - if $datasearch=firstname, i have directly a list of user who have this 
 firstname
 - if $datasearch=keyword, i have list of user who have this keyword
 
 - i take term for sku or internal_code because i can't search whith partial 
 of this. (if my sku = 1234, no could found result if i type 123)
 
 - And for finish, in my data i have user : 
 [1 - charles martin who have localized_keywords=moto, licorne, cheval, 
 course ] 
 [2 - henry martin who have localized_keywords=pétanque, chevaux, basket, 
 parieur]
 i want with my request have this 2 user if $datasearch = cheval.
 
 I hope to be me understand , I can have a bad English
 
 thanks
 
 Le jeudi 15 janvier 2015 16:17:08 UTC+1, David Pilato a écrit :
 Could you reproduce this with a full test case so we understand exactly What 
 you are doing?
 May be simplify your test.
 
 See elasticsearch.org/help http://elasticsearch.org/help
 
 
 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
 
 Le 15 janv. 2015 à 16:01, Thibaut Owczarz thi...@1001pharmacies.com 
 javascript: a écrit :
 
 i'm ok, but my data search no say if is sku or code_internal or other field.
 
 if i do that, it's ok
 {
   query: {
 bool: {
   must: [
 {
   term: {
 sku: 01b3ae496c0142f993cf131c607fe003
   }
 }
   ],
   must_not: [],
   should: [
   {
 term: {
internal_code: 01b3ae496c0142f993cf131c607fe003
 }
  },
 
 {
   match: {
 firstname: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 lastname: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 address: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 city: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 localized_description: 01b3ae496c0142f993cf131c607fe003
   }
  

Re: need help for search

2015-01-15 Thread David Pilato
Could you reproduce this with a full test case so we understand exactly What 
you are doing?
May be simplify your test.

See elasticsearch.org/help


--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 15 janv. 2015 à 16:01, Thibaut Owczarz thib...@1001pharmacies.com a 
 écrit :
 
 i'm ok, but my data search no say if is sku or code_internal or other field.
 
 if i do that, it's ok
 {
   query: {
 bool: {
   must: [
 {
   term: {
 sku: 01b3ae496c0142f993cf131c607fe003
   }
 }
   ],
   must_not: [],
   should: [
   {
 term: {
internal_code: 01b3ae496c0142f993cf131c607fe003
 }
  },
 
 {
   match: {
 firstname: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 lastname: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 address: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 city: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 localized_description: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 localized_keywords: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 service.localized_label: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 medias.localized_label: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 services.localized_label: 01b3ae496c0142f993cf131c607fe003
   }
 }
   ]
 }
   }
 }
 
 but if now i search with internal_code
 {
   query: {
 bool: {
   must: [
 {
   term: {
 sku: 3401598272746
   }
 }
   ],
   must_not: [],
   should: [
   {
 term: {
internal_code: 3401598272746
 }
  },
 
 {
   match: {
 firstname: 3401598272746
   }
 },
 {
   match: {
 lastname: 3401598272746
   }
 },
 {
   match: {
 address: 3401598272746
   }
 },
 {
   match: {
 city: 3401598272746
   }
 },
 {
   match: {
 localized_description: 3401598272746
   }
 },
 {
   match: {
 localized_keywords: 3401598272746
   }
 },
 {
   match: {
 service.localized_label: 3401598272746
   }
 },
 {
   match: {
 medias.localized_label: 3401598272746
   }
 },
 {
   match: {
 services.localized_label: 3401598272746
   }
 }
   ]
 }
   }
 }
 my request is bad
 
 
 Le jeudi 15 janvier 2015 15:49:56 UTC+1, David Pilato a écrit :
 
 I guess it's most likely because you added all your filters in should clause 
 instead of must?
 
 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
 
 Le 15 janv. 2015 à 15:36, Thibaut Owczarz thi...@1001pharmacies.com a 
 écrit :
 
 i found my first error, no need user. because i search already in user.
 but why when i search a defined sku, no found only one ?
 
 
 curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{
 query : {
 bool: {
 must: [ ],
 must_not: [ ],
 should: [
 {
 term: {
 sku: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 term: {
internal_code: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
firstname: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
lastname: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
address: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
city: 01b3ae496c0142f993cf131c607fe003
 }
 },   
 {
 match: {
localized_description: 
 01b3ae496c0142f993cf131c607fe003
 }
 },   
 {
 match: {
   

Re: Aggregation - Blank and date aggregation

2015-01-15 Thread buddarapu nagaraju
Hey Adrien ,Thank you.I have one more question on aggregating on dates .

We actually stored date time in a field called createdDateTime but I need
only aggregates on date part of date time .

Any ideas ? Or sample code  can help us ?

Regards
Nagaraju
908 517 6981

On Wed, Jan 14, 2015 at 6:10 AM, Adrien Grand 
adrien.gr...@elasticsearch.com wrote:



 On Wed, Jan 14, 2015 at 10:37 AM, buddarapu nagaraju 
 budda08n...@gmail.com wrote:

 Does term aggregation counts on blank field values ?


 Yes, an empty value  counts as a term. Note that you need the field to
 be not analyzed for it to work (or to use an analyzer that emits empty
 strings). Otherwise the standard analyzer would analyzer  as an empty
 list of tokens, so a field value of  would not actually count...


 Does term aggregation is enough for doing date aggregation ? Or there any
 specific aggregations we have ?All I need in date aggregation is to know
 different dates and its counts ?


 A terms aggregation is enough, but a date_histogram aggregation is
 generally more useful on dates as there are lots of unique values and it's
 often more useful to group them based on the year, month or day.

 --
 Adrien Grand

 --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/i9N09n_-n38/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74ZqbBN0zNW6-5Feu7xYTKkomzx%3DDMhx28inFVYLSu5Q%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74ZqbBN0zNW6-5Feu7xYTKkomzx%3DDMhx28inFVYLSu5Q%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFtuXXKp0JycJfNvLxPGN_5YL7P-X%3DGDzvmYJQ9NFN7Q%2BaJjQw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: When searching for 'Boss' with fuzziness, get higher score for 'Bose' than 'Boss'. ???? How Comes !?!?

2015-01-15 Thread Adrien Grand
This is because the score takes two factors into account: the document
frequency and the edit distance. Quite likely in your case, even though
Boss is closer than Bose, Bose has a much lower document frequency which
helped it eventually get a better score. I guess we should have another
rewrite method that would not take freqs into account (or somehow merge
them) to avoid that issue.

On Thu, Jan 15, 2015 at 4:06 PM, Eylon Steiner eylon.stei...@gmail.com
wrote:

 Any ideas?


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/52e09e54-90b6-4014-8454-34e3db5756e5%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/52e09e54-90b6-4014-8454-34e3db5756e5%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7-7SbX_CVizbC%3DwCf9jyNSfkn4zy-GEqEj0sdBZGkRrg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-01-15 Thread Julien Naour
Hi,

I work on a complex workflow using Spark (Parsing, Cleaning, Machine 
Learning).
At the end of the workflow I want to send aggregated results to 
elasticsearch so my portal could query data.
There will be two types of processing: streaming and the possibility to 
relaunch workflow on all available data.

Right now I use elasticsearch-hadoop and particularly the spark part to 
send document to elasticsearch with the saveJsonToEs(myindex, mytype) 
method.
The target is to have an index by day using the proper template that we 
build.
AFAIK you could not add consideration of a feature in a document to send it 
to the proper index in elasticsearch-hadoop.

What is the proper way to implement this feature? 
Have a special step useing spark and bulk so that each executor send 
documents to the proper index considering the feature of each line?
Is there something that I missed in elasticsearch-hadoop?

Julien

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58b0e0e3-a297-4cf4-95bf-d3cf34546ea3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: need help for search

2015-01-15 Thread Thibaut Owczarz
hi,

in my structure send in my gist,
my question is just that:

i have a search field. no say what i type in this field.
but i need 1 request like this.
{
query : {
bool: {
must: [ ],
must_not: [ ],
should: [
{
term: {
sku: $datasearch
}
},
{
term: {
   internal_code: $datasearch
}
},
{
match: {
   firstname: $datasearch
}
},
{
match: {
   lastname: $datasearch
}
},
{
match: {
   address: $datasearch
}
},
{
match: {
   city: $datasearch
}
},   
{
match: {
   localized_description: $datasearch
}
},   
{
match: {
   localized_keywords: $datasearch
}
},
{
match: {
   service.localized_label: $datasearch
}
},
{
match: {
   medias.localized_label: $datasearch
}
},
{
match: {
   services.localized_label: $datasearch
}
}
]
}
}
}';

Exemple :
-
- if $datasearch=sku, i have directly 1 user with this sku
- if $datasearch=firstname, i have directly a list of user who have this 
firstname
- if $datasearch=keyword, i have list of user who have this keyword

- i take term for sku or internal_code because i can't search whith partial 
of this. (if my sku = 1234, no could found result if i type 123)

- And for finish, in my data i have user : 
[1 - charles martin who have localized_keywords=moto, licorne, cheval, 
course ] 
[2 - henry martin who have localized_keywords=pétanque, chevaux, basket, 
parieur]
i want with my request have this 2 user if $datasearch = cheval.

I hope to be me understand , I can have a bad English


thanks


Le jeudi 15 janvier 2015 16:17:08 UTC+1, David Pilato a écrit :

 Could you reproduce this with a full test case so we understand exactly 
 What you are doing?
 May be simplify your test.

 See elasticsearch.org/help


 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 15 janv. 2015 à 16:01, Thibaut Owczarz thi...@1001pharmacies.com 
 javascript: a écrit :

 i'm ok, but my data search no say if is sku or code_internal or other 
 field.

 if i do that, it's ok
 {
   query: {
 bool: {
   must: [
 {
   term: {
 sku: 01b3ae496c0142f993cf131c607fe003
   }
 }
   ],
   must_not: [],
   should: [
   {

 term: {
internal_code: 01b3ae496c0142f993cf131c607fe003
 }
  },

 {
   match: {
 firstname: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 lastname: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 address: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 city: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 localized_description: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 localized_keywords: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 service.localized_label: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 medias.localized_label: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 services.localized_label: 01b3ae496c0142f993cf131c607fe003
   }
 }
   ]
 }
   }
 }

 but if now i search with internal_code
 {
   query: {
 bool: {
   must: [
 {
   term: {
 sku: 3401598272746
   }
 }
   ],
   must_not: [],
   should: [
   {

 term: {
internal_code: 3401598272746

 }
  },

 {
   match: {
 firstname: 3401598272746
   }
   

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-01-15 Thread Julien Naour
My previous idea doesn't seem to work. Cannot send documents directly to 
_bulk only to index/type pattern

On Thursday, January 15, 2015 at 4:17:57 PM UTC+1, Julien Naour wrote:

 Hi,

 I work on a complex workflow using Spark (Parsing, Cleaning, Machine 
 Learning).
 At the end of the workflow I want to send aggregated results to 
 elasticsearch so my portal could query data.
 There will be two types of processing: streaming and the possibility to 
 relaunch workflow on all available data.

 Right now I use elasticsearch-hadoop and particularly the spark part to 
 send document to elasticsearch with the saveJsonToEs(myindex, mytype) 
 method.
 The target is to have an index by day using the proper template that we 
 build.
 AFAIK you could not add consideration of a feature in a document to send 
 it to the proper index in elasticsearch-hadoop.

 What is the proper way to implement this feature? 
 Have a special step useing spark and bulk so that each executor send 
 documents to the proper index considering the feature of each line?
 Is there something that I missed in elasticsearch-hadoop?

 Julien


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f01bc8d0-0c04-4c82-8ddf-dc301b06179c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Just initialize shards when problems but no rebalance

2015-01-15 Thread Kimbro Staken
I've experienced what you're describing. I called it a shard relocation
storm and it's really tough to get under control. I opened a ticket on the
issue and a fix was supposedly included in 1.4.2. What version are you
running?

If you want to truly manually manage this situation you could set
cluster.routing.allocation.disk.threshold_enabled to false but that will
likely cause other issues. I ended up just setting
cluster.routing.allocation.disk.watermark.high to a really low value and
actively managed shard allocations to prevent nodes from getting anywhere
near that value. This is tricky as the way ES allocates shards it can
easily run nodes out of disk if you're regularly creating new indices and
those grow rapidly.

Kimbro

On Thu, Jan 15, 2015 at 6:14 AM, Matías Waisgold mwaisg...@gmail.com
wrote:

 Yes, I've seen that but the problem is that when the threshold is reached
 it removes all shards from the server instead of just removing 1 and
 balance. And when that happens the cluster starts to move shards over
 everywhere and it never stops.

 Another problem we are having is that in the file storage we see data from
 shards that are not assigned to itself so it can´t allocate anything in
 this dirty state.

 2015-01-15 0:09 GMT-03:00 Mark Walkom markwal...@gmail.com:

 You could do this, but it's a lot of manual overhead to have to deal with.
 However ES does have some disk space awareness during allocation, take a
 look at
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/index-modules-allocation.html#disk

 On 15 January 2015 at 10:57, Matías Waisgold mwaisg...@gmail.com wrote:

 Hi is there any setting that I can put to ES that it automatically
 assigns shards that are unassigned but never ever rebalance the cluster?
 I´ve found several issues when rebalancing and prefer to do it manually.
 If I set cluster.routing.allocation.enable to none nothing happens.
 If I set it to all then it starts rebalancing.

 Is it ok to combine cluster.routing.allocation.allow_rebalance to none
 and cluster.routing.allocation.enable to all.

 The issue is mainly because we are running low on disk and when that
 happens elasticsearch removes all shards from an instance, that doesn´t
 care about cluster.routing.allocation.cluster_concurrent_rebalance and
 starts moving shards like crazy around the entire cluster, filling the
 storage on other instances in the way that it will never stop balancing.

 Kind regards

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAA0DmXaW8AdZJhGPGTRqD%3DYCSQ%2B2JdM-oGGpxkRgi0BZLOw2rg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: need help for search

2015-01-15 Thread David Pilato
I guess it's most likely because you added all your filters in should clause 
instead of must?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 15 janv. 2015 à 15:36, Thibaut Owczarz thib...@1001pharmacies.com a 
 écrit :
 
 i found my first error, no need user. because i search already in user.
 but why when i search a defined sku, no found only one ?
 
 
 curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{
 query : {
 bool: {
 must: [ ],
 must_not: [ ],
 should: [
 {
 term: {
 sku: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 term: {
internal_code: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
firstname: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
lastname: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
address: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
city: 01b3ae496c0142f993cf131c607fe003
 }
 },   
 {
 match: {
localized_description: 
 01b3ae496c0142f993cf131c607fe003
 }
 },   
 {
 match: {
localized_keywords: 
 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
service.localized_label: 
 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
medias.localized_label: 
 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
services.localized_label: 
 01b3ae496c0142f993cf131c607fe003
 }
 }
 ]
 }
 }
 }';
 
 they return all my users.
 
 Thanks
 
 Le jeudi 15 janvier 2015 14:58:16 UTC+1, Thibaut Owczarz a écrit :
 
 Hello,
  I start learning Elasticsearch, and i have a problem for understand how 
 search. anyone could help me? 
 
 My gist for all my structure and my data is here
 https://gist.github.com/thibaut1001/7a3000c3ff371be3a52d
 
 My problem is just in 4part
 To search in multi field by data like this
 
 ## We need to search henry in field selected
 curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{
 query : {
 bool: {
 must: [ ],
 must_not: [ ],
 should: [
 {
 term: {
 user.sku: henry
 }
 },
 {
 term: {
user.internal_code: henry
 }
 },
 {
 term: {
user.firstname: henry
 }
 },
 {
 term: {
user.lastname: henry
 }
 },
 {
 term: {
user.address: henry
 }
 },
 {
 term: {
user.city: henry
 }
 },
 {
 term: {
user.localized_description: henry
 }
 },
 {
 term: {
user.localized_keywords: henry
 }
 },
 {
 term: {
user.service.localized_label: henry
 }
 },
 {
 term: {
user.medias.localized_label: henry
 }
 },
 {
 term: {
user.services.localized_label: henry
 }
 }
 ]
 }
 }
 }';
 ## Return no results Why?
 
 I have many question.
 Could you help me please,
 thanks
 
 -- 
 You received this message because you are subscribed to the 

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-01-15 Thread Julien Naour
I think I have a solution:
Build JSON files so I could send it directly to _bulk
saveJsonToEs(_bulk)

Not sure if it will be optimized or even worked, I'll try.

On Thursday, January 15, 2015 at 4:17:57 PM UTC+1, Julien Naour wrote:

 Hi,

 I work on a complex workflow using Spark (Parsing, Cleaning, Machine 
 Learning).
 At the end of the workflow I want to send aggregated results to 
 elasticsearch so my portal could query data.
 There will be two types of processing: streaming and the possibility to 
 relaunch workflow on all available data.

 Right now I use elasticsearch-hadoop and particularly the spark part to 
 send document to elasticsearch with the saveJsonToEs(myindex, mytype) 
 method.
 The target is to have an index by day using the proper template that we 
 build.
 AFAIK you could not add consideration of a feature in a document to send 
 it to the proper index in elasticsearch-hadoop.

 What is the proper way to implement this feature? 
 Have a special step useing spark and bulk so that each executor send 
 documents to the proper index considering the feature of each line?
 Is there something that I missed in elasticsearch-hadoop?

 Julien


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b9bba847-9e64-4336-92d9-80cd52c081d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: need help for search

2015-01-15 Thread Thibaut Owczarz
i'm ok, but my data search no say if is sku or code_internal or other field.

if i do that, it's ok
{
  query: {
bool: {
  must: [
{
  term: {
sku: 01b3ae496c0142f993cf131c607fe003
  }
}
  ],
  must_not: [],
  should: [
  {

term: {
   internal_code: 01b3ae496c0142f993cf131c607fe003
}
 },

{
  match: {
firstname: 01b3ae496c0142f993cf131c607fe003
  }
},
{
  match: {
lastname: 01b3ae496c0142f993cf131c607fe003
  }
},
{
  match: {
address: 01b3ae496c0142f993cf131c607fe003
  }
},
{
  match: {
city: 01b3ae496c0142f993cf131c607fe003
  }
},
{
  match: {
localized_description: 01b3ae496c0142f993cf131c607fe003
  }
},
{
  match: {
localized_keywords: 01b3ae496c0142f993cf131c607fe003
  }
},
{
  match: {
service.localized_label: 01b3ae496c0142f993cf131c607fe003
  }
},
{
  match: {
medias.localized_label: 01b3ae496c0142f993cf131c607fe003
  }
},
{
  match: {
services.localized_label: 01b3ae496c0142f993cf131c607fe003
  }
}
  ]
}
  }
}

but if now i search with internal_code
{
  query: {
bool: {
  must: [
{
  term: {
sku: 3401598272746
  }
}
  ],
  must_not: [],
  should: [
  {

term: {
   internal_code: 3401598272746

}
 },

{
  match: {
firstname: 3401598272746
  }
},
{
  match: {
lastname: 3401598272746
  }
},
{
  match: {
address: 3401598272746
  }
},
{
  match: {
city: 3401598272746
  }
},
{
  match: {
localized_description: 3401598272746
  }
},
{
  match: {
localized_keywords: 3401598272746
  }
},
{
  match: {
service.localized_label: 3401598272746
  }
},
{
  match: {
medias.localized_label: 3401598272746
  }
},
{
  match: {
services.localized_label: 3401598272746
  }
}
  ]
}
  }
}
my request is bad


Le jeudi 15 janvier 2015 15:49:56 UTC+1, David Pilato a écrit :

 I guess it's most likely because you added all your filters in should 
 clause instead of must?

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 15 janv. 2015 à 15:36, Thibaut Owczarz thi...@1001pharmacies.com 
 javascript: a écrit :

 i found my first error, no need user. because i search already in user.
 but why when i search a defined sku, no found only one ?


 curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{
 query : {
 bool: {
 must: [ ],
 must_not: [ ],
 should: [
 {
 term: {
 sku: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 term: {
internal_code: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
firstname: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
lastname: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
address: 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
city: 01b3ae496c0142f993cf131c607fe003
 }
 },   
 {
 match: {
localized_description: 
 01b3ae496c0142f993cf131c607fe003
 }
 },   
 {
 match: {
localized_keywords: 
 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
service.localized_label: 
 01b3ae496c0142f993cf131c607fe003
 }
 },
 {
 match: {
medias.localized_label: 
 

Re: need help for search

2015-01-15 Thread Thibaut Owczarz
Thanks for elastisearch-fr mailing list

tomorrow I do a little game simple data
and I give the request that I want to do and the result i need

Thanks


Le jeudi 15 janvier 2015 17:31:28 UTC+1, David Pilato a écrit :

 No worries for your english.
 Sorry. I missed your gist.

 Based on your examples, it sounds like you are french. Are you aware of 
 the french mailing list? 
 https://groups.google.com/forum/?hl=frfromgroups#!forum/elasticsearch-fr

 It would help a lot if you can simplify with some sample data and small 
 queries what you are trying to do what does not work.
 So suppress all analyzers as I guess here it’s not really your concern at 
 this stage.
 Try with only two or 3 fields.


 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 http://Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs


  
 Le 15 janv. 2015 à 17:13, Thibaut Owczarz thi...@1001pharmacies.com 
 javascript: a écrit :

 hi,

 in my structure send in my gist,
 my question is just that:

 i have a search field. no say what i type in this field.
 but i need 1 request like this.
 {
 query : {
 bool: {
 must: [ ],
 must_not: [ ],
 should: [
 {
 term: {
 sku: $datasearch
 }
 },
 {
 term: {
internal_code: $datasearch
 }
 },
 {
 match: {
firstname: $datasearch
 }
 },
 {
 match: {
lastname: $datasearch
 }
 },
 {
 match: {
address: $datasearch
 }
 },
 {
 match: {
city: $datasearch
 }
 },   
 {
 match: {
localized_description: $datasearch
 }
 },   
 {
 match: {
localized_keywords: $datasearch
 }
 },
 {
 match: {
service.localized_label: $datasearch
 }
 },
 {
 match: {
medias.localized_label: $datasearch
 }
 },
 {
 match: {
services.localized_label: $datasearch
 }
 }
 ]
 }
 }
 }';

 Exemple :
 -
 - if $datasearch=sku, i have directly 1 user with this sku
 - if $datasearch=firstname, i have directly a list of user who have this 
 firstname
 - if $datasearch=keyword, i have list of user who have this keyword

 - i take term for sku or internal_code because i can't search whith 
 partial of this. (if my sku = 1234, no could found result if i type 123)

 - And for finish, in my data i have user : 
 [1 - charles martin who have localized_keywords=moto, licorne, cheval, 
 course ] 
 [2 - henry martin who have localized_keywords=pétanque, chevaux, basket, 
 parieur]
 i want with my request have this 2 user if $datasearch = cheval.

 I hope to be me understand , I can have a bad English


 thanks


 Le jeudi 15 janvier 2015 16:17:08 UTC+1, David Pilato a écrit :

 Could you reproduce this with a full test case so we understand exactly 
 What you are doing?
 May be simplify your test.

 See elasticsearch.org/help


 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 15 janv. 2015 à 16:01, Thibaut Owczarz thi...@1001pharmacies.com a 
 écrit :

 i'm ok, but my data search no say if is sku or code_internal or other 
 field.

 if i do that, it's ok
 {
   query: {
 bool: {
   must: [
 {
   term: {
 sku: 01b3ae496c0142f993cf131c607fe003
   }
 }
   ],
   must_not: [],
   should: [
   {

 term: {
internal_code: 01b3ae496c0142f993cf131c607fe003
 }
  },

 {
   match: {
 firstname: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 lastname: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 address: 01b3ae496c0142f993cf131c607fe003
   }
 },
 {
   match: {
 city: 

Re: Just initialize shards when problems but no rebalance

2015-01-15 Thread Kimbro Staken
So is this still happening with 1.4.2?

Here's the ticket. Looks like the fix was supposed to be in 1.4.1

https://github.com/elasticsearch/elasticsearch/issues/8538

On Thu, Jan 15, 2015 at 10:55 AM, Matías Waisgold mwaisg...@gmail.com
wrote:

 Great, thank you. We are creating another cluster with more disk space to
 avoid this situations.
 By any chance do you have the link to the issue?

 2015-01-15 13:26 GMT-03:00 Kimbro Staken ksta...@kstaken.com:

 I've experienced what you're describing. I called it a shard relocation
 storm and it's really tough to get under control. I opened a ticket on the
 issue and a fix was supposedly included in 1.4.2. What version are you
 running?

 If you want to truly manually manage this situation you could set
 cluster.routing.allocation.disk.threshold_enabled to false but that will
 likely cause other issues. I ended up just setting
 cluster.routing.allocation.disk.watermark.high to a really low value and
 actively managed shard allocations to prevent nodes from getting anywhere
 near that value. This is tricky as the way ES allocates shards it can
 easily run nodes out of disk if you're regularly creating new indices and
 those grow rapidly.

 Kimbro

 On Thu, Jan 15, 2015 at 6:14 AM, Matías Waisgold mwaisg...@gmail.com
 wrote:

 Yes, I've seen that but the problem is that when the threshold is
 reached it removes all shards from the server instead of just removing 1
 and balance. And when that happens the cluster starts to move shards over
 everywhere and it never stops.

 Another problem we are having is that in the file storage we see data
 from shards that are not assigned to itself so it can´t allocate anything
 in this dirty state.

 2015-01-15 0:09 GMT-03:00 Mark Walkom markwal...@gmail.com:

 You could do this, but it's a lot of manual overhead to have to deal
 with.
 However ES does have some disk space awareness during allocation, take
 a look at
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/index-modules-allocation.html#disk

 On 15 January 2015 at 10:57, Matías Waisgold mwaisg...@gmail.com
 wrote:

 Hi is there any setting that I can put to ES that it automatically
 assigns shards that are unassigned but never ever rebalance the cluster?
 I´ve found several issues when rebalancing and prefer to do it
 manually.
 If I set cluster.routing.allocation.enable to none nothing happens.
 If I set it to all then it starts rebalancing.

 Is it ok to combine cluster.routing.allocation.allow_rebalance to
 none and cluster.routing.allocation.enable to all.

 The issue is mainly because we are running low on disk and when that
 happens elasticsearch removes all shards from an instance, that doesn´t
 care about cluster.routing.allocation.cluster_concurrent_rebalance and
 starts moving shards like crazy around the entire cluster, filling the
 storage on other instances in the way that it will never stop balancing.

 Kind regards

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe
 .
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this 

Re: real time match analysis

2015-01-15 Thread Ed Kim
I was able to identify which field matched via explain, but couldn't see 
any information on which token filter was the reason for the match. I've 
tried specifying the analyzer name that the field uses as well as not 
specifying. If the explain is supposed to provide this data, I will give it 
another go and set up a test index with simpler analyzer setups.

Also, in order to do this, I will need to run the explain separate from the 
search itself. My ultimate goal is to be able to do this within 
milliseconds (less than 10). Is this feasible with explain?

On Wednesday, January 14, 2015 at 12:51:15 PM UTC-8, Nikolas Everett wrote:

 What about explain?

 On Wed, Jan 14, 2015 at 3:24 PM, Ed Kim edk...@gmail.com javascript: 
 wrote:

 Just a friendly bump to see if anyone has any feedback. :)


 On Saturday, January 10, 2015 at 10:38:34 PM UTC-8, Ed Kim wrote:

 Hello all, I was wondering if anyone could offer some feedback on 
 whether there is a way to determine how a document matched in real time. I 
 currently use custom analyzers at index time to allow a broad array of 
 matches for a given text field. I try to match based on phrases, synonyms, 
 substrings, stemming, etc of a given phrase, and I would like to be able to 
 figure out at search time, which analyzer was attributed to causing the 
 match. 

 Currently, I've gotten around this by creating child documents where the 
 fields are fanned out to their respective analyzer types. So I have a child 
 document where the field only applies stemming, another that uses only 
 synonyms, etc. However, due to the growing number of fields that require 
 analysis and the growth of my data set, I'd much prefer if I had less 
 documents (and less complex too). I was hoping there would be a way to tag 
 tokens at the analysis phase that could be used at the search phase to 
 quickly determine my match level, but I was not able to find anything like 
 this.

 Having said that, has anyone else ever tried to figure this out, or have 
 an thoughts on how to leverage ES at a lower level to determine match? 

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/eab16b7d-7d98-4096-b853-66ef65376c44%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/eab16b7d-7d98-4096-b853-66ef65376c44%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/326aca97-d937-41cc-9c28-7f89aa398c81%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Seeking a Director of Data Engineering in Austin TX

2015-01-15 Thread Mark Walkom
Hi Traci,
This is a community based technical list. We'd greatly appreciate it if you
didn't post job ads.

On 16 January 2015 at 03:38, Traci Martin traci@gmail.com wrote:

 Hello All!

 I am a recruiter in Austin, TX trying to fill a Director of Data
 Engineering for my client, also in Austin. They are ELK stack evangelists
 and would prefer some with, at least knowledge of Lucene or Hadoop. This is
 really a great company to work for and probably the nicest client I have
 had the pleasure of working with.

 It is a permanent position offering great benefits, a laid back
 atmosphere, and very competitive salary with options. If you are interested
 please feel free to contact me.

 *There will be no re-lo provided and no sponsorship at this time.

 Traci Martin
 512-640-3656
 tmar...@intersysconsulting.com

  *Director **Data Engineering*

 *Who we are: *
 *Intersys Consulting* is a leading Business Intelligence, Data
 Management, and Application Development professional services organization
 focused on providing solutions with real business value.  We provide a
 customer-focused approach to building authentic partnerships with our
 clients with objective counsel from concept to deployment for a consistent
 voice through the dynamic IT environment.

 *What we look for: *
 *Intersys Consulting *is focused on finding and cultivating talent across
 the IT space.  We have over 100 developers, project managers, business
 analysts, and data management professionals, most with over ten years of
 experience in their respective fields.  In new hires we look for
 authenticity; be proud of who you are and what you bring to the table, as
 well as those candidates who consistently deliver the highest quality
 product and have a deep desire to improve not just themselves, but the
 organization as a whole.

 *The Position:*
 Intersys Consulting is seeking a Director of Data Engineering  to work at
 our client site in Austin, Texas.

 *Primary Responsibilities:*

- Build and optimize each component of our data pipeline
- Work with our data scientists to provide data in the optimal format
- Work with our DevOps team to ensure the data infrastructure is
reliable and scalable
- Integrate with our data partners to enrich our firstparty data with
thirdparty sources
- Stay on top of cutting edge technologies to constantly improve and
streamline our data systems

 *Qualifications:*

- Experience with high performance, high traffic web systems
- Experience with monitoring systems: New Relic, ELK stack, etc.
- Experience with either Hadoop or Elasticsearch/Lucene and a desire
and willingness to learn the other

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c61a0318-a9c8-496d-86de-54a4a7ba3349%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/c61a0318-a9c8-496d-86de-54a4a7ba3349%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9N7JQ%2B8k0y8V4cV%2B7ddO0yqeOe783AVh0mdFKvyUTLsw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch: access document nested value in groovy script

2015-01-15 Thread Anil Kumar
I found this.

I had to use _source.medals to access the nested documents which are stored 
in disk and not in memory.

Thanks

On Wednesday, January 14, 2015 at 10:55:15 AM UTC-8, Anil Kumar wrote:

 I have a document stored in ElasticSearch as below. _source:

  {
  firstname: John,
  lastname: Smith,
  medals:[
{
  bucket: 100, 
  count: 1
},
{
  bucket: 150,
  count: 2
}
  ]
   }

 I can access the string type value inside a document using doc.firstname for 
 scripted metric aggregation 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html
 .

 But I am not able to get the field value using doc.medals[0].bucket.

 Can you please help me out and let me know how to access the values inside 
 nested fields?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/19bd5fb9-b584-441f-8c55-c2f0d2b7d24e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using icu_collation plugin in Unit Tests

2015-01-15 Thread Kumar S
Thanks David!

Sorry for being a new one in the ES world. But where would i download the 
JAR file from and what calss should i be using for the icu_collation?

Thank you very much,
Kumar Subramanian,

On Thursday, January 15, 2015 at 12:52:12 PM UTC-8, David Pilato wrote:

 You most likely just need to add it as a dependency. Which is easy if you 
 are using maven.

 David

 Le 15 janv. 2015 à 21:03, Kumar S krsku...@gmail.com javascript: a 
 écrit :

 Hi,
 I am new to ES. I am using NodeBuilder in my unit test to run a local 
 instance of ES. I would like to use the icu_collation plugin. How can i 
 install and run the plugin form within this local instance. Is there API 
 that i should use? if not, what are the different ways i can do this?

 Thank you very much,
 Kumar Subramanian.

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/5f3ebc39-4c13-4d1b-a888-bd101ab46136%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/5f3ebc39-4c13-4d1b-a888-bd101ab46136%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5a5e82b3-038b-4251-ae2c-f2216dc991f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Excluding Terms Using a Minus Sign

2015-01-15 Thread Cindy Conway
Is there a way to exclude a term if the user precedes it with a minus sign; 
the way google does. For example, if I want to search for the word lovre, 
but I don't want the museum in France, I can search for: 
*louve -museum* as my search terms. Does ES support this? I am not finding 
anything like that in the documentation.

Thanks All!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7e7fa83-332f-4fc9-a704-5abccb2d9856%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Using icu_collation plugin in Unit Tests

2015-01-15 Thread Kumar S
Hi,
I am new to ES. I am using NodeBuilder in my unit test to run a local 
instance of ES. I would like to use the icu_collation plugin. How can i 
install and run the plugin form within this local instance. Is there API 
that i should use? if not, what are the different ways i can do this?

Thank you very much,
Kumar Subramanian.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5f3ebc39-4c13-4d1b-a888-bd101ab46136%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Just initialize shards when problems but no rebalance

2015-01-15 Thread Matías Waisgold
I'm on 1.4.1 and still seeing the same behavior.
There should be a better practice than remove all shards at the same time
and try to move a few.
We are going to apply the same solution you mentioned, add more disk.
Thank's for your help.

2015-01-15 16:09 GMT-03:00 Kimbro Staken ksta...@kstaken.com:

 So is this still happening with 1.4.2?

 Here's the ticket. Looks like the fix was supposed to be in 1.4.1

 https://github.com/elasticsearch/elasticsearch/issues/8538

 On Thu, Jan 15, 2015 at 10:55 AM, Matías Waisgold mwaisg...@gmail.com
 wrote:

 Great, thank you. We are creating another cluster with more disk space to
 avoid this situations.
 By any chance do you have the link to the issue?

 2015-01-15 13:26 GMT-03:00 Kimbro Staken ksta...@kstaken.com:

 I've experienced what you're describing. I called it a shard relocation
 storm and it's really tough to get under control. I opened a ticket on the
 issue and a fix was supposedly included in 1.4.2. What version are you
 running?

 If you want to truly manually manage this situation you could set
 cluster.routing.allocation.disk.threshold_enabled to false but that will
 likely cause other issues. I ended up just setting
 cluster.routing.allocation.disk.watermark.high to a really low value and
 actively managed shard allocations to prevent nodes from getting anywhere
 near that value. This is tricky as the way ES allocates shards it can
 easily run nodes out of disk if you're regularly creating new indices and
 those grow rapidly.

 Kimbro

 On Thu, Jan 15, 2015 at 6:14 AM, Matías Waisgold mwaisg...@gmail.com
 wrote:

 Yes, I've seen that but the problem is that when the threshold is
 reached it removes all shards from the server instead of just removing 1
 and balance. And when that happens the cluster starts to move shards over
 everywhere and it never stops.

 Another problem we are having is that in the file storage we see data
 from shards that are not assigned to itself so it can´t allocate anything
 in this dirty state.

 2015-01-15 0:09 GMT-03:00 Mark Walkom markwal...@gmail.com:

 You could do this, but it's a lot of manual overhead to have to deal
 with.
 However ES does have some disk space awareness during allocation, take
 a look at
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/index-modules-allocation.html#disk

 On 15 January 2015 at 10:57, Matías Waisgold mwaisg...@gmail.com
 wrote:

 Hi is there any setting that I can put to ES that it automatically
 assigns shards that are unassigned but never ever rebalance the cluster?
 I´ve found several issues when rebalancing and prefer to do it
 manually.
 If I set cluster.routing.allocation.enable to none nothing happens.
 If I set it to all then it starts rebalancing.

 Is it ok to combine cluster.routing.allocation.allow_rebalance to
 none and cluster.routing.allocation.enable to all.

 The issue is mainly because we are running low on disk and when that
 happens elasticsearch removes all shards from an instance, that doesn´t
 care about cluster.routing.allocation.cluster_concurrent_rebalance and
 starts moving shards like crazy around the entire cluster, filling the
 storage on other instances in the way that it will never stop balancing.

 Kind regards

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe
 .
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com
 

filtering/querying on script field

2015-01-15 Thread samatha kankipati


Is it possible to filter or query on script_fields.
If so, can you provide any example..

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6d4c738b-0975-4711-b9e1-a7d6eaa7830b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using icu_collation plugin in Unit Tests

2015-01-15 Thread David Pilato
You most likely just need to add it as a dependency. Which is easy if you are 
using maven.

David

 Le 15 janv. 2015 à 21:03, Kumar S krskumar...@gmail.com a écrit :
 
 Hi,
 I am new to ES. I am using NodeBuilder in my unit test to run a local 
 instance of ES. I would like to use the icu_collation plugin. How can i 
 install and run the plugin form within this local instance. Is there API that 
 i should use? if not, what are the different ways i can do this?
 
 Thank you very much,
 Kumar Subramanian.
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/5f3ebc39-4c13-4d1b-a888-bd101ab46136%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8E14B6ED-B736-4CA8-9200-65E60006CDDC%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Help creating a near real time streaming plugin to perform replication between clusters

2015-01-15 Thread joergpra...@gmail.com
While it seems quite easy to attach listeners to an ES node to capture
operations in translog-style and push out index/delete operations on shard
level somehow, there will be more to consider for a reliable solution.

The Couchbase developers have added a data replication protocol to their
product which is meant for transporting changes over long distances with
latency for in-memory processing.

To learn about the most important features, see

https://github.com/couchbaselabs/dcp-documentation

and

http://docs.couchbase.com/admin/admin/Concepts/dcp.html

I think bringing such a concept of an inter cluster protocol into ES could
be a good starting point, to sketch the complete path for such an ambitious
project beforehand.

Most challenging could be dealing with back pressure when receiving
nodes/clusters are becoming slow. For a solution to this, reactive Java /
reactive streams look like a viable possibility.

See also

https://github.com/ReactiveX/RxJava/wiki/Backpressure

http://www.ratpack.io/manual/current/streams.html

I'm in favor of Ratpack since it comes with Java 8, Groovy, Google Guava,
and Netty, which has a resemblance to ES.

In ES, for inter cluster communication, there is not much coded afaik,
except snapshot/restore. Maybe snapshot/restore can provide everything you
want, with incremental mode. Lucene will offer numbered segment files for
faster incremental snapshot/restore.

Just my 2¢

Jörg



On Thu, Jan 15, 2015 at 7:00 PM, Todd Nine tn...@apigee.com wrote:

 Hey all,
   I would like to create a plugin, and I need a hand.  Below are the
 requirements I have.


- Our documents are immutable.  They are only ever created or deleted,
updates do not apply.
- We want mirrors of our ES cluster in multiple AWS regions.  This way
if the WAN between regions is severed for any reason, we do not suffer an
outage, just a delay in consistency.
- As documents are added or removed they are rolled up then shipped in
batch to the other AWS Regions.  This can be a fast as a few milliseconds,
or as slow as minutes, and will be user configurable.  Note that a full
backup+load is too slow, this is more of a near realtime operation.
- This will sync the following operations.
   - Index creation/deletion
   - Alias creation/deletion
   - Document creation/deletion


 What I'm thinking architecturally.


- The plugin is installed on each node in our cluster in all regions
- The plugin will only gather changes for the primary shards on the
local node
- After the timeout elapses, the plugin will ship the changelog to the
other AWS regions, where the plugin will receive it and process it


 Are there any api's I can look at that are a good starting point for
 developing this?  I'd like to do a simple prototype with 2 1 node clusters
 reasonably soon.  I found several plugin tutorials, but I'm more concerned
 with what part of the ES api I can call to receive events, if any.

 Thanks,
 Todd

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/dff53da5-8a0c-4805-8f97-72844019a79e%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/dff53da5-8a0c-4805-8f97-72844019a79e%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFxWfx_KasNcZVCA7wC6VTSM-NrC0hBn51iSnikGsdD8g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Kibana and nested documents -- include_in_parent

2015-01-15 Thread Phil
Hello,

I am new to ElasticSearch and I have a very specific question. We have 
implemented our ElasticSearch cluster with a nested document structure. 
Each document is made of one ID, a key element and one field including 
several nested records that are inserted by the script api and the bulk 
update function.

My question is, is it possible to view nested documents in Kibana, without 
using *include_in_parent, *because from preliminary testing it seams to be 
using more disk space when include_in_parent is in the mappings ? 
When include_in_parent is not in the mappings, the documents are not 
viewable within Kibana 4.0.0

Also, is there a function or way to display which documents have the most 
nested records, by using the size of the nested records in the 
document? I would like to have a pie chart, that could display them using 
the size of their nested attribute.

Thank you in advance.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/85b0aed9-f74a-4031-b815-999f1df9be55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to remove a cluster setting?

2015-01-15 Thread Mark Walkom
This is a known issue​, see
https://github.com/elasticsearch/elasticsearch/issues/6732​

On 15 January 2015 at 22:01, Gary Gao garygaow...@gmail.com wrote:

 why this didn't work on my es :

 GET /_cluster/settings
 {
persistent: {
   discovery: {
  zen: {
 minimum_master_nodes: 2
  }
   }
},
transient: {
   indices: {
  recovery: {
 translog_size: 1024kb,
 concurrent_streams: 3,
 translog_ops: 2000,
 max_bytes_per_sec: 400mb,
 file_chunk_size: 1024kb
  }
   }
}
 }

 PUT _cluster/settings
 {
   transient: {
 indices.recovery.translog_size:
   }
 }

 response:
 {
acknowledged: true,
persistent: {},
transient: {}
 }

 When I do GET again, this setting still exists.


 On Tuesday, July 22, 2014 at 8:50:10 AM UTC+8, Jeffrey Zhou wrote:

 I made the following setting to my Elasticsearch cluster in order to
 decommission some old nodes in the cluster. After removed these old nodes,
 now I need to re-enable the cluster to allocate shards on those '10.0.6.*'
 nodes. Does anyone know how to remove this setting?

 PUT /_cluster/settings
 {
transient: {
   cluster.routing.allocation.exclude._ip: 10.0.6.*
}
 }

 Thanks in advance for any help!

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/24d2a534-fe0f-4956-9d59-38b0300393d3%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/24d2a534-fe0f-4956-9d59-38b0300393d3%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_gRwFZ1gyoXHrKU5-wWqyCg6d9p2in2jx%2B6jpyCyeRGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Best pratices for index , search and updates

2015-01-15 Thread bvnr
Am new to the elastic search ...

Can some body throw me ideas about the best practices one should follow to 
get good performance for index ,search and updates 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b76ce70c-f2f5-4a56-b402-3b46ced79a82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: filtering/querying on script field

2015-01-15 Thread Masaru Hasegawa
Hi Samatha,

I don’t think so because script field is created from fields of hit document, 
results of query/filter.
You can use script filter instead 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html#query-dsl-script-filter.


Masaru

On January 16, 2015 at 04:40:49, samatha kankipati 
(samatha.kankip...@gmail.com) wrote:
  
  
 Is it possible to filter or query on script_fields.
 If so, can you provide any example..
  
 --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch  
 group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.  
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/6d4c738b-0975-4711-b9e1-a7d6eaa7830b%40googlegroups.com.
   
 For more options, visit https://groups.google.com/d/optout.
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.54b89229.3d1b58ba.1877%40citra.local.
For more options, visit https://groups.google.com/d/optout.


Re: Slow Commands with 1.2.4 to 1.4.2 Upgrade

2015-01-15 Thread pskieu
Just added 2 more nodes with the same specs, and still seeing the same 
slowness. These commands no longer return anything, because it's taking too 
long to return.


On Tuesday, December 30, 2014 at 3:54:34 PM UTC-8, Mark Walkom wrote:

 How slow?
 Is the load on your system high?

 On 31 December 2014 at 05:04, psk...@gmail.com javascript: wrote:

 I have about 50 GB of data (1 mil docs) in a single node--8 cores with 32 
 GB (24 GB heap). I just upgraded from 1.2.4 to 1.4.2, and I noticed that a 
 few commands take a long time to return, and marvel doesn't work as well as 
 it used to.

 Some of the commands that are slow for me are _cat/indices and _nodes.

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/f9ab96bf-b5c3-4f99-9c9c-e00568aada9c%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/f9ab96bf-b5c3-4f99-9c9c-e00568aada9c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9e3f7c4b-0705-4063-a591-8c5359ff8254%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


questions regarding elasticsearch-spark

2015-01-15 Thread Seungjin Lee
Hi all,

I'm quite familiar with ElasticSearch but new to spark, and
elasticsearch-spark.

My idea at this moment is that by using spark together with elasticsearch,
it might be able to increase search performance when the time interval is
fixed.

question is, is hadoop need to be set up first to use elasticsearch-spark?
does it depend on hadoop by any means?

Sincerely,

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL3_U40M1jth_Lw1-TqiWv0rW0M-Qa2yZsvJx-j-hf9Ngf5KOA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: questions regarding elasticsearch-spark

2015-01-15 Thread Ravi Kiran
Hi Lee,

No. Hadoop isn't required .  You can use the spark Standalone mode (
https://spark.apache.org/docs/1.2.0/spark-standalone.html) when running
ElasticSearch on spark.

Regards
Ravi

On Thu, Jan 15, 2015 at 10:15 PM, Seungjin Lee sweetest0...@gmail.com
wrote:

 Hi all,

 I'm quite familiar with ElasticSearch but new to spark, and
 elasticsearch-spark.

 My idea at this moment is that by using spark together with elasticsearch,
 it might be able to increase search performance when the time interval is
 fixed.

 question is, is hadoop need to be set up first to use elasticsearch-spark?
 does it depend on hadoop by any means?

 Sincerely,

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAL3_U40M1jth_Lw1-TqiWv0rW0M-Qa2yZsvJx-j-hf9Ngf5KOA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAL3_U40M1jth_Lw1-TqiWv0rW0M-Qa2yZsvJx-j-hf9Ngf5KOA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAK4spt2bdpQ7t_xvtap5HTwva2un4te-rBd7P2ZP4qm2zNf3bA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Changing the Axis (X - Y) Label | Naming Legend

2015-01-15 Thread Ravi Prakash
Hi,

1. Is there any way we can change the Label of X and Y axis
2. Is Kibana3, it was possible to name the legends, any way we can do this 
in Kiabana4

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9b48b40d-7c99-4ecc-a896-2b664fb87fe4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Perl client: Cannot combine params and body?

2015-01-15 Thread Andrew Walker
I have a remote node that I am attempting to connect to that requires an 
api key as a URL parameter in addition to the body in order to get it to 
work.

The code is as follows:

#!/usr/bin/perl
use v5.14;
use warnings;
use Search::Elasticsearch;
use Data::Dumper;

my $API_KEY='API_KEY';

my $ES = Search::Elasticsearch-new(
cxn_pool = 'Static::NoPing',
nodes = [{
scheme = 'https',
host = 'service.host.com',
port = 443,
path = '/api/es/a_path',
}],
#send_get_body_as = 'POST',
trace_to = 'Stdout',
log_to = 'Stdout',
);

my $res = $ES-search(
params = {
api_key = $API_KEY,
},
body= {
query   = {
bool = {
must = {
query_string = {
default_field = _all,
query = thisisasitethatdoesntexist.com,
default_operator = AND
}
}
}
}
}
);

print Dumper($res);


The generated curl is:

# Request to: https://service.host.com:443/api/es/a_path
curl -XGET 'http://localhost:9200/_search?api_key=API_KEYpretty=1' -d '
{
   query : {
  bool : {
 must : {
query_string : {
   query : thisisasitethatdoesntexist.com,
   default_field : _all,
   default_operator : AND
}
 }
  }
   }
}
'

When I replace localhost and the path with the proper host and path and run 
the curl command directly from the command line, I get zero hits back, 
which is what I expect.  If I run the above perl, however, I get many 
millions of results back, which is exactly the same as what I get when I 
remove the body from the curl query (-d ''). So it seems that the 
combination of params and body causes body to get eaten?  I looked at the 
code, but I couldn't find where this might be happening.  Any help?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b990961c-a129-4cd0-b1e0-46f33f86c4ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sorting on nested object collections

2015-01-15 Thread Russ Cam
I've run the query with the smallest possible subset and the query is 
returning the results in the expected order so it appears to be correct. 

The biggest question that I have is does the second sort condition know to 
run on the *first* projected valuation that had the max date from the first 
sort condition?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/420cb05f-38f0-4ef9-a922-96e26f5ab5e1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Complex search

2015-01-15 Thread Russ Cam
Take a look at highlighting 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html
 for 
highlighting the relevant parts of matches and at multifield 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-match-query.html#_boosting_individual_fieldssearch
 
queries with boosting on individual fields.

On Friday, 16 January 2015 08:19:08 UTC+11, Serge Schumacher wrote:

 Hi,
 I'm looking to create a search behaviour like Amazon does.

 I have an index with 3 Fields  : Title, Description and Category.

 I want to search in the fields title and descriptions for the word *car* 
 and I would like to get scored result like this :

 car   -- score : 1 in category vehicles
 autocar-- score : 0,5 in category vehicles where the part car 
 should highlighted ex : auto*car*
 carradio -- core : 0,5 in category vehicles where the part car should 
 highlightedex : *car*radio

 and that if the word is found in the title field, the score should be 
 higher as if the word would only be found in the description field.

 Is anybody out there who could help me on this topic or at least point me 
 to the right direction where I should look for ?

 Thanks,
 Serge


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1b4e9e2-f84b-4b72-bdda-0b22a8584658%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How can we achieve an equivalent of this SQL a query in Elasticsearch?

2015-01-15 Thread Lokesh Gupta
What will be equivalent of the following query in the Elasticsearch world..

select myDate, col1, col2 from myTable
where myDate = (select max(myDate) from myTable)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can we achieve an equivalent of this SQL a query in Elasticsearch?

2015-01-15 Thread David Pilato
I think you need to run two queries for now. One is an aggregation (max). The 
other one use the result of this aggregation to search for documents.

My 2 cents

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
https://twitter.com/elasticsearchfr | @scrutmydocs 
https://twitter.com/scrutmydocs



 Le 15 janv. 2015 à 09:13, Lokesh Gupta lgup...@gmail.com a écrit :
 
 What will be equivalent of the following query in the Elasticsearch world..
 
 select myDate, col1, col2 from myTable
 where myDate = (select max(myDate) from myTable)
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com 
 mailto:elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com?utm_medium=emailutm_source=footer.
 For more options, visit https://groups.google.com/d/optout 
 https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/335F42ED-A70A-4401-82A6-6828DF3D794B%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: How to remove a cluster setting?

2015-01-15 Thread Gary Gao
why this didn't work on my es :

GET /_cluster/settings
{
   persistent: {
  discovery: {
 zen: {
minimum_master_nodes: 2
 }
  }
   },
   transient: {
  indices: {
 recovery: {
translog_size: 1024kb,
concurrent_streams: 3,
translog_ops: 2000,
max_bytes_per_sec: 400mb,
file_chunk_size: 1024kb
 }
  }
   }
}

PUT _cluster/settings
{
  transient: {
indices.recovery.translog_size:
  }
}

response:
{
   acknowledged: true,
   persistent: {},
   transient: {}
}

When I do GET again, this setting still exists.


On Tuesday, July 22, 2014 at 8:50:10 AM UTC+8, Jeffrey Zhou wrote:

 I made the following setting to my Elasticsearch cluster in order to 
 decommission some old nodes in the cluster. After removed these old nodes, 
 now I need to re-enable the cluster to allocate shards on those '10.0.6.*' 
 nodes. Does anyone know how to remove this setting? 

 PUT /_cluster/settings 
 { 
transient: { 
   cluster.routing.allocation.exclude._ip: 10.0.6.* 
} 
 } 

 Thanks in advance for any help! 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/24d2a534-fe0f-4956-9d59-38b0300393d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I sort results by _id?

2015-01-15 Thread Adrien Grand
Making it index:not_analyzed should work, what is the issue with the
results?

Note that loading the _id in fielddata is typically very costly since the
_id field is typically unique per document.

On Thu, Jan 15, 2015 at 10:35 AM, Jason Zhang moc...@gmail.com wrote:

 I use a query dsl like:

 {
   filter: {
 exists: { field: info }
   },
   sort: { _id: desc }
 }

 And the _id here is an integer like '123'.

 But the result is like:

 {
   took: 50,
   ...
   hits: {
 ...
 hits: [
   {
 ...
 sort: [ null ]
   }]
   }
 }

 Also, I've tried to add _id: { index: not_analyzerd } in the
 _mapping.
 This time the sort section returns values. But I find the results are
 still partly unordered.

 Can I sort results by _id? How?

 Thank you.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7uK%2BJY_2-C3LHGTc7YYRFVv2z_-o%3DuWbDhE2SQOJYFZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to combine aggregations

2015-01-15 Thread Adrien Grand
I believe you could run a terms aggregation on the city field, and under
this terms aggregation put two sum aggregations, one for clicks and one for
displays. And finally you could derive the click rate from the sum of
clicks and displays on client side? If you are starting playing with
aggregations, I would recommend reading this blog post by Zachary Tong:
http://www.elasticsearch.org/blog/intro-to-aggregations/

On Wed, Jan 14, 2015 at 10:43 PM, Yan Georget y...@ogury.co wrote:

 Hello,

 Let's imagine I am logging displays and clicks, say by cities.
 I can aggregate those by countries and I can also compute grand totals.

 Now I would like to compute click rates (clicks/displays) by cities,
 countries and I would also like to get a global click rate.
 How can I do this?

 It seems that I could use a scripted metric (I have not tried yet) but I
 would also like to expose these rates in Kibana.

 It is possible?

 Thanks in advance,
 Yan Georget

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c5356c3e-9322-4708-9c20-eed270ee57d9%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/c5356c3e-9322-4708-9c20-eed270ee57d9%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j69AGX4bH4eL%3DxP6a84oT-64Op1FqGha5iMJJZ_hzVAnA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Out of memory on start with 38GB index

2015-01-15 Thread Thomas Cataldo
Hi,

I am doing all my tests on a 38GB production index copy, with ES 1.4.2. I 
tried several memory settings and virtual machine sizes, but ES fails to 
start on a linux system with 48GB memory and 32GB for ES heap.

Searching for similar issues, I 
encountered https://github.com/elasticsearch/elasticsearch/issues/8394 
which is still open and looks fairly similar to my problem.


The debug output at the start of looks like this :

[2015-01-14 12:00:48,710][DEBUG][indices.cluster  ] [Saint Elmo] 
[mailspool][1] creating shard

[2015-01-14 12:00:48,710][DEBUG][index.service] [Saint Elmo] 
[mailspool] creating shard_id [1]

[2015-01-14 12:00:48,791][DEBUG][index.deletionpolicy ] [Saint Elmo] 
[mailspool][1] Using [keep_only_last] deletion policy

[2015-01-14 12:00:48,793][DEBUG][index.merge.policy   ] [Saint Elmo] 
[mailspool][1] using [tiered] merge mergePolicy with 
expunge_deletes_allowed[10.0], floor_segment[2mb], max_merge_at_once[10], 
max_merge_at_once_explicit[30], max_merged_segment[5gb], 
segments_per_tier[10.0], reclaim_deletes_weight[2.0]

[2015-01-14 12:00:48,794][DEBUG][index.merge.scheduler] [Saint Elmo] 
[mailspool][1] using [concurrent] merge scheduler with max_thread_count[2], 
max_merge_count[4]

[2015-01-14 12:00:48,797][DEBUG][index.shard.service  ] [Saint Elmo] 
[mailspool][1] state: [CREATED]

[2015-01-14 12:00:48,797][DEBUG][index.translog   ] [Saint Elmo] 
[mailspool][1] interval [5s], flush_threshold_ops [2147483647], 
flush_threshold_size [200mb], flush_threshold_period [30m]

[2015-01-14 12:00:48,801][DEBUG][index.shard.service  ] [Saint Elmo] 
[mailspool][1] state: [CREATED]-[RECOVERING], reason [from gateway]

[2015-01-14 12:00:48,801][DEBUG][index.gateway] [Saint Elmo] 
[mailspool][1] starting recovery from local ...

[2015-01-14 12:00:48,805][DEBUG][river.cluster] [Saint Elmo] 
processing [reroute_rivers_node_changed]: execute

[2015-01-14 12:00:48,805][DEBUG][river.cluster] [Saint Elmo] 
processing [reroute_rivers_node_changed]: no change in cluster_state

[2015-01-14 12:00:48,814][INFO ][gateway  ] [Saint Elmo] 
recovered [1] indices into cluster_state

[2015-01-14 12:00:48,814][DEBUG][cluster.service  ] [Saint Elmo] 
processing [local-gateway-elected-state]: done applying updated 
cluster_state (version: 2)

[2015-01-14 12:00:48,840][DEBUG][index.engine.internal] [Saint Elmo] 
[mailspool][1] starting engine

[2015-01-14 12:00:58,406][DEBUG][cluster.service  ] [Saint Elmo] 
processing [routing-table-updater]: execute

[2015-01-14 12:00:58,407][DEBUG][gateway.local] [Saint Elmo] 
[mailspool][4]: throttling allocation [[mailspool][4], node[null], [P], 
s[UNASSIGNED]] to [[[Saint 
Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300 on primary 
allocation

[2015-01-14 12:00:58,407][DEBUG][gateway.local] [Saint Elmo] 
[mailspool][2]: throttling allocation [[mailspool][2], node[null], [P], 
s[UNASSIGNED]] to [[[Saint 
Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300 on primary 
allocation

[2015-01-14 12:00:58,407][DEBUG][gateway.local] [Saint Elmo] 
[mailspool][3]: throttling allocation [[mailspool][3], node[null], [P], 
s[UNASSIGNED]] to [[[Saint 
Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300 on primary 
allocation

[2015-01-14 12:00:58,408][DEBUG][gateway.local] [Saint Elmo] 
[mailspool][0]: throttling allocation [[mailspool][0], node[null], [P], 
s[UNASSIGNED]] to [[[Saint 
Elmo][gOgAuHo4SXyfyuPpws0Usw][es][inet[/172.16.45.250:9300 on primary 
allocation

[2015-01-14 12:00:58,408][DEBUG][cluster.service  ] [Saint Elmo] 
processing [routing-table-updater]: no change in cluster_state

[2015-01-14 12:01:31,619][WARN ][index.engine.internal] [Saint Elmo] 
[mailspool][1] failed engine [refresh failed]

java.lang.OutOfMemoryError: Java heap space

at org.apache.lucene.util.FixedBitSet.init(FixedBitSet.java:187)

at 
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)

at 
org.elasticsearch.index.cache.filter.weighted.WeightedFilterCache$FilterCacheFilterWrapper.getDocIdSet(WeightedFilterCache.java:177)

at 
org.elasticsearch.common.lucene.search.OrFilter.getDocIdSet(OrFilter.java:55)

at 
org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:46)

at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:130)

at 
org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)

at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:136)

at 
org.apache.lucene.search.QueryWrapperFilter$1.iterator(QueryWrapperFilter.java:59)

at 
org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:554)

at 

Grandchild is not getting fetched by parent id

2015-01-15 Thread Iv Igi
I am experiencing an issue while trying to retrieve a grandchild record by 
its parent ID. (child-grandchild relationship)
The amount of hits in result is always zero.
Also the same request is working fine for parent-child relationship.

My records are getting organized kinda like this:

Account --(one to one)-- User --(one to one)-- Address

My execution environment is:
 - Fedora 21 CE
 - openjdk 1.8.0_25
 - ES 1.4.2

Here is a script that is showing the problem

# index creation
curl -XPUT localhost:9200/the_index/ -d {
\mappings\: {
\account\ : {},
\user\ : { 
\_parent\ : { 
\type\ : \account\ 
}  
},
\address\ : { 
\_parent\ : { 
\type\ : \user\ 
}  
}
}
};

# mrsmith account creation
curl -XPUT localhost:9200/the_index/account/mrsmith -d {
\foo\ : \foo\
};

# john user creation
curl -XPUT localhost:9200/the_index/user/john?parent=mrsmith -d {
\bar\ : \bar\
};

# john user creation
curl -XPUT localhost:9200/the_index/address/smithshouse?parent=john -d {
\baz\ : \baz\
};

# Here I am trying to retrieve a record. Getting zero hits.
curl -XGET localhost:9200/the_index/address/_search?pretty -d {
\query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ : 
\john\ } } } }
};

# Another approach with has_parent query type. Still getting zero hits.
curl -XGET localhost:9200/the_index/address/_search?pretty -d {
   \query\ : { 
   \has_parent\ : {
   \parent_type\ : \user\,
   \query\ : { 
   \term\ : { 
   \_id\ : \john\ 
   } 
   } 
   } 
}
};

# OK, lets try a routed search. Nope
curl -XGET localhost:9200/the_index/address/_search?routing=johnpretty 
-d {
\query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ : 
\john\ } } } }
};

# Routed has_parent query. Same
curl -XGET localhost:9200/the_index/address/_search?routing=johnpretty 
-d {
   \query\ : { 
   \has_parent\ : {
   \parent_type\ : \user\,
   \query\ : { 
   \term\ : { 
   \_id\ : \john\ 
   } 
   } 
   } 
}
};

# Retrieving a record by itself. Going just fine.
curl -XGET localhost:9200/the_index/address/smithshouse?parent=john;

# Querying for user record with the same query. Got a hit.
curl -XGET localhost:9200/the_index/user/_search?pretty -d {
\query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ : 
\mrsmith\ } } } }
};



The output:

{acknowledged:true}
{_index:the_index,_type:account,_id:mrsmith,_version:1,created:true}{_index:the_index,_type:user,_id:john,_version:1,created:true}{_index:the_index,_type:address,_id:smithshouse,_version:1,created:true}
{
  took : 54,
  timed_out : false,
  _shards : {
total : 5,
successful : 5,
failed : 0
  },
  hits : {
total : 0,
max_score : null,
hits : [ ]
  }
}
{
  took : 221,
  timed_out : false,
  _shards : {
total : 5,
successful : 5,
failed : 0
  },
  hits : {
total : 0,
max_score : null,
hits : [ ]
  }
}
{
  took : 35,
  timed_out : false,
  _shards : {
total : 1,
successful : 1,
failed : 0
  },
  hits : {
total : 0,
max_score : null,
hits : [ ]
  }
}
{
  took : 481,
  timed_out : false,
  _shards : {
total : 1,
successful : 1,
failed : 0
  },
  hits : {
total : 0,
max_score : null,
hits : [ ]
  }
}
{_index:the_index,_type:address,_id:smithshouse,_version:1,found:true,_source:{
baz : baz
}}
{
  took : 65,
  timed_out : false,
  _shards : {
total : 5,
successful : 5,
failed : 0
  },
  hits : {
total : 1,
max_score : 1.0,
hits : [ {
  _index : the_index,
  _type : user,
  _id : john,
  _score : 1.0,
  _source:{
bar : bar
}
} ]
  }
}

You can find out on resuls that ES got the required shard, but no records 
have been fetched.
Probably I am doing it in a wrong way, and if it so please fix me up.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bbaebc65-a87f-4857-a2a4-577b0b487c6b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Logstash output to Elastic search is not working

2015-01-15 Thread Marc
What do you mean by 
Can't see anything from the following command output:
#curl http://localhost:9200/_search?pretty;

from your first post?

On Wednesday, January 14, 2015 at 3:27:57 AM UTC+1, zal...@gmail.com wrote:

 Hi Marc,

 I didn't find any .sincedb file from the file system.The problem is still. 



 On Tuesday, January 13, 2015 at 8:39:57 PM UTC+8, Marc wrote:

 It all looks ok to me, since one can see that the logstash process is 
 added as a node.
 However, you should try to remove the .sincedb files in your home 
 directory.
 If sincedb files exist and you are trying to analyze identical log files 
 it will know that it already read in the info and wait for new log entries 
 in the file... ergo nothing will happen


 On Tuesday, January 13, 2015 at 10:05:10 AM UTC+1, zal...@gmail.com 
 wrote:

 Hi all,

 I've started experimenting ELK today, unfortunately not succeeded. 
 Everything installed properly and running without any error. When I start 
 Logstash with the following command, output to STDOUT is fine. But nothing 
 is seen in elastic search:

 #./logstash agent -e input { stdin {} } output { elasticsearch { host 
 = localhost } stdout { codec = rubydebug}}

 What should I do?

 Elastic search's console output is:

 [2015-01-13 15:55:48,072][INFO ][node ] [Apollo] 
 started
 [2015-01-13 15:55:51,392][INFO ][gateway  ] [Apollo] 
 recovered [1] indices into cluster_state
 [2015-01-13 15:55:51,422][INFO ][cluster.service  ] [Apollo] 
 added 
 {[logstash-0.0.0.0-21484-2010][O0emX_s0SmauCfqAC_YaTA][inet[/172.16.4.88:9302]]{client=true,
  
 data=false},[logstash-suricata-3299-4010][cKVoEM8zT8KPVIAelpMSsg][suricata][inet[/172.16.4.88:9301]]{client=true,
  
 data=false},}, reason: zen-disco-receive(join from 
 node[[logstash-suricata-3299-4010][cKVoEM8zT8KPVIAelpMSsg][suricata][inet[/172.16.4.88:9301]]{client=true,
  
 data=false}])
 [2015-01-13 15:57:44,028][INFO ][cluster.service  ] [Apollo] 
 removed 
 {[logstash-0.0.0.0-21484-2010][O0emX_s0SmauCfqAC_YaTA][inet[/172.16.4.88:9302]]{client=true,
  
 data=false},}, reason: 
 zen-disco-node_failed([logstash-0.0.0.0-21484-2010][O0emX_s0SmauCfqAC_YaTA][inet[/172.16.4.88:9302]]{client=true,
  
 data=false}), reason transport disconnected
 [2015-01-13 16:01:29,656][INFO ][cluster.service  ] [Apollo] 
 added 
 {[logstash-0.0.0.0-22435-2010][LdUiD4llTY6S7eiN8Z97ag][inet[/172.16.4.88:9302]]{client=true,
  
 data=false},}, reason: zen-disco-receive(join from 
 node[[logstash-0.0.0.0-22435-2010][LdUiD4llTY6S7eiN8Z97ag][inet[/172.16.4.88:9302]]{client=true,
  
 data=false}])
 [2015-01-13 16:21:07,373][INFO ][cluster.service  ] [Apollo] 
 removed 
 {[logstash-0.0.0.0-22435-2010][LdUiD4llTY6S7eiN8Z97ag][inet[/172.16.4.88:9302]]{client=true,
  
 data=false},}, reason: 
 zen-disco-node_failed([logstash-0.0.0.0-22435-2010][LdUiD4llTY6S7eiN8Z97ag][inet[/172.16.4.88:9302]]{client=true,
  
 data=false}), reason transport disconnected
 [2015-01-13 16:25:07,143][INFO ][cluster.service  ] [Apollo] 
 added 
 {[logstash-0.0.0.0-24108-2010][k2ToeYbPRtW_LH4PLBcL-A][inet[/172.16.4.88:9302]]{client=true,
  
 data=false},}, reason: zen-disco-receive(join from 
 node[[logstash-0.0.0.0-24108-2010][k2ToeYbPRtW_LH4PLBcL-A][inet[/172.16.4.88:9302]]{client=true,
  
 data=false}])

 Can't see anything from the following command output:
 #curl http://localhost:9200/_search?pretty

 Please help me on this. 




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/33030a85-786c-46e6-b24c-b9de6403b79a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I sort results by _id?

2015-01-15 Thread Adrien Grand
This is because the _id is a string field, so comparison is based on the
lexicographical order, not numeric.

On Thu, Jan 15, 2015 at 11:04 AM, Jason Zhang moc...@gmail.com wrote:

 What I'm confused is the 'sorted' results are still partly unordered.

 Also, if I query:

 {  range: {
 _id: {
   gt: 1,
   lt: 1}}}

 the results contain _id: 199989.

 On Thursday, January 15, 2015 at 5:48:48 PM UTC+8, Adrien Grand wrote:

 Making it index:not_analyzed should work, what is the issue with the
 results?

 Note that loading the _id in fielddata is typically very costly since the
 _id field is typically unique per document.

 On Thu, Jan 15, 2015 at 10:35 AM, Jason Zhang moc...@gmail.com wrote:

 I use a query dsl like:

 {
   filter: {
 exists: { field: info }
   },
   sort: { _id: desc }
 }

 And the _id here is an integer like '123'.

 But the result is like:

 {
   took: 50,
   ...
   hits: {
 ...
 hits: [
   {
 ...
 sort: [ null ]
   }]
   }
 }

 Also, I've tried to add _id: { index: not_analyzerd } in the
 _mapping.
 This time the sort section returns values. But I find the results are
 still partly unordered.

 Can I sort results by _id? How?

 Thank you.

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 --
 Adrien Grand

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6x_GN9HuZzYtgB_T69hu0y_QVUCzqxxOKciEvKubgkUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I sort results by _id?

2015-01-15 Thread Itamar Syn-Hershko
No, an ID has to be a string

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Jan 15, 2015 at 12:12 PM, Jason Zhang moc...@gmail.com wrote:

 Can I specify its type as integer in _mapping? Because the _id I use is
 rewritten.

 On Thursday, January 15, 2015 at 6:07:22 PM UTC+8, Adrien Grand wrote:

 This is because the _id is a string field, so comparison is based on the
 lexicographical order, not numeric.

 On Thu, Jan 15, 2015 at 11:04 AM, Jason Zhang moc...@gmail.com wrote:

 What I'm confused is the 'sorted' results are still partly unordered.

 Also, if I query:

 {  range: {
 _id: {
   gt: 1,
   lt: 1}}}

 the results contain _id: 199989.

 On Thursday, January 15, 2015 at 5:48:48 PM UTC+8, Adrien Grand wrote:

 Making it index:not_analyzed should work, what is the issue with the
 results?

 Note that loading the _id in fielddata is typically very costly since
 the _id field is typically unique per document.

 On Thu, Jan 15, 2015 at 10:35 AM, Jason Zhang moc...@gmail.com wrote:

 I use a query dsl like:

 {
   filter: {
 exists: { field: info }
   },
   sort: { _id: desc }
 }

 And the _id here is an integer like '123'.

 But the result is like:

 {
   took: 50,
   ...
   hits: {
 ...
 hits: [
   {
 ...
 sort: [ null ]
   }]
   }
 }

 Also, I've tried to add _id: { index: not_analyzerd } in the
 _mapping.
 This time the sort section returns values. But I find the results
 are still partly unordered.

 Can I sort results by _id? How?

 Thank you.

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40goo
 glegroups.com
 https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 --
 Adrien Grand

  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




 --
 Adrien Grand

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/2475cb1a-5631-4b06-8507-28c4d81f9d4d%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2475cb1a-5631-4b06-8507-28c4d81f9d4d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZvWQtGKE6JDd6%3D%2BXRJENrAyLPkTE3%2BBRpFsEJ%2BS09bTpg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


How can I sort results by _id?

2015-01-15 Thread Jason Zhang
I use a query dsl like:

{
  filter: {
exists: { field: info }
  },
  sort: { _id: desc }
}

And the _id here is an integer like '123'.

But the result is like:

{ 
  took: 50,
  ...
  hits: {
...
hits: [
  {
...
sort: [ null ]
  }]
  }
}

Also, I've tried to add _id: { index: not_analyzerd } in the _mapping.
This time the sort section returns values. But I find the results are 
still partly unordered.

Can I sort results by _id? How?

Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I sort results by _id?

2015-01-15 Thread Jason Zhang
What I'm confused is the 'sorted' results are still partly unordered.

Also, if I query:

{  range: {
_id: {
  gt: 1,
  lt: 1}}}

the results contain _id: 199989.

On Thursday, January 15, 2015 at 5:48:48 PM UTC+8, Adrien Grand wrote:

 Making it index:not_analyzed should work, what is the issue with the 
 results?

 Note that loading the _id in fielddata is typically very costly since the 
 _id field is typically unique per document.

 On Thu, Jan 15, 2015 at 10:35 AM, Jason Zhang moc...@gmail.com 
 javascript: wrote:

 I use a query dsl like:

 {
   filter: {
 exists: { field: info }
   },
   sort: { _id: desc }
 }

 And the _id here is an integer like '123'.

 But the result is like:

 { 
   took: 50,
   ...
   hits: {
 ...
 hits: [
   {
 ...
 sort: [ null ]
   }]
   }
 }

 Also, I've tried to add _id: { index: not_analyzerd } in the 
 _mapping.
 This time the sort section returns values. But I find the results are 
 still partly unordered.

 Can I sort results by _id? How?

 Thank you.

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 -- 
 Adrien Grand
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I sort results by _id?

2015-01-15 Thread Jason Zhang
Can I specify its type as integer in _mapping? Because the _id I use is 
rewritten.

On Thursday, January 15, 2015 at 6:07:22 PM UTC+8, Adrien Grand wrote:

 This is because the _id is a string field, so comparison is based on the 
 lexicographical order, not numeric.

 On Thu, Jan 15, 2015 at 11:04 AM, Jason Zhang moc...@gmail.com 
 javascript: wrote:

 What I'm confused is the 'sorted' results are still partly unordered.

 Also, if I query:

 {  range: {
 _id: {
   gt: 1,
   lt: 1}}}

 the results contain _id: 199989.

 On Thursday, January 15, 2015 at 5:48:48 PM UTC+8, Adrien Grand wrote:

 Making it index:not_analyzed should work, what is the issue with the 
 results?

 Note that loading the _id in fielddata is typically very costly since 
 the _id field is typically unique per document.

 On Thu, Jan 15, 2015 at 10:35 AM, Jason Zhang moc...@gmail.com wrote:

 I use a query dsl like:

 {
   filter: {
 exists: { field: info }
   },
   sort: { _id: desc }
 }

 And the _id here is an integer like '123'.

 But the result is like:

 { 
   took: 50,
   ...
   hits: {
 ...
 hits: [
   {
 ...
 sort: [ null ]
   }]
   }
 }

 Also, I've tried to add _id: { index: not_analyzerd } in the 
 _mapping.
 This time the sort section returns values. But I find the results are 
 still partly unordered.

 Can I sort results by _id? How?

 Thank you.

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%
 40googlegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/4ea45f18-847a-4b58-b78e-ddcd9ee1e9f9%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 -- 
 Adrien Grand
  
  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/b7f625dd-8afd-4603-afc8-1fd6d5b601d1%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




 -- 
 Adrien Grand
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2475cb1a-5631-4b06-8507-28c4d81f9d4d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Questions about scaling elasticsearch with regard to the number of documents indexed per second

2015-01-15 Thread Chinch Pokli
No, so the whole point was that, will elasticsearch be able to index say 
10,000 documents per second? If yes, I can simply hook up my twitter code 
to es. If not, I would need to think of how to make that happen.
Typically I've seen es indexes just around 30 docs per second which is 
pretty low.

I am hoping Redis/ Kafka/ Logstash/ etc. might help elasticsearch to get 
some breathing room and enable it to index up to 10K docs per second.

On Thursday, January 15, 2015 at 10:47:31 AM UTC+5:30, David Pilato wrote:

 You have a Twitter input so you can extract content from Twitter and send 
 to elasticsearch. No need to have Redis here. 

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 15 janv. 2015 à 00:02, Chinch Pokli cpo...@gmail.com javascript: a 
 écrit :

 Thanks. I'll have a look at the raw option.
 Regarding logstash, I don't fully understand it's utility. It says that it 
 can take messages from a Redis server. But if I have to set up Redis, I 
 could simply use the Redis river to index into Elasticsearch. Is there any 
 additional benefit that Logstash would give me?

 On Thursday, January 15, 2015 at 4:06:12 AM UTC+5:30, David Pilato wrote:

 You should look at raw option or better look at Logstash.

 My 2 cents.

 David

 Le 14 janv. 2015 à 23:29, Chinch Pokli cpo...@gmail.com a écrit :

 Hi,

 I am using elasticsearch to index twitter stream. Until recently I was 
 using the official river which was working great but realized that it 
 throwing out much of the data (e.g. it is not storing number of followers 
 etc. data).

 Is there a way to make the river to store all the data? If not, I am fine 
 with writing a streaming code which will stream and index. But have a 
 concern. How many documents can elasticsearch index per second? I might 
 eventually need to index almost 10,000 documents (each document = 2 KB) per 
 second (current requirement is of 100 documents per second). Is this even 
 feasible? If yes, do I need to make any special modifications?

 Thanks-in-advance!!

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Just initialize shards when problems but no rebalance

2015-01-15 Thread Matías Waisgold
Yes, I've seen that but the problem is that when the threshold is reached
it removes all shards from the server instead of just removing 1 and
balance. And when that happens the cluster starts to move shards over
everywhere and it never stops.

Another problem we are having is that in the file storage we see data from
shards that are not assigned to itself so it can´t allocate anything in
this dirty state.

2015-01-15 0:09 GMT-03:00 Mark Walkom markwal...@gmail.com:

 You could do this, but it's a lot of manual overhead to have to deal with.
 However ES does have some disk space awareness during allocation, take a
 look at
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/index-modules-allocation.html#disk

 On 15 January 2015 at 10:57, Matías Waisgold mwaisg...@gmail.com wrote:

 Hi is there any setting that I can put to ES that it automatically
 assigns shards that are unassigned but never ever rebalance the cluster?
 I´ve found several issues when rebalancing and prefer to do it manually.
 If I set cluster.routing.allocation.enable to none nothing happens.
 If I set it to all then it starts rebalancing.

 Is it ok to combine cluster.routing.allocation.allow_rebalance to none
 and cluster.routing.allocation.enable to all.

 The issue is mainly because we are running low on disk and when that
 happens elasticsearch removes all shards from an instance, that doesn´t
 care about cluster.routing.allocation.cluster_concurrent_rebalance and
 starts moving shards like crazy around the entire cluster, filling the
 storage on other instances in the way that it will never stop balancing.

 Kind regards

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/666a4d70-2497-4a2b-8c5e-774c7d0617b7%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/CHqlig1M-T0/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8KXKpmnAPWvr8a_Mgny75KkkKxRFP_bJVhQL20bhR0UQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMaTqYqFmk8t7couOmYEyPYNZPKepT8nKVrCM6fvSPW0CUjMwA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can we achieve an equivalent of this SQL a query in Elasticsearch?

2015-01-15 Thread Lokesh Gupta
Thanks.. Any other creative solutions?

On Thursday, January 15, 2015 at 1:54:10 PM UTC+5:30, David Pilato wrote:

 I think you need to run two queries for now. One is an aggregation (max). 
 The other one use the result of this aggregation to search for documents.

 My 2 cents

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 http://Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs


  
 Le 15 janv. 2015 à 09:13, Lokesh Gupta lgu...@gmail.com javascript: a 
 écrit :

 What will be equivalent of the following query in the Elasticsearch world..

 select myDate, col1, col2 from myTable
 where myDate = (select max(myDate) from myTable)

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/906c817f-3ca4-4a7b-a0cc-a316076ae332%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Deploy Elasticsearch in live

2015-01-15 Thread BizEcho Jmr
Hi all,

I use ElasticSearch locally on my PC as a search engine in a content website 
developed 
with the Django framework.

I would like your opinion on the choice of a host offers production, ideally a 
scalable offering.

I consulted the offers of DigitalOcean, Amazon EC2, OVH (OVH VPC, runAbove 
...).
Amazon EC2 offers a free initial first year but I do not know if this offer is 
suitable for my application.
The first offers DigitalOcean is $ 5 / month, but the memory is only 512 MB. 
I just received an email and find out that it was now possible to deploy 
ElasticSearch Google Compute Engine.

  And what would be the impact on this configuration in live if I planned to 
use as Logstash and Kibana.

Thank you in advance for your host offers advice in live.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/554ee6b4-7652-4ccc-9d17-27c117a26cf9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Http Cors Setting

2015-01-15 Thread Raffaele Garofalo
In my case I faced the same issue cause my web tier is hosted on a 
different domain.
My configuration is working quite well, I can see the pre-flight (OPTIONS) 
call returning 200 and then subsequent POST or GET being succesfull.

I have used the following configuration:

http.cors.enabled: true
http.cors.allow-origin: my regex for my domains
http.cors.allow-methods: OPTIONS, HEAD, GET, POST, PUT, DELETE
http.cors.allow-credentials: true
http.cors.allow-headers: X-Requested-With, Content-Type, Content-Length, 
accept, authorization

You can work with Chrome F12 and verify which are the pre-flight headers 
sent by your application and add them to the parameter 
http.cors.allow-headers

On Tuesday, November 11, 2014 at 1:21:05 PM UTC+1, Reza Samee wrote:

 Hello to all!

 Note: I'm new to ELK :)

 I'm using elasticsearch 1.4.0 and I'm trying to enable http.cors feature 
 in elasticsearch. When I set http.cors.enabled: true and 
 http.cors.allow-origin: * in config file and then restart, the 
 http.cors feature doesn't enabled yet and I can't use kibana again. 
 What's wrong with my config file?

 elasticsearch.conf:

 http.cors.enabled: true
 http.cors.allow-origin: *



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a6aa7e2b-5809-4d42-8dc5-3fdfc7dd8547%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can we achieve an equivalent of this SQL a query in Elasticsearch?

2015-01-15 Thread Mark Harwood
Sorted query?

GET /myIndex/_search
{
query:{match_all: {}}, 
fields:[myDate,col1],
sort: [
   {
  myDate: {
 order: desc
  }
   }
]
}


On Thursday, January 15, 2015 at 1:05:22 PM UTC, Lokesh Gupta wrote:

 Thanks.. Any other creative solutions?

 On Thursday, January 15, 2015 at 1:54:10 PM UTC+5:30, David Pilato wrote:

 I think you need to run two queries for now. One is an aggregation (max). 
 The other one use the result of this aggregation to search for documents.

 My 2 cents

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 http://Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs


  
 Le 15 janv. 2015 à 09:13, Lokesh Gupta lgu...@gmail.com a écrit :

 What will be equivalent of the following query in the Elasticsearch 
 world..

 select myDate, col1, col2 from myTable
 where myDate = (select max(myDate) from myTable)

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8dffc8cf-8dee-4584-8fac-119482ea0831%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can we achieve an equivalent of this SQL a query in Elasticsearch?

2015-01-15 Thread Lokesh Gupta
Thanks for the suggestion. Sorted query would work if I am okay with 
getting data for dates other than the max(date). But in the use case I have 
I need to restrict the results to be only for max(date).

Is there a way to chain the output of a query as an input to another query?

On Thursday, January 15, 2015 at 7:10:51 PM UTC+5:30, Mark Harwood wrote:

 Sorted query?

 GET /myIndex/_search
 {
 query:{match_all: {}}, 
 fields:[myDate,col1],
 sort: [
{
   myDate: {
  order: desc
   }
}
 ]
 }


 On Thursday, January 15, 2015 at 1:05:22 PM UTC, Lokesh Gupta wrote:

 Thanks.. Any other creative solutions?

 On Thursday, January 15, 2015 at 1:54:10 PM UTC+5:30, David Pilato wrote:

 I think you need to run two queries for now. One is an aggregation 
 (max). The other one use the result of this aggregation to search for 
 documents.

 My 2 cents

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 http://Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs


  
 Le 15 janv. 2015 à 09:13, Lokesh Gupta lgu...@gmail.com a écrit :

 What will be equivalent of the following query in the Elasticsearch 
 world..

 select myDate, col1, col2 from myTable
 where myDate = (select max(myDate) from myTable)

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/cee4d390-a53c-4c11-ae4b-4d40023ca889%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8973324f-32fc-4b90-b549-df014808d729%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Suggestion for an ElasticSearch plugin that forward documents at indexng time

2015-01-15 Thread Stefano Ruggiero
Hi all,

i would like to know if someone have play around an ElasticSearch plugin 
that can forward documents at indexing time to an external source, i dont 
want to do it throught logstash but only whene doc is indexed
my goal is to take that plugin as an example of my custom one, i would like 
to have a plugin that receive a copy of a document that is indexed so we 
can manipulate it in real time and then send it to an external database or 
interface.

thanks for all the suggestions

regards

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/144e4f63-d686-4bb2-aded-cc9a77c28971%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


need help for search

2015-01-15 Thread Thibaut Owczarz
Hello,
 I start learning Elasticsearch, and i have a problem for understand how 
search. anyone could help me? 

My gist for all my structure and my data is here
https://gist.github.com/thibaut1001/7a3000c3ff371be3a52d

My problem is just in 4part
To search in multi field by data like this


## We need to search henry in field selected
curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{
query : {
bool: {
must: [ ],
must_not: [ ],
should: [
{
term: {
user.sku: henry
}
},
{
term: {
   user.internal_code: henry
}
},
{
term: {
   user.firstname: henry
}
},
{
term: {
   user.lastname: henry
}
},
{
term: {
   user.address: henry
}
},
{
term: {
   user.city: henry
}
},
{
term: {
   user.localized_description: henry
}
},
{
term: {
   user.localized_keywords: henry
}
},
{
term: {
   user.service.localized_label: henry
}
},
{
term: {
   user.medias.localized_label: henry
}
},
{
term: {
   user.services.localized_label: henry
}
}
]
}
}
}';

## Return no results Why?

I have many question.
Could you help me please,
thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c32551bd-cd04-4227-b783-40ca556928f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Excluding Terms Using a Minus Sign

2015-01-15 Thread David Pilato
Yes simple query string query supports this.
See 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#query-dsl-simple-query-string-query

David

 Le 15 janv. 2015 à 20:37, Cindy Conway cindyanncon...@gmail.com a écrit :
 
 Is there a way to exclude a term if the user precedes it with a minus sign; 
 the way google does. For example, if I want to search for the word lovre, but 
 I don't want the museum in France, I can search for: 
 louve -museum as my search terms. Does ES support this? I am not finding 
 anything like that in the documentation.
 
 Thanks All!
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/b7e7fa83-332f-4fc9-a704-5abccb2d9856%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/483A7A3D-24E0-4292-B156-55DD3874AEA5%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Complex search

2015-01-15 Thread Serge Schumacher
Hi,
I'm looking to create a search behaviour like Amazon does.

I have an index with 3 Fields  : Title, Description and Category.

I want to search in the fields title and descriptions for the word *car* 
and I would like to get scored result like this :

car   -- score : 1 in category vehicles
autocar-- score : 0,5 in category vehicles where the part car should 
highlighted ex : auto*car*
carradio -- core : 0,5 in category vehicles where the part car should 
highlightedex : *car*radio

and that if the word is found in the title field, the score should be 
higher as if the word would only be found in the description field.

Is anybody out there who could help me on this topic or at least point me 
to the right direction where I should look for ?

Thanks,
Serge

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3f5e7d15-74c2-49ab-bc8f-231d01899fa4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is ElasticSearch truly scalable for analytics?

2015-01-15 Thread Nils Dijk
Adding a 'node reduce phase' to aggregations is something I'm very 
interested in, and also investigating for the project I'm currently working 
on.

If you introduce an extra reduction phase (for multiple shards on the same 
node) you introduce further potential for inaccuracies in the final 
results.

This is true if you only reduce the top-k items per shard, but I was 
thinking to reduce the complete set of buckets locally. This takes a bit 
more cpu, and memory, but my guess is that this is negligible compared to 
the work already being done by the aggregation framework. If you reduce the 
buckets on the node before sending it to the coordinator it will actually 
increase the accuracy for aggregations!

how many of these sorts of use cases generate sufficiently large trees of 
results where a node-level merging would be beneficial

It is primarily beneficial for bigger installations with lots of shards per 
machine. Say 40 machines with ~100 shards per machine. In the current 
strategy where every node is sending 100 results there is a lot of 
bandwidth used on the coordinating node, since it receives 4000 responses, 
while it could do with 40 responses (1 per machine).

I acknowledge it is a highly specialised use-case which not very many 
people run into, but it is a case I'm currently working on.

How hard would it to be to implement such a feature?

I have been looking into this, and it is not trivial. This needs to be 
implemented in/around the SearchService. This is the place I found to be 
implementing the different search strategies, eg. DFS. Unlike the rest of 
Elasticsearch it does seem to not consist of modules that implement 
different search strategies.

Regarding the accuracy of top-k lists. I think the above, both the 'node 
reduce phase' and making the search strategy pluggable will be the 
groundwork to start working on implementations of TJA or TPUT strategies as 
discussed in an old issue[1] about accuracy of factes.

The order of steps to take before reaching the ultimate goal would be:
1) Make search strategies (eg. query then fetch, dfs query then fetch) more 
modularized.
2) Make a search strategy with a 'node reduce phase' for the aggregations. 
Start with a complete reduce on the node. If that takes to much memory/time 
you can use TJA or TPUT locally on the node to get a reliable top-k list.
3a) Make a search strategy that executes TJA on the cluster coordinated by 
the coordinating node
3b) Make a separate strategy that executes TPUT on the cluster coordinated 
by the coordinating node

I would say that 3a and 3b are 'easy' if doing a complete reduce in step 2 
is not consuming to much resources.

Adding strategies for both TJA and TPUT gives ultimate control to the user, 
as TPUT is not suited for reliably sorting on sums where the field might 
contain a negative value. But TPUT has better performance in latency over 
TJA.

I would love to get an opinion from Adrien concerning the feasibility of 
such an approach.

-- Nils

[1] https://github.com/elasticsearch/elasticsearch/issues/1305

On Wednesday, January 14, 2015 at 7:47:07 PM UTC+1, Elliott Bradshaw wrote:

 How hard would it to be to implement such a feature?  Even if there are 
 only a handful of use cases, it could prove very helpful in these.  
 Particularly since very large trees are the ones that will struggle the 
 most with bandwidth issues.


 On Wednesday, January 14, 2015 at 1:36:53 PM UTC-5, Mark Harwood wrote:

 Understood, but what about cases where size is set to unlimited?  
 Inaccuracies are not a concern in that case, correct?


 Correct. But if we only consider the scenarios where the key sets are 
 complete and accuracy is not put at risk by merging (i.e. there is no top 
 N type filtering in play), how many of these sorts of use cases generate 
 sufficiently large trees of results where a node-level merging would be 
 beneficial? 
  


 On Wednesday, January 14, 2015 at 1:09:48 PM UTC-5, Mark Harwood wrote:

 If you introduce an extra reduction phase (for multiple shards on the 
 same node) you introduce further potential for inaccuracies in the final 
 results.
 Consider the role of 'size' and 'shard_size' in the terms aggregation 
 [1] and the effects they have on accuracy. You'd arguably need a 
 'node_size' setting to also control the size of this new intermediate 
 collection. All stages that reduce the volumes of data processed can 
 introduce an approximation with the potential for inaccuracies upstream 
 when merging.


 [1] 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_shard_size

 On Wednesday, January 14, 2015 at 5:44:47 PM UTC, Elliott Bradshaw 
 wrote:

 Adrien,

 I get the feeling that you're a pretty heavy contributor to the 
 aggregation module.  In your experience, would a shard per cpu core 
 strategy be an effective performance solution in a pure aggregation use 
 case?If this could proportionally 

Re: Questions about scaling elasticsearch with regard to the number of documents indexed per second

2015-01-15 Thread David Pilato
I can index on my laptop 1-12000 docs per second. SSD drives of course.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 15 janv. 2015 à 13:43, Chinch Pokli cpo...@gmail.com a écrit :
 
 No, so the whole point was that, will elasticsearch be able to index say 
 10,000 documents per second? If yes, I can simply hook up my twitter code to 
 es. If not, I would need to think of how to make that happen.
 Typically I've seen es indexes just around 30 docs per second which is pretty 
 low.
 
 I am hoping Redis/ Kafka/ Logstash/ etc. might help elasticsearch to get some 
 breathing room and enable it to index up to 10K docs per second.
 
 On Thursday, January 15, 2015 at 10:47:31 AM UTC+5:30, David Pilato wrote:
 You have a Twitter input so you can extract content from Twitter and send to 
 elasticsearch. No need to have Redis here. 
 
 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
 
 Le 15 janv. 2015 à 00:02, Chinch Pokli cpo...@gmail.com a écrit :
 
 Thanks. I'll have a look at the raw option.
 Regarding logstash, I don't fully understand it's utility. It says that it 
 can take messages from a Redis server. But if I have to set up Redis, I 
 could simply use the Redis river to index into Elasticsearch. Is there any 
 additional benefit that Logstash would give me?
 
 On Thursday, January 15, 2015 at 4:06:12 AM UTC+5:30, David Pilato wrote:
 You should look at raw option or better look at Logstash.
 
 My 2 cents.
 
 David
 
 Le 14 janv. 2015 à 23:29, Chinch Pokli cpo...@gmail.com a écrit :
 
 Hi,
 
 I am using elasticsearch to index twitter stream. Until recently I was 
 using the official river which was working great but realized that it 
 throwing out much of the data (e.g. it is not storing number of followers 
 etc. data).
 
 Is there a way to make the river to store all the data? If not, I am fine 
 with writing a streaming code which will stream and index. But have a 
 concern. How many documents can elasticsearch index per second? I might 
 eventually need to index almost 10,000 documents (each document = 2 KB) 
 per second (current requirement is of 100 documents per second). Is this 
 even feasible? If yes, do I need to make any special modifications?
 
 Thanks-in-advance!!
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/FD1F8969-377F-420C-A2CF-438F7383C890%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: need help for search

2015-01-15 Thread Thibaut Owczarz
i found my first error, no need user. because i search already in user.
but why when i search a defined sku, no found only one ?


curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{
query : {
bool: {
must: [ ],
must_not: [ ],
should: [
{
term: {
sku: 01b3ae496c0142f993cf131c607fe003
}
},
{
term: {
   internal_code: 01b3ae496c0142f993cf131c607fe003
}
},
{
match: {
   firstname: 01b3ae496c0142f993cf131c607fe003
}
},
{
match: {
   lastname: 01b3ae496c0142f993cf131c607fe003
}
},
{
match: {
   address: 01b3ae496c0142f993cf131c607fe003
}
},
{
match: {
   city: 01b3ae496c0142f993cf131c607fe003
}
},   
{
match: {
   localized_description: 
01b3ae496c0142f993cf131c607fe003
}
},   
{
match: {
   localized_keywords: 
01b3ae496c0142f993cf131c607fe003
}
},
{
match: {
   service.localized_label: 
01b3ae496c0142f993cf131c607fe003
}
},
{
match: {
   medias.localized_label: 
01b3ae496c0142f993cf131c607fe003
}
},
{
match: {
   services.localized_label: 
01b3ae496c0142f993cf131c607fe003
}
}
]
}
}
}';

they return all my users.

Thanks

Le jeudi 15 janvier 2015 14:58:16 UTC+1, Thibaut Owczarz a écrit :

 Hello,
  I start learning Elasticsearch, and i have a problem for understand how 
 search. anyone could help me? 

 My gist for all my structure and my data is here
 https://gist.github.com/thibaut1001/7a3000c3ff371be3a52d

 My problem is just in 4part
 To search in multi field by data like this


 ## We need to search henry in field selected
 curl -XPOST 'http://localhost:9200/test_fr/user/_search' -d '{
 query : {
 bool: {
 must: [ ],
 must_not: [ ],
 should: [
 {
 term: {
 user.sku: henry
 }
 },
 {
 term: {
user.internal_code: henry
 }
 },
 {
 term: {
user.firstname: henry
 }
 },
 {
 term: {
user.lastname: henry
 }
 },
 {
 term: {
user.address: henry
 }
 },
 {
 term: {
user.city: henry
 }
 },
 {
 term: {
user.localized_description: henry
 }
 },
 {
 term: {
user.localized_keywords: henry
 }
 },
 {
 term: {
user.service.localized_label: henry
 }
 },
 {
 term: {
user.medias.localized_label: henry
 }
 },
 {
 term: {
user.services.localized_label: henry
 }
 }
 ]
 }
 }
 }';

 ## Return no results Why?

 I have many question.
 Could you help me please,
 thanks



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2ced6dc5-fa42-43bd-81bf-99ce4f7bedb5%40googlegroups.com.
For more options, visit 

Seeking a Director of Data Engineering in Austin TX

2015-01-15 Thread Traci Martin
Hello All!

I am a recruiter in Austin, TX trying to fill a Director of Data 
Engineering for my client, also in Austin. They are ELK stack evangelists 
and would prefer some with, at least knowledge of Lucene or Hadoop. This is 
really a great company to work for and probably the nicest client I have 
had the pleasure of working with. 

It is a permanent position offering great benefits, a laid back atmosphere, 
and very competitive salary with options. If you are interested please feel 
free to contact me. 

*There will be no re-lo provided and no sponsorship at this time.

Traci Martin 
512-640-3656
tmar...@intersysconsulting.com

 *Director **Data Engineering*
 
*Who we are: *
*Intersys Consulting* is a leading Business Intelligence, Data Management, 
and Application Development professional services organization focused on 
providing solutions with real business value.  We provide a 
customer-focused approach to building authentic partnerships with our 
clients with objective counsel from concept to deployment for a consistent 
voice through the dynamic IT environment.

*What we look for: *
*Intersys Consulting *is focused on finding and cultivating talent across 
the IT space.  We have over 100 developers, project managers, business 
analysts, and data management professionals, most with over ten years of 
experience in their respective fields.  In new hires we look for 
authenticity; be proud of who you are and what you bring to the table, as 
well as those candidates who consistently deliver the highest quality 
product and have a deep desire to improve not just themselves, but the 
organization as a whole. 
 
*The Position:*
Intersys Consulting is seeking a Director of Data Engineering  to work at 
our client site in Austin, Texas.
 
*Primary Responsibilities:*
   
   - Build and optimize each component of our data pipeline
   - Work with our data scientists to provide data in the optimal format
   - Work with our DevOps team to ensure the data infrastructure is 
   reliable and scalable
   - Integrate with our data partners to enrich our firstparty data with 
   thirdparty sources
   - Stay on top of cutting edge technologies to constantly improve and 
   streamline our data systems

*Qualifications:*
   
   - Experience with high performance, high traffic web systems
   - Experience with monitoring systems: New Relic, ELK stack, etc.
   - Experience with either Hadoop or Elasticsearch/Lucene and a desire and 
   willingness to learn the other

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c61a0318-a9c8-496d-86de-54a4a7ba3349%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Need help for Custom Score On Array Fields

2015-01-15 Thread Sang Dang
Hi All,

Currently I have an array type, and I need to calculate score base on num 
matched terms filters.
For example:

Here is my mappings :

{
tweet : {
properties : {
tags : {type : string, index_name : tag},
}
}
}

My data will be indexed like that :

{
tweet : {  
tags : [USA,VN,GM]
}
}

So if I query with terms filter for : USA, and GM

My score will be 2/3 (it's mean num matched / tags array). (actually the 
score will be calculated with complex formal, but I just one to focus on 
the problem)

Thanks in advance. 



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3d047dbe-20d4-43ac-92c1-d1e21322c13c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Grandchild is not getting fetched by parent id

2015-01-15 Thread Masaru Hasegawa
Hi Iv,

You’d need to specify both parent and routing when you index grand children.
See 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/grandparents.html


Masaru

On January 15, 2015 at 20:44:43, Iv Igi (sayon...@gmail.com) wrote:
 I am experiencing an issue while trying to retrieve a grandchild record by
 its parent ID. (child-grandchild relationship)
 The amount of hits in result is always zero.
 Also the same request is working fine for parent-child relationship.
  
 My records are getting organized kinda like this:
  
 Account --(one to one)-- User --(one to one)-- Address
  
 My execution environment is:
 - Fedora 21 CE
 - openjdk 1.8.0_25
 - ES 1.4.2
  
 Here is a script that is showing the problem
  
 # index creation
 curl -XPUT localhost:9200/the_index/ -d {
 \mappings\: {
 \account\ : {},
 \user\ : {
 \_parent\ : {
 \type\ : \account\
 }
 },
 \address\ : {
 \_parent\ : {
 \type\ : \user\
 }
 }
 }
 };
  
 # mrsmith account creation
 curl -XPUT localhost:9200/the_index/account/mrsmith -d {
 \foo\ : \foo\
 };
  
 # john user creation
 curl -XPUT localhost:9200/the_index/user/john?parent=mrsmith -d {
 \bar\ : \bar\
 };
  
 # john user creation
 curl -XPUT localhost:9200/the_index/address/smithshouse?parent=john -d {  
 \baz\ : \baz\
 };
  
 # Here I am trying to retrieve a record. Getting zero hits.
 curl -XGET localhost:9200/the_index/address/_search?pretty -d {
 \query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ :
 \john\ } } } }
 };
  
 # Another approach with has_parent query type. Still getting zero hits.
 curl -XGET localhost:9200/the_index/address/_search?pretty -d {
 \query\ : {
 \has_parent\ : {
 \parent_type\ : \user\,
 \query\ : {
 \term\ : {
 \_id\ : \john\
 }
 }
 }
 }
 };
  
 # OK, lets try a routed search. Nope
 curl -XGET localhost:9200/the_index/address/_search?routing=johnpretty  
 -d {
 \query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ :
 \john\ } } } }
 };
  
 # Routed has_parent query. Same
 curl -XGET localhost:9200/the_index/address/_search?routing=johnpretty  
 -d {
 \query\ : {
 \has_parent\ : {
 \parent_type\ : \user\,
 \query\ : {
 \term\ : {
 \_id\ : \john\
 }
 }
 }
 }
 };
  
 # Retrieving a record by itself. Going just fine.
 curl -XGET localhost:9200/the_index/address/smithshouse?parent=john;
  
 # Querying for user record with the same query. Got a hit.
 curl -XGET localhost:9200/the_index/user/_search?pretty -d {
 \query\ : { \bool\ : { \must\ : { \term\ : { \_parent\ :
 \mrsmith\ } } } }
 };
  
  
  
 The output:
  
 {acknowledged:true}
 {_index:the_index,_type:account,_id:mrsmith,_version:1,created:true}{_index:the_index,_type:user,_id:john,_version:1,created:true}{_index:the_index,_type:address,_id:smithshouse,_version:1,created:true}
   
 {
 took : 54,
 timed_out : false,
 _shards : {
 total : 5,
 successful : 5,
 failed : 0
 },
 hits : {
 total : 0,
 max_score : null,
 hits : [ ]
 }
 }
 {
 took : 221,
 timed_out : false,
 _shards : {
 total : 5,
 successful : 5,
 failed : 0
 },
 hits : {
 total : 0,
 max_score : null,
 hits : [ ]
 }
 }
 {
 took : 35,
 timed_out : false,
 _shards : {
 total : 1,
 successful : 1,
 failed : 0
 },
 hits : {
 total : 0,
 max_score : null,
 hits : [ ]
 }
 }
 {
 took : 481,
 timed_out : false,
 _shards : {
 total : 1,
 successful : 1,
 failed : 0
 },
 hits : {
 total : 0,
 max_score : null,
 hits : [ ]
 }
 }
 {_index:the_index,_type:address,_id:smithshouse,_version:1,found:true,_source:{
   
 baz : baz
 }}
 {
 took : 65,
 timed_out : false,
 _shards : {
 total : 5,
 successful : 5,
 failed : 0
 },
 hits : {
 total : 1,
 max_score : 1.0,
 hits : [ {
 _index : the_index,
 _type : user,
 _id : john,
 _score : 1.0,
 _source:{
 bar : bar
 }
 } ]
 }
 }
  
 You can find out on resuls that ES got the required shard, but no records
 have been fetched.
 Probably I am doing it in a wrong way, and if it so please fix me up.
  
 --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch  
 group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.  
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/bbaebc65-a87f-4857-a2a4-577b0b487c6b%40googlegroups.com.
   
 For more options, visit https://groups.google.com/d/optout.
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.54b88def.46e87ccd.1877%40citra.local.
For more options, visit https://groups.google.com/d/optout.