date:20150219

Yes.

David

Le 19 févr. 2015 à 08:56, Jan-Erik Westlund je.westl...@gmail.com a écrit :

Correct, in that case, it will not be a rolling upgrade ;-) The service will
be down for a few minutes.
Can I then change all the nodenames, and the start the services on all the
nodes with the new names without messing things up ?

2015-02-19 7:58 GMT+01:00 David Pilato da...@pilato.fr:
You should define this in that case:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after

But it's not anymore a rolling upgrade, right? Your service will be down for
some seconds/minutes I guess.

David

Le 19 févr. 2015 à 07:52, Jan-Erik Westlund je.westl...@gmail.com a écrit
:

I understand that, but is it safe to change all the nodenames and restart
all the nodes at the same time ?

Skickat från min iPhone 6.

19 feb 2015 kl. 07:47 skrev David Pilato da...@pilato.fr:

You can change safely elasticsearch.yml file while elasticsearch is
running.
This file is only loaded when elasticsearch starts.

David

Le 19 févr. 2015 à 07:33, Jan-Erik Westlund je.westl...@gmail.com a
écrit :

Hi again !

Thanks for Rolling restart info, that was really helpful.
But since the elasticsearch.yml file is managed by Puppet, all the
nodenames will change pretty much at the same time !
So in my case it would be best to shutdown the ES daemon on all nodes
first, apply the Puppet changes and then start the ES cluster again...
Is it safe to do so ?

//Jan-Erik

Den onsdag 18 februari 2015 kl. 16:44:35 UTC+1 skrev David Pilato:
Have a look at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html#_rolling_restart_of_nodes_full_cluster_restart

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 18 févr. 2015 à 16:37, Jan-Erik Westlund je.we...@gmail.com a
écrit :

Thanks David !

All my Recovery Throttling settings are default in the
elasticsearch.yml file.
How do I disable allocation, in a running production environment ?
Do I need to disable allocation first, restart each node / daemon, and
after rename the nodes ?

Or maybe it would be better to down the ES cluster (all 3 nodes) during
a maintenance windows, change all names, and then restart the ES
cluster nodes again ?

//Jan-Erik

Den onsdag 18 februari 2015 kl. 16:18:42 UTC+1 skrev David Pilato:
Yes. It’s safe.
You can do it one at a time.

If you already have data around and don’t want your shards moving
during this, you should disable allocation.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 18 févr. 2015 à 16:14, Jan-Erik Westlund je.we...@gmail.com a
écrit :

Hi !

Is it safe to change the node names of my 3 nodes in an existing
elasticsearch 1.4.0 cluster ?

The reason is to get rid of the random names like: Elizabeth Betsy
Braddock, Franz Kafka, etc...

Is it just to set the node.name: server name in elasticsearch.yml
and then restart the daemon ?
Do I do it one node at the time, or do I need down the cluster and
then change all node names, and then bring up the cluster again ?

//Jan-Erik

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9eca2130-030d-468c-825d-b66c8766ba4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ZjOHjpXVZ00/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the

Re: Aggregations failing on fields with custom analyzer..

I don’t know without a concrete example.
I’d say that if you map have a type number and you send 123 it could work.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 19 févr. 2015 à 09:30, Anil Karaka anilkar...@gmail.com a écrit :

It was my mistake, the field I was trying to do an aggregation was mapped
double, I assumed its a string, after seeing some sample documents with
strings..

Why didn't es throw an error when I'm indexing docs with strings instead of
double..?

On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:
Did you apply your analyzer to your mapping?

David

Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com javascript: a
écrit :

http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear

Posted in stack over flow as well..

On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
I wanted a custom analyzer that behaves exactly like not_analyzed, except
that fields are case insensitive..

I have my analyzer as below,

index: {
analysis: {
analyzer: { // Custom Analyzer with keyword tokenizer and
lowercase filter, same as not_analyzed but case insensitive
case_insensitive_keyword_analyzer: {
tokenizer: keyword,
filter: lowercase
}
}
}
}

But when I'm trying to do term aggregation over a field with strings
analyzed as above, I'm getting this error..

{
error
:ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
cannot be cast to
org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket],
status : 500
}

Are there additional settings that I have to update in my custom analyzer
for my terms aggregation to work..?

The better question is I want a custom analyzer that does everything similar
to not_analyzed but is case insensitive.. How do I achieve that?

https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout
https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com

https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout
https://groups.google.com/d/optout.

Re: Aggregations failing on fields with custom analyzer..

It was my mistake, the field I was trying to do an aggregation was mapped
double, I assumed its a string, after seeing some sample documents with
strings..

Why didn't es throw an error when I'm indexing docs with strings instead of
double..?

On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:

Did you apply your analyzer to your mapping?

David

Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com javascript:
a écrit :

http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear

Posted in stack over flow as well..

On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:

I wanted a custom analyzer that behaves exactly like not_analyzed, except
that fields are case insensitive..

I have my analyzer as below,

But when I'm trying to do term aggregation over a field with strings
analyzed as above, I'm getting this error..

Are there additional settings that I have to update in my custom analyzer
for my terms aggregation to work..?

The better question is I want a custom analyzer that does everything similar
to not_analyzed but is case insensitive.. How do I achieve that?

https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch search performance question

2015-02-19 Thread Mark Harwood

Good stuff. You're seeing the benefits of not caching lots of single-use 
BitSets.
Now if you swap queries for your filters then you'll also see the benefits 
of not allocating multi-megabyte BitSets to hold what is typically a single 
bit for each query that you run.

On Thursday, February 19, 2015 at 6:23:08 AM UTC, Jay Danielian wrote:

 Just to update the thread.

 I added code to disable cache on all the term filters we were using, and 
 it made a huge performance improvement. Now we are able to service the 
 queries with average response time under two seconds, which is excellent 
 (we are bundling several searches using _msearch, so  2 seconds total 
 response is good) The search requests / sec metric is still peaking at 
 around 600 / sec, however our CPU only spikes to about 65% now - so I 
 think we can add more search threads to our config as we are no longer 
 maxing out CPU. I also see a a bit of disk read activity now, which against 
 our non RAID EBS drive - means we may be able to squeeze more if we switch 
 disk setup.

 It seems like having these filters add cache items was wasting CPU on 
 cache eviction and cache lookups (cache misses really) for each query - 
 which really only shows up when trying to push some load through.

 Thanks for everyone's suggestions!!

 J

 On Friday, February 13, 2015 at 11:55:52 AM UTC-5, Jay Danielian wrote:

 Thanks to all for these great suggestions. I haven't had a chance to 
 change the syntax yet, as that is a risky thing for me to quickly change 
 against our production setup. My plan is to try that this weekend (so I can 
 properly test the new syntax is returning the same results). However, is 
 there a way to turn filter caching off globally via config or elsewhere?

 Thanks!

 J

 On Friday, February 13, 2015 at 11:25:20 AM UTC-5, Mark Harwood wrote:

 So I can see in the hot threads dump the initialization requests for 
 those FixedBitSets I was talking about.
 Looking at the number of docs in your index I estimate each Term to be 
 allocating 140mb of memory in total for all these bitsets across all shards 
 given the 1bn docs in your index. Remember that you are probably setting 
 only a single bit in each of these large structures. 
 Another stat (if I read it correctly) shows 5m evictions of these 
 cached filters given their low reusability. It's fair to say you have some 
 cache churn going on :)
 Did you try my earlier suggestion of queries not filters?




 On Friday, February 13, 2015 at 2:29:42 PM UTC, Jay Danielian wrote:

 As requested here is a dump of the hot threads output. 

 Thanks!

 J

 On Thursday, February 12, 2015 at 6:45:23 PM UTC-5, Nikolas Everett 
 wrote:

 You might want to try hitting hot threads while putting your load on 
 it and seeing what you see.  Or posting it.

 Nik

 On Thu, Feb 12, 2015 at 4:44 PM, Jay Danielian 
 jay.da...@circleback.com wrote:

 Mark,

 Thanks for the initial reply. Yes, your assumption about these things 
 being very specific and thus not likely to have any re-use with regards 
 to 
 caching is correct. I have attached some screenshots from the BigDesk 
 plugin which showed a decent snapshot of what the server looked like 
 while 
 my tests were running. You can see the spikes in CPU, that essentially 
 covered the duration when the JMeter tests were running. 

 At a high level, the only thing that seems to be really stressed on 
 the server is CPU. But that makes me think that there is something in my 
 setup , query syntax, or perhaps the cache eviction rate, etc that is 
 causing it to spike so high. I also have concerns about non RAID 0 the 
 EBS 
 volumes, as I know that having one large volume does not maximize 
 throughput - however, just looking at the stats  it doesn't seem like IO 
 is 
 really a bottleneck.

 Here is a sample query structure = 
 https://gist.github.com/jaydanielian/c2be885987f344031cfc

 Also this is one query - in reality we use _msearch to pipeline 
 several of these queries in one batch. The queries also include custom 
 routing / route key to make sure we only hit one shard.

 Thanks!

 J


 On Thursday, February 12, 2015 at 4:22:29 PM UTC-5, Mark Walkom wrote:

 It'd help if you could gist/pastebin/etc a query example.

 Also your current ES and java need updating, there are known issues 
 with java 1.7u55, and you will always see performance boosts running 
 the 
 latest version of ES.

 That aside, what is your current resource utilisation like?  Are you 
 seeing lots of cache evictions, high heap use, high CPU, IO delays?

 On 13 February 2015 at 07:32, Jay Danielian 
 jay.da...@circleback.com wrote:

 I know this is difficult to answer, the real answer is always It 
 Depends :) But I am going to go ahead and hope I get some feedback 
 here.

 We are mainly using ES to issue terms searches against fields that 
 are non-analyzed. We are using ES like a key value store, where once 
 the 
 match is found we parse the _source JSON and return our model. We

Re: Aggregations failing on fields with custom analyzer..

If you can provide a full example working as I did, we can try it and see what 
is wrong.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
https://twitter.com/elasticsearchfr | @scrutmydocs 
https://twitter.com/scrutmydocs



 Le 19 févr. 2015 à 10:01, Anil Karaka anilkar...@gmail.com a écrit :
 
 Im getting this error as well using your PUT requests..
 
 It feels like I'm doing something wrong.. But I don't know what exactly..
 
 I'm using this index template.. 
 https://gist.github.com/syllogismos/c2dde4f097fea149e1a0
 
 I didn't specify a particular mapping from my index but reindexed from a 
 previous index.. and ended up with that mapping and documents that looks like 
 above.. Am I seeing things and an obvious mistake? So lost right now..
 
 On Thursday, February 19, 2015 at 2:23:10 PM UTC+5:30, David Pilato wrote:
 I think you are doing something wrong.
 
 DELETE index
 PUT index
 {
   mappings: {
 doc: {
   properties: {
 foo: {
   type: double
 }
   }
 }
   }
 }
 PUT index/doc/1
 {
   foo: bar
 }
 
 gives:
 
 {
error: MapperParsingException[failed to parse [foo]]; nested: 
 NumberFormatException[For input string: \bar\]; ,
status: 400
 }
 
 -- 
 David Pilato | Technical Advocate | Elasticsearch.com 
 http://elasticsearch.com/
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs
 
 
 
 Le 19 févr. 2015 à 09:39, Anil Karaka anilk...@gmail.com javascript: a 
 écrit :
 
 _source : {
 Sort : ,
 gt : 2015-02-18T15:07:10,
 uid : 54867dc55b482b04da7f23d8,
 usId : 54867dc55b482b04da7f23d7,
 ut : 2015-02-18T20:37:10,
 act : productlisting,
 st : 2015-02-18T15:07:46,
 Filter : ,
 av : 3.0.0.0,
 ViewType : SmallSingleList,
 os : Windows,
 categoryid : home-kitchen-curtains-blinds
 }
 
 properties : {
 uid : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 ViewType : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 usId : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 os : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 Sort : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 Filter : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 categoryid : {
 type : double
 },
 gt : {
 format : dateOptionalTime,
 type : date
 },
 ut : {
 format : dateOptionalTime,
 type : date
 },
 st : {
 format : dateOptionalTime,
 type : date
 },
 act : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 av : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 }
 }
 
 
 A sample document and the index mappings above..
 
 
 On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:
 I don’t know without a concrete example.
 I’d say that if you map have a type number and you send 123 it could work. 
 
 -- 
 David Pilato | Technical Advocate | Elasticsearch.com 
 http://elasticsearch.com/
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs
 
 
 
 Le 19 févr. 2015 à 09:30, Anil Karaka anilk...@gmail.com  a écrit :
 
 It was my mistake, the field I was trying to do an aggregation was mapped 
 double, I assumed its a string, after seeing some sample documents with 
 strings..
 
 Why didn't es throw an error when I'm indexing docs with strings instead of 
 double..?
 
 On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:
 Did you apply your analyzer to your mapping?
 
 David
 
 Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com  a écrit :
 
 http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear
  
 http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear
 
 Posted in stack over flow as well..
 
 On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
 I wanted a custom analyzer that behaves exactly like not_analyzed, except 
 that fields are case insensitive..
 
 I have my analyzer as below, 
 
 index: {
 analysis: {
 analyzer: { // Custom Analyzer with keyword tokenizer and 
 lowercase filter, same as not_analyzed but case insensitive
 case_insensitive_keyword_analyzer: {
 tokenizer: keyword,
 filter: lowercase
 }
 }
 }
 }
 
 But when I'm trying to do term aggregation over a field with strings analyzed 
 as above, I'm getting this error..
 
 {
 error 
 :ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
  cannot be cast to 
 org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket],

Re: Is it safe to change node names in an existing ElasticSearch cluster

2015-02-19 Thread Jan-Erik Westlund

Ok, thanks again.

2015-02-19 9:06 GMT+01:00 David Pilato da...@pilato.fr:

Yes.

David

Le 19 févr. 2015 à 08:56, Jan-Erik Westlund je.westl...@gmail.com a
écrit :

Correct, in that case, it will not be a rolling upgrade ;-) The service
will be down for a few minutes.
Can I then change all the nodenames, and the start the services on all the
nodes with the new names without messing things up ?

2015-02-19 7:58 GMT+01:00 David Pilato da...@pilato.fr:

You should define this in that case:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after

But it's not anymore a rolling upgrade, right? Your service will be down
for some seconds/minutes I guess.

David

Le 19 févr. 2015 à 07:52, Jan-Erik Westlund je.westl...@gmail.com a
écrit :

I understand that, but is it safe to change all the nodenames and restart
all the nodes at the same time ?

Skickat från min iPhone 6.

19 feb 2015 kl. 07:47 skrev David Pilato da...@pilato.fr:

You can change safely elasticsearch.yml file while elasticsearch is
running.
This file is only loaded when elasticsearch starts.

David

Le 19 févr. 2015 à 07:33, Jan-Erik Westlund je.westl...@gmail.com a
écrit :

Hi again !

//Jan-Erik

Den onsdag 18 februari 2015 kl. 16:44:35 UTC+1 skrev David Pilato:

Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/cluster-nodes-shutdown.html#_rolling_
restart_of_nodes_full_cluster_restart

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com
http://Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 18 févr. 2015 à 16:37, Jan-Erik Westlund je.we...@gmail.com a
écrit :

Thanks David !

Or maybe it would be better to down the ES cluster (all 3 nodes) during
a maintenance windows, change all names, and then restart the ES cluster
nodes again ?

//Jan-Erik

Den onsdag 18 februari 2015 kl. 16:18:42 UTC+1 skrev David Pilato:

Yes. It’s safe.
You can do it one at a time.

If you already have data around and don’t want your shards moving
during this, you should disable allocation.

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com
http://elasticsearch.com/*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 18 févr. 2015 à 16:14, Jan-Erik Westlund je.we...@gmail.com a
écrit :

Hi !

Is it safe to change the node names of my 3 nodes in an existing
elasticsearch 1.4.0 cluster ?

The reason is to get rid of the random names like: Elizabeth Betsy
Braddock, Franz Kafka, etc...

Is it just to set the node.name: server name in elasticsearch.yml
and then restart the daemon ?
Do I do it one node at the time, or do I need down the cluster and then
change all node names, and then bring up the cluster again ?

//Jan-Erik

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0bed6a3d-9315-4060-9585-cf68907f844b%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a79c0ae3-d786-4bf4-80cb-61acdb8804d3%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregations failing on fields with custom analyzer..

_source : {
Sort : ,
gt : 2015-02-18T15:07:10,
uid : 54867dc55b482b04da7f23d8,
usId : 54867dc55b482b04da7f23d7,
ut : 2015-02-18T20:37:10,
act : productlisting,
st : 2015-02-18T15:07:46,
Filter : ,
av : 3.0.0.0,
ViewType : SmallSingleList,
os : Windows,
categoryid : home-kitchen-curtains-blinds
}

properties : {
uid : {
analyzer : case_insensitive_keyword_analyzer,
type : string
},
ViewType : {
analyzer : case_insensitive_keyword_analyzer,
type : string
},
usId : {
analyzer : case_insensitive_keyword_analyzer,
type : string
},
os : {
analyzer : case_insensitive_keyword_analyzer,
type : string
},
Sort : {
analyzer : case_insensitive_keyword_analyzer,
type : string
},
Filter : {
analyzer : case_insensitive_keyword_analyzer,
type : string
},
categoryid : {
type : double
},
gt : {
format : dateOptionalTime,
type : date
},
ut : {
format : dateOptionalTime,
type : date
},
st : {
format : dateOptionalTime,
type : date
},
act : {
analyzer : case_insensitive_keyword_analyzer,
type : string
},
av : {
analyzer : case_insensitive_keyword_analyzer,
type : string
}
}


A sample document and the index mappings above..


On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:

 I don’t know without a concrete example.
 I’d say that if you map have a type number and you send 123 it could 
 work. 

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 http://Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs


  
 Le 19 févr. 2015 à 09:30, Anil Karaka anilk...@gmail.com javascript: 
 a écrit :

 It was my mistake, the field I was trying to do an aggregation was mapped 
 double, I assumed its a string, after seeing some sample documents with 
 strings..

 Why didn't es throw an error when I'm indexing docs with strings instead 
 of double..?

 On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:

 Did you apply your analyzer to your mapping?

 David

 Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com a écrit :


 http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear

 Posted in stack over flow as well..

 On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:

 I wanted a custom analyzer that behaves exactly like not_analyzed, 
 except that fields are case insensitive..

 I have my analyzer as below, 

 index: {
 analysis: {
 analyzer: { // Custom Analyzer with keyword tokenizer and 
 lowercase filter, same as not_analyzed but case insensitive
 case_insensitive_keyword_analyzer: {
 tokenizer: keyword,
 filter: lowercase
 }
 }
 }
 }

 But when I'm trying to do term aggregation over a field with strings 
 analyzed as above, I'm getting this error..

 {
 error 
 :ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
  cannot be cast to 
 org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket],
 status : 500
 }

 Are there additional settings that I have to update in my custom analyzer 
 for my terms aggregation to work..?


 The better question is I want a custom analyzer that does everything 
 similar to not_analyzed but is case insensitive.. How do I achieve that?




 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/46135e6f-6946-41bd-a562-557737192a07%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit

Re: Aggregations failing on fields with custom analyzer..

Im getting this error as well using your PUT requests..

It feels like I'm doing something wrong.. But I don't know what exactly..

I'm using this index template.. 
https://gist.github.com/syllogismos/c2dde4f097fea149e1a0

I didn't specify a particular mapping from my index but reindexed from a 
previous index.. and ended up with that mapping and documents that looks 
like above.. Am I seeing things and an obvious mistake? So lost right now..

On Thursday, February 19, 2015 at 2:23:10 PM UTC+5:30, David Pilato wrote:

 I think you are doing something wrong.

 DELETE index
 PUT index
 {
   mappings: {
 doc: {
   properties: {
 foo: {
   type: double
 }
   }
 }
   }
 }
 PUT index/doc/1
 {
   foo: bar
 }

 gives:

 {
error: MapperParsingException[failed to parse [foo]]; nested: 
 NumberFormatException[For input string: \bar\]; ,
status: 400
 }

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 http://Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs


  
 Le 19 févr. 2015 à 09:39, Anil Karaka anilk...@gmail.com javascript: 
 a écrit :

 _source : {
 Sort : ,
 gt : 2015-02-18T15:07:10,
 uid : 54867dc55b482b04da7f23d8,
 usId : 54867dc55b482b04da7f23d7,
 ut : 2015-02-18T20:37:10,
 act : productlisting,
 st : 2015-02-18T15:07:46,
 Filter : ,
 av : 3.0.0.0,
 ViewType : SmallSingleList,
 os : Windows,
 categoryid : home-kitchen-curtains-blinds
 }

 properties : {
 uid : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 ViewType : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 usId : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 os : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 Sort : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 Filter : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 categoryid : {
 type : double
 },
 gt : {
 format : dateOptionalTime,
 type : date
 },
 ut : {
 format : dateOptionalTime,
 type : date
 },
 st : {
 format : dateOptionalTime,
 type : date
 },
 act : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 av : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 }
 }


 A sample document and the index mappings above..


 On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:

 I don’t know without a concrete example.
 I’d say that if you map have a type number and you send 123 it could 
 work. 

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 http://elasticsearch.com/*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs


  
 Le 19 févr. 2015 à 09:30, Anil Karaka anilk...@gmail.com a écrit :

 It was my mistake, the field I was trying to do an aggregation was mapped 
 double, I assumed its a string, after seeing some sample documents with 
 strings..

 Why didn't es throw an error when I'm indexing docs with strings instead 
 of double..?

 On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:

 Did you apply your analyzer to your mapping?

 David

 Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com a écrit :


 http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear

 Posted in stack over flow as well..

 On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:

 I wanted a custom analyzer that behaves exactly like not_analyzed, except 
 that fields are case insensitive..

 I have my analyzer as below, 

 index: {
 analysis: {
 analyzer: { // Custom Analyzer with keyword tokenizer and 
 lowercase filter, same as not_analyzed but case insensitive
 case_insensitive_keyword_analyzer: {
 tokenizer: keyword,
 filter: lowercase
 }
 }
 }
 }

 But when I'm trying to do term aggregation over a field with strings analyzed 
 as above, I'm getting this error..

 {
 error 
 :ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
  cannot be cast to 
 org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket],
 status : 500
 }

 Are there additional settings that I have to update in my custom analyzer for 
 my terms aggregation to work..?


 The better question is I want a custom analyzer that does everything similar 
 to not_analyzed but is case insensitive.. How do I achieve that?




 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this

Re: Aggregations failing on fields with custom analyzer..

I understand what you are saying.. I was able to recreate the same error 
you showed myself..

I was not able to insert into your index whose mapping is double, but I 
am able to insert a string into my older index whose mapping is double.. 
Very weird..
But I don't know how you could recreate my case..

I'm using this index 
template, https://gist.github.com/syllogismos/c2dde4f097fea149e1a0 and then 
reindexed from an older index.. and it took the mapping as double, and has 
strings in the indexed documents later..

Thanks for your help..

On Thursday, February 19, 2015 at 2:34:14 PM UTC+5:30, David Pilato wrote:

 If you can provide a full example working as I did, we can try it and see 
 what is wrong.

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 http://Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs


  
 Le 19 févr. 2015 à 10:01, Anil Karaka anilk...@gmail.com javascript: 
 a écrit :

 Im getting this error as well using your PUT requests..

 It feels like I'm doing something wrong.. But I don't know what exactly..

 I'm using this index template.. 
 https://gist.github.com/syllogismos/c2dde4f097fea149e1a0

 I didn't specify a particular mapping from my index but reindexed from a 
 previous index.. and ended up with that mapping and documents that looks 
 like above.. Am I seeing things and an obvious mistake? So lost right now..

 On Thursday, February 19, 2015 at 2:23:10 PM UTC+5:30, David Pilato wrote:

 I think you are doing something wrong.

 DELETE index
 PUT index
 {
   mappings: {
 doc: {
   properties: {
 foo: {
   type: double
 }
   }
 }
   }
 }
 PUT index/doc/1
 {
   foo: bar
 }

 gives:

 {
error: MapperParsingException[failed to parse [foo]]; nested: 
 NumberFormatException[For input string: \bar\]; ,
status: 400
 }

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 http://elasticsearch.com/*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs


  
 Le 19 févr. 2015 à 09:39, Anil Karaka anilk...@gmail.com a écrit :

 _source : {
 Sort : ,
 gt : 2015-02-18T15:07:10,
 uid : 54867dc55b482b04da7f23d8,
 usId : 54867dc55b482b04da7f23d7,
 ut : 2015-02-18T20:37:10,
 act : productlisting,
 st : 2015-02-18T15:07:46,
 Filter : ,
 av : 3.0.0.0,
 ViewType : SmallSingleList,
 os : Windows,
 categoryid : home-kitchen-curtains-blinds
 }

 properties : {
 uid : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 ViewType : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 usId : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 os : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 Sort : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 Filter : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 categoryid : {
 type : double
 },
 gt : {
 format : dateOptionalTime,
 type : date
 },
 ut : {
 format : dateOptionalTime,
 type : date
 },
 st : {
 format : dateOptionalTime,
 type : date
 },
 act : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 av : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 }
 }


 A sample document and the index mappings above..


 On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:

 I don’t know without a concrete example.
 I’d say that if you map have a type number and you send 123 it could 
 work. 

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 http://elasticsearch.com/*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs


  
 Le 19 févr. 2015 à 09:30, Anil Karaka anilk...@gmail.com a écrit :

 It was my mistake, the field I was trying to do an aggregation was mapped 
 double, I assumed its a string, after seeing some sample documents with 
 strings..

 Why didn't es throw an error when I'm indexing docs with strings instead 
 of double..?

 On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:

 Did you apply your analyzer to your mapping?

 David

 Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com a écrit :


 http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear

 Posted in stack over flow as well..

 On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:

 I wanted a custom analyzer that behaves exactly like not_analyzed, except 
 that fields are case insensitive..

 I have my analyzer as below, 

 index: {
 analysis: {
 analyzer: { // Custom Analyzer with keyword tokenizer and 
 lowercase filter, same as not_analyzed but case insensitive

Re: ES OOMing and not triggering cache circuit breakers, using LocalManualCache

2015-02-19 Thread Wilfred Hughes

After some experimentation, I believe _cluster/stats shows the total field 
data across the whole cluster. I manged to push my test cluster to 198MiB 
field data cache usage.

As a result, based on Zachary's feedback, I've set the following values in 
my elasticsearch.yml:

indices.fielddata.cache.size: 15gb
indices.fielddata.cache.expire: 7d

On Thursday, 12 February 2015 15:15:32 UTC, Wilfred Hughes wrote:

 Oh, is field data per-node or total across the cluster? I grabbed a test 
 cluster with two data nodes, and I deliberately set fielddata really low:

 indices.fielddata.cache.size: 100mb

 However, after a few queries, I'm seeing more than 100MiB in use:

 $ curl http://localhost:9200/_cluster/stats?humanpretty;
 ...
   fielddata: {
 memory_size: 119.7mb,
 memory_size_in_bytes: 125543995,
 evictions: 0
   },

 Is this expected?

 On Wednesday, 11 February 2015 18:57:28 UTC, Zachary Tong wrote:

 LocalManualCache is a component of Guava's LRU cache 
 https://code.google.com/p/guava-libraries/source/browse/guava-gwt/src-super/com/google/common/cache/super/com/google/common/cache/CacheBuilder.java,
  
 which is used by Elasticsearch for both the filter and field data cache. 
  Based on your node stats, I'd agree it is the field data usage which is 
 causing your OOMs.  CircuitBreaker helps prevent OOM, but it works on a 
 per-request basis.  It's possible for individual requests to pass the CB 
 because they use small subsets of fields, but over-time the set of fields 
 loaded into Field Data continues to grow and you'll OOM anyway.

 I would prefer to set a field data limit, rather than an expiration.  A 
 hard limit prevents OOM because you don't allow the cache to grow anymore. 
  An expiration does not guarantee that, since you could get a burst of 
 activity that still fills up the heap and OOMs before the expiration can 
 work.

 -Z

 On Wednesday, February 11, 2015 at 12:50:45 PM UTC-5, Wilfred Hughes 
 wrote:

 After examining some other nodes that were using a lot of their heap, I 
 think this is actually field data cache:


 $ curl http://localhost:9200/_cluster/stats?humanpretty;
 ...
 fielddata: {
   memory_size: 21.3gb,
   memory_size_in_bytes: 22888612852,
   evictions: 0
 },
 filter_cache: {
   memory_size: 6.1gb,
   memory_size_in_bytes: 6650700423,
   evictions: 12214551
 },

 Since this is storing logstash data, I'm going to add the following 
 lines to my elasticsearch.yml and see if I observe a difference once 
 deployed to production.

 # Don't hold field data caches for more than a day, since data is
 # grouped by day and we quickly lose interest in historical data.
 indices.fielddata.cache.expire: 1d


 On Wednesday, 11 February 2015 16:29:22 UTC, Wilfred Hughes wrote:

 Hi all

 I have an ES 1.2.4 cluster which is occasionally running out of heap. I 
 have ES_HEAP_SIZE=31G and according to the heap dump generated, my biggest 
 memory users were:

 org.elasticsearch.common.cache.LocalCache$LocalManualCache 55%
 org.elasticsearch.indices.cache.filter.IndicesFilterCache 11%

 and nothing else used more than 1%.

 It's not clear to me what this cache is. I can't find any references to 
 ManualCache in the elasticsearch source code, and the docs: 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/index-modules-fielddata.html
  
 suggest to me that the circuit breakers should stop requests or reduce 
 cache usage rather that OOMing.

 At the moment my cache was filled up, the node was actually trying to 
 index some data:

 [2015-02-11 08:14:29,775][WARN ][index.translog   ] 
 [data-node-2] [logstash-2015.02.11][0] failed to flush shard on translog 
 threshold
 org.elasticsearch.index.engine.FlushFailedEngineException: 
 [logstash-2015.02.11][0] Flush failed
 at 
 org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:805)
 at 
 org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:604)
 at 
 org.elasticsearch.index.translog.TranslogService$TranslogBasedFlush$1.run(TranslogService.java:202)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.IllegalStateException: this writer hit an 
 OutOfMemoryError; cannot commit
 at 
 org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4416)
 at 
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2989)
 at 
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3096)
 at 
 org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3063)
 at 
 org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:797)
 ... 5 more
 [2015-02-11 08:14:29,812][DEBUG][action.bulk

elasticsearch-http-basic with ES 1.4.2

2015-02-19 Thread Eric

Has anyone had any luck with using http-basic with ES 1.4.2? I just want to
put some basic security on my ES instance from outside of the clusters and
this appears to be the easiest way with just white listing my other nodes.
When I install it and configure it, it shows it going to the http-basic
plugin but it always accepts the username/password from localhost even if I
put the wrong info in there. It also never prompts for username/password
from other IPs connecting to it.

Locally it shows this:

*[root@elasticsearch1 http-basic]# curl -v --user bob:wrongpassword
localhost:9200*
* About to connect() to localhost port 9200 (#0)
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 9200 (#0)
* Server auth using Basic with user 'bob'
GET / HTTP/1.1
Authorization: Basic Ym9iOnBhc3N3b3JkMTIzNTU1
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7
NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Host: localhost:9200
Accept: */*

HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 9

* Connection #0 to host localhost left intact
* Closing connection #0

From external sources it shows this in the logs.

[2015-02-19 14:56:29,816][INFO
][com.asquera.elasticsearch.plugins.http.HttpBasicServer] [elasticsearch1]
Authorization:null, Host:192.168.1.4:9200, Path:/, :null,
Request-IP:192.168.1.4, Client-IP:null, X-Client-IPnull

Re: elastic search on t2.micro (Amazon WS)

Depends on your dataset and use.
I don't go below 2GB heaps when I'm just testing things.

On 20 February 2015 at 05:52, Seung Chan Lim djs...@gmail.com wrote:

What's the minimum RAM requirement?

slim

On Wednesday, February 18, 2015 at 5:18:58 PM UTC-5, Mark Walkom wrote:

Your only real option here is to get a machine with more RAM.
Try spinning up a VM locally, on your desk/laptop.

On 19 February 2015 at 00:52, Seung Chan Lim djs...@gmail.com wrote:

I'm trying to see if I can get elastic search (ES) 1.3.8 working with
couchbase (CB) 3.0.2 on a t2.micro (Amazon WS)

t2.micro has 1 gig of RAM, which isn't a lot, but I'm only doing test
development on this with not a lot of documents (1000).

I just installed ES and followed the CB instructions to install the
plugin and set the XDCR to get the replication going from CB to ES.

I also configured ES to have 0 replication and 1 shard (hoping this
would help minimize RAM usage). But I'm still seeing behaviors from ES
where it locks up the server, making it unresponsive than eventually
complaining of lack of memory.

Is there something else I can do to get this working on a t2.micro?

I'm a complete newbie to ES, and any help would be great,

thank you

slim

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/40578480-09dd-4571-9108-b2675aa5ce1b%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/40578480-09dd-4571-9108-b2675aa5ce1b%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e6259e3d-2240-4a67-8385-17fc00d6dcbb%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e6259e3d-2240-4a67-8385-17fc00d6dcbb%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

[Spark] Unable to index JSON from HDFS using SchemaRDD.saveToES()

2015-02-19 Thread m shirley

This is my first real attempt at spark/scala so be gentle.

I have a file called test.json on HDFS that I'm trying to read and index 
using Spark.  I'm able to read the file via SQLContext.jsonFile() but when 
I try to use SchemaRDD.saveToEs() I get an invalid JSON fragment received 
error.  I'm thinking that the saveToES() function isn't actually formatting 
the output in json and instead is just sending the value field of the RDD.

What am I doing wrong?

Spark 1.2.0
Elasticsearch-hadoop 2.1.0.BUILD-20150217

test.json:
{key:value}

spark-shell:
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._

val input = 
sqlContext.jsonFile(hdfs://nameservice1/user/mshirley/test.json)

input.saveToEs(mshirley_spark_test/test)

error:
snip
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable 
error [Bad Request(400) - Invalid JSON fragment 
received[[value]][MapperParsingException[failed to parse]; n
ested: ElasticsearchParseException[Failed to derive xcontent from 
(offset=13, length=9): [123, 34, 105, 110, 100, 101, 120, 34, 58, 123, 125, 
125, 10, 91, 34, 118, 97, 108, 117, 101, 3
4, 93, 10]]; ]]; Bailing out..
snip

input:
res2: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[6] at RDD at SchemaRDD.scala:108
== Query Plan ==
== Physical Plan ==
PhysicalRDD [key#0], MappedRDD[5] at map at JsonRDD.scala:47

input.printSchema():
root
 |-- key: string (nullable = true)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bc6caa8f-b309-488c-8b1b-4cbef1e1c9fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch index name question

2015-02-19 Thread Silvana Vezzoli

I use a Monitoring Framework designed as a solution to monitor
heterogeneous networks and systems in terms of services (platforms,
applications for TELCO systems).

This framework collects in synchronous way the required data from several
devices, it stores them in a Mongo Data Base and then transfers all stored
collections from MongoDB to Elasticsearch via river-mongodb plugin.

We can have a huge amount of data stored in a single index of
Elasticsearch, for example, about 5.2 millions of documents can be
collected in a single MongoDB collection for only 8 hours of monitoring and
so the number of documents in a single index grows rapidly.

At present, I have installed on Centos 6.5 server an Elasticsearch Cluster
configuration with one node and five indices but only one index for all
synchronous data.

My problem is to be able to create different indices in Elasticsearch where
I can share the synchronous data, and so I would like to know if it is
possible to create an index name with a timestamp appended to it, like so
Logstash uses the timestamp from an event to derive the related
Elasticsearch index name.

Some idea, suggestion, help?

Re: elastic search cluster behind azure public load balncer

2015-02-19 Thread Subodh Patil

Ok Thanks Mark.
So you are saying I should set node.master: true as well as node.data:
true for all the 3 nodes in the cluster.
And this will hold true for 4 node or even more node cluster ?

As of now I am making data redundant on all 3 nodes but please suggest
performance...
azure Load balancer chances of failure are much lesser than VM.

The redundancy is required from backup point of view. So I will have 4
node cluster with data replicated on all 4 nodes.
3 nodes in the same region (say East US) and behind load balancer and 4th
node will be in different region (say west us) but NOT behind load balancer
and just holding the all data.
In case east us data center has problem then I can redirect all traffic to
west US data center where single node will always have all update data, or
even i can take backups from west us data center.

Thanks,
Subodh

On Thursday, February 19, 2015 at 7:10:52 AM UTC+5:30, Mark Walkom wrote:

Yes the master can serve requests.

You don't really want 2 masters and 1 data node though, make all 3
master+data to start with.
And sure the client can be a SPOF, but then isn't a single load balancer a
SPOF as well? So the question remains, where are you happy dealing with
these points, because at some point you cannot make *everything* redundant
without being excessive.

On 18 February 2015 at 21:36, Subodh Patil subod...@gmail.com
javascript: wrote:

i am trying to setup ES cluster behind azure load balancer /cloud
service. the cluster is 3 node with no specific data/client/master node
settings. By default 2 nodes are elected as master and 1 as data node.

As the request (create/update/search) from application comes to azure
load balancer on 9200 port which load balanced for all 3 vms the request
can go to any vm.

Will master node be able to serve the requests ?

Many article says that you don't need load balancer for ES cluster just
use client node but then it becomes single point of failure as azure vm can
go down any point of time. so load balancing is required mainly for high
availability from infrastructure point of view.
please suggest cluster setup and which nodes (data or client) to be put
behind load balancer.

ES version 1.4.1 on windows server 2012 r2 vm

https://groups.google.com/d/msgid/elasticsearch/47a28ee1-f216-480c-b148-c8e0d105a07c%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Script based transform during index

2015-02-19 Thread Demetris Lambrou

Hi Group
First apologies if this is not the right way to ask the below question but 
this is my first time.

I have some documents with source IP and destination IP address. I want to 
enhance these documents with geo info with script transform when they 
arrive. So I created a script in python and I resolve geo info for every 
destination ip and store in _source (from what I understand)

The template for the index is as below. Everything works fine however I 
have two issues. 

1. The field (which does not exist and I create it namely location) is 
not shown  in a search unless explicitly asked. 
2. Kibana 3 does not show this field or it shows as empty. 

The field location is there if I explicitly ask for it. Can you please let 
me know how I can have these added fields prior to index available as 
normal fields ?


Thanks in advance !

P.S inside the python script I update the below 
ctx['location'] = ip2geo(dest_ip)
ctx['_source']['location'] = ip2geo(dest_ip)

POST /geotest/gdoc/_search
{
query: {
match_all: {}
},
fields: [
src_ip,
dst_ip,
location -- This is the new field which I add via 
ctx['_source']['location'] = ip2geo(dest_ip)
]
}

My template

PUT /_template/geo

{
template: geo*,
mappings: {
gdoc: {
transform: {
lang: python,
script: python_ip2geo
},
_source: {
enabled: true
},
properties: {
src_ip: {
type: ip,
index: not_analyzed
},
dst_ip: {
type: ip,
index: not_analyzed
},
location: {
type: geo_point,
index: analyzed, -- does not need to be analyzed 
really
store: true,
doc_values: true,
null_value: 
}
}
}
}
}


Can someone explain a bit more on how transform fields are stored and how 
they can be indexed ?

Thanks in advance

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3e552ca9-f110-47f0-aae5-63ea8f2a89d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Date Histogram Bucket Count ?

2015-02-19 Thread neil . varnas

Is it possible to apply a filter to date histogram buckets ?
For example return only buckets that are below a certain value. I was 
looking to do it with scripts but dont know how to access buckets in 
script, so if anyone knows anything about that, it would be very very very 
helpful

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/57052d83-57ea-4037-b87d-708eb040d557%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch performance tuning

2015-02-19 Thread Deva Raj

Hi Mark Walkom,

I have given below logstash conf file

 
  Logstash conf

input {
   file {

  }

}

filter {
  mutate
  {
gsub = [message, \n,  ]
  }
 mutate
 {
gsub = [message, \t,  ]
 }
 multiline
   {
pattern = ^ 
what = previous
   }

grok { match = [ message, 
%{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\|
 %{GREEDYDATA:log_message}] 
 match = [ path , 
%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log]

 break_on_match = false
}


#To check location is S or L
  if [loccode] == S  or [loccode] == L {
 ruby {   
code =  temp = event['_machine'].split('_')
  if  !temp.nil? || !temp.empty?
  event['_machine'] = temp[0]
end
   } 
 }
 mutate {

add_field = [event_timestamp, %{@timestamp} ]
replace = [ log_time, %{logdate} %{log_time} ]
# Remove the 'logdate' field since we don't need it anymore.
   lowercase=[loccode]
   remove = logdate

  }
# to get all site details (site name, city and co-ordinates)
sitelocator{sitename = loccode  
datafile=vendor/sitelocator/SiteDetails.csv}
date {  locale=en
match = [ log_time, -MM-dd HH:mm:ss, MM-dd- 
HH:mm:ss.SSS,ISO8601 ] }

}

output {
elasticsearch{
 }

}



I have checked step by step to find bottleneck filter. Below filter which 
took much time. Can you guide me How can I tune it to get faster. 

date { locale=en match = [ log_time, -MM-dd HH:mm:ss, 
MM-dd- HH:mm:ss.SSS,ISO8601 ] } } 
http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558


Thanks
Devaraj

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch Script merge the results of two aggregations

2015-02-19 Thread ali balci



I use aggregrations on elasticsearch version 1.3.8. I use aggregation script 
for awhile today ıt didnt work. Please help ı cant find any solution : 

This the mapping:

  mappings: {
 product: {
properties: {
   brandId: {
  type: integer
   },
   brandIsActive: {
  type: boolean
   },
   brandLink: {
  type: string,
  index: not_analyzed
   },
   brandName: {
  type: string,
  index: not_analyzed
   }

  }

   }

}


this my query:


post alias-test/product/_search
{
query: {
match_all: {}
},
aggs: {
   Brand: {
 terms: {
 script: doc['brandName'].value,
 size: 0
  } 
} 
 }  
}

This is the error:

{
   error: SearchPhaseExecutionException[Failed to execute phase 
[query_fetch], all shards failed; shardFailures {[g][mizu-20150219142655][0]: 
RemoteTransportException[[Mammomax][inet[/172.31.37.148:9300]][search/phase/query+fetch]];
 nested: SearchParseException[[mizu-20150219142655][0]: 
query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse 
source 
[{\query\:{\match_all\:{}},\aggs\:{\Brand\:{\terms\:{\script\:\doc['brandName'].value\]]];
 nested: ExpressionScriptCompilationException[Field [brandName] used in 
expression must be numeric]; }],
   status: 400
}


The other query :

post test/product/_search
{
query: {
match_all: {}
},
aggs: {
   Brand: {
 terms: {
 script: doc['brandName'].value+'|'+doc['brandLink'].value,
 size: 0
  } 
} 
 }  
}


the error :

post test/product/_search
{
query: {
match_all: {}
},
aggs: {
   Brand: {
 terms: {
 script: doc['brandName'].value+'|'+doc['brandLink'].value,
 size: 0
  } 
} 
 }  
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1af39cb0-893d-4a7d-b9e1-a061eed48de6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How to index multiple database query data to a single index file in Elastic search.

2015-02-19 Thread Gurunath pai

Hi,

Want to index multiple database data to Elastic search, I have done the 
same in Solr using DataImportHandler and DeltaImportHandler. Is there any 
way to achieve the same in Elastic Search, also want to index all these 
multiple query output all at once. I have observed that we cant index 
multiple queries here in Elastic Search.

Thanks
Guru Pai.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f4d61ef-2116-4991-b93f-546898ff2462%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[ES-1.4.0] Snapshot Queue

2015-02-19 Thread Yarden Bar

Hi All,

Is there a way to queue snapshot invocations?

e.g.: snapshot_1 - snapshot_2 -  - snapshot_N


Thanks,
Yarden

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/33727f3a-f969-48fb-9c9b-fb360ccd9e08%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Script merge the results of two aggregations

2015-02-19 Thread ali balci

I second error :

{
   error: SearchPhaseExecutionException[Failed to execute phase
[query_fetch], all shards failed; shardFailures {[g][test][0]:
RemoteTransportException[[Mammomax][inet[/192.168.1.8:9300]][search/phase/query+fetch]];
nested: SearchParseException[[test][0]:
query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to
parse source 
[{\query\:{\match_all\:{}},\aggs\:{\Brand\:{\terms\:{\script\:\doc['brandName'].value+'|'+doc['brandLink'].value\,\size\:0]]];
nested: ExpressionScriptCompilationException[Failed to parse
expression: doc['brandName'].value+'|'+doc['brandLink'].value];
nested: ParseException[ unexpected character ''' at position (23).];
nested: NoViableAltException; }],
   status: 400
}


2015-02-19 14:49 GMT+02:00 ali balci balci.a...@gmail.com:


 I use aggregrations on elasticsearch version 1.3.8. I use aggregation script 
 for awhile today ıt didnt work. Please help ı cant find any solution :

 This the mapping:

   mappings: {
  product: {
 properties: {
brandId: {
   type: integer
},
brandIsActive: {
   type: boolean
},
brandLink: {
   type: string,
   index: not_analyzed
},
brandName: {
   type: string,
   index: not_analyzed
}

   }

}

 }


 this my query:


 post alias-test/product/_search
 {
 query: {
 match_all: {}
 },
 aggs: {
Brand: {
  terms: {
  script: doc['brandName'].value,
  size: 0
   }
 }
  }
 }

 This is the error:

 {
error: SearchPhaseExecutionException[Failed to execute phase 
 [query_fetch], all shards failed; shardFailures {[g][mizu-20150219142655][0]: 
 RemoteTransportException[[Mammomax][inet[/172.31.37.148:9300]][search/phase/query+fetch]];
  nested: SearchParseException[[mizu-20150219142655][0]: 
 query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse 
 source 
 [{\query\:{\match_all\:{}},\aggs\:{\Brand\:{\terms\:{\script\:\doc['brandName'].value\]]];
  nested: ExpressionScriptCompilationException[Field [brandName] used in 
 expression must be numeric]; }],
status: 400
 }


 The other query :

 post test/product/_search
 {
 query: {
 match_all: {}
 },
 aggs: {
Brand: {
  terms: {
  script: doc['brandName'].value+'|'+doc['brandLink'].value,
  size: 0
   }
 }
  }
 }


 the error :

 post test/product/_search
 {
 query: {
 match_all: {}
 },
 aggs: {
Brand: {
  terms: {
  script: doc['brandName'].value+'|'+doc['brandLink'].value,
  size: 0
   }
 }
  }
 }

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/aeNWfgYNVmA/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1af39cb0-893d-4a7d-b9e1-a061eed48de6%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/1af39cb0-893d-4a7d-b9e1-a061eed48de6%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Best Regards

ALİ BALCI
Bilgisayar Mühendisligi
Tel:0543 699 59 88
FACEBOOK http://www.facebook.com/alibalci.mail
BLOG http://balciali.wordpress.com/

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMLQj%3D%2B1k0DiPwXdtxYNMjcK%2BMeQmzSAdw-9D0nrVpP-mPU4nw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I want to implement QueryElevationComponent feature in ELasticSearch

2015-02-19 Thread Gurunath pai

HI All,

I want to implement QueryElevationComponent feature in ELasticSearch, Can 
anyone suggest me how can I go ahead with this feature in Elastic Search.

Thanks
Guru Pai.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/219ad580-8e19-418b-a775-79f61373141c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem with using uax_url_email

2015-02-19 Thread Marria

Hi,

for people having the same problem like me, here an answer I received from 
Pablo in PT group:

About your problem I beleive this is a constraint of the Apache Tika [1], 
which is used by the mapper-attachment plugin.
I believe that a search over Tika pdf limitations or a question on their 
list will help you more than we can.
Anyway, maybe you want to ask in the Elasticsearch main list [2], which is 
bigger than ours and has the Elasticsearch engineers.

I am sorry for not being able to help you that much.

Cheers,
Pablo

[1] http://tika.apache.org/
[2] elasti...@googlegroups.com


Le mercredi 18 février 2015 15:37:33 UTC+1, Marria a écrit :

 Hi everybody,

 I want to perform URL extraction from my PDF files. I use 
 mapper-attachment plugin to index my PDF files.

 In order to be able to perform some regex queries and extract all the urls 
 present in a pdf file, I useduax_url_email:

 curl -X PUT localhost:9200/test -d '{

settings : {

  index: {

analysis :{

  analyzer: {

default: {

  type : custom,

  tokenizer : uax_url_email,

  filter : [standard, lowercase, stop]

}

  }

}

  }

}

  }'



  and the map :

 curl -X PUT localhost:9200/test/attachment/_mapping -d '{

attachment : {

  properties : {

file : {

  type : attachment,

  fields : {

title : { store : yes },

file : { term_vector:with_positions_offsets, 
 store:yes }

  }

}

  }

}


 I indexed some PDF files, the problem is for a file , I get this (while 
 urls in this file start with http://):



 https://lh3.googleusercontent.com/-6uzhp-v0qFs/VOSfMU95byI/AUc/H4c6xvb54kg/s1600/Capture%2Bd’écran%2B2015-02-18%2Bà%2B15.17.19.png
 for another file, I got this (it leaves the http:// ):


 https://lh3.googleusercontent.com/-1rYIYWJJEbU/VOSfweFpgbI/AUk/bWzfst_uZUE/s1600/Capture%2Bd’écran%2B2015-02-18%2Bà%2B15.19.43.png
  But the problem is the urls are not recognized completely , look at this:


 https://lh3.googleusercontent.com/-vsKUj5I9MiA/VOSgtyS3yWI/AUw/64lgO4gYSdI/s1600/Capture%2Bd’écran%2B2015-02-18%2Bà%2B15.22.32.png

 Is it caused by the double column representation in the PDF file?


 https://lh4.googleusercontent.com/-c7n5-oMygRM/VOShm4hwnWI/AU4/CQNjTTctMnY/s1600/Capture%2Bd’écran%2B2015-02-18%2Bà%2B15.26.46.png

 So, what did I do wrong? how can I fix this and use regexp queries 
 successfully to extract all the URLs?


 Thank you





  
  

  

  






-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/12a4f452-6c6f-4e4f-ba00-97208efdbcba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregations failing on fields with custom analyzer..

Did you apply your analyzer to your mapping?

David

Le 19 févr. 2015 à 08:53, Anil Karaka anilkar...@gmail.com a écrit :

http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear

Posted in stack over flow as well..

On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
I wanted a custom analyzer that behaves exactly like not_analyzed, except
that fields are case insensitive..

I have my analyzer as below,

But when I'm trying to do term aggregation over a field with strings
analyzed as above, I'm getting this error..

Are there additional settings that I have to update in my custom analyzer
for my terms aggregation to work..?

The better question is I want a custom analyzer that does everything similar
to not_analyzed but is case insensitive.. How do I achieve that?

Re: Aggregations failing on fields with custom analyzer..

I think you are doing something wrong.

DELETE index
PUT index
{
  mappings: {
doc: {
  properties: {
foo: {
  type: double
}
  }
}
  }
}
PUT index/doc/1
{
  foo: bar
}

gives:

{
   error: MapperParsingException[failed to parse [foo]]; nested: 
NumberFormatException[For input string: \bar\]; ,
   status: 400
}

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
https://twitter.com/elasticsearchfr | @scrutmydocs 
https://twitter.com/scrutmydocs



 Le 19 févr. 2015 à 09:39, Anil Karaka anilkar...@gmail.com a écrit :
 
 _source : {
 Sort : ,
 gt : 2015-02-18T15:07:10,
 uid : 54867dc55b482b04da7f23d8,
 usId : 54867dc55b482b04da7f23d7,
 ut : 2015-02-18T20:37:10,
 act : productlisting,
 st : 2015-02-18T15:07:46,
 Filter : ,
 av : 3.0.0.0,
 ViewType : SmallSingleList,
 os : Windows,
 categoryid : home-kitchen-curtains-blinds
 }
 
 properties : {
 uid : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 ViewType : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 usId : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 os : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 Sort : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 Filter : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 categoryid : {
 type : double
 },
 gt : {
 format : dateOptionalTime,
 type : date
 },
 ut : {
 format : dateOptionalTime,
 type : date
 },
 st : {
 format : dateOptionalTime,
 type : date
 },
 act : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 },
 av : {
 analyzer : case_insensitive_keyword_analyzer,
 type : string
 }
 }
 
 
 A sample document and the index mappings above..
 
 
 On Thursday, February 19, 2015 at 2:03:11 PM UTC+5:30, David Pilato wrote:
 I don’t know without a concrete example.
 I’d say that if you map have a type number and you send 123 it could work. 
 
 -- 
 David Pilato | Technical Advocate | Elasticsearch.com 
 http://elasticsearch.com/
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs
 
 
 
 Le 19 févr. 2015 à 09:30, Anil Karaka anilk...@gmail.com javascript: a 
 écrit :
 
 It was my mistake, the field I was trying to do an aggregation was mapped 
 double, I assumed its a string, after seeing some sample documents with 
 strings..
 
 Why didn't es throw an error when I'm indexing docs with strings instead of 
 double..?
 
 On Thursday, February 19, 2015 at 1:35:08 PM UTC+5:30, David Pilato wrote:
 Did you apply your analyzer to your mapping?
 
 David
 
 Le 19 févr. 2015 à 08:53, Anil Karaka anilk...@gmail.com  a écrit :
 
 http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear
  
 http://stackoverflow.com/questions/28601082/terms-aggregation-failing-on-string-fields-with-a-custom-analyzer-in-elasticsear
 
 Posted in stack over flow as well..
 
 On Thursday, February 19, 2015 at 1:01:40 PM UTC+5:30, Anil Karaka wrote:
 I wanted a custom analyzer that behaves exactly like not_analyzed, except 
 that fields are case insensitive..
 
 I have my analyzer as below, 
 
 index: {
 analysis: {
 analyzer: { // Custom Analyzer with keyword tokenizer and 
 lowercase filter, same as not_analyzed but case insensitive
 case_insensitive_keyword_analyzer: {
 tokenizer: keyword,
 filter: lowercase
 }
 }
 }
 }
 
 But when I'm trying to do term aggregation over a field with strings 
 analyzed as above, I'm getting this error..
 
 {
 error 
 :ClassCastException[org.elasticsearch.search.aggregations.bucket.terms.DoubleTerms$Bucket
  cannot be cast to 
 org.elasticsearch.search.aggregations.bucket.terms.StringTerms$Bucket],
 status : 500
 }
 
 Are there additional settings that I have to update in my custom analyzer 
 for my terms aggregation to work..?
 
 
 The better question is I want a custom analyzer that does everything 
 similar to not_analyzed but is case insensitive.. How do I achieve that?
 
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com .
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/91eea272-2f5e-4d9a-b975-dae5d50cd0d3%40googlegroups.com?utm_medium=emailutm_source=footer.
 For more options, visit https://groups.google.com/d/optout 
 https://groups.google.com/d/optout.
 
 
 -- 
 You received this message because you are subscribed to the

disappearing log records: what can I do?

2015-02-19 Thread r . grosmann

hello group,

I have the following stack:
fluentd (td-agent 2.1.3), elasticsearch (1.4.4), kibana (3.1.2)
to manage the logging of an in-company application.

At first glance, this seems to work OK, but it appears that from time to
time, the reported records in kibana don't match the line count of the
logfiles.

Diving into this, it appears that when very large logfile are put in the
flluentd logdirectory, not all records show up in elasticsearch. This does
not show up in the logging of either fluentd or elasticsearch, so at first
glance, everyting seems fine. I started with looking at fluentd and managed
to get extra information, which seems to indicate that all of the log lines
are processed.

When comparing the wc -l of the logfile and the contents of ES, the
difference becomes visible:
ES:645551 wc:647506groot.log

Looking at the thread pool statistics with the REST api, ES reports 60
bulk.rejects.

Right now, I have a very simple configuration.
cluster.name: cwc-dev
index.number_of_replicas: 0
index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
path.data: /data/elasticsearch

I hope you can support me on how to tackle this, since I am quite new to
ES. So I don't know which ways are available to get extra information on
this.

thanks in advance, Ruud

Using 'elapsed' plugin with GrayLog2

2015-02-19 Thread rlautman

I am working with a team who are creating a proof of concept monitoring and 
analytics system using a combination of LogStash, MongoDB, Elasticsearch 
and GrayLog2. 

I was wondering if anyone has any experience using the elapsed plugin with 
this set up and can tell me what (if any) issues they encountered?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2d194879-4a2c-4201-ae88-0985a4a0d4b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

waited for 30s and no initial state was set by the discovery

I know other people have posted this, but I've tried everything that the 
other threads have said to try.  We keep getting this at startup of our 
java node client:

INFO  [2015-02-19 17:57:45,206] org.elasticsearch.node: [localhost] 
version[1.4.2], pid[11103], build[927caff/2014-12-16T14:11:12Z]
INFO  [2015-02-19 17:57:45,207] org.elasticsearch.node: [localhost] 
initializing ...
INFO  [2015-02-19 17:57:45,217] org.elasticsearch.plugins: [localhost] 
loaded [cloud-aws], sites []
INFO  [2015-02-19 17:57:47,625] org.elasticsearch.node: [localhost] 
initialized
INFO  [2015-02-19 17:57:47,625] org.elasticsearch.node: [localhost] 
starting ...
INFO  [2015-02-19 17:57:47,716] org.elasticsearch.transport: [localhost] 
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
{inet[/10.99.157.0:9300]}
INFO  [2015-02-19 17:57:49,747] org.elasticsearch.discovery: [localhost] 
elasticsearch-dev/EqKAxZm9SCutQGd-0_SonA
WARN  [2015-02-19 17:58:19,749] org.elasticsearch.discovery: [localhost] 
waited for 30s and no initial state was set by the discovery
INFO  [2015-02-19 17:58:19,761] org.elasticsearch.http: [localhost] 
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address 
{inet[/10.99.157.0:9200]}
INFO  [2015-02-19 17:58:19,761] org.elasticsearch.node: [localhost] started

This is the elasticsearch.yml on our master (and only data node right now):

plugin.mandatory: cloud-aws 
cloud: 
  aws: 
region: us-west-2 
access_key: ACCESS_KEY 
secret_key: SECRET_KEY

discovery: 
  type: ec2 
  ec2: 
groups: DevAll


And this is our java node client:

Settings settings = ImmutableSettings.settingsBuilder()
.put(node.name, nodeName)
.put(cloud.aws.access_key, awsAccessKey)
.put(cloud.aws.secret_key, awsSecretKey)
.put(cloud.node.auto_attributes, true)
.put(discovery.type, ec2)
.build();
this.node = nodeBuilder()
.clusterName(clusterName)
.settings(settings)
.client(true)
.node();
this.client = node.client();

Any help would be greatly appreciated!!!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/15f03a06-58a8-4ef3-9a73-93fc779dd6e1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Cluster hanging on node failure

2015-02-19 Thread Max Charas

I posted here too:
http://stackoverflow.com/questions/28601885/cluster-hanging-on-node-failure

Would love to get some help with this.

Best,
Max

Den onsdag 18 februari 2015 kl. 20:30:46 UTC+1 skrev Max Charas:

Hello all of you bright people,

We’re currently running a smallish 300 GB cluster in production on 5 nodes
with around 30 mil docs. Everything works flawlessly except when a node
really goes down (I mean like network/ HW failure/ kill -9).

When we lose a node the cluster becomes more or less completely
unresponsive for a few minutes. Both regarding indexing and querying. This
is of course, less than ideal as we have load 24/7.

I would really appreciate some help with understanding best practice
settings to have a robust cluster.

First goal for us is for the cluster to not become unresponsive in the
event of a node crash. After reading everything I could find on the web I
can't really understand if ES is designed to be unresponsive for
ping_retries*ping_timeout seconds or if the cluster will continue to server
query requests even during this time. Could anyone help me shed light on
this?

Secondly in the event of a even worse failure where the cluster goes into
red state, would it be possible to allow the cluster to still serve
read/query requests?

I would be ever so grateful for anyone willing to help me understand how
this works or what we would need to change to make our ES installation more
robust.

I’ve included our config here:

cluster.name: clustername

node.name: nodename

path.data: /index

node.master: true

node.data: true

discovery.zen.minimum_master_nodes: 3

discovery.zen.ping.multicast.enabled: false

discovery.zen.ping.multicast.ping.enabled: false

discovery.zen.ping.unicast.enabled: true

discovery.zen.ping.unicast.hosts: [host1,host2,host3]

bootstrap.mlockall: true

index.number_of_shards: 10

action.disable_delete_all_indices: true

marvel.agent.exporter.es.hosts: [marvel:9200]

Re: waited for 30s and no initial state was set by the discovery

Additional information - we can telnet into or ES server from our 
application server on 9300.

On Thursday, February 19, 2015 at 10:04:51 AM UTC-8, Diana Tuck wrote:

 I know other people have posted this, but I've tried everything that the 
 other threads have said to try.  We keep getting this at startup of our 
 java node client:

 INFO  [2015-02-19 17:57:45,206] org.elasticsearch.node: [localhost] 
 version[1.4.2], pid[11103], build[927caff/2014-12-16T14:11:12Z]
 INFO  [2015-02-19 17:57:45,207] org.elasticsearch.node: [localhost] 
 initializing ...
 INFO  [2015-02-19 17:57:45,217] org.elasticsearch.plugins: [localhost] 
 loaded [cloud-aws], sites []
 INFO  [2015-02-19 17:57:47,625] org.elasticsearch.node: [localhost] 
 initialized
 INFO  [2015-02-19 17:57:47,625] org.elasticsearch.node: [localhost] 
 starting ...
 INFO  [2015-02-19 17:57:47,716] org.elasticsearch.transport: [localhost] 
 bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
 10.99.157.0:9300]}
 INFO  [2015-02-19 17:57:49,747] org.elasticsearch.discovery: [localhost] 
 elasticsearch-dev/EqKAxZm9SCutQGd-0_SonA
 WARN  [2015-02-19 17:58:19,749] org.elasticsearch.discovery: [localhost] 
 waited for 30s and no initial state was set by the discovery
 INFO  [2015-02-19 17:58:19,761] org.elasticsearch.http: [localhost] 
 bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
 10.99.157.0:9200]}
 INFO  [2015-02-19 17:58:19,761] org.elasticsearch.node: [localhost] started

 This is the elasticsearch.yml on our master (and only data node right now):

 plugin.mandatory: cloud-aws 
 cloud: 
   aws: 
 region: us-west-2 
 access_key: ACCESS_KEY 
 secret_key: SECRET_KEY

 discovery: 
   type: ec2 
   ec2: 
 groups: DevAll


 And this is our java node client:

 Settings settings = ImmutableSettings.settingsBuilder()
 .put(node.name, nodeName)
 .put(cloud.aws.access_key, awsAccessKey)
 .put(cloud.aws.secret_key, awsSecretKey)
 .put(cloud.node.auto_attributes, true)
 .put(discovery.type, ec2)
 .build();
 this.node = nodeBuilder()
 .clusterName(clusterName)
 .settings(settings)
 .client(true)
 .node();
 this.client = node.client();

 Any help would be greatly appreciated!!!


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f57f1be2-cda0-42c2-9001-33fa29b1a111%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ClassNotFoundException: org.elasticsearch.discovery.ec2.Ec2DiscoveryModule

Thanks, David. I eventually ended up finding the pom on the github repo.
Thanks for adding the documentation!!

On Wednesday, February 18, 2015 at 10:39:19 PM UTC-8, David Pilato wrote:

Yes.

This should be added to the doc:
https://github.com/elasticsearch/elasticsearch-cloud-aws/issues/176

You need to add this dependency if you are using a NodeClient:

groupIdorg.elasticsearch/groupId artifactIdelasticsearch-cloud-aws/
artifactId
version2.4.1/version
HTH
David

Le 19 févr. 2015 à 01:15, Diana Tuck dtu...@gmail.com javascript: a
écrit :

New to ES - Trying to use the elasticsearch-cloud-aws plugin, but when
starting my java client node, I'm getting ClassNotFoundException
on org.elasticsearch.discovery.ec2.Ec2DiscoveryModule. Do I need to
install this plugin on java client nodes, and if so, how does one do that?
Or, rather, is there a maven dependency that can be referenced to load
these required classes?

For reference, the elasticsearch.yaml is:

plugin.mandatory: cloud-aws
cloud:
aws:
access_key: **
secret_key: *
discovery:
type: ec2

and my java client code is:

Settings settings = ImmutableSettings.settingsBuilder()
.put(node.name, nodeName)
.put(cloud.aws.access_key, awsAccessKey)
.put(cloud.aws.secret_key, awsSecretKey)
.put(cloud.node.auto_attributes, true)
.put(discovery.type, ec2)
.build();
this.node = nodeBuilder()
.clusterName(clusterName)
.settings(settings)
.client(true)
.node();
this.client = node.client();

https://groups.google.com/d/msgid/elasticsearch/925353fd-b717-417d-986f-570c634e39c1%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Re: elastic search on t2.micro (Amazon WS)

2015-02-19 Thread Seung Chan Lim

What's the minimum RAM requirement?

slim

On Wednesday, February 18, 2015 at 5:18:58 PM UTC-5, Mark Walkom wrote:

Your only real option here is to get a machine with more RAM.
Try spinning up a VM locally, on your desk/laptop.

On 19 February 2015 at 00:52, Seung Chan Lim djs...@gmail.com
javascript: wrote:

I'm trying to see if I can get elastic search (ES) 1.3.8 working with
couchbase (CB) 3.0.2 on a t2.micro (Amazon WS)

t2.micro has 1 gig of RAM, which isn't a lot, but I'm only doing test
development on this with not a lot of documents (1000).

I just installed ES and followed the CB instructions to install the
plugin and set the XDCR to get the replication going from CB to ES.

I also configured ES to have 0 replication and 1 shard (hoping this would
help minimize RAM usage). But I'm still seeing behaviors from ES where it
locks up the server, making it unresponsive than eventually complaining of
lack of memory.

Is there something else I can do to get this working on a t2.micro?

I'm a complete newbie to ES, and any help would be great,

thank you

slim

https://groups.google.com/d/msgid/elasticsearch/40578480-09dd-4571-9108-b2675aa5ce1b%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Multi Level nested search by using NEST API

2015-02-19 Thread Xinli Shang

I am having trouble to make multi level nested query by using NEST API.
Here is my mapping.

{
  log: {
mappings:{
  LogEvent: {
   properties:{

@timestamp:{type:date,store:true,format:-MM-dd'T'HH:mm:ss},
records:
{type:nested,properties:{

eventtype : {type:string,store:true},

detail:{type:string,store:true},

others:{

type:nested,properties:{

ScrubbedContent:{type:string,store:true},

RawContent:{type:string,store:true}

}

}
 }   } }
}
}

And here is the query that works

{
  from: 0,
  size: 1,
  query: {
filtered: {
  filter: {
and: {
  filters: [
{
  range: {
@timestamp: {
  gte: 2015-02-12T02:37:32,
  lte: 2015-02-19T02:37:32
}
  }
},
{
  nested: {
filter: {
  terms: {
records.eventtype: myeventtype
  }
},
path: records
  }
}
  ]
}
  }
}
  }
}

But if I change path: records to path: records.*others*, no result
returned. I am pretty sure I should have results for it.

Any thought why?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHsnNRjY8dLpzd260Pg6rkoAHVKvZWwC9OZS%3DUinpaoXTJU6Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: formula or guidelines to calculate/estimate the index size

There is nothing official.

Just create an index, put in 1 of your documents and then extrapolate.

On 20 February 2015 at 04:26, Gaurav gupta gupta.gaurav0...@gmail.com
wrote:

 Could anyone assist me to know how can I find a formula or guidelines to
 calculate/estimate the index size created by elastic search. I found that
 there is a formula (excel sheet) for it for LUCENE.

 Thanks!
 Gaurav

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALZAj3Knr%3DusYtW8KyJVqbNsS0cK5z1MY5F-EvP94tC1%3DjBCDA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CALZAj3Knr%3DusYtW8KyJVqbNsS0cK5z1MY5F-EvP94tC1%3DjBCDA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_tJ8fMjqqYerwWm9A0ui9%2BNP2kBOwm6LB-mtk94x9YMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

how to avoid MapperParsingException?

2015-02-19 Thread sebastian

Hi,

I'm indexing documents with the following structure:

{ name: peter, email: p...@p.com, location: MIA }
{ name: mary, email: m...@m.com, device: ipad }
{ name: mary, email: m...@m.com, metadata: { ... } }

As you can see, I only know the type for name and email fields. The 
location, device, metadata or whatever other field are dynamic fields.

So, in order to avoid a MapperParsingException, I want to persist all of 
the document fields, but ONLY mark as searchable the name and email 
fields.

Can I do that using mappings?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8bbe52be-59b4-4e78-8866-13716b3f862c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: elastic search on t2.micro (Amazon WS)

2015-02-19 Thread jrgns

What's your heap size (JVM setting) set to?

It needs to be at most one half the machine's RAM, so set it to 500MB.

J

On Wednesday, February 18, 2015 at 3:52:22 PM UTC+2, Seung Chan Lim wrote:

 I'm trying to see if I can get elastic search (ES) 1.3.8 working with 
 couchbase (CB) 3.0.2 on a t2.micro (Amazon WS)

 t2.micro has 1 gig of RAM, which isn't a lot, but I'm only doing test 
 development on this with not a lot of documents (1000).

 I just installed ES and followed the CB instructions to install the plugin 
 and set the XDCR to get the replication going from CB to ES.

 I also configured ES to have 0 replication and 1 shard (hoping this would 
 help minimize RAM usage). But I'm still seeing behaviors from ES where it 
 locks up the server, making it unresponsive than eventually complaining of 
 lack of memory.

 Is there something else I can do to get this working on a t2.micro? 

 I'm a complete newbie to ES, and any help would be great,

 thank you

 slim


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bc2e94e2-ebe5-4db3-badf-a24d47ba589f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch index name question

This is possible but not automatically within ES.
LS knows it needs to switch to a new index at UTC, you need to find a
way to get the river or some other code to do this.

On 19 February 2015 at 21:59, Silvana Vezzoli silvana.vezz...@gmail.com
wrote:

I use a Monitoring Framework designed as a solution to monitor
heterogeneous networks and systems in terms of services (platforms,
applications for TELCO systems).

At present, I have installed on Centos 6.5 server an Elasticsearch Cluster
configuration with one node and five indices but only one index for all
synchronous data.

My problem is to be able to create different indices in Elasticsearch
where I can share the synchronous data, and so I would like to know if it
is possible to create an index name with a timestamp appended to it, like
so Logstash uses the timestamp from an event to derive the related
Elasticsearch index name.

Some idea, suggestion, help?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/37bb3713-56d9-443c-b3a5-9056092b958d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/37bb3713-56d9-443c-b3a5-9056092b958d%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Move Data from One index to Another Cluster Index

2015-02-19 Thread Amay Patil

How can one move data from One index in a server to another Index in 
another Server.

I know reindexing moves data into another index in the same cluster. 

How can it be done in Python ?

Ap

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ecb23365-08a3-4b0e-8036-ed2f8b1fbc27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch performance tuning

Don't change cache and buffer sizes unless you know what is happening, the
defaults are going to be fine.
How much heap did you give ES?

I'm not sure you can do much about the date filter though, maybe someone
else has pointers.

On 19 February 2015 at 21:12, Deva Raj devarajcse...@gmail.com wrote:

 Hi Mark Walkom,

 I have given below logstash conf file


   Logstash conf

 input {
file {

   }

 }

 filter {
   mutate
   {
 gsub = [message, \n,  ]
   }
  mutate
  {
 gsub = [message, \t,  ]
  }
  multiline
{
 pattern = ^ 
 what = previous
}

 grok { match = [ message, 
 %{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\|
  %{GREEDYDATA:log_message}]
  match = [ path , 
 %{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log]

  break_on_match = false
 }


 #To check location is S or L
   if [loccode] == S  or [loccode] == L {
  ruby {
 code =  temp = event['_machine'].split('_')
   if  !temp.nil? || !temp.empty?
   event['_machine'] = temp[0]
 end
}
  }
  mutate {

 add_field = [event_timestamp, %{@timestamp} ]
 replace = [ log_time, %{logdate} %{log_time} ]
 # Remove the 'logdate' field since we don't need it anymore.
lowercase=[loccode]
remove = logdate

   }
 # to get all site details (site name, city and co-ordinates)
 sitelocator{sitename = loccode  
 datafile=vendor/sitelocator/SiteDetails.csv}
 date {  locale=en
 match = [ log_time, -MM-dd HH:mm:ss, MM-dd- 
 HH:mm:ss.SSS,ISO8601 ] }

 }

 output {
 elasticsearch{
  }

 }



 I have checked step by step to find bottleneck filter. Below filter which
 took much time. Can you guide me How can I tune it to get faster.

 date { locale=en match = [ log_time, -MM-dd HH:mm:ss,
 MM-dd- HH:mm:ss.SSS,ISO8601 ] } }
 http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558


 Thanks
 Devaraj

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-bB8%3DY0fd4HKcJ9Tw6OENwOTkMYo2muZs-Pd7-dt%2BA9w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Help with 4 node cluster

You can have 4 master eligable
if you want
, just set discovery.zen.minimum_master_nodes
to 3 to ensure a quorum.

But ideally as Christian mentioned, it's best to have an uneven number of
masters.

On 19 February 2015 at 16:51, christian.dahlqv...@elasticsearch.com wrote:

Hi,

You always want an odd number of master nodes (often 3), so I would
therefore recommend setting three of the four nodes to be master eligible
and leave the fourth as a pure data node. This will prevent the cluster
getting partitioned into two with equal number of master nodes on both
sides of the partition.

Best regards,

Christian

On Wednesday, February 18, 2015 at 11:39:54 AM UTC, sysads wrote:

I am in need of help on setting up a 4 node elasticsearch servers. I
have installed and configured ES on all 4 nodes but I am lost as to what
the configuration in elasticsearch.yml will be:

- if I want to have all 4 nodes both master and data
- make node A act as primary shard while node B acts as its replica then
node C as primary shard while node D as its own replica.

Thanks

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/956bd6e5-44fa-448e-93aa-9a7623972b42%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/956bd6e5-44fa-448e-93aa-9a7623972b42%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Disk awarnes on Indexing.

2015-02-19 Thread Prasanth R

Hi All,

   Few of our nodes filled disks up to 90%. Is there any option to avoid 
indexing on those disks instead of relocating shards. Because each shard 
contains roughly 600G data, so relocating is costly as ours is very busy 
cluster.

Thanks
Prasath Rajan

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f0eeb39f-de10-4db1-9db7-838c8e4d21c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

indexing- pls help

2015-02-19 Thread aiswarya lakshmi

Hi.. 

I am a beginner here.. I just installed elasticsearch on windows and i 
added the sense extension for google chrome.. 

I need to index some files that are present in my local system directory.. 

I did not install any extension other than sense.. 

Please tell me what more should i install and how do i add and index these 
files,, 

Thanks in advance
Aiswaryalakshmi K

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c3620d76-5918-41e4-8eb6-fe2baadc3e59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Combining Multiple Queries with 'OR' or 'AND'

2015-02-19 Thread Masaru Hasegawa

Hi,

bool query should work:
-
{
  query: {
bool: {
  should: [
{
  filtered: { // Query1
query: {...},
filter: {...}
  }
},
{
  filtered: { // Query2
query: {...},
filter: {...}
  }
}
  ]
}
  }
}
-


Masaru

On Fri, Feb 20, 2015 at 8:29 AM, Debashish Paul shima...@gmail.com wrote:

 Hi,
 Question

 I am trying to combine 2 user search queries using AND and OR as
 operation. I am aware of combining queries where we can merge filters, but
 I want to merge entire queries like { {BIG Elastic Query1} AND {BIG Elastic
 Query2} }.
 Details

 For instance say a user performs a search for batman in movies type with
 filters of Christian and Bale and another query Dark Knight in type
 tvshows with filters of Christopher Nolan. I want to combine both queries
 so I can look for both batman movies and Dark Knight tvshows, but not Dark
 knight movies or batman tvshows.

 For example, for the given queries
 I just want to run Query1 OR Query2 in the elasticsearch.
 Query 1:

 {
query: {
   filtered: {
  query: {
 query_string:{
query:Batman,
default_operator:AND,
fields:[
   Movies._all
]
 }
  },
  filter: {
 bool: {
must: [
   {
  query:{
 filtered:{
filter:{
   and:[
  {
 term:{
cast.firstName:Christian
 }
  },
  {
 term:{
cast.lastName:Bale
 }
  }
   ]
}
 }
  }
   }
]
 }
  }
   }
}
 }

 Query2:

 {
query: {
   filtered: {
  query: {
 query_string:{
query:Dark Knight,
default_operator:AND,
fields:[
   tvshows._all
]
 }
  },
  filter: {
 bool: {
must: [
   {
  query:{
 filtered:{
filter:{
   and:[
  {
 term:{
director.firstName:Christopher
 }
  },
  {
 term:{
director.lastName:Nolan
 }
  }
   ]
}
 }
  }
   }
]
 }
  }
   }
}
 }

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/be301214-1b6a-4065-8c8e-f1e5048e9873%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/be301214-1b6a-4065-8c8e-f1e5048e9873%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1GF4sQX_Pa7RRL2xOFz0gXs2r5-dtx55AsToxCgWtKcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: indexing- pls help

You could give a look at FSRiver plugin. Might help to start.

https://github.com/dadoonet/fsriver

David

 Le 20 févr. 2015 à 06:42, aiswarya lakshmi ais.k...@gmail.com a écrit :
 
 Hi.. 
 
 I am a beginner here.. I just installed elasticsearch on windows and i added 
 the sense extension for google chrome.. 
 
 I need to index some files that are present in my local system directory.. 
 
 I did not install any extension other than sense.. 
 
 Please tell me what more should i install and how do i add and index these 
 files,, 
 
 Thanks in advance
 Aiswaryalakshmi K
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/c3620d76-5918-41e4-8eb6-fe2baadc3e59%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/D5F364C7-F6D8-47BC-9BF7-45D1CB5B5372%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch performance tuning

2015-02-19 Thread Deva Raj


I listed below instance and his heap size  details.

Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network  

Java heap size: 2gb


R3 Large 15.25 RAM 2 cores Storage :32 GB SSD

Java heap size: 7gb


R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores

Java heap size: 15gb


Thanks
Devaraj

On Friday, February 20, 2015 at 4:15:12 AM UTC+5:30, Mark Walkom wrote:

 Don't change cache and buffer sizes unless you know what is happening, the 
 defaults are going to be fine.
 How much heap did you give ES?

 I'm not sure you can do much about the date filter though, maybe someone 
 else has pointers.

 On 19 February 2015 at 21:12, Deva Raj devara...@gmail.com javascript: 
 wrote:

 Hi Mark Walkom,

 I have given below logstash conf file

  
   Logstash conf

 input {
file {

   }

 }

 filter {
   mutate
   {
 gsub = [message, \n,  ]
   }
  mutate
  {
 gsub = [message, \t,  ]
  }
  multiline
{
 pattern = ^ 
 what = previous
}

 grok { match = [ message, 
 %{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\|
  %{GREEDYDATA:log_message}] 
  match = [ path , 
 %{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log]

  break_on_match = false
 }


 #To check location is S or L
   if [loccode] == S  or [loccode] == L {
  ruby {   
 code =  temp = event['_machine'].split('_')
   if  !temp.nil? || !temp.empty?
   event['_machine'] = temp[0]
 end
} 
  }
  mutate {

 add_field = [event_timestamp, %{@timestamp} ]
 replace = [ log_time, %{logdate} %{log_time} ]
 # Remove the 'logdate' field since we don't need it anymore.
lowercase=[loccode]
remove = logdate

   }
 # to get all site details (site name, city and co-ordinates)
 sitelocator{sitename = loccode  
 datafile=vendor/sitelocator/SiteDetails.csv}
 date {  locale=en
 match = [ log_time, -MM-dd HH:mm:ss, MM-dd- 
 HH:mm:ss.SSS,ISO8601 ] }

 }

 output {
 elasticsearch{
  }

 }



 I have checked step by step to find bottleneck filter. Below filter which 
 took much time. Can you guide me How can I tune it to get faster. 

 date { locale=en match = [ log_time, -MM-dd HH:mm:ss, 
 MM-dd- HH:mm:ss.SSS,ISO8601 ] } } 
 http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558


 Thanks
 Devaraj

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Script merge the results of two aggregations

2015-02-19 Thread Masaru Hasegawa

Hi,

Looks like you are using lucene expression [1]. See the link for the limitation 
of lucene expression. Today it only supports numeric values.
Since terms agg doesn’t have lang property, probably you have 
“script.default_lang set to “expression in elasticsearch.yml?

FYI, if you put “lang”:”groovy” (and if configuration allows running dynamic 
groovy script), your query should work.
But make sure you read release note [2] before turning on dynamic groovy 
scripting. (you can use groovy script without turning on dynamic scripting [3])


Masaru

[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_lucene_expressions_scripts
[2] http://www.elasticsearch.org/blog/elasticsearch-1-4-3-and-1-3-8-released/
[3] 
http://www.elasticsearch.org/blog/running-groovy-scripts-without-dynamic-scripting/


On February 19, 2015 at 22:21:41, ali balci (balci.a...@gmail.com) wrote:
 I second error :
  
 {
 error: SearchPhaseExecutionException[Failed to execute phase
 [query_fetch], all shards failed; shardFailures {[g][test][0]:
 RemoteTransportException[[Mammomax][inet[/192.168.1.8:9300]][search/phase/query+fetch]];
   
 nested: SearchParseException[[test][0]:
 query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to
 parse source 
 [{\query\:{\match_all\:{}},\aggs\:{\Brand\:{\terms\:{\script\:\doc['brandName'].value+'|'+doc['brandLink'].value\,\size\:0]]];
   
 nested: ExpressionScriptCompilationException[Failed to parse
 expression: doc['brandName'].value+'|'+doc['brandLink'].value];
 nested: ParseException[ unexpected character ''' at position (23).];
 nested: NoViableAltException; }],
 status: 400
 }
  
  
 2015-02-19 14:49 GMT+02:00 ali balci :
  
 
  I use aggregrations on elasticsearch version 1.3.8. I use aggregation 
  script for awhile  
 today ıt didnt work. Please help ı cant find any solution :
 
  This the mapping:
 
  mappings: {
  product: {
  properties: {
  brandId: {
  type: integer
  },
  brandIsActive: {
  type: boolean
  },
  brandLink: {
  type: string,
  index: not_analyzed
  },
  brandName: {
  type: string,
  index: not_analyzed
  }
 
  }
 
  }
 
  }
 
 
  this my query:
 
 
  post alias-test/product/_search
  {
  query: {
  match_all: {}
  },
  aggs: {
  Brand: {
  terms: {
  script: doc['brandName'].value,
  size: 0
  }
  }
  }
  }
 
  This is the error:
 
  {
  error: SearchPhaseExecutionException[Failed to execute phase 
  [query_fetch],  
 all shards failed; shardFailures {[g][mizu-20150219142655][0]: 
 RemoteTransportException[[Mammomax][inet[/172.31.37.148:9300]][search/phase/query+fetch]];
   
 nested: SearchParseException[[mizu-20150219142655][0]: 
 query[ConstantScore(*:*)],from[-1],size[-1]:  
 Parse Failure [Failed to parse source 
 [{\query\:{\match_all\:{}},\aggs\:{\Brand\:{\terms\:{\script\:\doc['brandName'].value\]]];
   
 nested: ExpressionScriptCompilationException[Field [brandName] used in 
 expression  
 must be numeric]; }],
  status: 400
  }
 
 
  The other query :
 
  post test/product/_search
  {
  query: {
  match_all: {}
  },
  aggs: {
  Brand: {
  terms: {
  script: doc['brandName'].value+'|'+doc['brandLink'].value,
  size: 0
  }
  }
  }
  }
 
 
  the error :
 
  post test/product/_search
  {
  query: {
  match_all: {}
  },
  aggs: {
  Brand: {
  terms: {
  script: doc['brandName'].value+'|'+doc['brandLink'].value,
  size: 0
  }
  }
  }
  }
 
  --
  You received this message because you are subscribed to a topic in the
  Google Groups elasticsearch group.
  To unsubscribe from this topic, visit
  https://groups.google.com/d/topic/elasticsearch/aeNWfgYNVmA/unsubscribe.  
  To unsubscribe from this group and all its topics, send an email to
  elasticsearch+unsubscr...@googlegroups.com.
  To view this discussion on the web visit
  https://groups.google.com/d/msgid/elasticsearch/1af39cb0-893d-4a7d-b9e1-a061eed48de6%40googlegroups.com

   
  .
  For more options, visit https://groups.google.com/d/optout.
 
  
  
  
 --
 Best Regards
  
 ALİ BALCI
 Bilgisayar Mühendisligi
 Tel:0543 699 59 88
 FACEBOOK  
 BLOG  
  
 --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch  
 group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.  
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/CAMLQj%3D%2B1k0DiPwXdtxYNMjcK%2BMeQmzSAdw-9D0nrVpP-mPU4nw%40mail.gmail.com.
   
 For more options, visit https://groups.google.com/d/optout.
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.54e6d79d.3a95f874.10ad%40citra.local.
For more options, visit https://groups.google.com/d/optout.

Re: Move Data from One index to Another Cluster Index

2015-02-19 Thread Honza Král

The python client has a reindex helper that can do just that - just
supply a client instance for the source and destination clusters.

http://elasticsearch-py.readthedocs.org/en/latest/helpers.html#elasticsearch.helpers.reindex

Hope this helps,
Honza

On Thu, Feb 19, 2015 at 11:44 PM, Amay Patil amaypati...@gmail.com wrote:
 How can one move data from One index in a server to another Index in another
 Server.

 I know reindexing moves data into another index in the same cluster.

 How can it be done in Python ?

 Ap

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/ecb23365-08a3-4b0e-8036-ed2f8b1fbc27%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDiopGtorHza49wP1MLPrnA4MD%3Dufd2KO0K6eyamE0AuUfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: What tools to use for indexing ES in production?

Kafka is fine, you could also use Logstash to read this data and send to ES
-
http://www.elasticsearch.org/guide/en/logstash/current/plugins-inputs-kafka.html
NB this is in the 1.5 BETA release, so use with caution.

Otherwise take a look at other Logstash inputs and see if there is
something suitable you can leverage, there is also a number of official
clients if you want to roll your own. There isn't an official method, but
those last two are pretty common.

On 19 February 2015 at 20:07, Kevin Liu ke...@ticketfly.com wrote:

On Monday, February 16, 2015 at 2:30:45 AM UTC-8, Mark Walkom wrote:

This depends on how the app is made and what options you have to extract
data from it.

On 16 February 2015 at 20:28, Kevin Liu ke...@ticketfly.com wrote:

We want to index our sales records as they come in from our apps. We are
using quartz job right now, which is really slow and not really real-time.

We will be implementing a message bus soon for firing sales event.

The process is:
read from a message queue
grab some extra data from mysql
do some ETL to construct the document
index it to ES.

I've been reading about apache spark, ES river, logstash.

My questions is what kind of tools are right for the job here?
is apache spark an over kill?
is DIY a better option here?
what are you guys using?

please advice and point me to the right thing to read.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4a11083b-4e9f-4be2-9a8e-83caf34d56b4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4a11083b-4e9f-4be2-9a8e-83caf34d56b4%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Re: I want to implement QueryElevationComponent feature in ELasticSearch

2015-02-19 Thread joergpra...@gmail.com

I have implemented query elevation by a function score-based conditional
boosting plugin, see

https://github.com/jprante/elasticsearch-functionscore-conditionalboost

Jörg

On Thu, Feb 19, 2015 at 2:40 PM, Gurunath pai pai.gurun...@gmail.com
wrote:

 HI All,

 I want to implement QueryElevationComponent feature in ELasticSearch, Can
 anyone suggest me how can I go ahead with this feature in Elastic Search.

 Thanks
 Guru Pai.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/219ad580-8e19-418b-a775-79f61373141c%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/219ad580-8e19-418b-a775-79f61373141c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEied9-E3j%2BEgdY-%3DJzdck%3DyE8xb3QgGjrpgETxQF4xew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

formula or guidelines to calculate/estimate the index size

2015-02-19 Thread Gaurav gupta

Could anyone assist me to know how can I find a formula or guidelines to
calculate/estimate the index size created by elastic search. I found that
there is a formula (excel sheet) for it for LUCENE.

Thanks!
Gaurav

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALZAj3Knr%3DusYtW8KyJVqbNsS0cK5z1MY5F-EvP94tC1%3DjBCDA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Hadoop][Spark] Exclude metadata fields from _source

2015-02-19 Thread Itai Yaffe

Thanks for the response Costin!
As you mentioned, option 1, i.e es.mapping.exclude, is more appropriate
when working with JSON.
Since it doesn't seem to work, I've followed your advice and raised a new
issue (https://github.com/elasticsearch/elasticsearch-hadoop/issues/381)
including a small test application to reproduce.
I'd be happy to hear what you think of it.

Thanks again,
Itai

On Wednesday, February 18, 2015 at 7:42:36 PM UTC+2, Costin Leau wrote:

Hi Itay,

Sorry I missed your email. I'm not clear from your post how your documents
look like - can you post a gist somewhere with your JSON input that you are
sending to Elasticsearch?
Typically the metadata appear in the _source if they are declared that
way. You should be able to go around this by using:
1. es.mapping.exclude - if it doesn't seem to be working
2. in case of Spark, by specifying the metadata through the `saveWithMeta`
methods which allows it to stay decoupled from the object itself.

Since you are using JSON likely 1 is your best shot. If it doesn't work
for you can you please raise an issue with a quick/small sample to be able
to reproduce it?

Thanks,

On Wed, Feb 18, 2015 at 10:27 AM, Itai Yaffe it...@exelate.com
javascript: wrote:

Hey,
Has anyone experienced with such an issue?
Perhaps Costin can help here?

Thanks!

On Thursday, February 12, 2015 at 8:27:14 AM UTC+2, Itai Yaffe wrote:

Hey,
I've recently started using Elasticsearch for Spark (Scala application).
I've added elasticsearch-spark_2.10 version 2.1.0.BUILD-SNAPSHOT to my
Spark application pom file, and used
org.apache.spark.rdd.RDD[String].saveJsonToEs()
to send documents to Elasticsearch.
When the documents are loaded to Elasticsearch, my metadata fields (e.g
id, index, etc.) are being loaded as part of the _source field.
Is there a way to exclude them from the _source?
I've tried using the new es.mapping.exclude configuration property
(added in this commit
https://github.com/elasticsearch/elasticsearch-hadoop/commit/aae4f0460a23bac9567ea2ad335c74245a1ba069

- that's why I needed to take the latest build rather than using version
2.1.0.Beta3), but it doesn't seem to have any affect (although I'm not sure
it's even possible to exclude fields I'm using for mapping, e.g
es.mapping.id).

A code snippet (I'm using a single-node Elasticsearch cluster for
testing purposes and running the Spark app from my desktop) :
val conf = new SparkConf()...
conf.set(es.index.auto.create, false)
conf.set(es.nodes.discovery, false)
conf.set(es.nodes, XXX:9200)
conf.set(es.update.script, XXX)
conf.set(es.update.script.params, param1:events)
conf.set(es.update.retry.on.conflict , 2)
conf.set(es.write.operation, upsert)
conf.set(es.input.json, true)
val documentsRdd = ...
documentsRdd.saveJsonToEs(test/user, scala.collection.Map(es.
mapping.id - _id, es.mapping.exclude - _id))

The JSON looks like that :
{
_id: ,
_type: user,
_index: test,
params: {
events: [
{
...
}
]
}

Thanks!
}

https://groups.google.com/d/msgid/elasticsearch/aea88dfb-8d4b-49d1-a236-8de6d513b4f6%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

Re: elastic search cluster behind azure public load balncer

Do not run ES across regions, it's a bad idea. Use snapshot and restore or
some other method to replicate data.

On 19 February 2015 at 20:51, Subodh Patil subodh00...@gmail.com wrote:

Ok Thanks Mark.
So you are saying I should set node.master: true as well as node.data:
true for all the 3 nodes in the cluster.
And this will hold true for 4 node or even more node cluster ?

As of now I am making data redundant on all 3 nodes but please suggest
performance...
azure Load balancer chances of failure are much lesser than VM.

The redundancy is required from backup point of view. So I will have 4
node cluster with data replicated on all 4 nodes.
3 nodes in the same region (say East US) and behind load balancer and
4th node will be in different region (say west us) but NOT behind load
balancer and just holding the all data.
In case east us data center has problem then I can redirect all traffic to
west US data center where single node will always have all update data, or
even i can take backups from west us data center.

Thanks,
Subodh

On Thursday, February 19, 2015 at 7:10:52 AM UTC+5:30, Mark Walkom wrote:

Yes the master can serve requests.

You don't really want 2 masters and 1 data node though, make all 3
master+data to start with.
And sure the client can be a SPOF, but then isn't a single load balancer
a SPOF as well? So the question remains, where are you happy dealing with
these points, because at some point you cannot make *everything* redundant
without being excessive.

On 18 February 2015 at 21:36, Subodh Patil subod...@gmail.com wrote:

As the request (create/update/search) from application comes to azure
load balancer on 9200 port which load balanced for all 3 vms the request
can go to any vm.

Will master node be able to serve the requests ?

ES version 1.4.1 on windows server 2012 r2 vm

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/47a28ee1-f216-480c-b148-c8e0d105a07c%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/47a28ee1-f216-480c-b148-c8e0d105a07c%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d961148e-13d5-444f-bb33-368c9a468078%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d961148e-13d5-444f-bb33-368c9a468078%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch Broken?

2015-02-19 Thread Cong Hui

Hello, 

I recently started attempting to use elasticsearch to store some data. 
In the learning process, I was doing ok until I attempted using the bulk 
api. 
I used the something along the lines of: 

conn = ES(url, timeout, bulksize) 

for each (tuple) 
   data = something(tuple) 
   conn.index(data, index name, count, bulk=true) 

Which I imagined would add a large number of items to localhost:9200/(index 
name)/(type name)/(count), but it ended up adding a large number of garbage 
at localhost:9200/(index name). 
Not sure how to proceed, but knowing that I needed a reset, I went ahead 
and did a delete on localhost:9200/*, which deleted everything as expected. 
I thus attempted to begin again, with adding a index with mapping. 

curl -XPUT http://localhost:9200/(index name) -d mapping.json 

mapping.json: 
{ 
  settings: { 
analysis: { 
  analyzer: { 
ngram_analyzer: { 
  tokenizer: ngram_tokenizer 
} 
  }, 
  tokenizer: { 
ngram_tokenizer: { 
  type: nGram, 
  min_gram: 2, 
  max_gram: 3, 
  token_chars: [ 
letter, 
digit, 
symbol, 
punctuation, 
whitespace 
  ] 
} 
  } 
} 
  }, 
  mappings: { 
access: { 
  properties: { 
date: { 
  type: date, 
  format: -MM-dd, 
  analyzer: ngram_analyzer 
}, 
time: { 
  type: date, 
  format: HH:mm:ss, 
  analyzer: ngram_analyzer 
}, 
protocol: { 
  type: string, 
  analyzer: ngram_analyzer 
}, 
source ip: { 
  type: string, 
  analyzer: ngram_analyzer 
}, 
source port: { 
  type: integer 
}, 
country: { 
  type: string, 
  analyzer: ngram_analyzer 
}, 
organization: { 
  type: string, 
  analyzer: ngram_analyzer 
}, 
dest ip: { 
  type: string, 
  analyzer: ngram_analyzer 
}, 
dest port: { 
  type: integer 
} 
  } 
} 
  } 
} 

This is where the troubles began. While the above command created a node at 
(index name), it refused to populate the index with my settings. 

{ 
  darknet : { 
aliases : { }, 
mappings : { }, 
settings : { 
  index : { 
creation_date : 1424425712525, 
mapping : { 
  json :  
}, 
uuid : cgTEPkqnQJKejLPyHqVNYA, 
number_of_replicas : 1, 
number_of_shards : 5, 
version : { 
  created : 1040399 
} 
  } 
}, 
warmers : { } 
  } 
} 

I attempted to update the node with curl -XPUT http://localhost:9200/(index 
name)/_setting 
-d mapping.json after a close, but that also left the index blank. 
I delete the index, create another index, etc. etc. but to no avail. 

In the end, I manage to make a change to the index, though not a good one. 
I made a call to the update mapping API, which changed the index into a 
horrifying monstrosity with repeating tags, like so: 

{ 
  darknet : { 
aliases : { }, 
mappings : { 
  1 : { 
properties : { 
  mappings : { 
properties : { 
  access : { 
properties : { 
  properties : { 
properties : { 
  country : { 
properties : { 
  analyzer : { 
type : string 
  }, 
  type : { 
type : string 
  } 
} 
  }, 
  date : { 
properties : { 
  analyzer : { 
type : string 
  }, 
  format : { 
type : string 
  }, 
  type : { 
type : string 
  } 
} 
  }, 
  dest ip : { 
properties : { 
  analyzer : { 
type : string 
  }, 
  type : { 
type : string 
  } 
} 
  }, 
  dest port : { 
properties : { 
  type : { 
type : string 
  } 
} 
  }, 
  organization : { 
properties : { 
  analyzer : { 
type : string

Incorrect results from geo_shape filter

2015-02-19 Thread Alexei Peters

Hey Everyone,

I have a nested document that includes a spatial component that I'm using 
in a spatial query via the geo_shape filter.
I've noticed that the filter consistently returns a result that includes a 
geometry that clearly falls outside the spatial filter.
I've tested the spatial intersection in PostGIS and that correctly returns 
back no result from the same query.

Can anyone else verify this?  We have lots of other MultiPolygon data that 
seems to work just fine.

I'm using the REST api against ES v1.4.1

*Here's the document I'm testing against:*

{dates: [],
geometries: [{
  value: {
type: MultiPolygon,
coordinates: [
  [
[
  [-118.32608,
  34.07035],
  [-118.32657,
  34.07035],
  [-118.32657,
  34.07054],
  [-118.32608,
  34.07054],
  [-118.32608,
  34.07035]
]
  ],
  [
[
  [-118.32608,
  34.07021],
  [-118.32608,
  34.07004],
  [-118.32657,
  34.07004],
  [-118.32657,
  34.07021],
  [-118.32608,
  34.07021]
]
  ]
]
  },
  label: MULTIPOLYGON (((-118.32608 34.07035,-118.32657 
34.07035,-118.32657 34.07054,-118.32608 34.07054,-118.32608 
34.07035)),((-118.32608 34.07021,-118.32608 34.07004,-118.32657 
34.07004,-118.32657 34.07021,-118.32608 34.07021))),
  child_entities: [],
  entitytypeid: SPATIAL_COORDINATES_GEOMETRY.E47,
  parentid: 80446382-0e96-4db1-8a37-95aca166b785,
  entityid: 81b06621-2989-4825-98f3-88ee2ac937b6,
  property: P87,
  businesstablename: geometries
}],
child_entities: [{
  value: Windsor Square District Non-Contributor,
  label: Windsor Square District Non-Contributor,
  child_entities: [],
  entitytypeid: NAME.E41,
  parentid: 5fbe787f-8fc3-4986-b77a-38828699b68c,
  entityid: 62a82362-b7c3-4ad2-ba58-29bab7dc5dbb,
  property: P1,
  businesstablename: strings
}, {
  value: ,
  label: ,
  child_entities: [],
  entitytypeid: PLACE.E53,
  parentid: 5fbe787f-8fc3-4986-b77a-38828699b68c,
  entityid: 80446382-0e96-4db1-8a37-95aca166b785,
  property: P53,
  businesstablename: 
}],
label: ,
date_groups: [],
primaryname: Windsor Square District Non-Contributor,
value: ,
entitytypeid: HERITAGE_RESOURCE.E18,
domains: [{
  conceptid: a5675b84-fed4-4839-9afa-434be64c3899,
  child_entities: [],
  label: Primary,
  value: b171b37b-4c78-4b51-91a9-503541511be4,
  entitytypeid: NAME_TYPE.E55,
  parentid: 62a82362-b7c3-4ad2-ba58-29bab7dc5dbb,
  entityid: 4bd86b01-a047-4e7a-953a-00d2243e39db,
  property: P2,
  businesstablename: domains
}],
entityid: 5fbe787f-8fc3-4986-b77a-38828699b68c,
property: ,
businesstablename: }


*Here's the DSL:*

{
  query: {
filtered: {
  filter: {
and: [
  {
bool: {
  should: [],
  must_not: [],
  must: [
{
  nested: {
path: geometries,
query: {
  geo_shape: {
geometries.value: {
  shape: {
type: Point,
coordinates: [
  -118.34194465228431,
  34.06402964781424
]
  }
}
  }
}
  }
}
  ]
}
  }
]
  },
  query: {
match_all: {}
  }
}
  },
  from: 0,
  size: 5
} 



*Here's the mapping for the index:*

  properties: {
businesstablename: {
  index: not_analyzed,
  type: string
},
child_entities: {
  type: nested,
  properties: {
businesstablename: {
  index: not_analyzed,
  type: string
},
entitytypeid: {
  index: not_analyzed,
  type: string
},
property: {
  index: not_analyzed,
  type: string
},
entityid: {
  index: not_analyzed,
  type: string
},
label: {
  index: not_analyzed,
  type: string
},
value: {
  type: string,
  fields: {
raw: {
  index: not_analyzed,
  type: string
},
folded: {
  analyzer: folding,
  type: string
}
  }
},
parentid: {
  index: not_analyzed,
  type: string
}
  }
},
entitytypeid: {
  index: not_analyzed,
  type: string
},
domains: {
  type: nested,
  properties: {
businesstablename: {
  index: not_analyzed,
  type: string
},
entitytypeid: {
  index: not_analyzed,
  type: string
},
property: {
  index: not_analyzed,
  type: string

Re: Incorrect results from geo_shape filter

2015-02-19 Thread Alexei Peters

FYI, you can also try a search using these coordinates (which are clearly 
incorrect) in your filter, and it still returns a result.

 coordinates: [ -119, 35 ]



On Thursday, February 19, 2015 at 3:04:29 PM UTC-8, Alexei Peters wrote:

 Hey Everyone,

 I have a nested document that includes a spatial component that I'm using 
 in a spatial query via the geo_shape filter.
 I've noticed that the filter consistently returns a result that includes a 
 geometry that clearly falls outside the spatial filter.
 I've tested the spatial intersection in PostGIS and that correctly returns 
 back no result from the same query.

 Can anyone else verify this?  We have lots of other MultiPolygon data that 
 seems to work just fine.

 I'm using the REST api against ES v1.4.1

 *Here's the document I'm testing against:*

 {dates: [],
 geometries: [{
   value: {
 type: MultiPolygon,
 coordinates: [
   [
 [
   [-118.32608,
   34.07035],
   [-118.32657,
   34.07035],
   [-118.32657,
   34.07054],
   [-118.32608,
   34.07054],
   [-118.32608,
   34.07035]
 ]
   ],
   [
 [
   [-118.32608,
   34.07021],
   [-118.32608,
   34.07004],
   [-118.32657,
   34.07004],
   [-118.32657,
   34.07021],
   [-118.32608,
   34.07021]
 ]
   ]
 ]
   },
   label: MULTIPOLYGON (((-118.32608 34.07035,-118.32657 
 34.07035,-118.32657 34.07054,-118.32608 34.07054,-118.32608 
 34.07035)),((-118.32608 34.07021,-118.32608 34.07004,-118.32657 
 34.07004,-118.32657 34.07021,-118.32608 34.07021))),
   child_entities: [],
   entitytypeid: SPATIAL_COORDINATES_GEOMETRY.E47,
   parentid: 80446382-0e96-4db1-8a37-95aca166b785,
   entityid: 81b06621-2989-4825-98f3-88ee2ac937b6,
   property: P87,
   businesstablename: geometries
 }],
 child_entities: [{
   value: Windsor Square District Non-Contributor,
   label: Windsor Square District Non-Contributor,
   child_entities: [],
   entitytypeid: NAME.E41,
   parentid: 5fbe787f-8fc3-4986-b77a-38828699b68c,
   entityid: 62a82362-b7c3-4ad2-ba58-29bab7dc5dbb,
   property: P1,
   businesstablename: strings
 }, {
   value: ,
   label: ,
   child_entities: [],
   entitytypeid: PLACE.E53,
   parentid: 5fbe787f-8fc3-4986-b77a-38828699b68c,
   entityid: 80446382-0e96-4db1-8a37-95aca166b785,
   property: P53,
   businesstablename: 
 }],
 label: ,
 date_groups: [],
 primaryname: Windsor Square District Non-Contributor,
 value: ,
 entitytypeid: HERITAGE_RESOURCE.E18,
 domains: [{
   conceptid: a5675b84-fed4-4839-9afa-434be64c3899,
   child_entities: [],
   label: Primary,
   value: b171b37b-4c78-4b51-91a9-503541511be4,
   entitytypeid: NAME_TYPE.E55,
   parentid: 62a82362-b7c3-4ad2-ba58-29bab7dc5dbb,
   entityid: 4bd86b01-a047-4e7a-953a-00d2243e39db,
   property: P2,
   businesstablename: domains
 }],
 entityid: 5fbe787f-8fc3-4986-b77a-38828699b68c,
 property: ,
 businesstablename: }


 *Here's the DSL:*

 {
   query: {
 filtered: {
   filter: {
 and: [
   {
 bool: {
   should: [],
   must_not: [],
   must: [
 {
   nested: {
 path: geometries,
 query: {
   geo_shape: {
 geometries.value: {
   shape: {
 type: Point,
 coordinates: [
   -118.34194465228431,
   34.06402964781424
 ]
   }
 }
   }
 }
   }
 }
   ]
 }
   }
 ]
   },
   query: {
 match_all: {}
   }
 }
   },
   from: 0,
   size: 5
 } 



 *Here's the mapping for the index:*

   properties: {
 businesstablename: {
   index: not_analyzed,
   type: string
 },
 child_entities: {
   type: nested,
   properties: {
 businesstablename: {
   index: not_analyzed,
   type: string
 },
 entitytypeid: {
   index: not_analyzed,
   type: string
 },
 property: {
   index: not_analyzed,
   type: string
 },
 entityid: {
   index: not_analyzed,
   type: string
 },
 label: {
   index: not_analyzed,
   type: string
 },
 value: {
   type: string,
   fields: {
 raw: {
   index: not_analyzed,
   type: string
 },
 folded: {
   analyzer: folding,
   type: string
 }
   }
 },
 parentid: {
   index: not_analyzed,

Java node client failing to send join request to master