Term Filter by document

2014-03-14 Thread Allon
When I read the case for using a term filter from a field of a document, it 
made perfect sense that it would be completely unwieldy to have to fetch 
over the wire, then submit back over the wire what could potentially be 
tens or hundreds of thousands of values.  But when I read the solution that 
can only point to the values stored within a single document it didn't make 
sense to me how you could create or manage a document of that size to begin 
with?

Effectively, if you want to follow the tweets of your followers, all of 
your followers need to be stored on the same user document.  Eventually 
your list of followers could grow quite big but to add or remove followers 
you need to get and update the document with potentially hundreds of 
thousands of followers over the wire anyway.  Right?  So I don't understand 
the value.

Could someone explain it to me because I would really like to filter 
documents that relate to a large set of changing values.  It seems like 
this filter could only be used against types that don't change frequently.

Thanks,

-Allon

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a24fafe3-cb3e-43ef-af86-f0e06d308f9a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES 1.0.1 snapshot creation fails

2014-03-14 Thread David Pilato
The user who is running elasticsearch needs to have full control of your 
/data/backup_es

Is it the case?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 14 mars 2014 à 06:48, Yatish Teddla yat...@buzzinga.io a écrit :

Hi Everyone,

For backup directory used a mounted directory from other machine using sshfs. 
For that directory changed the owner and group to elasticsearch.

When iam trying to create a snapshot getting below error.

{error:SnapshotCreationException[[backup:snapshot_6] failed to create 
snapshot]; nested: FileNotFoundException[/data/backup_es/snapshot-snapshot_6 
(Permission denied)]; ,status:500}

Any idea why iam getting this error or how can i use the mounted directory as a 
backup space?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0655a68a-9bcc-428e-b9e1-04e1ac1b9e4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/A956B35A-090B-48A0-8045-6EA035B20386%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: ES 1.0.1 snapshot creation fails

2014-03-14 Thread Yatish Teddla
Hi David,

I have given full access permissions to that direcory(given 777 permissons
to it) .
But still iam getting that error.


On Fri, Mar 14, 2014 at 11:37 AM, David Pilato da...@pilato.fr wrote:

 The user who is running elasticsearch needs to have full control of your
 /data/backup_es

 Is it the case?

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 14 mars 2014 à 06:48, Yatish Teddla yat...@buzzinga.io a écrit :

 Hi Everyone,

 For backup directory used a mounted directory from other machine using
 sshfs. For that directory changed the owner and group to elasticsearch.

 When iam trying to create a snapshot getting below error.

 {error:SnapshotCreationException[[backup:snapshot_6] failed to create
 snapshot]; nested:
 FileNotFoundException[/data/backup_es/snapshot-snapshot_6 (Permission
 denied)]; ,status:500}

 Any idea why iam getting this error or how can i use the mounted directory
 as a backup space?

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/0655a68a-9bcc-428e-b9e1-04e1ac1b9e4e%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0655a68a-9bcc-428e-b9e1-04e1ac1b9e4e%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

 --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/rGXD2WErqp0/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/A956B35A-090B-48A0-8045-6EA035B20386%40pilato.frhttps://groups.google.com/d/msgid/elasticsearch/A956B35A-090B-48A0-8045-6EA035B20386%40pilato.fr?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFF7EL%2Bof4wpTDumhL0YkZwYtY42E%2B%3DP4qFeu1aAOOAt3eLq0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Adding new nodes to cluster with unicast without restarting

2014-03-14 Thread Hari Prasad
Hi
Can you please explain the following 
I first started one node with only its ip in unicast host list and 
multicast discovery false. The cluster started with one node.
Then I started another node with its unicast host value of node one. This 
also joined the cluster.

This is in sync with what you said. But if this is the case the what the 
use for setting the multicast as false and setting the unicast host list.
Cant any node join the cluster just like that by setting the unicast to any 
node in the cluster. how to limit this.

On Thursday, 13 March 2014 19:46:08 UTC+5:30, Hari Prasad wrote:

 Ok Thank you :)

 On Thursday, 13 March 2014 19:35:52 UTC+5:30, David Pilato wrote:

 yes

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 13 mars 2014 à 15:00:10, Hari Prasad (iamha...@gmail.com) a écrit:

 Is this the case the even if discovery.zen.ping.multicast.enabled is 
 false?

 On Thursday, 13 March 2014 19:17:19 UTC+5:30, David Pilato wrote: 

  Yes. Just launch the new node and set its unicast values to other 
 running nodes.
 It will connect to the cluster and the cluster will add him as a new 
 node.

 You don't have to modify existing settings, although you should do it to 
 have updated settings in case of restart.

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

  
 Le 13 mars 2014 à 14:38, Hari Prasad iamha...@gmail.com a écrit :

  Hi 
 I am having an elasticsearch cluster. I am using the unicast to discover 
 nodes. Can I add nodes to list dynamically without restarting the cluster? 
 I tried to do this with the prepareUpdateSettings but i got ignoring 
 transient setting [discovery.zen.ping.unicast.hosts], not dynamically 
 updateable.
 Is there any other way to do this without restarting the cluster.

 I am not going for multicast because i don't want rouge nodes to join my 
 cluster. I can go for it if i can, in anyway, limit what all nodes join the 
 cluster, other than the cluster name.
 Are there any ways to do this.

 Thanks 
 Hari
  --
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/127e4ba6-3fab-4fca-8595-e4506ef8f101%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/127e4ba6-3fab-4fca-8595-e4506ef8f101%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.
  
  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a5a2de96-f308-495a-8de4-07feca421759%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a5a2de96-f308-495a-8de4-07feca421759%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7af0beb0-880c-4df4-b182-eb026210959b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES 1.0.1 snapshot creation fails

2014-03-14 Thread David Pilato
can you run mkdir /data/backup_es/snapshot-snapshot_6 from es machine using 
elasticsearch user?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 14 mars 2014 à 07:10, Yatish Teddla yat...@buzzinga.io a écrit :

Hi David,

I have given full access permissions to that direcory(given 777 permissons to 
it) .
But still iam getting that error.


 On Fri, Mar 14, 2014 at 11:37 AM, David Pilato da...@pilato.fr wrote:
 The user who is running elasticsearch needs to have full control of your 
 /data/backup_es
 
 Is it the case?
 
 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
 
 
 Le 14 mars 2014 à 06:48, Yatish Teddla yat...@buzzinga.io a écrit :
 
 Hi Everyone,
 
 For backup directory used a mounted directory from other machine using sshfs. 
 For that directory changed the owner and group to elasticsearch.
 
 When iam trying to create a snapshot getting below error.
 
 {error:SnapshotCreationException[[backup:snapshot_6] failed to create 
 snapshot]; nested: FileNotFoundException[/data/backup_es/snapshot-snapshot_6 
 (Permission denied)]; ,status:500}
 
 Any idea why iam getting this error or how can i use the mounted directory as 
 a backup space?
 
 
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/0655a68a-9bcc-428e-b9e1-04e1ac1b9e4e%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 -- 
 You received this message because you are subscribed to a topic in the Google 
 Groups elasticsearch group.
 To unsubscribe from this topic, visit 
 https://groups.google.com/d/topic/elasticsearch/rGXD2WErqp0/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to 
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/A956B35A-090B-48A0-8045-6EA035B20386%40pilato.fr.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFF7EL%2Bof4wpTDumhL0YkZwYtY42E%2B%3DP4qFeu1aAOOAt3eLq0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/B9C89C92-D19E-47EF-9A30-A50D50E0EABA%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Adding new nodes to cluster with unicast without restarting

2014-03-14 Thread David Pilato
If the cluster name is different, the new node won't join the cluster.
Also, if you have a security concern, you should restrict network access to 
your nodes on a transport layer (9300). 



--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 14 mars 2014 à 07:11, Hari Prasad iamhari1...@gmail.com a écrit :

Hi
Can you please explain the following 
I first started one node with only its ip in unicast host list and multicast 
discovery false. The cluster started with one node.
Then I started another node with its unicast host value of node one. This also 
joined the cluster.

This is in sync with what you said. But if this is the case the what the use 
for setting the multicast as false and setting the unicast host list.
Cant any node join the cluster just like that by setting the unicast to any 
node in the cluster. how to limit this.

 On Thursday, 13 March 2014 19:46:08 UTC+5:30, Hari Prasad wrote:
 Ok Thank you :)
 
 On Thursday, 13 March 2014 19:35:52 UTC+5:30, David Pilato wrote:
 yes
 
 -- 
 David Pilato | Technical Advocate | Elasticsearch.com
 @dadoonet | @elasticsearchfr
 
 
 Le 13 mars 2014 à 15:00:10, Hari Prasad (iamha...@gmail.com) a écrit:
 
 Is this the case the even if discovery.zen.ping.multicast.enabled is false?
 
 On Thursday, 13 March 2014 19:17:19 UTC+5:30, David Pilato wrote:
 Yes. Just launch the new node and set its unicast values to other running 
 nodes.
 It will connect to the cluster and the cluster will add him as a new node.
 
 You don't have to modify existing settings, although you should do it to 
 have updated settings in case of restart.
 
 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
 
 
 Le 13 mars 2014 à 14:38, Hari Prasad iamha...@gmail.com a écrit :
 
 Hi
 I am having an elasticsearch cluster. I am using the unicast to discover 
 nodes. Can I add nodes to list dynamically without restarting the cluster? 
 I tried to do this with the prepareUpdateSettings but i got ignoring 
 transient setting [discovery.zen.ping.unicast.hosts], not dynamically 
 updateable.
 Is there any other way to do this without restarting the cluster.
 
 I am not going for multicast because i don't want rouge nodes to join my 
 cluster. I can go for it if i can, in anyway, limit what all nodes join 
 the cluster, other than the cluster name.
 Are there any ways to do this.
 
 Thanks 
 Hari
 --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/127e4ba6-3fab-4fca-8595-e4506ef8f101%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a5a2de96-f308-495a-8de4-07feca421759%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7af0beb0-880c-4df4-b182-eb026210959b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/67888A41-9415-4F99-B1AE-CEDD0FFC7544%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Adding new nodes to cluster with unicast without restarting

2014-03-14 Thread Hari Prasad

Is the cluster name the only was to restrict nodes from joining into 
cluster, unicast or multicast. 

I do have security concern and my concern is is there a way the address it 
from elasticsearch itself rather than going into the network layer.
On Friday, 14 March 2014 12:22:52 UTC+5:30, David Pilato wrote:

 If the cluster name is different, the new node won't join the cluster.
 Also, if you have a security concern, you should restrict network access 
 to your nodes on a transport layer (9300). 



 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 14 mars 2014 à 07:11, Hari Prasad iamha...@gmail.com javascript: a 
 écrit :

 Hi
 Can you please explain the following 
 I first started one node with only its ip in unicast host list and 
 multicast discovery false. The cluster started with one node.
 Then I started another node with its unicast host value of node one. This 
 also joined the cluster.

 This is in sync with what you said. But if this is the case the what the 
 use for setting the multicast as false and setting the unicast host list.
 Cant any node join the cluster just like that by setting the unicast to 
 any node in the cluster. how to limit this.

 On Thursday, 13 March 2014 19:46:08 UTC+5:30, Hari Prasad wrote:

 Ok Thank you :)

 On Thursday, 13 March 2014 19:35:52 UTC+5:30, David Pilato wrote:

 yes

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
 http://Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 13 mars 2014 à 15:00:10, Hari Prasad (iamha...@gmail.com) a écrit:

 Is this the case the even if discovery.zen.ping.multicast.enabled is 
 false?

 On Thursday, 13 March 2014 19:17:19 UTC+5:30, David Pilato wrote: 

  Yes. Just launch the new node and set its unicast values to other 
 running nodes.
 It will connect to the cluster and the cluster will add him as a new 
 node.

 You don't have to modify existing settings, although you should do it 
 to have updated settings in case of restart.

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

  
 Le 13 mars 2014 à 14:38, Hari Prasad iamha...@gmail.com a écrit :

  Hi 
 I am having an elasticsearch cluster. I am using the unicast to 
 discover nodes. Can I add nodes to list dynamically without restarting the 
 cluster? 
 I tried to do this with the prepareUpdateSettings but i got ignoring 
 transient setting [discovery.zen.ping.unicast.hosts], not dynamically 
 updateable.
 Is there any other way to do this without restarting the cluster.

 I am not going for multicast because i don't want rouge nodes to join 
 my cluster. I can go for it if i can, in anyway, limit what all nodes join 
 the cluster, other than the cluster name.
 Are there any ways to do this.

 Thanks 
 Hari
  --
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/127e4ba6-3fab-4fca-8595-e4506ef8f101%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/127e4ba6-3fab-4fca-8595-e4506ef8f101%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.
  
  --
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a5a2de96-f308-495a-8de4-07feca421759%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a5a2de96-f308-495a-8de4-07feca421759%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/7af0beb0-880c-4df4-b182-eb026210959b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7af0beb0-880c-4df4-b182-eb026210959b%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/228a8efa-2ba3-4cf2-a244-15ea641dcf2e%40googlegroups.com.
For more options, visit 

Re: Adding new nodes to cluster with unicast without restarting

2014-03-14 Thread David Pilato
AFAIK no but I might be wrong.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 14 mars 2014 à 07:58, Hari Prasad iamhari1...@gmail.com a écrit :


Is the cluster name the only was to restrict nodes from joining into cluster, 
unicast or multicast. 

I do have security concern and my concern is is there a way the address it from 
elasticsearch itself rather than going into the network layer.
 On Friday, 14 March 2014 12:22:52 UTC+5:30, David Pilato wrote:
 If the cluster name is different, the new node won't join the cluster.
 Also, if you have a security concern, you should restrict network access to 
 your nodes on a transport layer (9300). 
 
 
 
 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
 
 
 Le 14 mars 2014 à 07:11, Hari Prasad iamha...@gmail.com a écrit :
 
 Hi
 Can you please explain the following 
 I first started one node with only its ip in unicast host list and multicast 
 discovery false. The cluster started with one node.
 Then I started another node with its unicast host value of node one. This 
 also joined the cluster.
 
 This is in sync with what you said. But if this is the case the what the use 
 for setting the multicast as false and setting the unicast host list.
 Cant any node join the cluster just like that by setting the unicast to any 
 node in the cluster. how to limit this.
 
 On Thursday, 13 March 2014 19:46:08 UTC+5:30, Hari Prasad wrote:
 Ok Thank you :)
 
 On Thursday, 13 March 2014 19:35:52 UTC+5:30, David Pilato wrote:
 yes
 
 -- 
 David Pilato | Technical Advocate | Elasticsearch.com
 @dadoonet | @elasticsearchfr
 
 
 Le 13 mars 2014 à 15:00:10, Hari Prasad (iamha...@gmail.com) a écrit:
 
 Is this the case the even if discovery.zen.ping.multicast.enabled is false?
 
 On Thursday, 13 March 2014 19:17:19 UTC+5:30, David Pilato wrote:
 Yes. Just launch the new node and set its unicast values to other running 
 nodes.
 It will connect to the cluster and the cluster will add him as a new node.
 
 You don't have to modify existing settings, although you should do it to 
 have updated settings in case of restart.
 
 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
 
 
 Le 13 mars 2014 à 14:38, Hari Prasad iamha...@gmail.com a écrit :
 
 Hi
 I am having an elasticsearch cluster. I am using the unicast to discover 
 nodes. Can I add nodes to list dynamically without restarting the 
 cluster? 
 I tried to do this with the prepareUpdateSettings but i got ignoring 
 transient setting [discovery.zen.ping.unicast.hosts], not dynamically 
 updateable.
 Is there any other way to do this without restarting the cluster.
 
 I am not going for multicast because i don't want rouge nodes to join my 
 cluster. I can go for it if i can, in anyway, limit what all nodes join 
 the cluster, other than the cluster name.
 Are there any ways to do this.
 
 Thanks 
 Hari
 --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/127e4ba6-3fab-4fca-8595-e4506ef8f101%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a5a2de96-f308-495a-8de4-07feca421759%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/7af0beb0-880c-4df4-b182-eb026210959b%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/228a8efa-2ba3-4cf2-a244-15ea641dcf2e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

JVM Memory Usage

2014-03-14 Thread Umutcan

Hi,

I have a question about memory usage.

My cluster has 1 master and data node and 3 data nodes. Each have 6 GB 
heap size (which is nearly half of the machine). I have 250 shards with 
replicas and I have 600 GB data in total. When I start the cluster, I 
can use it for 1 week without any problem. After a week, my cluster 
begins to fail due to low memory (below 10%). When I restart all the 
nodes, everything is fine, again. Free memory goes up to 40%. And, it 
fails again 1 week after the restart.


I think  some data is remaining in the memory for a long time even if it 
is not used. Is there any configuration to optimize this? Do I need to 
flush indices or clear cache periodically?


Thanks,
Umutcan

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5322B754.80104%40gamegos.com.
For more options, visit https://groups.google.com/d/optout.


Re: Mapping Attachment plugin Installtion/dubian

2014-03-14 Thread Alexander Reelsen
Hey,

the plugin command is in '/usr/share/elasticsearch/bin' - you can find out
by executing 'dpkg -L elasticsearch'


--Alex


On Thu, Mar 13, 2014 at 4:58 PM, ZenMaster80 sabdall...@gmail.com wrote:

 I am having trouble finding how to install the above plugin? I installed
 Elastic Search with Dubian.
 Typically On my local linux machine I did /bin/plugin , I am not
 sure where is the 'bin/plugin goes with the dubian installation?

 Thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/685b87e0-34bd-49da-993a-5a92927cc0f1%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/685b87e0-34bd-49da-993a-5a92927cc0f1%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-%3DymneQRqSvb9sSS9baTOFgEcR5k5esmkX_Sq5oUk5Cg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


aggregations: How to get a only the entries with latest status 'nok' and nothing else

2014-03-14 Thread Sven Beauprez



Suppose I have following mapping for documents
_timestamp: ES timestamp enabled
mod_id : string (a unique ID for a module, not the same as _id field from 
ES)
status_code : integer (similar as to HTTP codes where 200 is ok and all 
else is nok)

With following aggregation, I get for all modules (buckets) an aggregation 
of the status codes, with the latest submitted status code on top:

   aggs: {
  by_module: {
 terms: {
field: mod_id
 },
 aggs: {
by_status: {
   terms: {
  field: status_code,
  order: {
 max_time: desc
  }
   },
   aggs: {
  max_time: {
 max: {
field: _timestamp
 }
  }
   }
}
 }
  }
   }

   
result:
   aggregations: {
  by_module: {
 buckets: [
{
   key: ModuleUniqueID12,
   doc_count: 4,
   by_status: {
  buckets: [
 {
key: 503,
doc_count: 2,
max_time: {
   value: 1394750966731
}
 },
 {
key: 200,
doc_count: 2,
max_time: {
   value: 1394745749862
}
 }
  ]
   }
},
{
   key: ModuleUniqueID1,
   doc_count: 2,
   by_status: {
  buckets: [
 {
key: 200,
doc_count: 2,
max_time: {
   value: 1394729958485
}
 }
  ]
   }
},

... //and so on
]
  }
   }


What I want now is only the documents where the latest (- this is the hard 
part) entries for a module contains a status_code that is not ok, ie. and 
the above resultset I would only get the document with mod_id 
ModuleUniqueID12, because the latest entry added to ES has a status_code 
of 503.

Can this be filtered combined with the 'max_time' aggregation metric for 
example? Any other ways? How would I use the 'max_time' metric in a script?

thnx!

Sven


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/601181d2-6888-47f6-bf95-6b7708a587b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


OutOfMemoryError OOM while indexing Documents

2014-03-14 Thread Alexander Ott
Hi,

we always run in an OutOfMemoryError while indexing documents or shortly 
afterwards.
We only have one instance of elasticsearch version 1.0.1 (no cluster)

Index informations:
size: 203G (203G)
docs: 237.354.313 (237.354.313)

Our JVM settings as following:

/usr/lib/jvm/java-7-oracle/bin/java -Xms16g -Xmx16g -Xss256k -Djava.awt.
headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:
CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+
HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/
elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/
elasticsearch/lib/elasticsearch-1.0.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
 
-Des.default.config=/etc/elasticsearch/elasticsearch.yml 
-Des.default.path.home=/usr/share/elasticsearch 
-Des.default.path.logs=/var/log/elasticsearch 
-Des.default.path.data=/var/lib/elasticsearch 
-Des.default.path.work=/tmp/elasticsearch 
-Des.default.path.conf=/etc/elasticsearch 
org.elasticsearch.bootstrap.Elasticsearch


OutOfMemoryError:
[2014-03-12 01:27:27,964][INFO ][monitor.jvm  ] [Stiletto] 
[gc][old][32451][309] duration [5.1s], collections [1]/[5.9s], total 
[5.1s]/[3.1m], memory [15.8gb]-[15.7gb]/[15.9gb], all_pools {[young] 
[665.6mb]-[583.7mb]/[665.6mb]}{[survivor] [32.9mb]-[0b]/[83.1mb]}{[old] 
[15.1gb]-[15.1gb]/[15.1gb]}
[2014-03-12 01:28:23,822][INFO ][monitor.jvm  ] [Stiletto] 
[gc][old][32466][322] duration [5s], collections [1]/[5.9s], total 
[5s]/[3.8m], memory [15.8gb]-[15.8gb]/[15.9gb], all_pools {[young] 
[652.5mb]-[663.8mb]/[665.6mb]}{[survivor] [0b]-[0b]/[83.1mb]}{[old] 
[15.1gb]-[15.1gb]/[15.1gb]}
[2014-03-12 01:33:29,814][WARN ][index.merge.scheduler] [Stiletto] 
[myIndex][0] failed to merge
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.util.fst.BytesStore.writeByte(BytesStore.java:83)
at org.apache.lucene.util.fst.FST.init(FST.java:282)
at org.apache.lucene.util.fst.Builder.init(Builder.java:163)
at 
org.apache.lucene.codecs.BlockTreeTermsWriter$PendingBlock.compileIndex(BlockTreeTermsWriter.java:420)
at 
org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:569)
at 
org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter$FindBlocks.freeze(BlockTreeTermsWriter.java:544)
at org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:214)
at org.apache.lucene.util.fst.Builder.add(Builder.java:394)
at 
org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
at 
org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:166)
at 
org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:383)
at 
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4071)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3668)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

We also increased heap to 32g but with the same result
[2014-03-12 22:39:53,817][INFO ][monitor.jvm  ] [Charcoal] 
[gc][old][32895][86] duration [6.9s], collections [1]/[7.3s], total 
[6.9s]/[19.6s], memory [20.5gb]-[12.7gb]/[31.9gb], all_pools {[youn
g] [654.9mb]-[1.9mb]/[665.6mb]}{[survivor] [83.1mb]-[0b]/[83.1mb]}{[old] 
[19.8gb]-[12.7gb]/[31.1gb]}
[2014-03-12 23:11:07,015][INFO ][monitor.jvm  ] [Charcoal] 
[gc][old][34750][166] duration [8s], collections [1]/[8.6s], total 
[8s]/[29.1s], memory [30.9gb]-[30.9gb]/[31.9gb], all_pools {[young]
[660.6mb]-[1mb]/[665.6mb]}{[survivor] [83.1mb]-[0b]/[83.1mb]}{[old] 
[30.2gb]-[30.9gb]/[31.1gb]}
[2014-03-12 23:12:18,117][INFO ][monitor.jvm  ] [Charcoal] 
[gc][old][34812][182] duration [7.1s], collections [1]/[8.1s], total 
[7.1s]/[36.6s], memory [31.5gb]-[31.5gb]/[31.9gb], all_pools {[you
ng] [655.6mb]-[410.3mb]/[665.6mb]}{[survivor] [0b]-[0b]/[83.1mb]}{[old] 
[30.9gb]-[31.1gb]/[31.1gb]}
[2014-03-12 23:12:56,294][INFO ][monitor.jvm  ] [Charcoal] 
[gc][old][34844][193] duration [7.1s], collections [1]/[7.1s], total 
[7.1s]/[43.9s], memory [31.9gb]-[31.9gb]/[31.9gb], all_pools {[you
ng] [665.6mb]-[665.2mb]/[665.6mb]}{[survivor] 
[81.9mb]-[82.8mb]/[83.1mb]}{[old] [31.1gb]-[31.1gb]/[31.1gb]}
[2014-03-12 23:13:11,836][WARN ][index.merge.scheduler] [Charcoal] 
[myIndex][3] failed to merge
java.lang.OutOfMemoryError: Java heap space
at 

Re: (ES 0.90.1) Cannot connect to elasticsearch cluster after a node is removed

2014-03-14 Thread Hui
Sorry All.

I've verified that the slow problem is related to the 
reason failed to ping, tried [3] times, each with maximum [30s] timeout

Thanks.


On Friday, March 14, 2014 12:02:36 PM UTC+8, Hui wrote:

 Hi All,

 After testing in another cluster, I found that the cluster can be 
 connected but it was very slow.

 At this moment, every normal request(~50ms) becomes 41732ms to 85984ms 
 while the cluster is in Yellow health and there is no unassigned shard(s).

 It becomes 50ms again after the problem node re-joins.

 There is no exception log in the master node. 

 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5f5a7739-c2b5-43cf-b040-346002d7281f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: aggregations: How to get a only the entries with latest status 'nok' and nothing else

2014-03-14 Thread Sven Beauprez

It might be related to github.com/elasticsearch/elasticsearch/issues/4404but it 
seems that that is not yet implemented. So a solution that works 
with 1.0 version would be welcome.

regards,

Sven


On Friday, March 14, 2014 9:50:57 AM UTC+1, Sven Beauprez wrote:




 Suppose I have following mapping for documents
 _timestamp: ES timestamp enabled
 mod_id : string (a unique ID for a module, not the same as _id field from 
 ES)
 status_code : integer (similar as to HTTP codes where 200 is ok and all 
 else is nok)

 With following aggregation, I get for all modules (buckets) an aggregation 
 of the status codes, with the latest submitted status code on top:

aggs: {
   by_module: {
  terms: {
 field: mod_id
  },
  aggs: {
 by_status: {
terms: {
   field: status_code,
   order: {
  max_time: desc
   }
},
aggs: {
   max_time: {
  max: {
 field: _timestamp
  }
   }
}
 }
  }
   }
}


 result:
aggregations: {
   by_module: {
  buckets: [
 {
key: ModuleUniqueID12,
doc_count: 4,
by_status: {
   buckets: [
  {
 key: 503,
 doc_count: 2,
 max_time: {
value: 1394750966731
 }
  },
  {
 key: 200,
 doc_count: 2,
 max_time: {
value: 1394745749862
 }
  }
   ]
}
 },
 {
key: ModuleUniqueID1,
doc_count: 2,
by_status: {
   buckets: [
  {
 key: 200,
 doc_count: 2,
 max_time: {
value: 1394729958485
 }
  }
   ]
}
 },
 
 ... //and so on
 ]
   }
}


 What I want now is only the documents where the latest (- this is the 
 hard part) entries for a module contains a status_code that is not ok, ie. 
 and the above resultset I would only get the document with mod_id 
 ModuleUniqueID12, because the latest entry added to ES has a status_code 
 of 503.

 Can this be filtered combined with the 'max_time' aggregation metric for 
 example? Any other ways? How would I use the 'max_time' metric in a script?

 thnx!

 Sven




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/322f9d73-9743-4380-b5e8-c26c997de5cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Low priority queries or query throttling?

2014-03-14 Thread Clinton Gormley
Adding to what Zach said, I'd also be interested in looking at what causes
these queries to be so slow. Potentially their performance could be greatly
improved.

clint


On 14 March 2014 01:29, Zachary Tong zacharyjt...@gmail.com wrote:

 What's the nature of the queries?  There may be some optimizations that
 can be made.

 How much memory is on the box total?

 I would not recommend G1 GC.  It is promising but we still see bug reports
 where G1 just straight up crashes.  For now, the official ES recommendation
 is still CMS.  FWIW, G1 will use more CPU than CMS by definition, because
 of the way G1 operates (e.g. shorter pauses at the cost of more CPU).  That
 could partially explain your increased load.

 There is currently no way to give a priority to queries, although I agree
 that would be very nice.  There are some tricks you can do to control where
 queries go using search preferences (see
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html).
  For example, you could send all slow queries to a single node, and send
 all other queries to the rest of your cluster.  That would effectively
 bottleneck the slow queries, assuming the slow node has all the shards
 required to execute the query.

 Similarly, you can use Allocation Awareness and Forced Zones to control
 which indices end up on which shards, etc.




 On Thursday, March 13, 2014 5:15:58 PM UTC-4, Peter Wright wrote:

 Hi,
 I am currently having trouble with fairly slow and intensive queries
 causing excessive load on my elasticsearch cluster and I would like to
 know people's opinions on ways to mitigate or prevent that excessive load.

 We attempt about 50 of these slow queries per second, and they take an
 average of 300ms which adds up to more than we can process,, causing
 excessive load and sometimes causing elasticsearch to become
 non-responsive.

 The slow queries are all low priority, and we have other, high priority
 queries running on the index. Slow queries could take 10 seconds to process
 for all we care, and we'd rather have them fail than cause excessive load
 on the cluster.

 Is there a way to give these queries a lower priority and to force them
 to use no more than a certain percentage of the cluster's resources? Or is
 it possible to refuse certain types of queries if elasticsearch is under
 excessive load?

 I am also curious if people have thoughts on what could improve the
 throughput of these queries based on the information given below. I can
 give more details about the structure of the queries themselves if
 necessary.



 The cluster is made of two machines (both have 16 CPU cores and let
 elasticsearch use 15G of memory) running ES 0.90.12 with the G1 garbage
 collector. I began experiencing higher load with this setup than before
 when I upgraded to 0.90.12 from 0.90.0 beta (which was using the CMS
 collector with the default settings), however, several other changes were
 made at the same time and it isn't yet fully clear whether either the
 change of version or the change of GC is responsible, or whether it was
 simply a coincidence. Some thoughts on that would be appreciated.

 The index has 5 shards and one replica (both machines each have a
 version of all shards), is a couple gigs in size and contains a couple
 million documents.

 Thanks!

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a13d004a-e180-43ad-90c1-132bd05bfdfa%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a13d004a-e180-43ad-90c1-132bd05bfdfa%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKQTwVo-aRE2SA6xy5fzJ6tkOHOAckq4zCMPAr4D08t4rw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Upload/index document to Elastic Search

2014-03-14 Thread Sandeep
How can I upload/index PDF/HTML/XML format documents to Elastic Search.



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Upload-index-document-to-Elastic-Search-tp4051542.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1394542439478-4051542.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


[ES - 1.0.0] Trouble doing a split match with edgeNgram

2014-03-14 Thread عمر
Hi,

I am trying to query for fit pro against my index where there is a 
document with fitpro on one of its fields. Decided to take the following 
approach with it:

- Create a reverse edgeNgram tokenizer
- Used the edgeNgram tokenizer only of index_analyzer
- Boost on match prefix and match (edgeNgram field)


Am getting the result after couple of hits down the list. Playing with the 
weights is not changing much :/ 

Any help would be much appreciated.


Index mapping is -


'settings':{ 
'number_of_shards': 1,
'number_of_replicas': 0,
analysis: {
filter : {
filter_edgeNgram : {
type : edgeNGram,
min_gram : 3,
max_gram : 6,
token_chars: [ letter, digit ]
}
},
analyzer: { 
analyzer_ngram_rev : {
type:custom,
tokenizer: standard,
filter: [standard, lowercase, asciifolding, reverse, 
filter_edgeNgram, reverse]
},
analyzer_stemmed : {
tokenizer:standard,
filter:[standard, lowercase, asciifolding, kstem]
}
}
}
}
},
ignore=400



Query is -



{
  from : 0,
  size : 60,
  query : {
function_score : {
  query : {
filtered : {
  query : {
bool : {
  should : [ {
query_string : {
  query : fit~1 pro~1 ,
  minimum_should_match : 60%
}
  }, {
match : {
  name : {
query : fit,
type : phrase_prefix
  }
}
  }, {
match : {
  name_ngram_fwd : {
query : fit pro,
type : boolean
  }
}
  }, {
match : {
  name_stemmed : {
query : fit pro,
type : boolean
  }
}
  } ]
}
  },
  filter : {
geo_distance : {
  location : [ 41.880001068115234, -87.62000274658203 ],
  distance : 16km
}
  }
}
  },
  functions : [ {
filter : {
  geo_distance : {
location : [ 41.880001068115234, -87.62000274658203 ],
distance : 16km
  }
},
boost_factor : 2.0
  }, {
filter : {
  query : {
query_string : {
  query : fit~1 pro~1 ,
  minimum_should_match : 60%
}
  }
},
boost_factor : 1.0
  }, {
filter : {
  query : {
match : {
  name : {
query : fit,
type : phrase_prefix
  }
}
  }
},
boost_factor : 4.0
  }, {
filter : {
  query : {
match : {
  name_ngram_rev : {
query : fit pro
}
}
  }
},
boost_factor : 6.0
}, {
filter : {
  query : {
query_string : {
  query : fit pro
}
  }
},
boost_factor : 2.0
  } ],
  score_mode : multiply
}
  },
  fields : _source,
  script_fields : {
distance : {
  script : doc['location'].distanceInKm(41.880001,-87.620003)
}
  }
}


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9fefa584-ee2f-42e1-ae38-6fa494df5ddd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch memory usage

2014-03-14 Thread codebird
Hello,

I have been using elasticsearch on a ubuntu server for a year now, and
everything was going great. I had an index of 150,000,000 entries of domain
names, running small queries on it, just filtering by 1 term no sorting no
wildcard nothing. Now we moved servers, I have now a CentOS 6 server, 32GB
ram and running elasticserach but now we have 2 indices, of about 150
million entries each 32 shards, still running the same queries on them
nothing changed in the queries. But since we went online with the new
server, I have to restart elasticsearch every 2 hours before OOM killer
kills it.

What's happening is that elasticsearch starts using memory till 50% then it
goes back down to about 30% gradually then starts to go up again gradually
and never goes back down.

I have tried all the solutions I found on the net, I am a developer not a
server admin.

I have these setting in my service wrapper configuration

set.default.ES_HOME=/home/elasticsearch
set.default.ES_HEAP_SIZE=8192
set.default.MAX_OPEN_FILES=65535
set.default.MAX_LOCKED_MEMORY=10240
set.default.CONF_DIR=/home/elasticsearch/conf
set.default.WORK_DIR=/home/elasticsearch/tmp
set.default.DIRECT_SIZE=4g

# Java Additional Parameters
wrapper.java.additional.1=-Delasticsearch-service
wrapper.java.additional.2=-Des.path.home=%ES_HOME%
wrapper.java.additional.3=-Xss256k
wrapper.java.additional.4=-XX:+UseParNewGC
wrapper.java.additional.5=-XX:+UseConcMarkSweepGC
wrapper.java.additional.6=-XX:CMSInitiatingOccupancyFraction=75
wrapper.java.additional.7=-XX:+UseCMSInitiatingOccupancyOnly
wrapper.java.additional.8=-XX:+HeapDumpOnOutOfMemoryError
wrapper.java.additional.9=-Djava.awt.headless=true
wrapper.java.additional.10=-XX:MinHeapFreeRatio=40
wrapper.java.additional.11=-XX:MaxHeapFreeRatio=70
wrapper.java.additional.12=-XX:CMSInitiatingOccupancyFraction=75
wrapper.java.additional.13=-XX:+UseCMSInitiatingOccupancyOnly
wrapper.java.additional.15=-XX:MaxDirectMemorySize=4g
# Initial Java Heap Size (in MB)
wrapper.java.initmemory=%ES_HEAP_SIZE%

And these in elasticsearch.yml
ES_MIN_MEM: 5g
ES_MAX_MEM: 5g
#index.store.type=mmapfs
index.cache.field.type: soft
index.cache.field.max_size: 1
index.cache.field.expire: 10m
index.term_index_interval: 256
index.term_index_divisor: 5

java version:
java version 1.7.0_51
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

Elasticsearch version
 version : {
number : 1.0.0,
build_hash : a46900e9c72c0a623d71b54016357d5f94c8ea32,
build_timestamp : 2014-02-12T16:18:34Z,
build_snapshot : false,
lucene_version : 4.6
  }

Using elastica PHP


I have tried playing with values up and down to try to make it work, but
nothing is changing.  

Please any help would be highly appreciated. 



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-memory-usage-tp4051793.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1394711400299-4051793.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Upload/index document to Elastic Search

2014-03-14 Thread David Pilato
If you are a Java developer, you could use Apache Tika.
This is what I'm doing here: 
https://github.com/dadoonet/fsriver/blob/master/src/main/java/fr/pilato/elasticsearch/river/fs/river/FsRiver.java#L689


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 14 mars 2014 à 11:35:26, Sandeep (sandeep.test...@gmail.com) a écrit:

How can I upload/index PDF/HTML/XML format documents to Elastic Search.  



--  
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Upload-index-document-to-Elastic-Search-tp4051542.html
  
Sent from the ElasticSearch Users mailing list archive at Nabble.com.  

--  
You received this message because you are subscribed to the Google Groups 
elasticsearch group.  
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.  
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1394542439478-4051542.post%40n3.nabble.com.
  
For more options, visit https://groups.google.com/d/optout.  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.5322deef.2eb141f2.1ccf%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch memory usage

2014-03-14 Thread Mark Walkom
Your config is a bit of a mess unfortunately, you're setting different
values for Xms and Xmx in a few places which makes it hard to know what is
applied.
Can you do a ps -ef|grep elasticsearch and post it? That will clarify the
memory settings used as least.

That aside you really want to set 50% of your system ram to ES, if the host
is dedicated to ES.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 13 March 2014 22:50, codebird mallah.hic...@gmail.com wrote:

 Hello,

 I have been using elasticsearch on a ubuntu server for a year now, and
 everything was going great. I had an index of 150,000,000 entries of domain
 names, running small queries on it, just filtering by 1 term no sorting no
 wildcard nothing. Now we moved servers, I have now a CentOS 6 server, 32GB
 ram and running elasticserach but now we have 2 indices, of about 150
 million entries each 32 shards, still running the same queries on them
 nothing changed in the queries. But since we went online with the new
 server, I have to restart elasticsearch every 2 hours before OOM killer
 kills it.

 What's happening is that elasticsearch starts using memory till 50% then it
 goes back down to about 30% gradually then starts to go up again gradually
 and never goes back down.

 I have tried all the solutions I found on the net, I am a developer not a
 server admin.

 I have these setting in my service wrapper configuration

 set.default.ES_HOME=/home/elasticsearch
 set.default.ES_HEAP_SIZE=8192
 set.default.MAX_OPEN_FILES=65535
 set.default.MAX_LOCKED_MEMORY=10240
 set.default.CONF_DIR=/home/elasticsearch/conf
 set.default.WORK_DIR=/home/elasticsearch/tmp
 set.default.DIRECT_SIZE=4g

 # Java Additional Parameters
 wrapper.java.additional.1=-Delasticsearch-service
 wrapper.java.additional.2=-Des.path.home=%ES_HOME%
 wrapper.java.additional.3=-Xss256k
 wrapper.java.additional.4=-XX:+UseParNewGC
 wrapper.java.additional.5=-XX:+UseConcMarkSweepGC
 wrapper.java.additional.6=-XX:CMSInitiatingOccupancyFraction=75
 wrapper.java.additional.7=-XX:+UseCMSInitiatingOccupancyOnly
 wrapper.java.additional.8=-XX:+HeapDumpOnOutOfMemoryError
 wrapper.java.additional.9=-Djava.awt.headless=true
 wrapper.java.additional.10=-XX:MinHeapFreeRatio=40
 wrapper.java.additional.11=-XX:MaxHeapFreeRatio=70
 wrapper.java.additional.12=-XX:CMSInitiatingOccupancyFraction=75
 wrapper.java.additional.13=-XX:+UseCMSInitiatingOccupancyOnly
 wrapper.java.additional.15=-XX:MaxDirectMemorySize=4g
 # Initial Java Heap Size (in MB)
 wrapper.java.initmemory=%ES_HEAP_SIZE%

 And these in elasticsearch.yml
 ES_MIN_MEM: 5g
 ES_MAX_MEM: 5g
 #index.store.type=mmapfs
 index.cache.field.type: soft
 index.cache.field.max_size: 1
 index.cache.field.expire: 10m
 index.term_index_interval: 256
 index.term_index_divisor: 5

 java version:
 java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

 Elasticsearch version
  version : {
 number : 1.0.0,
 build_hash : a46900e9c72c0a623d71b54016357d5f94c8ea32,
 build_timestamp : 2014-02-12T16:18:34Z,
 build_snapshot : false,
 lucene_version : 4.6
   }

 Using elastica PHP


 I have tried playing with values up and down to try to make it work, but
 nothing is changing.

 Please any help would be highly appreciated.



 --
 View this message in context:
 http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-memory-usage-tp4051793.html
 Sent from the ElasticSearch Users mailing list archive at Nabble.com.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1394711400299-4051793.post%40n3.nabble.com
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624aqfFB3ivJr08JHFWW6kA1J67SOk4f8eA5EZB8wEfJ5BQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


max_score is not coming for query with filter search

2014-03-14 Thread Subhadip Bagui
Hi,

I'm doing the below query to fetch data date wise. But in the result the 
score is coming as null. How to get the score here.

curl -XPOST 10.203.251.142:9200/aricloud/_search -d
'{
query: {
filtered: {
query : {
match : {
CLOUD_TYPE : AWS-EC2
}
},
filter: {
range: {
NODE_CREATE_TIME: {
to: 2014-03-14 18:43:55,
from: 2014-03-14 16:22:32
}
}
}
}
},
sort : {NODE_ID : desc},
from : 0,
size : 3
}'
==
{
took: 1,
timed_out: false,
_shards: {
total: 3,
successful: 3,
failed: 0
},
hits: {
total: 5,
max_score: null,
hits: [
{
_index: aricloud,
_type: nodes,
_id: 4,
_score: null,
_source: {
NODE_ID: 12334,
CLOUD_TYPE: AWS-EC2,
NODE_GROUP_NAME: DATABASE,
NODE_CPU: 5GHZ,
NODE_HOSTNAME: virtualnode.aricent.com,
NODE_NAME: aws-node4,
NODE_PRIVATE_IP_ADDRESS: 10.123.124.126,
NODE_PUBLIC_IP_ADDRESS: 125.31.108.72,
NODE_INSTANCE_ID: asw126,
NODE_STATUS: STOPPED,
NODE_CATEGORY_ID: 14,
NODE_CREATE_TIME: 2014-03-14 16:35:35
},
sort: [
12334
]
}
]
}
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/21f413c0-1a13-4ebf-9a49-d43e3267e46b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch java interaction

2014-03-14 Thread Venu Krishna
Hi,
   I am Y.Venu,i am totally new to this elasticsearch,now i am trying 
to communicate java elastisearch,i have gone through the elasticsearch java 
api's 

1st i came across maven repositry.
i have created pom.xml in my eclipse and in the dependency tag i have 
just placed the code that i found in maven repositry 

 i.e.  
 

dependency
groupIdorg.elasticsearch/groupId
artifactIdelasticsearch/artifactId
version${es.version}/version
/dependency

After that i have created one class with the main method and i copied and 
placed the code that i found in the client api of elasticsearch i.e.
 TransportClient.

main()
{
Client client = new TransportClient()
.addTransportAddress(new 
InetSocketTransportAddress(host1, 9200))
.addTransportAddress(new 
InetSocketTransportAddress(host2, 9200));

// on shutdown

client.close();

Settings settings = ImmutableSettings.settingsBuilder()
.put(client.transport.sniff, true).build();
TransportClient client1 = new TransportClient(settings);

}

After running this app javapplication,i am getting the errors like this



In Main Method
Mar 14, 2014 6:05:24 PM org.elasticsearch.node
INFO: [Mister Machine] {elasticsearch/0.16.1}[11016]: initializing ...
Mar 14, 2014 6:05:24 PM org.elasticsearch.plugins
INFO: [Mister Machine] loaded []
org.elasticsearch.common.inject.internal.ComputationException: 
org.elasticsearch.common.inject.internal.ComputationException: 
java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock;
at 
org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553)
at 
org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419)
at 
org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041)
at 
org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46)
at 
org.elasticsearch.common.inject.ConstructorInjectorStore.get(ConstructorInjectorStore.java:52)
at 
org.elasticsearch.common.inject.ConstructorBindingImpl.initialize(ConstructorBindingImpl.java:57)
at 
org.elasticsearch.common.inject.InjectorImpl.initializeBinding(InjectorImpl.java:377)
at 
org.elasticsearch.common.inject.BindingProcessor$1$1.run(BindingProcessor.java:169)
at 
org.elasticsearch.common.inject.BindingProcessor.initializeBindings(BindingProcessor.java:224)
at 
org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:120)
at 
org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:105)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:92)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:69)
at 
org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:58)
at 
org.elasticsearch.node.internal.InternalNode.init(InternalNode.java:146)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166)
at ES_Client.main(ES_Client.java:64)
Caused by: org.elasticsearch.common.inject.internal.ComputationException: 
java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock;
at 
org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553)
at 
org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419)
at 
org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041)
at 
org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46)
at 
org.elasticsearch.common.inject.MembersInjectorStore.get(MembersInjectorStore.java:66)
at 
org.elasticsearch.common.inject.ConstructorInjectorStore.createConstructor(ConstructorInjectorStore.java:69)
at 
org.elasticsearch.common.inject.ConstructorInjectorStore.access$000(ConstructorInjectorStore.java:31)
at 
org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:39)
at 
org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:35)
at 
org.elasticsearch.common.inject.internal.FailableCache$1.apply(FailableCache.java:35)
at 
org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:549)
... 17 more
Caused by: java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock;
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Unknown Source)
at java.lang.Class.getDeclaredFields(Unknown Source)
at 

Re: max_score is not coming for query with filter search

2014-03-14 Thread Clinton Gormley
My first question is: why do you want the score? The score is used only for
sorting, and you're sorting on NODE_ID.

If you really want it (and there is a cost to computing the score) then you
can set track_scores to true.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_track_scores

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKTHAbt0WP9NN_gJDjDb5%2BvViBs_ZVn-Zm2KknoX_dUyCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch java interaction

2014-03-14 Thread David Pilato
What is the value of ${es.version}

Did you set it? Try with 1.0.1
Also transport layer uses 9300 ports and not 9200 ports (REST layer)

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 14 mars 2014 à 13:47, Venu Krishna yvgk2...@gmail.com a écrit :

Hi,
   I am Y.Venu,i am totally new to this elasticsearch,now i am trying to 
communicate java elastisearch,i have gone through the elasticsearch java api's 

1st i came across maven repositry.
i have created pom.xml in my eclipse and in the dependency tag i have just 
placed the code that i found in maven repositry 

 i.e.  
 
dependency
groupIdorg.elasticsearch/groupId
artifactIdelasticsearch/artifactId
version${es.version}/version
/dependency

After that i have created one class with the main method and i copied and 
placed the code that i found in the client api of elasticsearch i.e.
 TransportClient.

main()
{
Client client = new TransportClient()
.addTransportAddress(new 
InetSocketTransportAddress(host1, 9200))
.addTransportAddress(new 
InetSocketTransportAddress(host2, 9200));

// on shutdown

client.close();

Settings settings = ImmutableSettings.settingsBuilder()
.put(client.transport.sniff, true).build();
TransportClient client1 = new TransportClient(settings);

}

After running this app javapplication,i am getting the errors like this



In Main Method
Mar 14, 2014 6:05:24 PM org.elasticsearch.node
INFO: [Mister Machine] {elasticsearch/0.16.1}[11016]: initializing ...
Mar 14, 2014 6:05:24 PM org.elasticsearch.plugins
INFO: [Mister Machine] loaded []
org.elasticsearch.common.inject.internal.ComputationException: 
org.elasticsearch.common.inject.internal.ComputationException: 
java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock;
at 
org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553)
at 
org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419)
at 
org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041)
at 
org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46)
at 
org.elasticsearch.common.inject.ConstructorInjectorStore.get(ConstructorInjectorStore.java:52)
at 
org.elasticsearch.common.inject.ConstructorBindingImpl.initialize(ConstructorBindingImpl.java:57)
at 
org.elasticsearch.common.inject.InjectorImpl.initializeBinding(InjectorImpl.java:377)
at 
org.elasticsearch.common.inject.BindingProcessor$1$1.run(BindingProcessor.java:169)
at 
org.elasticsearch.common.inject.BindingProcessor.initializeBindings(BindingProcessor.java:224)
at 
org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:120)
at 
org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:105)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:92)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:69)
at 
org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:58)
at 
org.elasticsearch.node.internal.InternalNode.init(InternalNode.java:146)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166)
at ES_Client.main(ES_Client.java:64)
Caused by: org.elasticsearch.common.inject.internal.ComputationException: 
java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock;
at 
org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553)
at 
org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419)
at 
org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041)
at 
org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46)
at 
org.elasticsearch.common.inject.MembersInjectorStore.get(MembersInjectorStore.java:66)
at 
org.elasticsearch.common.inject.ConstructorInjectorStore.createConstructor(ConstructorInjectorStore.java:69)
at 
org.elasticsearch.common.inject.ConstructorInjectorStore.access$000(ConstructorInjectorStore.java:31)
at 
org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:39)
at 
org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:35)
at 
org.elasticsearch.common.inject.internal.FailableCache$1.apply(FailableCache.java:35)
at 
org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:549)
... 17 more
Caused by: 

elasticsearch init script for centos or rhel ?

2014-03-14 Thread Dominic Nicholas
Hi - can someone please point me to an /etc/init.d script for elasticsearch 
1.0.1 for CentOS or RHEL ?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/25064596-595d-4227-be37-d20f267edc5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch init script for centos or rhel ?

2014-03-14 Thread David Pilato
May be this? 
https://github.com/elasticsearch/elasticsearch/blob/master/src/deb/init.d/elasticsearch

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 14 mars 2014 à 14:19, Dominic Nicholas dominic.s.nicho...@gmail.com a 
écrit :

Hi - can someone please point me to an /etc/init.d script for elasticsearch 
1.0.1 for CentOS or RHEL ?

Thanks
-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/25064596-595d-4227-be37-d20f267edc5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/DAD71921-A32A-4DE7-BF40-9C9FAD3267AF%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: OutOfMemoryError OOM while indexing Documents

2014-03-14 Thread Zachary Tong
Are you running searches at the same time, or only indexing?  Are you bulk 
indexing?  How big (in physical kb/mb) are your bulk requests?

Can you attach the output of these APIs (preferably during memory buildup 
but before the OOM):

   - curl -XGET 'localhost:9200/_nodes/'
   - curl -XGET 'localhost:9200/_nodes/stats'

I would recommend downgrading your JVM to Java 1.7.0_u25.  There are known 
sigsegv bugs in the most recent versions of the JVM which have not been 
fixed yet.  It should be unrelated to your problem, but best to rule the 
JVM out.

I would not touch any of those configs.  In general, when debugging 
problems it is best to restore as many of the configs to their default 
settings as possible.






On Friday, March 14, 2014 5:46:12 AM UTC-4, Alexander Ott wrote:

 Hi,

 we always run in an OutOfMemoryError while indexing documents or shortly 
 afterwards.
 We only have one instance of elasticsearch version 1.0.1 (no cluster)

 Index informations:
 size: 203G (203G)
 docs: 237.354.313 (237.354.313)

 Our JVM settings as following:

 /usr/lib/jvm/java-7-oracle/bin/java -Xms16g -Xmx16g -Xss256k -Djava.awt.
 headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:
 CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+
 HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/
 elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/
 elasticsearch/lib/elasticsearch-1.0.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
  
 -Des.default.config=/etc/elasticsearch/elasticsearch.yml 
 -Des.default.path.home=/usr/share/elasticsearch 
 -Des.default.path.logs=/var/log/elasticsearch 
 -Des.default.path.data=/var/lib/elasticsearch 
 -Des.default.path.work=/tmp/elasticsearch 
 -Des.default.path.conf=/etc/elasticsearch 
 org.elasticsearch.bootstrap.Elasticsearch


 OutOfMemoryError:
 [2014-03-12 01:27:27,964][INFO ][monitor.jvm  ] [Stiletto] 
 [gc][old][32451][309] duration [5.1s], collections [1]/[5.9s], total 
 [5.1s]/[3.1m], memory [15.8gb]-[15.7gb]/[15.9gb], all_pools {[young] 
 [665.6mb]-[583.7mb]/[665.6mb]}{[survivor] [32.9mb]-[0b]/[83.1mb]}{[old] 
 [15.1gb]-[15.1gb]/[15.1gb]}
 [2014-03-12 01:28:23,822][INFO ][monitor.jvm  ] [Stiletto] 
 [gc][old][32466][322] duration [5s], collections [1]/[5.9s], total 
 [5s]/[3.8m], memory [15.8gb]-[15.8gb]/[15.9gb], all_pools {[young] 
 [652.5mb]-[663.8mb]/[665.6mb]}{[survivor] [0b]-[0b]/[83.1mb]}{[old] 
 [15.1gb]-[15.1gb]/[15.1gb]}
 [2014-03-12 01:33:29,814][WARN ][index.merge.scheduler] [Stiletto] 
 [myIndex][0] failed to merge
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.apache.lucene.util.fst.BytesStore.writeByte(BytesStore.java:83)
 at org.apache.lucene.util.fst.FST.init(FST.java:282)
 at org.apache.lucene.util.fst.Builder.init(Builder.java:163)
 at 
 org.apache.lucene.codecs.BlockTreeTermsWriter$PendingBlock.compileIndex(BlockTreeTermsWriter.java:420)
 at 
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:569)
 at 
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter$FindBlocks.freeze(BlockTreeTermsWriter.java:544)
 at org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:214)
 at org.apache.lucene.util.fst.Builder.add(Builder.java:394)
 at 
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
 at 
 org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:166)
 at 
 org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
 at 
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:383)
 at 
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
 at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4071)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3668)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at 
 org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 We also increased heap to 32g but with the same result
 [2014-03-12 22:39:53,817][INFO ][monitor.jvm  ] [Charcoal] 
 [gc][old][32895][86] duration [6.9s], collections [1]/[7.3s], total 
 [6.9s]/[19.6s], memory [20.5gb]-[12.7gb]/[31.9gb], all_pools {[youn
 g] [654.9mb]-[1.9mb]/[665.6mb]}{[survivor] [83.1mb]-[0b]/[83.1mb]}{[old] 
 [19.8gb]-[12.7gb]/[31.1gb]}
 [2014-03-12 23:11:07,015][INFO ][monitor.jvm  ] [Charcoal] 
 [gc][old][34750][166] duration [8s], collections [1]/[8.6s], total 
 [8s]/[29.1s], memory [30.9gb]-[30.9gb]/[31.9gb], all_pools {[young]
 [660.6mb]-[1mb]/[665.6mb]}{[survivor] [83.1mb]-[0b]/[83.1mb]}{[old] 
 

bool query with filter giving error

2014-03-14 Thread Subhadip Bagui
Hi,

I'm trying to run the below bool query with filter range to fetch all the 
node data with CLOUD_TYPE=AWS-EC2 and NODE_STATUS=ACTIVE.
But I'm getting SearchPhaseExecutionException from elasticsearch. Please 
let me know the correct way to do this.

  
curl -XPOST http://10.203.251.142:9200/aricloud/_search; -d
'{
filtered: {
query: {
bool: {
must: [
{
term: {
CLOUD_TYPE: AWS-EC2
}
},
{
term: {
NODE_STATUS: ACTIVE
}
}
]
}
},
filter: {
range: {
NODE_CREATE_TIME: {
to: 2014-03-14 18:43:55,
from: 2014-03-14 16:22:32
}
}
}
},
sort: {
NODE_ID: desc
},
from: 0,
size: 3
}'

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4e53acd6-d68a-43fe-8340-72ae695e0060%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Timeouts on Node Stats API?

2014-03-14 Thread Xiao Yu


 Can you do a hot_threads while this is happening?


Just for good measure I also checked hot threads for blocking and waiting, 
nothing interesting there either.

::: 
[es1.global.search.sat.wordpress.com][7fiNB_thTk-GRDKe4yQITA][inet[/76.74.248.134:9300]]{dc=sat,
 
parity=1, master=false}

0.0% (0s out of 500ms) wait usage by thread 'Finalizer'
 10/10 snapshots sharing following 4 elements
   java.lang.Object.wait(Native Method)
   java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
   java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
   java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

0.0% (0s out of 500ms) wait usage by thread 'Signal Dispatcher'
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot

0.0% (0s out of 500ms) wait usage by thread 
'elasticsearch[es1.global.search.sat.wordpress.com][[timer]]'
 10/10 snapshots sharing following 2 elements
   java.lang.Thread.sleep(Native Method)
  
 
org.elasticsearch.threadpool.ThreadPool$EstimatedTimeThread.run(ThreadPool.java:511)
 
::: 
[es1.global.search.sat.wordpress.com][7fiNB_thTk-GRDKe4yQITA][inet[/76.74.248.134:9300]]{dc=sat,
 
parity=1, master=false}

0.0% (0s out of 500ms) block usage by thread 'Finalizer'
 10/10 snapshots sharing following 4 elements
   java.lang.Object.wait(Native Method)
   java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
   java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
   java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

0.0% (0s out of 500ms) block usage by thread 'Signal Dispatcher'
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot
 unique snapshot

0.0% (0s out of 500ms) block usage by thread 
'elasticsearch[es1.global.search.sat.wordpress.com][[timer]]'
 10/10 snapshots sharing following 2 elements
   java.lang.Thread.sleep(Native Method)
  
 
org.elasticsearch.threadpool.ThreadPool$EstimatedTimeThread.run(ThreadPool.java:511)
 
While all this is happening indexing operations start to see curl timeouts 
in our application logs.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7539ba57-f25c-4439-aac7-ea91712982e1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Basic Question on splitting data sources between 2 or more ES systems

2014-03-14 Thread michael . obrien
Forgive me but when you say feeders do you mean the LS actually processing 
the log? Can you run multiple LS's on the same log without having them trip 
over each other or end up with missing data read by the other LS first?

On Wednesday, March 12, 2014 3:12:04 PM UTC, Binh Ly wrote:

 Yes it could - although test it to see if it is acceptable to you. If it 
 becomes a problem, then you can always run multiple LS feeders one per ES 
 cluster and then just separate the config outputs individually.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/40b5a5f5-42b6-4fa1-baa4-5d27639b4563%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[hadoop] Performance in using Text vs. MapWritable

2014-03-14 Thread Brian Stempin
Hi,
I'm currently using the elasticsearch-hadoop component to load data into my 
ES cluster.  Currently, the ESOutputFormat will accept a MapWritable, 
Wrtiable or a Text that is already in JSON format.  My question:  Is there 
a performance advantage to using one over the other?

Thanks,
Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20302cc7-799f-4723-89db-3b050123d2bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: bool query with filter giving error

2014-03-14 Thread Clinton Gormley
You need to pass the search request a query, so just change the above to:

GET /_search
{ query: { filtered:  }, from: 0, size: 3 ...}




On 14 March 2014 14:55, Subhadip Bagui i.ba...@gmail.com wrote:

 Hi,

 I'm trying to run the below bool query with filter range to fetch all the
 node data with CLOUD_TYPE=AWS-EC2 and NODE_STATUS=ACTIVE.
 But I'm getting SearchPhaseExecutionException from elasticsearch. Please
 let me know the correct way to do this.


 curl -XPOST http://10.203.251.142:9200/aricloud/_search; -d
 '{
 filtered: {
 query: {
 bool: {
 must: [
 {
 term: {
 CLOUD_TYPE: AWS-EC2
 }
 },
 {
 term: {
 NODE_STATUS: ACTIVE
 }
 }
 ]
 }
 },
 filter: {
 range: {
 NODE_CREATE_TIME: {
 to: 2014-03-14 18:43:55,
 from: 2014-03-14 16:22:32
 }
 }
 }
 },
 sort: {
 NODE_ID: desc
 },
 from: 0,
 size: 3
 }'

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4e53acd6-d68a-43fe-8340-72ae695e0060%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4e53acd6-d68a-43fe-8340-72ae695e0060%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSTtR1ppgD_VopKPwXGZX3j3B_aQUMtHpeJxgA%3DX0qP3A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Timeouts on Node Stats API?

2014-03-14 Thread Clinton Gormley
Anything in the logs or slow logs?  You're sure slow GCs aren't impacting
performance?


On 14 March 2014 15:17, Xiao Yu m...@xyu.io wrote:


 Can you do a hot_threads while this is happening?


 Just for good measure I also checked hot threads for blocking and waiting,
 nothing interesting there either.

 ::: [es1.global.search.sat.wordpress.com
 ][7fiNB_thTk-GRDKe4yQITA][inet[/76.74.248.134:9300]]{dc=sat, parity=1,
 master=false}

 0.0% (0s out of 500ms) wait usage by thread 'Finalizer'
  10/10 snapshots sharing following 4 elements
java.lang.Object.wait(Native Method)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

 0.0% (0s out of 500ms) wait usage by thread 'Signal Dispatcher'
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot

 0.0% (0s out of 500ms) wait usage by thread 'elasticsearch[
 es1.global.search.sat.wordpress.com][[timer]]'
  10/10 snapshots sharing following 2 elements
java.lang.Thread.sleep(Native Method)

  
 org.elasticsearch.threadpool.ThreadPool$EstimatedTimeThread.run(ThreadPool.java:511)

 ::: [es1.global.search.sat.wordpress.com
 ][7fiNB_thTk-GRDKe4yQITA][inet[/76.74.248.134:9300]]{dc=sat, parity=1,
 master=false}

 0.0% (0s out of 500ms) block usage by thread 'Finalizer'
  10/10 snapshots sharing following 4 elements
java.lang.Object.wait(Native Method)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

 0.0% (0s out of 500ms) block usage by thread 'Signal Dispatcher'
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot
  unique snapshot

 0.0% (0s out of 500ms) block usage by thread 'elasticsearch[
 es1.global.search.sat.wordpress.com][[timer]]'
  10/10 snapshots sharing following 2 elements
java.lang.Thread.sleep(Native Method)

  
 org.elasticsearch.threadpool.ThreadPool$EstimatedTimeThread.run(ThreadPool.java:511)

 While all this is happening indexing operations start to see curl timeouts
 in our application logs.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/7539ba57-f25c-4439-aac7-ea91712982e1%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7539ba57-f25c-4439-aac7-ea91712982e1%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKS_m013zVwBhQ9ru2SeAAEiNCpWO7ruoC94WeovxxYAJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk Processor

2014-03-14 Thread ZenMaster80
David,

Sorry, I didn't quite follow, does it do the flushing automatically or am I 
supposed to tell it?

On Wednesday, March 12, 2014 4:05:49 PM UTC-4, David Pilato wrote:

 It also flush docs after a given time, let's say every 5 seconds.
 BTW there is a small issue which basically flush the Bulk every n-1 docs 
 instead of n.

 Fix is on the way.

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 12 mars 2014 à 20:51, ZenMaster80 sabda...@gmail.com javascript: a 
 écrit :


 I don't quite undertsand what the bulk processor is doing this, I would 
 like someone to explain how it is upposed to work to make sure I designed 
 this correctly.
 I specify the number of actions 1000.
 my feeder keeos pushing documents to it Its more like a loop iterating 
 documents folders, and I push eash document to the bulk. I expected the 
 bulk to queue things until it reaches 1000 docs, then processes the bulk?

 Yet, this is how it logs, thie comes from the call back functions of the 
 bulk processor.


 Bulk Called: ID= 1, Actions=33, MB=5.46250
 Bulk Called: ID= 2, Actions=29, MB=5.51660
 Bulk Succeeded: ID= 1, took= 921 ms
 Bulk Called: ID= 3, Actions=12, MB=5.691812
 Bulk Succeeded: ID= 2, took= 1526 ms

 .



 Bulk Called: ID= 23, Actions=8, MB=5.45294
 Bulk Succeeded: ID= 23, took= 751 ms
 Bulk Called: ID= 24, Actions=19, MB=5.383918
 Bulk Succeeded: ID= 24, took= 331 ms
 Bulk Called: ID= 25, Actions=22, MB=5.347542
 Bulk Succeeded: ID= 25, took= 694 ms
 Bulk Called: ID= 26, Actions=58, MB=5.249195
 Bulk Succeeded: ID= 26, took= 583 ms
 Bulk Called: ID= 27, Actions=89, MB=5.244396
 Bulk Succeeded: ID= 27, took= 588 ms.


 Bulk Called: ID= 47, Actions=17, MB=5.245771 ...


 Bulk Succeeded: ID= 47, took= 431 ms

 Finished Processing the whole thing




  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/3f06e4bc-eb79-4dd8-b987-1bf86c028062%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/3f06e4bc-eb79-4dd8-b987-1bf86c028062%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9cb96ece-d30d-49a2-bcb4-bb09098094fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk Processor

2014-03-14 Thread David Pilato
It does it automatically.

You just have to properly call .close() when you stop you application.
It will process the pending requests before actually exiting.



-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 14 mars 2014 à 15:50:48, ZenMaster80 (sabdall...@gmail.com) a écrit:

David,

Sorry, I didn't quite follow, does it do the flushing automatically or am I 
supposed to tell it?

On Wednesday, March 12, 2014 4:05:49 PM UTC-4, David Pilato wrote:
It also flush docs after a given time, let's say every 5 seconds.
BTW there is a small issue which basically flush the Bulk every n-1 docs 
instead of n.

Fix is on the way.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 12 mars 2014 à 20:51, ZenMaster80 sabda...@gmail.com a écrit :


I don't quite undertsand what the bulk processor is doing this, I would like 
someone to explain how it is upposed to work to make sure I designed this 
correctly.
I specify the number of actions 1000.
my feeder keeos pushing documents to it Its more like a loop iterating 
documents folders, and I push eash document to the bulk. I expected the bulk 
to queue things until it reaches 1000 docs, then processes the bulk?

Yet, this is how it logs, thie comes from the call back functions of the bulk 
processor.


Bulk Called: ID= 1, Actions=33, MB=5.46250
Bulk Called: ID= 2, Actions=29, MB=5.51660
Bulk Succeeded: ID= 1, took= 921 ms
Bulk Called: ID= 3, Actions=12, MB=5.691812
Bulk Succeeded: ID= 2, took= 1526 ms

.



Bulk Called: ID= 23, Actions=8, MB=5.45294
Bulk Succeeded: ID= 23, took= 751 ms
Bulk Called: ID= 24, Actions=19, MB=5.383918
Bulk Succeeded: ID= 24, took= 331 ms
Bulk Called: ID= 25, Actions=22, MB=5.347542
Bulk Succeeded: ID= 25, took= 694 ms
Bulk Called: ID= 26, Actions=58, MB=5.249195
Bulk Succeeded: ID= 26, took= 583 ms
Bulk Called: ID= 27, Actions=89, MB=5.244396
Bulk Succeeded: ID= 27, took= 588 ms.


Bulk Called: ID= 47, Actions=17, MB=5.245771 ...


Bulk Succeeded: ID= 47, took= 431 ms

Finished Processing the whole thing




--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3f06e4bc-eb79-4dd8-b987-1bf86c028062%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9cb96ece-d30d-49a2-bcb4-bb09098094fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.532319bc.1190cde7.1ccf%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: Timeouts on Node Stats API?

2014-03-14 Thread Xiao Yu


 Anything in the logs or slow logs?  You're sure slow GCs aren't impacting 
 performance?


There's nothing in the logs on the node other slow queries before or during 
the problematic period. The slow query logs show the same types of queries 
that we know to be slow (MLT queries with function rescores) and are shown. 
As a percentage of queries executed the there is no bump in number of slow 
queried before or during the problematic period. (Yes during the 
problematic period the broken node actually executes and returns query 
requests, it's not clear to me if it's simply routing queries to other 
nodes or if it's shards are actually executing queries as well.)

In addition before the problem occurs there is no increase in ES threads, 
heap or non heap memory use and the number of GC cycles remained consistent 
at about once every 2 seconds on the node. There are no long GC cycles and 
the node never drops from the cluster. During the problematic period the 
cluster reports that it's in a green state however all of our logging 
indicates that no indexing operations complete cluster wide (we should be 
seeing 100-500 / sec under normal load).

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/da65980b-d027-4ec2-9576-1f51a3dc6037%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problem with configuring index template via file

2014-03-14 Thread Sergey Zemlyanoy
So is it enough data in my custom template in order to expect that new 
index will absorb its settings? Should I point out template's name in a 
dedicated field?

On Thursday, March 13, 2014 9:55:13 PM UTC+2, Binh Ly wrote:

 You won't see your template in the list API, but if you create a new index 
 named logstash-whatever, it should take effect properly unless it is 
 overriden by another template with a higher order.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0a39e19f-875a-4e08-b7e4-49ecb722a079%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [hadoop] Performance in using Text vs. MapWritable

2014-03-14 Thread Costin Leau

Hey,

There is but in the big picture it doesn't make any difference. If the data is already in JSON format then es-hadoop can 
stream the data directly without having to do any conversion. With a data (MapWritable,Writable) the map has to be 
converted into JSON - note that this process is quite efficient and uses the same amount of memory no matter the number 
of documents/maps.

Consider Hadoop batch nature I would not worry about choosing one over the 
other but rather focus on ease of use.

If the data is in JSON or you want ultimate control over what data is sent to Elasticsearch, then JSON is the way to go 
- the data is streamed as is.
If you don't use JSON and have data in various formats readable through Hadoop, then pick the MapWritable,Writable - 
it gives you maximum interoperability and you don't have to worry about transforming data into an intermediate format.


Hope this helps,

On 3/14/2014 4:46 PM, Brian Stempin wrote:

Hi,
I'm currently using the elasticsearch-hadoop component to load data into my ES 
cluster.  Currently, the ESOutputFormat
will accept a MapWritable, Wrtiable or a Text that is already in JSON format. 
 My question:  Is there a performance
advantage to using one over the other?

Thanks,
Brian

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/20302cc7-799f-4723-89db-3b050123d2bd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/20302cc7-799f-4723-89db-3b050123d2bd%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53232046.4080206%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES-Hive

2014-03-14 Thread Costin Leau

Without any extra information I'm afraid I can only guess what might be the 
issue.
Make sure you have the latest Elasticsearch 0.90 or 1.x available on port 9200 
with the HTTP/REST port open.
Also make sure that Hive actually runs on the same machine - not just the client but also the server (meaning Hadoop 
itself).


You indicate that if you change the network configuration you get an error 
regarding the version - this suggests that:

1. Hive is actually running on a different machine than ES - hence the network 
error
2. After pointing Hive to the actual ES machine, you get an error since you're 
using an old Elasticsearch version (0.20)

Cheers,

On 3/14/2014 12:19 AM, P lva wrote:

I have a simple query
insert into table eslogs select * from eslogs_ext;
Hive and elasticsearch are running on the same host.

To execute the script I'm following the directions from the link.
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html

There are two elasticsearch nodes, and they can recognize each other (as 
indicated by start up process) , but why would
hive not be able to pick them up ? Can you explain what could have gone wrong ?


On Thursday, March 13, 2014 4:28:14 PM UTC-5, Costin Leau wrote:

What does your Hive script look like? Can you confirm the ip/address of 
your Hive and Elasticsearch ? How are you
executing the script?
The error indicates an error in your network configuration.

Cheers,

P.S. Feel free to post a gist or whatever it's convenient.

On 3/13/2014 10:38 PM, P lva wrote:
 Hi, I have few weblogs in a hive table that I'd like to visualize in 
kibana.
 ES is on the same node as hive server.

 Followed directions from this 
pagehttp://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html

http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html

 I can create a table  using esstorage handler, but when I tried to ingest 
data into this table I got

 Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing
 row {***first row of my table**}
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
Error while processing row {*** first row of
 my table**}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: Out of nodes and retries; caught exception
  at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:652)
  at 
org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
  at 
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
  at 
org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
  at 
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
  at 
org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
  at 
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
  at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
  ... 9 more
 Caused by: java.io.IOException: Out of nodes and retries; caught exception
  at 
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
  at 
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
  at 
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
  at 
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
  at 
org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
  at 
org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
  at 
org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
  at 
org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
  at 

Re: fielddata breaker question

2014-03-14 Thread Dunaeth
Our nodes are running with these trace settings atm, is there a prefered 
way to provide those logs to you ?

Le vendredi 14 mars 2014 03:43:49 UTC+1, Lee Hinman a écrit :

 On 3/13/14, 1:37 AM, Dunaeth wrote: 
  I tried to clear all caches to see if it could help but the fielddata 
  breaker estimated size is still skyrocketing... If it's not cache 
  issue, and it's linked with our data inserts, I can only think about 
  insert process or percolation queries. Any idea ? 
  

 Hi Danaeth, 

 Can you add the lines: 

 logger: 
   indices.fielddata.breaker: TRACE 
   index.fielddata: TRACE 
   common.breaker: TRACE 

 in logging.yml for your elasticsearch configuration and restart the 
 cluster? This will log information about the breaker estimation and 
 adjustment. If you can run some queries and attach the logs it would be 
 helpful in tracking down what's going on. 

 ;; Lee 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fcb17e77-0062-4107-a3c0-d635bc9174dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problem with configuring index template via file

2014-03-14 Thread Sergey Zemlyanoy
And I just tried to follow your advice:
placed custom template in /etc/logstash/conf.d/templates/logstash2.json

{
order : 2,
template : logstash-*,
settings : {
  index.number_of_replicas : 4
},
mappings : { }
  }

then created a new index 
curl -XPUT localhost:9200/logstash-111

and checked the result
# curl -XGET localhost:9200/logstash-111/_settings?pretty
{
  logstash-111 : {
settings : {
  index : {
uuid : l7HDeFfDS1arZr74ouwtyQ,
analysis : {
  analyzer : {
default : {
  type : standard,
  stopwords : _none_
}
  }
},
number_of_replicas : 1,
number_of_shards : 5,
refresh_interval : 5s,
version : {
  created : 151
}
  }
}
  }
}


And it look like custom setting didn't override (default number of replicas 
is 1)


On Friday, March 14, 2014 5:18:05 PM UTC+2, Sergey Zemlyanoy wrote:

 So is it enough data in my custom template in order to expect that new 
 index will absorb its settings? Should I point out template's name in a 
 dedicated field?

 On Thursday, March 13, 2014 9:55:13 PM UTC+2, Binh Ly wrote:

 You won't see your template in the list API, but if you create a new 
 index named logstash-whatever, it should take effect properly unless it 
 is overriden by another template with a higher order.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba98f7f5-5a25-4b1e-9da0-c564aaf8dc93%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Correction in the variable name for ES Hive Documentation in es.json parameter

2014-03-14 Thread Costin Leau

Thanks for reporting and sorry for the mistake. I've fixed this in master and 
corrected the existing docs.

Cheers!

On 3/14/2014 12:15 AM, Deepak Subhramanian wrote:

Hi,
I was struggling to load json document to ES from Hive. Later realised that 
there was a mistake in documentation.

CREATE EXTERNAL TABLE json(data STRING  
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#CO17-1)
STORED BY'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource'  =  '...',
   'es.json.input` ='yes'  
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#CO17-2);

Correct parameter  name is es.input.json instead of es.json.input and value is 
'true' instead of 'yes'

CREATE EXTERNAL TABLE json(data STRING  
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#CO17-1)
STORED BY'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource'  =  '...',
   'es.input.json' ='true'  
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#CO17-2);

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/26aada4e-4a71-4cbe-8ab7-d6d8efeb8b19%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/26aada4e-4a71-4cbe-8ab7-d6d8efeb8b19%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5323266F.2030905%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


query edition problem kibana

2014-03-14 Thread Phil gib
hello 
my context kibana + ES 0.90.11 + ES Head plugin

i have some problem with kibana query editor :
i have indexed my logs in ES, and displayed in kibana using basic queries . 
Perfect !

i need more complex queries  on a max_bitrate field ( integer) 

for example, i need this query --300  max_bitrate  1000 

using  ES head plugin it works i see the result, with raw json query   like 
this 
{query:{bool:{must:[{ 
range:{rglog3.max_bitrate:{gt:300,lt:1000}}}..

but then  how   to put in the query editor  of Kibana to get the query 
works ?  
i did not succeed 

any clue ,  or pointer to doc will be great because i cannot find 
thanks 

philippe

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac40cafe-1acd-458c-8699-908da4baa870%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [hadoop] Performance in using Text vs. MapWritable

2014-03-14 Thread Brian Stempin
It does, thanks.

Brian


On Fri, Mar 14, 2014 at 11:29 AM, Costin Leau costin.l...@gmail.com wrote:

 Hey,

 There is but in the big picture it doesn't make any difference. If the
 data is already in JSON format then es-hadoop can stream the data directly
 without having to do any conversion. With a data (MapWritable,Writable)
 the map has to be converted into JSON - note that this process is quite
 efficient and uses the same amount of memory no matter the number of
 documents/maps.
 Consider Hadoop batch nature I would not worry about choosing one over the
 other but rather focus on ease of use.

 If the data is in JSON or you want ultimate control over what data is sent
 to Elasticsearch, then JSON is the way to go - the data is streamed as is.
 If you don't use JSON and have data in various formats readable through
 Hadoop, then pick the MapWritable,Writable - it gives you maximum
 interoperability and you don't have to worry about transforming data into
 an intermediate format.

 Hope this helps,


 On 3/14/2014 4:46 PM, Brian Stempin wrote:

 Hi,
 I'm currently using the elasticsearch-hadoop component to load data into
 my ES cluster.  Currently, the ESOutputFormat
 will accept a MapWritable, Wrtiable or a Text that is already in JSON
 format.  My question:  Is there a performance
 advantage to using one over the other?

 Thanks,
 Brian

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to
 elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+
 unsubscr...@googlegroups.com.

 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/20302cc7-
 799f-4723-89db-3b050123d2bd%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/20302cc7-
 799f-4723-89db-3b050123d2bd%40googlegroups.com?utm_medium=
 emailutm_source=footer.

 For more options, visit https://groups.google.com/d/optout.


 --
 Costin


 --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit https://groups.google.com/d/
 topic/elasticsearch/hs-LJ6Le2AQ/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/53232046.4080206%40gmail.com.

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANB1ciC56FppVsL6tAha-oad%2BDGMP7cJMdZLPU1-RkRUN1qtkg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


WARN while updating index settings

2014-03-14 Thread Tomasz Romanczuk
Hi,
I'm using percolate index to store some queires. After node start (used 
java API) I try to update index settings:


client.admin().indices().prepareClose(INDEX_NAME).execute().actionGet();
UpdateSettingsRequestBuilder builder = 
client.admin().indices().prepareUpdateSettings();
builder.setIndices(INDEX_NAME);
builder.setSettings(createSettings());
builder.execute().actionGet();

client.admin().indices().prepareOpen(INDEX_NAME).execute().actionGet();

Everything works fine, changes are applied, but in my log I can see warning:

2014-03-14 16:55:40,896 WARN  [org.elasticsearch.index.indexing] 
[alerts_node] [_percolator][0] post listener 
[org.elasticsearch.index.percolator.PercolatorService$RealTimePercolat
orOperationListener@dd0099] failed
org.elasticsearch.ElasticSearchException: failed to parse query [299]
at 
org.elasticsearch.index.percolator.PercolatorExecutor.parseQuery(PercolatorExecutor.java:361)
at 
org.elasticsearch.index.percolator.PercolatorExecutor.addQuery(PercolatorExecutor.java:332)
at 
org.elasticsearch.index.percolator.PercolatorService$RealTimePercolatorOperationListener.postIndexUnderLock(PercolatorService.java:295)
at 
org.elasticsearch.index.indexing.ShardIndexingService.postIndexUnderLock(ShardIndexingService.java:140)
at 
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:594)
at 
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:492)
at 
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:703)
at 
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:224)
at 
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.lucene.store.AlreadyClosedException: this Analyzer is 
closed
at 
org.apache.lucene.analysis.Analyzer$ReuseStrategy.getStoredValue(Analyzer.java:368)
at 
org.apache.lucene.analysis.Analyzer$GlobalReuseStrategy.getReusableComponents(Analyzer.java:410)
at 
org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:173)
at 
org.elasticsearch.index.search.MatchQuery.parse(MatchQuery.java:203)
at 
org.elasticsearch.index.query.MatchQueryParser.parse(MatchQueryParser.java:163)
at 
org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:207)
at 
org.elasticsearch.index.query.BoolQueryParser.parse(BoolQueryParser.java:107)
at 
org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:207)
at 
org.elasticsearch.index.query.BoolQueryParser.parse(BoolQueryParser.java:107)
at 
org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:207)
at 
org.elasticsearch.index.query.BoolQueryParser.parse(BoolQueryParser.java:93)
at 
org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:207)
at 
org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:284)
at 
org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:255)
at 
org.elasticsearch.index.percolator.PercolatorExecutor.parseQuery(PercolatorExecutor.java:350)

What is the reason of this WARN and how can I avoid it?

Thanks for any help!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/148a02a9-4e20-442c-9bc8-0037283aa11d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES-Hive

2014-03-14 Thread Costin Leau
You should not change the configuration in Elasticsearch. By default, ES binds on all available interfaces - specifying 
an Ip:port is likely to restrict access rather than extend it.


To find out the configuration options in es-hadoop, look no further than the docs [1]. If the hive job is running on the 
same machine as elasticsearch, the default (localhost:9200) should be enough. As an alternative you can specify the 
public ip of your host: x.y.z.w


As a last step, enable TRACE logging (through log4j) on org.elasticsearch.hadoop package and see what comes out 
including network connectivity.


By the way, make sure you don't have any firewall or proxy set on your system which might be picked up automatically by 
the JVM.


[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/configuration.html

On 3/14/2014 6:50 PM, P lva wrote:

Costin,

About what you asked,
1) Hive server is running on the same machine as elasticsearch,
2) I get a response when I do a curl http://elasticsearchivehost:9200


I feel I'm missing something simple. This is what I've got until now.

1) First error (connection refused) is because i left the default settings as 
is.

I changed the 'network.host' in elasticsearch.yml to the hostname

2) Second error (cannot discover elasticsearch version) is when I changed the 
'network.host' in elasticsearch.yml to the
hostname.

Looked at the source code and figured I'm expected to pass eshost:port to the 
hive table at table creation.
(https://github.com/elasticsearch/elasticsearch-hadoop/blob/master/src/main/java/org/elasticsearch/hadoop/hive/EsStorageHandler.java#L48)
.
So I included 'es.resource.write'='http://elasticsearchhost:9200' as one of the 
table properties during table creation step.

STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'esdemo/hive',
   'es.mapping.name'='time:@timestamp',
   'es.resource.write'='http://eshost:9200');

Now I get connection refused again. Is that the right way to pass that 
information ?

Thanks for you patience and help




On Friday, March 14, 2014 10:33:14 AM UTC-5, Costin Leau wrote:

Without any extra information I'm afraid I can only guess what might be the 
issue.
Make sure you have the latest Elasticsearch 0.90 or 1.x available on port 
9200 with the HTTP/REST port open.
Also make sure that Hive actually runs on the same machine - not just the 
client but also the server (meaning Hadoop
itself).

You indicate that if you change the network configuration you get an error 
regarding the version - this suggests that:

1. Hive is actually running on a different machine than ES - hence the 
network error
2. After pointing Hive to the actual ES machine, you get an error since 
you're using an old Elasticsearch version
(0.20)

Cheers,

On 3/14/2014 12:19 AM, P lva wrote:
 I have a simple query
 insert into table eslogs select * from eslogs_ext;
 Hive and elasticsearch are running on the same host.

 To execute the script I'm following the directions from the link.

http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html

http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html

 There are two elasticsearch nodes, and they can recognize each other (as 
indicated by start up process) , but why would
 hive not be able to pick them up ? Can you explain what could have gone 
wrong ?


 On Thursday, March 13, 2014 4:28:14 PM UTC-5, Costin Leau wrote:

 What does your Hive script look like? Can you confirm the ip/address 
of your Hive and Elasticsearch ? How are you
 executing the script?
 The error indicates an error in your network configuration.

 Cheers,

 P.S. Feel free to post a gist or whatever it's convenient.

 On 3/13/2014 10:38 PM, P lva wrote:
  Hi, I have few weblogs in a hive table that I'd like to visualize 
in kibana.
  ES is on the same node as hive server.
 
  Followed directions from this 
pagehttp://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html

http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html
 
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html

http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html
 
  I can create a table  using esstorage handler, but when I tried to 
ingest data into this table I got
 
  Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing
  row {***first row of my table**}
  at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
   at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at 

Re: query edition problem kibana

2014-03-14 Thread Binh Ly
You'll need to use the query_string syntax:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

It should be something like:

rglog3.max_bitrate:[301 TO 999]

or:

rglog3.max_bitrate:300 AND  rglog3.max_bitrate:1000

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0353c6c5-6c54-4c43-ab1a-4c51e7f66843%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: WARN while updating index settings

2014-03-14 Thread Binh Ly
Can you perhaps have a simple reproducible sequence with real data. I'm 
trying to reproduce but I'm not sure what settings you are changing and 
what data did you have prior to closing the index.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/41b28de0-767c-42bb-b84b-b19da5acd982%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problem with configuring index template via file

2014-03-14 Thread Binh Ly
I can think of 2 reasons:

1) When you start elasticsearch, it is not pointing to the conf directory 
that your are using

or

2) There is another template that overrides yours

I'm thinking #1 is more likely. How are you running ES, i.e., how is it 
installed and ran?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9010cf90-fe16-49a0-98a1-d4662246bcf2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Basic Question on splitting data sources between 2 or more ES systems

2014-03-14 Thread Binh Ly
You can run different instances of LS each with its own config file. When 
you define your file input, just point it to a unique since_db location 
(that's different for each instance)

http://logstash.net/docs/1.3.3/inputs/file#sincedb_path

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5229af0a-cfbd-490d-a703-84b71e7e9d5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Occational client.transport.NoNodeAvailableException

2014-03-14 Thread Binh Ly
I'm curious, is there anything else in the es log files? Also are you 
running on EC2 micro instances?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/61021936-8052-49b8-bcbf-495edfb173df%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch : No ElasticSearch node is available Exception

2014-03-14 Thread Mark Betz
Just wanted to say thanks for posting this, Igor. Solved a problem for me 
this afternoon, and it would have taken a long time to stumble on this by 
myself :).

On Friday, October 4, 2013 3:33:36 PM UTC-4, Igor Motov wrote:

 Was the spaces after : removed as a result of your key obfuscation or 
 they are indeed missing in your real config?

 access_key:**my_access_key**
 secret_key:**My_secret**

 Because if they are missing, it would explain the error message that you 
 are getting:

 [2013-08-18 19:24:11,710][INFO ][discovery.ec2] [Skald] 
 Exception while retrieving instance list from AWS API: Unable to load AWS 
 credentials from any provider in the chain

 On Friday, October 4, 2013 5:18:32 AM UTC-4, Bjørn Bråthen wrote:

 And, i'm using Play2-elasticsearch 0.7-SNAPSHOT by the way. 

 On Friday, October 4, 2013 8:27:57 AM UTC+2, Ali Emami wrote:

 hey Bjorn, how did you turn client.transport.sniff off? from 
 configuration or from the code?? thanks

 On Tuesday, August 20, 2013 10:04:44 AM UTC-4, Bjørn Bråthen wrote:

 It was a Play2-elasticsearch issue, client.transport.sniff was set to 
 true on default, turned client.transport.sniff to false and it worked.

 On Sunday, August 18, 2013 9:52:31 PM UTC+2, Bjørn Bråthen wrote:

 I'm trying to deploy *Elasticsearch* to *Amazon web services EC2*. I 
 have got it up and working for normal http REST requests from my own 
 computer using curl, but when I try to use the Java Client API 
 (w/Play2-elasticsearch) on the same computer it throws me this error:

 [error] application - ElasticSearch : No ElasticSearch node isavailable
 . Please check that your configuration is correct, that you ES server 
 is up and reachable from the network. Index has not been created 
 andprepared
 .
 org.elasticsearch.client.transport.NoNodeAvailableException: No node 
 available
  at org.elasticsearch.client.transport.TransportClientNodesService.
 execute(TransportClientNodesService.java:205) ~[elasticsearch-0.90.
 3.jar:na]
  at org.elasticsearch.client.transport.support.
 InternalTransportIndicesAdminClient.execute(
 InternalTransportIndicesAdminClient.java:85) ~[elasticsearch-0.90.
 3.jar:na]
  at org.elasticsearch.client.support.AbstractIndicesAdminClient.exists
 (AbstractIndicesAdminClient.java:147) ~[elasticsearch-0.90.3.jar:na]
  at org.elasticsearch.action.admin.indices.exists.indices.
 IndicesExistsRequestBuilder.doExecute(IndicesExistsRequestBuilder.java
 :43) ~[elasticsearch-0.90.3.jar:na]
  at org.elasticsearch.action.ActionRequestBuilder.execute(
 ActionRequestBuilder.java:85) ~[elasticsearch-0.90.3.jar:na]
  at org.elasticsearch.action.ActionRequestBuilder.execute(
 ActionRequestBuilder.java:59) ~[elasticsearch-0.90.3.jar:na]
 [info] application - Application has started
 [error] application - 


 I've opened both inbound *9300* and *9200* ports on my EC2, and this 
 is my *elasticsearch.yml*:

 cluster.name: portal
 node.name: Skald
 node.master: true
 node.data: true
 index.number_of_shards: 1 
 
 index.number_of_replicas: 0
 transport.tcp.port: 9300
 http.port: 9200
 discovery.zen.ping.multicast.enabled: false
 cloud.aws.region: us-west-2

 cloud:
 aws:
 access_key:**my_access_key**
 secret_key:**My_secret**
 discovery:
 type: ec2


 I additionally get this info-snippet in the elasticsearch console:
 [2013-08-18 19:24:11,710][INFO ][discovery.ec2] [Skald] 
 Exception while retrieving instance list from AWS API: Unable to load AWS 
 credentials from any provider in the chain


 In the play2-elasticsearch module I've set:
 elasticsearch.local=false
 elasticsearch.client=
 ec2-63-32-3-23.us-west-2.compute.amazonaws.com:9300 
 elasticsearch.cluster.name=portal 
 elasticsearch.index.name=index1,index2 
 elasticsearch.index.clazzs=indexing.* 
 elasticsearch.index.show_request=false


 Anybody got an idea, why this isn't working? 



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/13c718b5-7922-4f97-bfa1-c2ff299179e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Occational client.transport.NoNodeAvailableException

2014-03-14 Thread ZenMaster80
I will post logs in a bit. I plan to wun on EC2, but currently just running 
on a local machine i7, 4G Ram.

I had  int concurrentRequests =  Runtime.getRuntime().availableProcessors(); 
(Returns 8), 
If I change this value to just 1, I don't get the exception, but indexing 
performance slows down considerably. I am not sure if 8 requests is really 
overwhelming the node.

On Friday, March 14, 2014 3:58:21 PM UTC-4, Binh Ly wrote:

 I'm curious, is there anything else in the es log files? Also are you 
 running on EC2 micro instances?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1325173c-230d-416a-8a25-3b2201fa987a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Creating dynamic fields from a field

2014-03-14 Thread Pablo Musa
Hey guys,
I have the following problem: Given a title, I want to record that title
but I also want to record two fields for each word in the title, using the
words itself as part of the field name.
For example:
given the title The greatest band ever - Urban Legion I would like to
have a, document like:

{
  title:The greatest band ever - Urban Legion,
  greatest_x : 1,
  band_x : 1,
  ever_x : 1,
  Urban_x: 1,
  Legion_x : 1,
  greatest_y : [],
  band_y : [],
  ever_y : [],
  Urban_y: [],
  Legion_y : []
}

I was reading about dynamic mapping but I am not sure if I can acomplish
the above. Is it possible to do inside ES?

I could easily do it in an auxiliary application.

Thanks,
Pablo

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF6PhFJisN_puhY0xq-tvRe9gx0jLRReheRzjL9n_PfxFAZ7pQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Creating dynamic fields from a field

2014-03-14 Thread Binh Ly
No I don't believe you can automatically create fields based on the token 
values of another field. You'd probably have to do this outside of ES.

If it matters, you can call the _analyze API to produce the tokens before 
you inject your fields.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ae093293-b65f-4661-b54f-e614accae5bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Confusing highlight result when creating many tokens

2014-03-14 Thread Jon-Paul Lussier
Hey Elasticsearch, hopefully someone can at least explain if this is 
intentional and how it happens(I have had other fragment highlighting 
issues not unlike this)

The problem seems simple, I have a 64 character string that I generate 62 
tokens for. Whenever I search for the entire string, I end up getting the 
highlight applied to the 50th fragment instead of the one that actually 
most nearly matches my search query.

Also confusing is if I try a very similar search, trying to use an exact 
match on the SHA1 or MD5 attributes -- highlighting works like I'd expect 
it to.


Please see the gist 
here: https://gist.github.com/jonpaul/d4a9aa7f9c8741933cf5


Currently I'm using 1.0.0-BETA2 so this *may* be a fixed bug, sorry if 
that's the case, I couldn't find anything that matches my problem per se.

Thanks very much in advance for help anyone can provide!


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6ed73d7d-fef8-4052-92a1-df2779795519%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: 3,000 events/sec Architecture

2014-03-14 Thread Zachary Lammers
Eric, as an update, I hit OOM with a couple nodes in my cluster today w/ 
16gb ram for ES alone (each data node has 24gb ram) - I was running fine, 
but then I had users kick off regular searches to watch performance, and my 
indexing rates went from 35k/sec down to almost nothing (ran at a lesser 
rate for a while), but as more searches were performed, marvel was showing 
my JVM usage getting dangerously high on a few nodes (almost entirely my VM 
Data nodes, which I thought odd).

I had roughly 27 billion total docs (log events) in the cluster, with daily 
indexes of 3/1-3/14 (today).  Ended up trying to up the worst JVM node 
with a few more gig of ram, and it kinda hosed things, so I wiped it and 
starting with a slightly modified config.  I may wipe it again and try to 
get LS1.4b and ES1.01 going this weekend again, to see if i can keep my 
indexing rates high enough.

-z

On Wednesday, March 12, 2014 1:50:51 PM UTC-5, Eric wrote:

 Yes, currently logstash is reading files that syslog-ng created. We 
 already had the syslog-ng architecture in place so just kept rolling with 
 that.


 On Tuesday, March 11, 2014 11:16:42 PM UTC-4, Otis Gospodnetic wrote:

 Hi,

 Is that Logstash instance reading files that are produces by syslog-ng 
 servers?  Maybe not but if yes, have you considered using Rsyslog with 
 omelasticsearch instead to simplify the architecture?

 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


 On Tuesday, March 4, 2014 10:11:59 AM UTC-5, Eric wrote:

 Hello,

 I've been working on a POC for Logstash/ElasticSearch/Kibana for about 2 
 months now and everything has worked out pretty good and we are ready to 
 move it to production. Before building out the infrastructure, I want to 
 make sure my shard/node/index setup is correct as that is the main part 
 that I'm still a bit fuzzy on. Overall my setup is this:

 Servers
 Networking Gear 
 syslog-ng server
 End Points   -   Load Balancer 
     syslog-ng server  -- Logs 
 stored in 5 flat files on SAN storage
 Security Devices 
 syslog-ng server
 Etc.

 I have logstash running on one of the syslog-ng servers and is basically 
 reading the input of 5 different files and sending them to ElasticSearch. 
 So within ElasticSearch, I am creating 5 different indexes a day so I can 
 do granular user access control within Kibana.

 unix-$date
 windows-$date
 networking-$date
 security-$date
 endpoint-$date

 My plan is to have 3 ElasticSearch servers with ~10 gig of RAM each on 
 them. For my POC I have 2 and it's working fine for 2,000 events/second. My 
 main concern is how I setup the ElasticSearch servers so they are as 
 efficient as possible. With my 5 different indexes a day, and I plan on 
 keeping ~1 month of logs within ES, is 3 servers enough? Should I have 1 
 master node and the other 2 be just basic setups that are data and 
 searching? Also, will 1 replica be sufficient for this setup or should I do 
 2 to be safe? In my POC, I've had a few issues where I ran out of memory or 
 something weird happened and I lost data for a while so wanted to try to 
 limit that as much as possible. We'll also have quite a few users 
 potentially querying the system so I didn't know if I should setup a 
 dedicated search node for one of these.

 Besides the ES cluster, I think everything else should be fine. I have 
 had a few concerns about logstash keeping up with the amount of entries 
 coming into syslog-ng but haven't seen much in the way of load balancing 
 logstash or verifying if it's able to keep up or not. I've spot checked the 
 files quite a bit and everything seems to be correct but if there is a 
 better way to do this, I'm all ears.

 I'm going to have my KIbana instance installed on the master ES node, 
 which shouldn't be a big deal. I've played with the idea of putting the ES 
 servers on the syslog-ng servers and just have a separate NIC for the ES 
 traffic but didn't want to bog down the servers a whole lot. 

 Any thoughts or recommendations would be greatly appreciated.

 Thanks,
 Eric



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9f2c48fc-dd30-4f87-bb7f-15ac7c59b4b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Search query containing an equal character

2014-03-14 Thread Guillaume Loetscher
Hello,

I'm trying to do a query search containing an equal (=) character in it.

I've got plenty of logs looking like this :

22postfix/smtpd[9136]: E4A4E34AA5: client=localhost.localdomain[127.0.0.1]

I want to query all messages that haven't been posted from 
localhost.localdomain.

I've looked at the query 
documentationhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.htmlhere
 and tried multiple queries in Kibana and through a curl command, but 
no luck.

Right now, I've did this query : -client=localhost.localdomain, but no 
luck, it keeps giving me answers with this precise string.

I also tried to protect the = character with a backslash.

How is it possible to do a query search with this character ?

Thanks a lot,

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/535ac45a-6698-422e-848f-594a824032a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


preventing ngram on query term

2014-03-14 Thread ymous . anon1234
How would you disable the ngramming of the query term: 
match: {username.ngram: linus 

the indexer: 
tokenizer: {customNgram: {type: nGram, min_gram: 3, max_gram: 
5} 

I don't want hits for lin, inu, nus as in Xinus but I do want hits for 
tlinustorvalds 

In other words - I don't what the query term linus to be broken down to 
[4-2]-grams

Thanks very much.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/196100e8-478c-4eb0-b285-68c3a59c0dbc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES-Hive

2014-03-14 Thread P lva
I got it to work by replacing 
'es.resource.write'='http://eshost:9200'
'es.resource.node'='http://eshost:9200'

as one of the tbl properties in hive. 

Thanks for your help Costin. 


On Friday, March 14, 2014 1:12:00 PM UTC-5, Costin Leau wrote:

 You should not change the configuration in Elasticsearch. By default, ES 
 binds on all available interfaces - specifying 
 an Ip:port is likely to restrict access rather than extend it. 

 To find out the configuration options in es-hadoop, look no further than 
 the docs [1]. If the hive job is running on the 
 same machine as elasticsearch, the default (localhost:9200) should be 
 enough. As an alternative you can specify the 
 public ip of your host: x.y.z.w 

 As a last step, enable TRACE logging (through log4j) on 
 org.elasticsearch.hadoop package and see what comes out 
 including network connectivity. 

 By the way, make sure you don't have any firewall or proxy set on your 
 system which might be picked up automatically by 
 the JVM. 

 [1] 
 http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/configuration.html
  

 On 3/14/2014 6:50 PM, P lva wrote: 
  Costin, 
  
  About what you asked, 
  1) Hive server is running on the same machine as elasticsearch, 
  2) I get a response when I do a curl http://elasticsearchivehost:9200 
  
  
  I feel I'm missing something simple. This is what I've got until now. 
  
  1) First error (connection refused) is because i left the default 
 settings as is. 
  
  I changed the 'network.host' in elasticsearch.yml to the hostname 
  
  2) Second error (cannot discover elasticsearch version) is when I 
 changed the 'network.host' in elasticsearch.yml to the 
  hostname. 
  
  Looked at the source code and figured I'm expected to pass eshost:port 
 to the hive table at table creation. 
  (
 https://github.com/elasticsearch/elasticsearch-hadoop/blob/master/src/main/java/org/elasticsearch/hadoop/hive/EsStorageHandler.java#L48)
  

  . 
  So I included 'es.resource.write'='http://elasticsearchhost:9200' as 
 one of the table properties during table creation step. 
  
  STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' 
  TBLPROPERTIES('es.resource' = 'esdemo/hive', 
 'es.mapping.name'='time:@timestamp', 
 'es.resource.write'='http://eshost:9200'); 
  
  Now I get connection refused again. Is that the right way to pass that 
 information ? 
  
  Thanks for you patience and help 
  
  
  
  
  On Friday, March 14, 2014 10:33:14 AM UTC-5, Costin Leau wrote: 
  
  Without any extra information I'm afraid I can only guess what might 
 be the issue. 
  Make sure you have the latest Elasticsearch 0.90 or 1.x available on 
 port 9200 with the HTTP/REST port open. 
  Also make sure that Hive actually runs on the same machine - not 
 just the client but also the server (meaning Hadoop 
  itself). 
  
  You indicate that if you change the network configuration you get an 
 error regarding the version - this suggests that: 
  
  1. Hive is actually running on a different machine than ES - hence 
 the network error 
  2. After pointing Hive to the actual ES machine, you get an error 
 since you're using an old Elasticsearch version 
  (0.20) 
  
  Cheers, 
  
  On 3/14/2014 12:19 AM, P lva wrote: 
   I have a simple query 
   insert into table eslogs select * from eslogs_ext; 
   Hive and elasticsearch are running on the same host. 
   
   To execute the script I'm following the directions from the link. 
  
 http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html 
  
 http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html 

   
   There are two elasticsearch nodes, and they can recognize each 
 other (as indicated by start up process) , but why would 
   hive not be able to pick them up ? Can you explain what could have 
 gone wrong ? 
   
   
   On Thursday, March 13, 2014 4:28:14 PM UTC-5, Costin Leau wrote: 
   
   What does your Hive script look like? Can you confirm the 
 ip/address of your Hive and Elasticsearch ? How are you 
   executing the script? 
   The error indicates an error in your network configuration. 
   
   Cheers, 
   
   P.S. Feel free to post a gist or whatever it's convenient. 
   
   On 3/13/2014 10:38 PM, P lva wrote: 
Hi, I have few weblogs in a hive table that I'd like to 
 visualize in kibana. 
ES is on the same node as hive server. 

Followed directions from this pagehttp://
 www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html 
  
 http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html 

   
 http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html 
  
 http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html
  

Re: preventing ngram on query term

2014-03-14 Thread David Pilato
In 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html:

The analyzer can be set to control which analyzer will perform the analysis 
process on the text. It default to the field explicit mapping definition, or 
the default search analyzer.

Also you can define in your mapping a different analyzer for indexing and 
searching:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html

index_analyzer

The analyzer used to analyze the text contents when analyzed during indexing.

search_analyzer

The analyzer used to analyze the field when part of a query string. Can be 
updated on an existing field.


--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 15 mars 2014 à 00:08, ymous.anon1...@gmail.com a écrit :

How would you disable the ngramming of the query term: 
match: {username.ngram: linus 

the indexer: 
tokenizer: {customNgram: {type: nGram, min_gram: 3, max_gram: 5} 

I don't want hits for lin, inu, nus as in Xinus but I do want hits for 
tlinustorvalds 

In other words - I don't what the query term linus to be broken down to 
[4-2]-grams

Thanks very much.
-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/196100e8-478c-4eb0-b285-68c3a59c0dbc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/57EE8F2E-AE14-4FA2-B1E7-E545AFF63134%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.