filter bitsets

2014-05-12 Thread slushi
I was reading this blog post about filter bitsets
http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

At the end, the conclusion is that the bool filter should be used for 
everything but geo, numeric range and custom script. However the example 
query at the end seems to include a range in the bool filter. Is this a 
typo or am I missing something? Also what about query filters? I would 
assume those wouldn't go in the bool filter either?

{
  and : [
{
  bool : {
must : [
  { term : {} },
  { range : {} },
  { term : {} }
]
  }
},
{ custom_script : {} },
{ geo_distance : {} }
  ]
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c50efe8-dce3-4cfc-a78f-ab0417884201%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Lot of GC in elasticsearch node.

2014-05-12 Thread Abhishek Tiwari
My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem). 
Elasticsearch starts as-

498  31810 99.6 64.6 163846656 4976944 ?   Sl   06:03  26:10 
/usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch 
-Des.pidfile=/var/run/elasticsearch/elasticsearch.pid 
-Des.path.home=/usr/share/elasticsearch -cp 
:/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
 
-Des.default.path.home=/usr/share/elasticsearch 
-Des.default.path.logs=/var/log/elasticsearch 
-Des.default.path.data=/var/lib/elasticsearch 
-Des.default.path.work=/tmp/elasticsearch 
-Des.default.path.conf=/etc/elasticsearch 
org.elasticsearch.bootstrap.Elasticsearch


The node stopped responding (the ip:9200 status page), and so did kibana. 
It started working fine on a restart.
i have logstash format docs wherein the index rotates daily. 
Stats:
  Daily: ~11G docs, ~15  million.
  Total: 195G docs, ~300 million.

The logs of the time when it stopped responding are-

[2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal King] 
[logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
[2014-05-12 03:40:52,293][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s], total 
[6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] 
[150.3mb]-[1.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
[3.6gb]-[3.6gb]/[3.6gb]}
[2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal King] 
[logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
[2014-05-12 03:45:32,191][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s], total 
[5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] 
[197.4mb]-[9.3mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
[3.5gb]-[3.6gb]/[3.6gb]}
[2014-05-12 04:06:01,224][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][241926][35985] duration [6s], collections [1]/[6.2s], total 
[6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] 
[134.7mb]-[9.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
[3.6gb]-[3.5gb]/[3.6gb]}
[2014-05-12 04:08:14,473][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s], total 
[5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] 
[165.1mb]-[2.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
[3.6gb]-[3.6gb]/[3.6gb]}
[2014-05-12 04:09:07,473][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][242096][36011] duration [6.2s], collections [1]/[6.7s], total 
[6.2s]/[4.7h], memory [3.9gb]-[3.6gb]/[3.9gb], all_pools {[young] 
[265.9mb]-[2.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
[3.6gb]-[3.5gb]/[3.6gb]}
[2014-05-12 04:10:08,387][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][242152][36020] duration [5.4s], collections [1]/[5.6s], total 
[5.4s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] 
[176.5mb]-[5.8mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
[3.6gb]-[3.5gb]/[3.6gb]}
[2014-05-12 04:13:12,774][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][242326][36046] duration [5.6s], collections [1]/[5.8s], total 
[5.6s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] 
[167.4mb]-[12.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
[3.6gb]-[3.5gb]/[3.6gb]}
[2014-05-12 04:14:22,729][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][242386][36057] duration [6.3s], collections [1]/[6.5s], total 
[6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] 
[224.2mb]-[3.5mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
[3.6gb]-[3.6gb]/[3.6gb]}
[2014-05-12 04:15:12,192][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][242431][36064] duration [5.2s], collections [1]/[5.4s], total 
[5.2s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] 
[234mb]-[2.4mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
[3.6gb]-[3.6gb]/[3.6gb]}
[2014-05-12 04:15:32,344][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][242445][36067] duration [6.3s], collections [1]/[7.1s], total 
[6.3s]/[4.7h], memory [3.6gb]-[3.7gb]/[3.9gb], all_pools {[young] 
[1.2mb]-[34.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
[3.6gb]-[3.6gb]/[3.6gb]}
[2014-05-12 04:15:39,627][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][242446][36068] duration [6.7s], collections [1]/[7.2s], total 
[6.7s]/[4.7h], memory [3.7gb]-[3.7gb]/[3.9gb], all_pools {[young] 
[34.7mb]-[45.7mb]/[266.2mb]}{[survivor] [0b]-[0b]/[33.2mb]}{[old] 
[3.6gb]-[3.6gb]/[3.6gb]}
[2014-05-12 04:15:51,547][INFO ][monitor.jvm  ] [Hannibal King] 
[gc][old][242448][36070] duration 

Re: filter bitsets

2014-05-12 Thread slushi
also one followup question. if i do a terms filter and then a query filter, 
should put the terms filter in a bool with a single clause? It seems 
strange to do so, but he following passage leads made me wonder if this is 
the case:

It matters because the Bool filter utilizes BitSets while the And/Or/Not 
 filters do not. If you put a Terms Filter inside of an And…no BitSet will 
 be used, even though it exists.






On Monday, May 12, 2014 2:14:16 AM UTC-4, slushi wrote:

 I was reading this blog post about filter bitsets
 http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

 At the end, the conclusion is that the bool filter should be used for 
 everything but geo, numeric range and custom script. However the example 
 query at the end seems to include a range in the bool filter. Is this a 
 typo or am I missing something? Also what about query filters? I would 
 assume those wouldn't go in the bool filter either?

 {
   and : [
 {
   bool : {
 must : [
   { term : {} },
   { range : {} },
   { term : {} }
 ]
   }
 },
 { custom_script : {} },
 { geo_distance : {} }
   ]
 }



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/71c46a63-8608-4c78-9176-8ef430ada278%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Lot of GC in elasticsearch node.

2014-05-12 Thread Mark Walkom
You need to reduce your data size, add more memory or add another node.

Basically, you've reached the limits of that node.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 12 May 2014 16:38, Abhishek Tiwari erb...@gmail.com wrote:

 My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem).
 Elasticsearch starts as-

 498  31810 99.6 64.6 163846656 4976944 ?   Sl   06:03  26:10
 /usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
 -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid
 -Des.path.home=/usr/share/elasticsearch -cp
 :/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
 -Des.default.path.home=/usr/share/elasticsearch
 -Des.default.path.logs=/var/log/elasticsearch
 -Des.default.path.data=/var/lib/elasticsearch
 -Des.default.path.work=/tmp/elasticsearch
 -Des.default.path.conf=/etc/elasticsearch
 org.elasticsearch.bootstrap.Elasticsearch


 The node stopped responding (the ip:9200 status page), and so did kibana.
 It started working fine on a restart.
 i have logstash format docs wherein the index rotates daily.
 Stats:
   Daily: ~11G docs, ~15  million.
   Total: 195G docs, ~300 million.

 The logs of the time when it stopped responding are-

 [2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal
 King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
 [2014-05-12 03:40:52,293][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s],
 total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [150.3mb]-[1.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal
 King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
 [2014-05-12 03:45:32,191][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s],
 total [5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [197.4mb]-[9.3mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.5gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:06:01,224][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][241926][35985] duration [6s], collections [1]/[6.2s], total
 [6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [134.7mb]-[9.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:08:14,473][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s],
 total [5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [165.1mb]-[2.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:09:07,473][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242096][36011] duration [6.2s], collections [1]/[6.7s],
 total [6.2s]/[4.7h], memory [3.9gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [265.9mb]-[2.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:10:08,387][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242152][36020] duration [5.4s], collections [1]/[5.6s],
 total [5.4s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young]
 [176.5mb]-[5.8mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:13:12,774][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242326][36046] duration [5.6s], collections [1]/[5.8s],
 total [5.6s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young]
 [167.4mb]-[12.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:14:22,729][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242386][36057] duration [6.3s], collections [1]/[6.5s],
 total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [224.2mb]-[3.5mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:15:12,192][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242431][36064] duration [5.2s], collections [1]/[5.4s],
 total [5.2s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [234mb]-[2.4mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:15:32,344][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242445][36067] duration [6.3s], collections [1]/[7.1s],
 total [6.3s]/[4.7h], memory [3.6gb]-[3.7gb]/[3.9gb], all_pools {[young]
 [1.2mb]-[34.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:15:39,627][INFO ][monitor.jvm  ] [Hannibal
 King] 

Re: Lot of GC in elasticsearch node.

2014-05-12 Thread Abhishek Tiwari


 add more memory


i am doing 15 million docs, which total to ~9G. The average doc size is 
~2KB.

1. How much memory would you suggest for my use-case?
2.Also, is it prudent for me to have half of OS memory dedicated to 
elasticsearch?


On Monday, 12 May 2014 14:03:19 UTC+5:30, Mark Walkom wrote:

 You need to reduce your data size, add more memory or add another node.

 Basically, you've reached the limits of that node.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com
  

 On 12 May 2014 16:38, Abhishek Tiwari erb...@gmail.com javascript:wrote:

 My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem). 
 Elasticsearch starts as-

 498  31810 99.6 64.6 163846656 4976944 ?   Sl   06:03  26:10 
 /usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch 
 -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid 
 -Des.path.home=/usr/share/elasticsearch -cp 
 :/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
  
 -Des.default.path.home=/usr/share/elasticsearch 
 -Des.default.path.logs=/var/log/elasticsearch 
 -Des.default.path.data=/var/lib/elasticsearch 
 -Des.default.path.work=/tmp/elasticsearch 
 -Des.default.path.conf=/etc/elasticsearch 
 org.elasticsearch.bootstrap.Elasticsearch


 The node stopped responding (the ip:9200 status page), and so did kibana. 
 It started working fine on a restart.
 i have logstash format docs wherein the index rotates daily. 
 Stats:
   Daily: ~11G docs, ~15  million.
   Total: 195G docs, ~300 million.

 The logs of the time when it stopped responding are-

 [2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal 
 King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
 [2014-05-12 03:40:52,293][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s], 
 total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [150.3mb]-[1.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
 [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal 
 King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
 [2014-05-12 03:45:32,191][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s], 
 total [5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [197.4mb]-[9.3mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
 [3.5gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:06:01,224][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][241926][35985] duration [6s], collections [1]/[6.2s], total 
 [6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [134.7mb]-[9.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
 [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:08:14,473][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s], 
 total [5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [165.1mb]-[2.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
 [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:09:07,473][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][242096][36011] duration [6.2s], collections [1]/[6.7s], 
 total [6.2s]/[4.7h], memory [3.9gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [265.9mb]-[2.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
 [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:10:08,387][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][242152][36020] duration [5.4s], collections [1]/[5.6s], 
 total [5.4s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] 
 [176.5mb]-[5.8mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
 [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:13:12,774][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][242326][36046] duration [5.6s], collections [1]/[5.8s], 
 total [5.6s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] 
 [167.4mb]-[12.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
 [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:14:22,729][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][242386][36057] duration [6.3s], collections [1]/[6.5s], 
 total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [224.2mb]-[3.5mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
 [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:15:12,192][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][242431][36064] duration [5.2s], collections [1]/[5.4s], 
 total [5.2s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [234mb]-[2.4mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] 
 [3.6gb]-[3.6gb]/[3.6gb]}
 

Retrieve mapping of an index using pyes library

2014-05-12 Thread nishidha randad
Hello,

I need a quick help. I need to know how to get mapping of an index using 
pyes library. I know a CURL command to achieve the same but I need an API. 
Alternatively, if I can get some API to get index name for a given alias, 
that would also do. I found get_alias(alias) but it is a deprecated in pyes 
0.19 version. Please help me.

Thanks,
Nishidha

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1bcc324a-968d-4623-b2c3-46fe1bc5274b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Lot of GC in elasticsearch node.

2014-05-12 Thread Mark Walkom
It's standard practise to use 50% of system memory for the heap.

How much RAM you need depends on how long you want to keep your data around
for. So, given you have ~200GB now on 4GB of RAM, you can probably
extrapolate that out based on your needs.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 12 May 2014 19:33, Abhishek Tiwari erb...@gmail.com wrote:

 add more memory


 i am doing 15 million docs, which total to ~9G. The average doc size is
 ~2KB.

 1. How much memory would you suggest for my use-case?
 2.Also, is it prudent for me to have half of OS memory dedicated to
 elasticsearch?


 On Monday, 12 May 2014 14:03:19 UTC+5:30, Mark Walkom wrote:

 You need to reduce your data size, add more memory or add another node.

 Basically, you've reached the limits of that node.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 12 May 2014 16:38, Abhishek Tiwari erb...@gmail.com wrote:

 My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem).
 Elasticsearch starts as-

 498  31810 99.6 64.6 163846656 4976944 ?   Sl   06:03  26:10
 /usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:
 CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/
 elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch
 -cp :/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/
 share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
 -Des.default.path.home=/usr/share/elasticsearch
 -Des.default.path.logs=/var/log/elasticsearch
 -Des.default.path.data=/var/lib/elasticsearch
 -Des.default.path.work=/tmp/elasticsearch 
 -Des.default.path.conf=/etc/elasticsearch
 org.elasticsearch.bootstrap.Elasticsearch


 The node stopped responding (the ip:9200 status page), and so did
 kibana. It started working fine on a restart.
 i have logstash format docs wherein the index rotates daily.
 Stats:
   Daily: ~11G docs, ~15  million.
   Total: 195G docs, ~300 million.

 The logs of the time when it stopped responding are-

 [2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal
 King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
 [2014-05-12 03:40:52,293][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s],
 total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [150.3mb]-[1.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal
 King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
 [2014-05-12 03:45:32,191][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s],
 total [5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [197.4mb]-[9.3mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.5gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:06:01,224][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][241926][35985] duration [6s], collections [1]/[6.2s], total
 [6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [134.7mb]-[9.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:08:14,473][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s],
 total [5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [165.1mb]-[2.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:09:07,473][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242096][36011] duration [6.2s], collections [1]/[6.7s],
 total [6.2s]/[4.7h], memory [3.9gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [265.9mb]-[2.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:10:08,387][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242152][36020] duration [5.4s], collections [1]/[5.6s],
 total [5.4s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young]
 [176.5mb]-[5.8mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old]
 [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:13:12,774][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242326][36046] duration [5.6s], collections [1]/[5.8s],
 total [5.6s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young]
 [167.4mb]-[12.9mb]/[266.2mb]}{[survivor]
 [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:14:22,729][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242386][36057] duration [6.3s], collections [1]/[6.5s],
 total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [224.2mb]-[3.5mb]/[266.2mb]}{[survivor] 

Marvel: sudden errors with index_stats type

2014-05-12 Thread poohsen
Hi,

I'm sort of new to ES and currently evaluating how well it will work for 
storing and querying our log data.
I installed Marvel and it worked fine for a while but now I'm suddenly 
getting weird errors and the index stats are gone from Kibana:

[2014-05-09 13:35:43,035][ERROR][marvel.agent.exporter] [Bushmaster] create 
failure (index:[.marvel-2014.05.09] type: [index_stats]): 
MapperParsingException[failed to parse [index]]; nested: 
MapperParsingException[failed to parse date field [.marvel-2014.05.08], tried 
both date format [dateOptionalTime], and timestamp number with locale []]; 
nested: IllegalArgumentException[Invalid format: .marvel-2014.05.08];
[2014-05-09 13:35:43,037][ERROR][marvel.agent.exporter] [Bushmaster] create 
failure (index:[.marvel-2014.05.09] type: [index_stats]): 
MapperParsingException[failed to parse [index]]; nested: 
MapperParsingException[failed to parse date field [.marvel-2014.05.09], tried 
both date format [dateOptionalTime], and timestamp number with locale []]; 
nested: IllegalArgumentException[Invalid format: .marvel-2014.05.09];


I created a gist detailing the steps I took so far trying to understand/debug 
what's going on. While debugging, I realized that I'm unable to recreate a 
marvel index with the same mapping that it normally has (either created by 
marvel or by ES automapping). I detailed this in the gist, too:


https://gist.github.com/poohsen/a3d3bb319010bf0c5648


Since I didn't temper with Marvel indices to begin with, I don't really see how 
it suddenly could've gone wrong other than because of a bug.


Any hints are appreciated.


Regards,

Chris





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/61d54877-c232-4a50-ae5d-301316fcc19a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is it possible to highlight the text with respect to no. of lines instead of no. of fragments

2014-05-12 Thread Anand kumar
Thank u Nik for your reply.

On Friday, 9 May 2014 20:25:55 UTC+5:30, Nikolas Everett wrote:




 On Fri, May 9, 2014 at 8:29 AM, Anand kumar anand...@gmail.comjavascript:
  wrote:



 Am having an index of huge content, from which I just want to have the 
 highlighting of the specific text. 

 The highlighted text might have appear as many as times, all I want to 
 have two or more lines before and after the line of highlighted text, by 
 which i can have a snippet of text with highlighted parts are in the middle 
 of the snippet, so that they can be easily located and identified from a 
 huge content of file.

 Is it possible?


 The only segmentation options are based on characters, sentences, and 
 grabbing the contents of the whole field.  The trick with lines is, unless 
 the text contains explicit new lines and you only wrap on new lines, then 
 you have to estimate line breaks based on the rendering context.  Stuff 
 like width in pixels and the font.  If you want to be precise you need the 
 screen dpi as well and a font rendering engine that works similarly.  Some 
 contexts don't properly render ligatures, some do.  And its 1000x worse 
 when you leave English and go to something like Arabic or Sanskrit.  There 
 be dragons.

 But, if you are talking about code, or something else with explicit 
 newlines and that only wraps on newlines, then the answer is still no, but 
 it wouldn't be hard to implement.

 Nik
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/892cbfdf-5783-4a0b-a06c-328c0e3f2b72%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Lot of GC in elasticsearch node.

2014-05-12 Thread Abhishek Tiwari


 How much RAM you need depends on how long you want to keep your data 
 around for. So, given you have ~200GB now on 4GB of RAM, you can probably 
 extrapolate that out based on your needs.


Isn't my problem more with 9G *daily* index, than with total of 200G(20 
days x 9G) indexes?
Correct me if i am wrong here but doesn't kibana ask elasticsearch for just 
one day/week of indices(based on the query).
Will elasticsearch really care if i have 500 days of total day-wise 
segregated indices out there but am performing queries on *just past 7 days*? 
 
Is this a total-footprint problem or a daily throughput problem?



On Monday, 12 May 2014 15:30:36 UTC+5:30, Mark Walkom wrote:

 It's standard practise to use 50% of system memory for the heap.

 How much RAM you need depends on how long you want to keep your data 
 around for. So, given you have ~200GB now on 4GB of RAM, you can probably 
 extrapolate that out based on your needs.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 12 May 2014 19:33, Abhishek Tiwari erb...@gmail.com javascript:wrote:

 add more memory


 i am doing 15 million docs, which total to ~9G. The average doc size is 
 ~2KB.

  1. How much memory would you suggest for my use-case?
 2.Also, is it prudent for me to have half of OS memory dedicated to 
 elasticsearch?


 On Monday, 12 May 2014 14:03:19 UTC+5:30, Mark Walkom wrote:

 You need to reduce your data size, add more memory or add another node.

 Basically, you've reached the limits of that node.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com
  

 On 12 May 2014 16:38, Abhishek Tiwari erb...@gmail.com wrote:

 My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem). 
 Elasticsearch starts as-

 498  31810 99.6 64.6 163846656 4976944 ?   Sl   06:03  26:10 
 /usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:
 CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/
 elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch 
 -cp :/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/
 share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* 
 -Des.default.path.home=/usr/share/elasticsearch 
 -Des.default.path.logs=/var/log/elasticsearch 
 -Des.default.path.data=/var/lib/elasticsearch 
 -Des.default.path.work=/tmp/elasticsearch 
 -Des.default.path.conf=/etc/elasticsearch 
 org.elasticsearch.bootstrap.Elasticsearch


 The node stopped responding (the ip:9200 status page), and so did 
 kibana. It started working fine on a restart.
 i have logstash format docs wherein the index rotates daily. 
 Stats:
   Daily: ~11G docs, ~15  million.
   Total: 195G docs, ~300 million.

 The logs of the time when it stopped responding are-

 [2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal 
 King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
 [2014-05-12 03:40:52,293][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s], 
 total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [150.3mb]-[1.7mb]/[266.2mb]}{[survivor] 
 [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal 
 King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
 [2014-05-12 03:45:32,191][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s], 
 total [5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [197.4mb]-[9.3mb]/[266.2mb]}{[survivor] 
 [33.2mb]-[0b]/[33.2mb]}{[old] [3.5gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:06:01,224][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][241926][35985] duration [6s], collections [1]/[6.2s], 
 total 
 [6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [134.7mb]-[9.9mb]/[266.2mb]}{[survivor] 
 [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:08:14,473][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s], 
 total [5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [165.1mb]-[2.7mb]/[266.2mb]}{[survivor] 
 [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:09:07,473][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][242096][36011] duration [6.2s], collections [1]/[6.7s], 
 total [6.2s]/[4.7h], memory [3.9gb]-[3.6gb]/[3.9gb], all_pools {[young] 
 [265.9mb]-[2.9mb]/[266.2mb]}{[survivor] 
 [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:10:08,387][INFO ][monitor.jvm  ] [Hannibal 
 King] [gc][old][242152][36020] duration [5.4s], 

Re: Lot of GC in elasticsearch node.

2014-05-12 Thread Mark Walkom
Yes, kibana will load for whatever you ask for, *but* ES has to maintain
index metadata for every index in memory.
Those two coupled are pushing things too far for your heap.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 12 May 2014 20:45, Abhishek Tiwari erb...@gmail.com wrote:

 How much RAM you need depends on how long you want to keep your data
 around for. So, given you have ~200GB now on 4GB of RAM, you can probably
 extrapolate that out based on your needs.


 Isn't my problem more with 9G *daily* index, than with total of 200G(20
 days x 9G) indexes?
 Correct me if i am wrong here but doesn't kibana ask elasticsearch for
 just one day/week of indices(based on the query).
 Will elasticsearch really care if i have 500 days of total day-wise
 segregated indices out there but am performing queries on *just past 7
 days*?
 Is this a total-footprint problem or a daily throughput problem?



 On Monday, 12 May 2014 15:30:36 UTC+5:30, Mark Walkom wrote:

 It's standard practise to use 50% of system memory for the heap.

 How much RAM you need depends on how long you want to keep your data
 around for. So, given you have ~200GB now on 4GB of RAM, you can probably
 extrapolate that out based on your needs.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 12 May 2014 19:33, Abhishek Tiwari erb...@gmail.com wrote:

 add more memory


 i am doing 15 million docs, which total to ~9G. The average doc size is
 ~2KB.

  1. How much memory would you suggest for my use-case?
 2.Also, is it prudent for me to have half of OS memory dedicated to
 elasticsearch?


 On Monday, 12 May 2014 14:03:19 UTC+5:30, Mark Walkom wrote:

 You need to reduce your data size, add more memory or add another node.

 Basically, you've reached the limits of that node.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 12 May 2014 16:38, Abhishek Tiwari erb...@gmail.com wrote:

 My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem).
 Elasticsearch starts as-

 498  31810 99.6 64.6 163846656 4976944 ?   Sl   06:03  26:10
 /usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:
 CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/
 elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch
 -cp :/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/s
 hare/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
 -Des.default.path.home=/usr/share/elasticsearch
 -Des.default.path.logs=/var/log/elasticsearch
 -Des.default.path.data=/var/lib/elasticsearch
 -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/
 elasticsearch org.elasticsearch.bootstrap.Elasticsearch


 The node stopped responding (the ip:9200 status page), and so did
 kibana. It started working fine on a restart.
 i have logstash format docs wherein the index rotates daily.
 Stats:
   Daily: ~11G docs, ~15  million.
   Total: 195G docs, ~300 million.

 The logs of the time when it stopped responding are-

 [2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal
 King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
 [2014-05-12 03:40:52,293][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s],
 total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [150.3mb]-[1.7mb]/[266.2mb]}{[survivor]
 [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal
 King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic)
 [2014-05-12 03:45:32,191][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s],
 total [5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [197.4mb]-[9.3mb]/[266.2mb]}{[survivor]
 [33.2mb]-[0b]/[33.2mb]}{[old] [3.5gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:06:01,224][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][241926][35985] duration [6s], collections [1]/[6.2s], 
 total
 [6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [134.7mb]-[9.9mb]/[266.2mb]}{[survivor]
 [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]}
 [2014-05-12 04:08:14,473][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s],
 total [5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young]
 [165.1mb]-[2.7mb]/[266.2mb]}{[survivor]
 [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]}
 [2014-05-12 04:09:07,473][INFO ][monitor.jvm  ] [Hannibal
 King] [gc][old][242096][36011] duration [6.2s], collections 

Is there a quick way to set the data dir for ElasticsearchIntegrationTest ?

2014-05-12 Thread mooky
I want elastic to put its data under the target dir ...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e945762c-a26d-464d-bccb-62493373cdb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Matching on sibling json nodes ?

2014-05-12 Thread Kristian Rosenvold
We're submitting a json document that looks like this:

{
  book: {
title : book1,
  authors: [
{name:auth1, role:role1},
{name:auth2, role:role2}
  ]
}
}


We would like to do searches that find this for a search on auth1/role1 but 
*not* for auth1/role2. We have used nested queries to make this work, but 
unfortunately nested queries dont work with highlighting. Is there any 
other way to accomplish this ? (We are contemplating simply filing a new 
field that combines name and role, but that would also have some drawbacks).

Kristian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0b062a6d-5766-4dee-96a5-27aff637f56b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Matching on sibling json nodes ?

2014-05-12 Thread Clinton Gormley
Hi Kristian

You can use nested objects and set include_in_parent to true (it's like
using type:nested and type:object on the same field), then highlight on the
fields in the parent object.

clint


On 12 May 2014 13:42, Kristian Rosenvold kristian.rosenv...@gmail.comwrote:

 We're submitting a json document that looks like this:

 {
   book: {
 title : book1,
   authors: [
 {name:auth1, role:role1},
 {name:auth2, role:role2}
   ]
 }
 }


 We would like to do searches that find this for a search on auth1/role1
 but *not* for auth1/role2. We have used nested queries to make this work,
 but unfortunately nested queries dont work with highlighting. Is there any
 other way to accomplish this ? (We are contemplating simply filing a new
 field that combines name and role, but that would also have some drawbacks).

 Kristian

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/0b062a6d-5766-4dee-96a5-27aff637f56b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0b062a6d-5766-4dee-96a5-27aff637f56b%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSJ%3Dq2-koCpKd3ALdyw9eHVfb0RRmDm3PwZ_Ln%2BZ0-UOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: unable to write data to elasticsearch using hadoop PIG

2014-05-12 Thread hanine haninne
I had get the same erreur but I don't know what I have to change in my 
/etc/hosts
thank you for your help

Le mercredi 5 mars 2014 09:39:46 UTC, Yann Barraud a écrit :

 Hi,

 Is your ES instance known by your Hadoop cluster (/etc/hosts) ? 

 It does not even seems to read in it.

 Cheers,
 Yann

 Le mercredi 5 mars 2014 06:32:55 UTC+1, siva mannem a écrit :

 I installed ES(at the location /usr/lib/elasticsearch/) on our gateway 
 server and i am able to run some basic curl commands like XPUT and XGET to 
 create some indices and retrieve the data in them.
 i am able to give single line JSON record but i am unable to give JSON 
 file as input to curl XPUT .
 can anybody give me the syntax for giving JSON file as input for curl 
 XPUT command?

 my next issue is i copied  the following 4 elasticsearch-hadoop jar files
 elasticsearch-hadoop-1.3.0.M2.jar  
 elasticsearch-hadoop-1.3.0.M2-sources.jar
 elasticsearch-hadoop-1.3.0.M2-javadoc.jar  
 elasticsearch-hadoop-1.3.0.M2-yarn.jar

 to  /usr/lib/elasticsearch/elasticsearch-0.90.9/lib
 and /usr/lib/gphd/pig/

 i have the following json file j.json
 ++
 {k1:v1 ,  k2:v2 , k3:v3}
 

 in my_hdfs_path.

 my pig script is write_data_to_es.pig
 +
 REGISTER /usr/lib/gphd/pig/elasticsearch-hadoop-1.3.0.M2-yarn.jar;
 DEFINE ESTOR org.elasticsearch.hadoop.pig.EsStorage('es.resource=usa/ca');
 A = LOAD '/my_hdfs_path/j.json' using 
 JsonLoader('k1:chararray,k2:chararray,k3:chararray');
 STORE A into 'usa/ca' USING ESTOR('es.input.json=true');
 ++

 when i run my pig script 
 +
 pig -x mapreduce  write_data_to_es.pig 
 

 i am getting following error
 +
 Input(s):
 Failed to read data from /my_hdfs_path/j.json

 Output(s):
 Failed to produce result in usa/ca

 Counters:
 Total records written : 0
 Total bytes written : 0
 Spillable Memory Manager spill count : 0
 Total bags proactively spilled: 0
 Total records proactively spilled: 0

 Job DAG:
 job_1390436301987_0089


 2014-03-05 00:26:50,839 [main] INFO 
  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  
 - Failed!
 2014-03-05 00:26:50,841 [main] ERROR 
 org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Input(s):
 Failed to read data from /elastic_search/es_hadoop_test.json

 Output(s):
 Failed to produce result in mannem/siva

 Counters:
 Total records written : 0
 Total bytes written : 0
 Spillable Memory Manager spill count : 0
 Total bags proactively spilled: 0
 Total records proactively spilled: 0

 Job DAG:
 job_1390436301987_0089

 2014-03-05 00:26:50,839 [main] INFO 
  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  
 - Failed!
 2014-03-05 00:26:50,841 [main] ERROR 
 org.apache.pig.tools.grunt.GruntParser - *ERROR 2997: Encountered 
 IOException. Out of nodes and retries; caught exception*
 Details at logfile: 
 /usr/lib/elasticsearch/elasticsearch-0.90.9/pig_1393997175206.log
 

 i am using pivotal hadoop version (1.0.1)  which is basically apache 
 hadoop (hadoop-2.0.2)
 and pig version is 0.10.1
 and elastic search version is 0.90.9

 can anybody help me out here?
 thank you so much in advance for your help.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1dd8ff7d-ef53-4614-9300-13b5f6ed66fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to use operators in elasticsearch percolator query?

2014-05-12 Thread yatish gupta
Hi have installed ES 1.0.1 and tried to check the percolator query with AND 
operator.

Added a query to percolator:

curl -XPUT 'localhost:9200/testperc/.percolator/1' -d 
'{query:{match:{message : hero AND shine}}}'

When tried to check for hero word that query is matching ideally it 
should not happen.

curl -XGET 'localhost:9200/testperc/message/_percolate' -d '{doc : 
{message : hero}}'
{took:10,_shards:{total:5,successful:5,failed:0},total:1,matches:[{_index:testperc,_id:1}]}


Please help me how can i use the operators in percolator query?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3abd96e1-6abb-4411-915b-9c24a352ba23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Disk changes forced resync

2014-05-12 Thread Duncan Innes
Not doing any monitoring yet - this is my dev cluster running on 3 
workstations.

I thought I was quick enough that the rebalance wouldn't have marched ahead 
and changed much - clearly my admin skills need sharpening!

Is there a way to get the cluster to avoid rebalancing when a node is 
removed from the cluster?  I wouldn't want a cluster rebalance starting 
just because I'm patching the OS and need a reboot.

Thanks

Duncan

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8eed4547-1ca7-4fc7-b291-529653618f94%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Script average value over hits

2014-05-12 Thread Jo Emil Holen
Hi!

I want to make a script that does some statistics, as how I understand it, 
ES don't do statistic the way I need it to.

The search I do returns multiple hits like 
this: http://pastebin.com/UkQjDXhm
What I want to do is an average of time after 13:37 independent of date. So 
what I think would do it is creating a date object and push 
parseFloat(date.getMinutes()+.+date.getSeconds()), and then average it in 
the end and put it into the response. Wrote it like javascript, but I 
assume MVEL is the fastest language, so I'd like to do it in MVEL.

Is this at all possible, or would I have to do it on the client side?

Regards Jo Emil

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d30345b1-8126-4d89-8fd9-2897695086a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana Time Troubles

2014-05-12 Thread Tate Eskew
Any ideas here?

On Tuesday, May 6, 2014 3:32:45 PM UTC-5, Tate Eskew wrote:

 Hello,
 Maybe someone can help me. My setup:
 AWS Servers using rsyslog (UTC time)  Physical server in datacenter 
 central syslog-ng server (CST). 
 Logstash shipper is running on the central syslog-ng box (CST). It grabs 
 the events coming in, mangles them, throws them into redis. Logstash 
 indexer on another box grabs them out of redis, shoves them in 
 elasticsearch.  

 Everything works as expected for months now, the only problem I have is 
 that the display in Kibana doesn't show the log events for 5 hours because 
 of the Logstash shipper being CST (5 hours behind). Any idea on how to get 
 it to display immediately? Logs display immediately if I send to the 
 central log server from a server that is CST as well. Here is a sample from 
 an AWS box (UTC) that is picked up by the central log server (CST)

 Is there any way to get Kibana to show the events as they come in 
 correctly?  We have lots of physical machines in our datacenters and they 
 are all set to CST, but all of our AWS instances are set to UTC.  As of 
 right now, we don't want to change the central syslog server's timezone to 
 UTC since it still resides in one of our data centers. 

 Any ideas? Is this something we should try to fix at the Logstash config 
 or is this a display fix for Kibana?

 Here is a sample from an AWS box (UTC) that is picked up by the central log 
 server (CST) - Displays 5 hours later/incorrectly

 {
   _index: logstash-2014.05.06,
   _type: syslog,
   _id: mZvpk-_9T4WgA2zxlsxogA,
   _score: null,
   _source: {
 @version: 1,
 @timestamp: 2014-05-05T20:01:26.000-05:00,
 type: syslog,
 syslog_pri: 163,
 syslog_program: ubuntu,
 received_at: 2014-05-05 20:01:27 UTC,
 syslog_severity_code: 3,
 syslog_facility_code: 20,
 syslog_facility: local4,
 syslog_severity: error,
 @source_host: p-aws-emmaplatformsingle01,
 @message: trustinme,
 @host: p-aws-emmaplatformsingle01
   },
   sort: [
 1399338086000
   ]
 }

 Here is a sample from a physical machine in one of our data centers (CST) 
 that is picked up by the central logs server (CST) - Diplays 
 instantly/correctly

 {
   _index: logstash-2014.05.06,
   _type: syslog,
   _id: SjWn9aJWRGKeshylyp1j2Q,
   _score: null,
   _source: {
 @version: 1,
 @timestamp: 2014-05-06T14:01:52.000-05:00,
 type: syslog,
 syslog_pri: 13,
 syslog_program: teskew,
 received_at: 2014-05-06 19:01:53 UTC,
 syslog_severity_code: 5,
 syslog_facility_code: 1,
 syslog_facility: user-level,
 syslog_severity: notice,
 @source_host: p-bna-apix01,
 @message: trustinme,
 @host: p-bna-apix01
   },
   sort: [
 1399402912000
   ]
 }



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2e5f4158-8954-4ed6-85bf-cc7dc8099454%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Aggregation Names

2014-05-12 Thread Andrew Mehler
Hey guys,
I noticed this constraint on agg names
https://github.com/elasticsearch/elasticsearch/commit/f1248e58
It's not mentioned in the guide or under breaking changes for 1.1.0 
(instead the breaking change is buried in the issue, which is an 
enhancement)

It seems to me the most convenient name for a terms agg is the field name 
itself, which very often contains the now non permited character '.'

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bfa4dce9-7e56-42b9-a4ab-702f456ce608%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Phrase Search with Proximity

2014-05-12 Thread K Chenette
What is the best way for us to perform a phrase search where we are 
concerned with the proximity between phrases (not terms)? I have looked at 
Match Query and Query String Query and I see how slop allows for Proximity 
between individual terms but we need to be able to do this at the level of 
phrases like: Quick Brown Fox NEAR(50) Flying Squirrel

Thanks. 
-- Karen 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d1282168-fd85-4800-9b3c-08fd2ac848db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: unable to write data to elasticsearch using hadoop PIG

2014-05-12 Thread Costin Leau

Check your network settings and make sure that the Hadoop nodes can communicate 
with the ES nodes.
If you install ES besides Hadoop itself, this shouldn't be a problem.
There are various way to check this - try ping, tracert, etc...

Please refer to your distro manual/documentation for more information about the 
configuration and setup.

Cheers,

On 5/12/14 3:42 PM, hanine haninne wrote:

I had get the same erreur but I don't know what I have to change in my 
/etc/hosts
thank you for your help

Le mercredi 5 mars 2014 09:39:46 UTC, Yann Barraud a écrit :

Hi,

Is your ES instance known by your Hadoop cluster (/etc/hosts) ?

It does not even seems to read in it.

Cheers,
Yann

Le mercredi 5 mars 2014 06:32:55 UTC+1, siva mannem a écrit :

I installed ES(at the location /usr/lib/elasticsearch/) on our gateway 
server and i am able to run some basic
curl commands like XPUT and XGET to create some indices and retrieve 
the data in them.
i am able to give single line JSON record but i am unable to give JSON 
file as input to curl XPUT .
can anybody give me the syntax for giving JSON file as input for curl 
XPUT command?

my next issue is i copied  the following 4 elasticsearch-hadoop jar 
files
elasticsearch-hadoop-1.3.0.M2.jar
elasticsearch-hadoop-1.3.0.M2-sources.jar
elasticsearch-hadoop-1.3.0.M2-javadoc.jar
elasticsearch-hadoop-1.3.0.M2-yarn.jar

to  /usr/lib/elasticsearch/elasticsearch-0.90.9/lib
and /usr/lib/gphd/pig/

i have the following json file j.json
++
{k1:v1 ,  k2:v2 , k3:v3}


in my_hdfs_path.

my pig script is write_data_to_es.pig
+
REGISTER /usr/lib/gphd/pig/elasticsearch-hadoop-1.3.0.M2-yarn.jar;
DEFINE ESTOR 
org.elasticsearch.hadoop.pig.EsStorage('es.resource=usa/ca');
A = LOAD '/my_hdfs_path/j.json' using 
JsonLoader('k1:chararray,k2:chararray,k3:chararray');
STORE A into 'usa/ca' USING ESTOR('es.input.json=true');
++

when i run my pig script
+
pig -x mapreduce  write_data_to_es.pig


i am getting following error
+
Input(s):
Failed to read data from /my_hdfs_path/j.json

Output(s):
Failed to produce result in usa/ca

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1390436301987_0089


2014-03-05 00:26:50,839 [main] INFO
  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Failed!
2014-03-05 00:26:50,841 [main] ERROR 
org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Input(s):
Failed to read data from /elastic_search/es_hadoop_test.json

Output(s):
Failed to produce result in mannem/siva

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1390436301987_0089

2014-03-05 00:26:50,839 [main] INFO
  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Failed!
2014-03-05 00:26:50,841 [main] ERROR 
org.apache.pig.tools.grunt.GruntParser - *ERROR 2997: Encountered
IOException. Out of nodes and retries; caught exception*
Details at logfile: 
/usr/lib/elasticsearch/elasticsearch-0.90.9/pig_1393997175206.log


i am using pivotal hadoop version (1.0.1)  which is basically apache 
hadoop (hadoop-2.0.2)
and pig version is 0.10.1
and elastic search version is 0.90.9

can anybody help me out here?
thank you so much in advance for your help.

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1dd8ff7d-ef53-4614-9300-13b5f6ed66fa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1dd8ff7d-ef53-4614-9300-13b5f6ed66fa%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

Bulk Load Large Spatial Datasets

2014-05-12 Thread Brian Behling
Hello.

I'm trying to bulk load about 550k records with spatial data into 
ElasticSearch. After about 20 mins, an error occurs No Handlers Can Be 
Found For Logger elasticsearch', then the connection times out and the 
Python scripts stops.

The Python loading script was working fine before adding the spatial data.


Anyone have some ideas on how to load large spatial datasets?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c2866bc2-f511-45ef-b780-3c7e275da3b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk Load Large Spatial Datasets

2014-05-12 Thread Honza Král
Hi Brian,

that message you are seeing is not an error - it's a warning from the
python logging system that you don't have any logging configured. So
when elasticsearch tries to log something it cannot.

I'd suggest to set up your logging and try again. To set up logging
just include:

import logging
logging.basicConfig(level=logging.INFO)

at the top of your script.

On Mon, May 12, 2014 at 6:21 PM, Brian Behling brian.behl...@gmail.com wrote:
 Hello.

 I'm trying to bulk load about 550k records with spatial data into
 ElasticSearch. After about 20 mins, an error occurs No Handlers Can Be
 Found For Logger elasticsearch', then the connection times out and the
 Python scripts stops.

 The Python loading script was working fine before adding the spatial data.


 Anyone have some ideas on how to load large spatial datasets?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c2866bc2-f511-45ef-b780-3c7e275da3b6%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDipHG3g9k%3D0i93iO1g5%3D1Pgi7GcLF3JMLgfY9dX8%3DSZ-bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk Load Large Spatial Datasets

2014-05-12 Thread Brian Behling
Thank you.

I did find the error causing this script to crash. Looks like there many 
invalid polygons (self intersecting) that is causing this problem. But this 
should be for another thread if the business rules dictate we can't 
simplify the geometries.

On Monday, May 12, 2014 10:24:24 AM UTC-6, Honza Král wrote:

 Hi Brian, 

 that message you are seeing is not an error - it's a warning from the 
 python logging system that you don't have any logging configured. So 
 when elasticsearch tries to log something it cannot. 

 I'd suggest to set up your logging and try again. To set up logging 
 just include: 

 import logging 
 logging.basicConfig(level=logging.INFO) 

 at the top of your script. 

 On Mon, May 12, 2014 at 6:21 PM, Brian Behling 
 brian@gmail.comjavascript: 
 wrote: 
  Hello. 
  
  I'm trying to bulk load about 550k records with spatial data into 
  ElasticSearch. After about 20 mins, an error occurs No Handlers Can Be 
  Found For Logger elasticsearch', then the connection times out and the 
  Python scripts stops. 
  
  The Python loading script was working fine before adding the spatial 
 data. 
  
  
  Anyone have some ideas on how to load large spatial datasets? 
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups 
  elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an 
  email to elasticsearc...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/c2866bc2-f511-45ef-b780-3c7e275da3b6%40googlegroups.com.
  

  For more options, visit https://groups.google.com/d/optout. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/56a80e65-cc36-4388-ac07-0f4a228922f2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


sizing for time data flow

2014-05-12 Thread slushi
(apologies in advance for yet another sizing post)

We are indexing approximately 2KB documents and ingesting about 50 million 
documents daily. The index size ends up being about 75GB per day for the 
primary shards (doing replication = 1 so 150GB/day). In our use case, after 
1 month, we throw away 95% of the data but need to keep the rest 
indefinitely. We are planning to use the time data flow mentioned in 
Shay's presentations and are currently thinking about what time period to 
use for each index. With a shorter period, the current month index may 
behave better, but we'll end up accumulating lots of smaller indices after 
the 1 month period. 

We currently have a 4 node setup, each with 12 cores, 96GB of ram and 2TB 
of disk space over 4 disks. By my calculations, to hold one year of data 
with r=1, we would need 150GB/day * 31 for the initial month, then 
150GB/day*31*.05 for historical months = 4.65TB + 2.5TB = 7+TB for 1 year 
of data. This seems pretty tight to me considering additional space may be 
needed for merges, etc. 

   1. Is accumulating a lot of indexes per node a concern here? If we did a 
   daily index with 4 shards and r=1, that would be over 700 shards per node 
   for 1 year. I know that there is a memory limitation on the number of 
   shards that can be managed by a node. 
   2. If we did a monthly index, that would be better for the historical 
   indices, but the current month index would be huge, over 2TB.
   3. Is there any difference here between doing a daily index with less 
   shards vs. a monthly index with more primary shards?
   4. How would having this many shards affect query performance? I assume 
   there is some sweet spot of shards per node that must be found empirically? 
   I would guess it's somewhat related to the number of disks/cores per node?
   5. I am also wondering about the RAM to data ratio and whether we'll get 
   decent query performance. Due to our use case, we can't use routing. Is 
   there any rule of thumb here?
   6. Another option we are considering is to do a daily index for the 
   first month, and then have periodic jobs to combine the historical daily 
   indexes into larger indices. So for example the first month = 31 daily 
   indices and following months will get rolled up into 1 index per month. But 
   we only want to do this extra work if it's needed.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b3b6e634-7184-4f7e-ac46-da453917721b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: The effect of multi-fields and copy_to on storage size

2014-05-12 Thread Jeremy McLain
Even if the three fields are compressed isn't it still storing three 
compressed copies of the same thing? That is still three times more 
overhead than it needs to be using. It seems very wasteful of space. 
Ideally the space used by the database would be 
size_of_stored_fields_compressed + size_of_index. In my case my database 
will look more like (size_of_stored_fields_compressed x 3) + size_of_index. 
This greatly increases my storage requirements!

If I enabled the type's _source field and disabled individual field storage 
could I still get highlighting info in the query response for those fields?

Thanks, Adrien, for your response.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6dec8a1c-e354-447d-82c0-cdd355a5afcc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Filtering nested aggregates

2014-05-12 Thread Ary Borenszweig
A friend of mine made it work. It wasn't working because we were using a 
filter - term inside the nested aggregation with Tuberculosis, but the 
analyzed value was tuberculosis. Changing Tuberculosis to 
tuberculosis made it work. Also, repeating the first query (instead of 
using a filter) makes it work in the nested filter.

Here's one example:

curl -XPOST http://localhost:9200/test_results/_search?pretty=true; -d'{
  size: 0,
  query: {
nested: {
  path: data,
  query: {
match: {
  data.condition: Tuberculosis
}
  }
}
  },
  aggregations: {
data: {
  nested: {
path: data
  },
  aggregations: {
filtered_result: {
  filter: {
query: {
  match: {
condition: Tuberculosis
  }
}
  },
  aggregations : {
result: {
  terms: {
field: data.result
  }
}
  }
}
  }
}
  }
}
'

  

On Friday, May 9, 2014 2:48:50 PM UTC-3, Ary Borenszweig wrote:

 Hi,

 I have an index where I need to store medical test results. A test result 
 can talk about many conditions and their results: for example, Tuberculosis 
 = positive, Flu = negative. So I modeled my index like this:

 curl -XPUT http://localhost:9200/test_results/; -d'
 {
mappings: {
   result: {
  properties: {
 data: {
   type: nested,
   properties: {
 condition: {type: string},
 result: {type: string}
   }
 }
  }
   }
}
 }'

 I insert one test result with Tuberculosis = positive, Flu = negative:

 curl -XPOST http://localhost:9200/test_results/_bulk; -d'
 {index:{_index:test_results,_type:result}}
 {data: [{condition: Tuberculosis, result: positive}, 
 {condition: FLU, result: negative}]}
 '

 Then, one of the queries I need to do is this one: for Tuberculosis, give 
 me how many positives you have and how many negatives you have (basically: 
 filter by data.condition and group by data.result). So I tried this query:

 curl -XPOST http://localhost:9200/test_results/_search?pretty=true; -d'{
   size: 0,
   query: {
 nested: {
   path: data,
   query: {
 match: {
   data.condition: Tuberculosis
 }
   }
 }
   },
   aggregations: {
 data: {
   nested: {
 path: data
   },
   aggregations: {
 result: {
   terms: {
 field: data.result
   }
 }
   }
 }
   }
 }
 '

 However, the above gives me this result:

  aggregations : {
 data : {
   doc_count : 2,
   result : {
 buckets : [ {
   key : negative,
   doc_count : 1
 }, {
   key : positive,
   doc_count : 1
 } ]
   }
 }
   }

 That is, it gives me one negative result and one positive result. That's 
 because the document has one positive and negative, and it's not discarding 
 the one that has Flu.

 I see in the documentation there's a filter aggregate. I tried using it 
 in many ways:

 1. With term on data.condition:

 curl -XPOST http://localhost:9200/test_results/_search?pretty=true; -d'{
   size: 0,
   query: {
 nested: {
   path: data,
   query: {
 match: {
   data.condition: Tuberculosis
 }
   }
 }
   },
   aggregations: {
 data: {
   nested: {
 path: data
   },
   aggregations: {
 filtered_result: {
   filter: {
 term: { data.condition : Tuberculosis }
   },
   aggregations : {
 result: {
   terms: {
 field: data.result
   }
 }
   }
 }
   }
 }
   }
 }
 '


 2. With term on condition:

 curl -XPOST http://localhost:9200/test_results/_search?pretty=true; -d'{
   size: 0,
   query: {
 nested: {
   path: data,
   query: {
 match: {
   data.condition: Tuberculosis
 }
   }
 }
   },
   aggregations: {
 data: {
   nested: {
 path: data
   },
   aggregations: {
 filtered_result: {
   filter: {
 term: { condition : Tuberculosis }
   },
   aggregations : {
 result: {
   terms: {
 field: data.result
   }
 }
   }
 }
   }
 }
   }
 }
 '

 3. With nested:

 curl -XPOST http://localhost:9200/test_results/_search?pretty=true; -d'{
   size: 0,
   query: {
 nested: {
   path: data,
   query: {
 match: {
   data.condition: Tuberculosis
 }
   }
 }
   },
   aggregations: {
 data: {
   nested: {
 path: data
   },
   aggregations: {
 filtered_result: {
   filter: {
 nested: {
   

Re: Multi DC cluster or separate cluster per DC?

2014-05-12 Thread Deepak Jha
Having a separate cluster is definitely a better way to go. OR, you can 
control the shard, replica placement so that they are always placed in the 
same DC. In this way, you can avoid interDC issues still having a single 
cluster. I have the similar issue and I am looking at it as one of the 
alternative. 

On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote:

 Thanks for the answer! We've been talking with several other teams in our 
 company and it looks like this is the most recommended and stable setup.

 Regards
 Sebastian

 W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

 Go the latter method and have two clusters, ES can be very sensitive to 
 network latency and you'll likely end up with more problems than it is 
 worth. 
 Given you already have the data source of truth being replicated, it's 
 the sanest option to just read that locally.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote:

 Hi!

 I'd like to ask for advice about deployment in multi DC scenario.

 Currently we operate on 2 Data Centers in active/standby mode.  like to 
 opeIn case of ES we'd like to have different approach - we'drate in 
 active-active mode (we want to optimize our resources especially for 
 querying). 
 Here are some details about target configuration:

- 4 ES instances per DC. Full cluster will have 8 instances.
- Up to 1 TB of data 
- Data pulled from database using JDBC River
- Database is replicated asynchronously between DCs. Each DC will 
have its own database instance to pull data. 
- Average latency between DCs is about several miliseconds
- We need to operate when passive DC is down

 We know that multi DC configuration might end with Split Brain issue. 
 Here is how we want to prevent it:

- Set node.master: true only in 4 nodes in active DC
- Set node.master: false in passive DC
- This way we'll be sure that new cluster will not be created in 
passive DC 
- Additionally we'd like to set discovery.zen.minimum_master_nodes: 
3 (to avoid Split Brain in active DC)

 Additionally there is problem with switchover (passive DC becomes active 
 and active becomes passive). In our system it takes about 20 minutes and 
 this is the maximum length of our maintenance window. We were thinking of 
 shutting down whole ES cluster and switch node.master setting in 
 configuration files (as far as I know this settings can not be changed via 
 REST api). Then we'd need to start whole cluster.

 So my question is: is it better to have one big ES cluster operating on 
 both DCs or should we change our approach and create 2 separate clusters 
 (and rely on database replication)? I'd be grateful for advice.

 Regards
 Sebastian

  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query string operators seem to not be working correctly

2014-05-12 Thread 'Binh Ly' via elasticsearch
Erich,

A colleague pointed out to me a much more complete explanation that I could 
ever do:

http://searchhub.org//2011/12/28/why-not-and-or-and-not/

But the short of it is, it is working as expected and just need to map a 
bit back to Lucene Boolean logic to fully understand why/how it works.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9477552e-3fa8-4771-8277-064c73ea70f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Corruption error after upgrade to 1.0

2014-05-12 Thread thale jacobs
Did you ever get this resolved and if so, how was it resolved?  I am 
experiencing the same issue...  

On Monday, February 17, 2014 4:25:00 PM UTC-5, Mo wrote:

 After upgrading to 1.0 I am unable to index any documents. I get the 
 following error. Could somebody help?
  

 [Aardwolf] Message not fully read (response) for [0] handler 
 future(org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler$1@5c6e3b4c),
  
 error [true], resetting

 [Aardwolf] failed to get node info for 
 [#transport#-1][inet[/10.80.140.59:9300]], disconnecting...

 org.elasticsearch.transport.RemoteTransportException: Failed to 
 deserialize exception response from stream

 Caused by: org.elasticsearch.transport.TransportSerializationException: 
 Failed to deserialize exception response from stream

 at 
 org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:168)

 at 
 org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:122)

 at 
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)

 at 
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)

 at 
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)

 at 
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)

 at 
 org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)

 at 
 org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)

 at 
 org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)

 at 
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)

 at 
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)

 at 
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)

 at 
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)

 at 
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)

 at 
 org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)

 at 
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)

 at 
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)

 at 
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)

 at 
 org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

 at 
 org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

 at 
 org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

 at java.lang.Thread.run(Unknown Source)

 Caused by: java.io.StreamCorruptedException: unexpected end of block data

 at java.io.ObjectInputStream.readObject0(Unknown Source)

 at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

 at java.io.ObjectInputStream.defaultReadObject(Unknown Source)

 at java.lang.Throwable.readObject(Throwable.java:913)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)

 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

 at java.lang.reflect.Method.invoke(Unknown Source)

 at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)

 at java.io.ObjectInputStream.readSerialData(Unknown Source)

 at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

 at java.io.ObjectInputStream.readObject0(Unknown Source)

 at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4af31baf-d27b-4f90-8f9d-fc6e72f70ead%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Mapping created using Template does not work

2014-05-12 Thread Deepak Jha
Hi Alexander,
Yes it works when I remove the template setting.

On Friday, May 9, 2014 12:26:49 PM UTC-7, Alexander Reelsen wrote:

 Hey,

 can you just take some sample data and index it into elasticsearch 
 manually and see if that works?


 --Alex


 On Thu, May 1, 2014 at 1:53 AM, Deepak Jha dkjh...@gmail.comjavascript:
  wrote:

 Hi,
 I have setup ELK stack and I am going by default index name, which is 
 logstash-.MM.DD . Since this is the only index format I have, I decided 
 to create a template file, so that whenever new index gets created i can 
 set up the mapping property. I am not able to push the data to 
 elasticsearch if my index mapping gets created from template. May I know 
 where am I wrong ?

 Here is my mapping file content:
 {
   X_Server : {
 properties : {
   @timestamp : {
 type : date,
 format : dateOptionalTime
   },
   @version : {  type : string  },
   class : { type : string },
   file : { type : string},
   message: {type: string},
   host : { type : string, index: not_analyzed }
 }}}


 My template file content is

 {
 template: logstash-*,
 settings : {
 index.number_of_shards : 3,
 index.number_of_replicas : 1,
 index.query.default_field : @message,
 index.routing.allocation.total_shards_per_node : 2,
 index.auto_expand_replicas: false
 },
 mappings: {
 X_Server: {
 _all: { enabled: false },
 _source: { compress: false },
 properties : {
 class : { type : string,  },
 host : { type : string, index : not_analyzed },
 file : { type : string },
 message : { type: string}
  }
 }}}

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/b1d382b5-0fa7-4a2c-96f0-150d856482cc%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b1d382b5-0fa7-4a2c-96f0-150d856482cc%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/76e896a0-8e74-417b-8027-63b3fe67f2bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query string operators seem to not be working correctly

2014-05-12 Thread Erich Lin
Thanks Binh!

To summarize for everyone else:

1) Queries are parsed left to right
2) NOT sets the Occurs flag of the clause to it’s right to MUST_NOT
3) AND will change the Occurs flag of the clause to it’s left to MUST 
unless it has already been set to MUST_NOT
4) AND sets the Occurs flag of the clause to it’s right to MUST
5) If the default operator of the query parser has been set to “And”: OR 
will change the Occurs flag of the clause to it’s left to SHOULD unless it 
has already been set to MUST_NOT
6) OR sets the Occurs flag of the clause to it’s right to SHOULD

Practically speaking this means that NOT takes precedence over AND which 
takes precedence over OR — but only if the default operator for the query 
parser has not been changed from the default (“Or”). If the default 
operator is set to “And” then the behavior is just plain weird. 

Erich

On Monday, May 12, 2014 12:37:24 PM UTC-7, Binh Ly wrote:

 Erich,

 A colleague pointed out to me a much more complete explanation that I 
 could ever do:

 http://searchhub.org//2011/12/28/why-not-and-or-and-not/

 But the short of it is, it is working as expected and just need to map a 
 bit back to Lucene Boolean logic to fully understand why/how it works.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/447bfb62-4094-4024-9f53-6e713b11b895%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: The effect of multi-fields and copy_to on storage size

2014-05-12 Thread Adrien Grand
Hi Jeremy,

On Mon, May 12, 2014 at 7:43 PM, Jeremy McLain gongcheng...@gmail.comwrote:

 Even if the three fields are compressed isn't it still storing three
 compressed copies of the same thing? That is still three times more
 overhead than it needs to be using. It seems very wasteful of space.
 Ideally the space used by the database would be
 size_of_stored_fields_compressed + size_of_index. In my case my database
 will look more like (size_of_stored_fields_compressed x 3) + size_of_index.
 This greatly increases my storage requirements!


It is not storing 3 compressed copies of the same thing, but storing these
3 things (as a whole) compressed. The difference is important because it
means that the 2nd and 3rd copies are effectively stored as references to
the first field value. I would recommend building two indices, once with
the copy_fields, and once without to see what the difference is in practice.


 If I enabled the type's _source field and disabled individual field
 storage could I still get highlighting info in the query response for those
 fields?


Yes, although I would recommend keeping _source if possible. It makes lots
of things easier, for example you can reindex from elasticsearch itself,
etc.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j67kqaP9SC8d__b7Qi4KGpX7kRAa0hxo6enKTdC7wvLxA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


What is the difference between common terms query vs match query with cutoff_frequency set

2014-05-12 Thread Mike
I was reading up on the match query and noticed that it has a 
cutoff_frequency parameter, which seems to do pretty much what the common 
terms query does.  

   1. What is the difference between the common and match queries?
   2. When would I want to use common terms over match?
   3. Ultimately, would the direction be to have common terms query roll up 
   into the match query (with any differences added to match)?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1bcf87e3-3b65-45bc-8578-dde77b10c37f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Marvel Indices taking lot of space ? Can we specify automatic delete of marvel indice ?

2014-05-12 Thread deepakas
Is there a way to set marvel to delete the marvel indices after 7 days. It
looks like Marvel is generating around 2 GB of data everyday.  Our disk got
full 2 times because of Marvel data. Is there a way to reduce the amount of
data generated by marvel ?

Also is there any plan to add alert mechanisms in Marvel. 
For example if the Marvel status goto red it will be good to get an email
for a specified user. I see Marvel status as red. But it doesnt show why it
is causing red.  It will be good to get some alerts with the details when
the cluster status goes red.



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Marvel-Indices-taking-lot-of-space-Can-we-specify-automatic-delete-of-marvel-indice-tp4055729.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1399905548133-4055729.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel Indices taking lot of space ? Can we specify automatic delete of marvel indice ?

2014-05-12 Thread Ivan Brusic
I do not use Marvel, but another monitoring system built on top of
Elasticsearch. I use the Elasticsearch Curactor to delete old indices:
https://github.com/elasticsearch/curator

I have a cron entry to run the curator once per day. Perhaps something
already exists in Marvel, not sure since I am not a user.

Cheers,

Ivan



On Mon, May 12, 2014 at 7:39 AM, deepakas deepak.subhraman...@gmail.comwrote:

 Is there a way to set marvel to delete the marvel indices after 7 days. It
 looks like Marvel is generating around 2 GB of data everyday.  Our disk got
 full 2 times because of Marvel data. Is there a way to reduce the amount of
 data generated by marvel ?

 Also is there any plan to add alert mechanisms in Marvel.
 For example if the Marvel status goto red it will be good to get an email
 for a specified user. I see Marvel status as red. But it doesnt show why it
 is causing red.  It will be good to get some alerts with the details when
 the cluster status goes red.



 --
 View this message in context:
 http://elasticsearch-users.115913.n3.nabble.com/Marvel-Indices-taking-lot-of-space-Can-we-specify-automatic-delete-of-marvel-indice-tp4055729.html
 Sent from the ElasticSearch Users mailing list archive at Nabble.com.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1399905548133-4055729.post%40n3.nabble.com
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAq93AUa76%2BShoYdzRMVSq-m5Qzi%2B3utcdyaErRhA0FkA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Can't access S3 from Elastic Search

2014-05-12 Thread IronMan2014
I have a cluster with 2 nodes which works fine.
I am running the latest Elastic search version, and I would like to use 
snapshot/restore API, but having hard time getting to work. Note, it works 
fine with fs type, its AWS I am having hard time with.

In my 2 instances, I have this in the config.yml files

cloud:
 aws: 

 access_key: XX

 secret_key: YYY

discovery: 

type: ec2




I have created S3 bucket called it my-bucket, Inbound rules allow most 
ports, 22, 80, 9200, 9300


Tried to register the bucket with ElasticSearch:
PUT /_snapshot/es_repository
{

type: s3,

settings: {

  bucket: my-bucket

}


I get this error:

   error: RemoteTransportException[[node 
2][inet[/10.240.78.87:9300]][cluster/repository/put]]; nested: 
RepositoryException[[elasticsrch] failed to create repository]; nested: 
CreationException[Guice creation errors:\n\n1) Error injecting constructor, 
com.amazonaws.AmazonClientException: Unable to load AWS credentials from 
any provider in the chain\n  at 
org.elasticsearch.repositories.s3.S3Repository.init()\n  at 
org.elasticsearch.repositories.s3.S3Repository\n  at 
Key[type=org.elasticsearch.repositories.Repository, annotation=[none]]\n\n1 
error]; nested: AmazonClientException[Unable to load AWS credentials from 
any provider in the chain]; ,

   status: 500

}



Any ideas?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3749a26a-75cb-4ae7-a3bb-8160a7030847%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unable to Send Stats to Monitoring Cluster

2014-05-12 Thread Boaz Leskes
Hi Mario,

We just released marvel 1.1.1 with bug fix which I think will solve this 
- http://www.elasticsearch.org/guide/en/marvel/current/#_1_1_1

Can you check and see if it helps? (you can remove the 30s setting)

Cheers,
Boaz

On Sunday, April 27, 2014 10:12:31 PM UTC+2, Boaz Leskes wrote:

 Hi Mario,

 Gists look good to me. 

 One other thing I thought about - do you get this error all the time or 
 does appear every once in a while? Said differently, do you have data in 
 your monitoring cluster?

 If so, you can try increasing the timeout (defaults to 6s): 
 marvel.agent.exporter.es.timeout: 30s

 This can be done through the yml file or via the Cluster Update Settings 
 API : 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html#cluster-update-settings

 Cheers,
 Boaz


 On Fri, Apr 25, 2014 at 11:15 PM, Mario Rodriguez star...@gmail.comwrote:

 https://gist.github.com/anonymous/11303362#file-gistfile1-txt - Prod 
 Server #1
 https://gist.github.com/anonymous/11303451#file-gistfile1-txt - 
 Monitoring Server

 -- 
 You received this message because you are subscribed to a topic in the 
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit 
 https://groups.google.com/d/topic/elasticsearch/FNQf_hEXp04/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to 
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/9b2a7f82-036a-46a2-b738-e228dc7c1861%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9b2a7f82-036a-46a2-b738-e228dc7c1861%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d23bb576-4e78-4bac-9327-110c59905893%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can't access S3 from Elastic Search

2014-05-12 Thread David Pilato
May be it's related to this? 
https://github.com/elasticsearch/elasticsearch-cloud-aws#recommended-s3-permissions

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 12 mai 2014 à 22:54, IronMan2014 sabdall...@hotmail.com a écrit :

I have a cluster with 2 nodes which works fine.
I am running the latest Elastic search version, and I would like to use 
snapshot/restore API, but having hard time getting to work. Note, it works fine 
with fs type, its AWS I am having hard time with.

In my 2 instances, I have this in the config.yml files

cloud:
 aws: 

 access_key: XX

 secret_key: YYY

discovery: 

type: ec2





I have created S3 bucket called it my-bucket, Inbound rules allow most ports, 
22, 80, 9200, 9300



Tried to register the bucket with ElasticSearch:

PUT /_snapshot/es_repository
{

type: s3,

settings: {

  bucket: my-bucket

}


I get this error:

   error: RemoteTransportException[[node 
2][inet[/10.240.78.87:9300]][cluster/repository/put]]; nested: 
RepositoryException[[elasticsrch] failed to create repository]; nested: 
CreationException[Guice creation errors:\n\n1) Error injecting constructor, 
com.amazonaws.AmazonClientException: Unable to load AWS credentials from any 
provider in the chain\n  at 
org.elasticsearch.repositories.s3.S3Repository.init()\n  at 
org.elasticsearch.repositories.s3.S3Repository\n  at 
Key[type=org.elasticsearch.repositories.Repository, annotation=[none]]\n\n1 
error]; nested: AmazonClientException[Unable to load AWS credentials from any 
provider in the chain]; ,

   status: 500

}




Any ideas?



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3749a26a-75cb-4ae7-a3bb-8160a7030847%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0F2F0A65-69F5-4581-A3A1-5CAD8010D092%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Odd behavior with AND condition

2014-05-12 Thread mdj2
This appears to be caused by the snowball analyzer which is used on the 
tags field. To reproduce the odd behavior:

curl -XDELETE http://localhost:9200/haystack;

curl -XPOST http://localhost:9200/haystack/; -d '
{
   settings:{
  index:{}
   }
}'

curl -XPOST http://localhost:9200/haystack/modelresult/_mapping; -d '
{
  modelresult : {
_boost : {
  name : boost,
  null_value : 1.0
},
properties : {
  assigned_to : {
type : string,
term_vector : with_positions_offsets,
analyzer : snowball
  },
  clipped_from : {
type : long,
index : analyzed
  },
  created_by : {
type : long,
index : analyzed
  },
  django_ct : {
type : string
  },
  django_id : {
type : string
  },
  id : {
type : string
  },
  org : {
type : long,
index : analyzed
  },
  tags : {
type : string,
store : true,
term_vector : with_positions_offsets,
analyzer : snowball
  },
  text : {
type : string,
store : true,
term_vector : with_positions_offsets,
analyzer : snowball
  },
  type : {
type : long,
index : analyzed
  }
}
  }
}'

curl -XPOST http://localhost:9200/haystack/modelresult/; -d '{
 assigned_to: [],
 created_by: 1,
 django_ct: preparations.preparation,
 django_id: 37,
 id: preparations.preparation.37,
org: 1,
tags: [
foo
],
text: Wildlife.wmv\n:)\n,
type: 2


}'


echo Shows no results (good)
curl http://127.0.0.1:9200/haystack/_search?q=(tags%3A(%22a%22))pretty 


echo Should show no results, but finds a match

curl 
http://127.0.0.1:9200/haystack/_search?q=(org%3A(%221%22)%20AND%20tags%3A(%22a%22))pretty

Switching the tags field to the standard analyzer fixes the problem.


On Friday, May 9, 2014 3:31:08 PM UTC-7, md...@pdx.edu wrote:

 When I run the query (tags:(a)) in elasticsearch, I get 0 results. My 
 query URL looks like:

 http://127.0.0.1:9200/haystack/_search?q=(tags%3A(%22a%22))

 That is to be expected, since no objects have a tag set to a.

 Now when I change the condition, and add an AND, (org:(1) AND 
 tags:(a)), *I get 3 results back*! The query URL looks like:

 http://127.0.0.1:9200/haystack/_search?q=(org%3A(%221%22)%20AND%20tags%3A(%22a%22))

 Getting *more* results back does not make any sense to me. I would expect 
 that kind of behavior with the OR operator, but AND? What is going on?

 (This is a cross post from 
 stackoverflowhttp://stackoverflow.com/questions/23568699/odd-behavior-with-and-condition-in-elasticsearch
 )


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fcaddf25-7aa6-46e0-873d-d52d349ad5af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: unable to write data to elasticsearch using hadoop PIG

2014-05-12 Thread hanine haninne
thank you so much for your quick reply,
Here is what I had done
1-instaled hadoop-1.2.1( pig-0.12.0 / hive-0.11.0 /...)
2-download Elasticsearch-1.0.1 and put it in the same file of hadoop
3-copied  the following 4 elasticsearch-hadoop jar files
elasticsearch-hadoop-1.3.0.M2.jar  
elasticsearch-hadoop-1.3.0.M2-sources.jar
elasticsearch-hadoop-1.3.0.M2-javadoc.jar  
elasticsearch-hadoop-1.3.0.M2-yarn.jar
to /pig and hadoop/lib
4- Add them in the PIG_CLASSPATH

knowing that when I take data from my Desktop and put it in elasticsearch 
using pig script it works very well, but when I try to get data from my 
HDFS it gives me that :

2014-05-12 23:16:31,765 [main] ERROR 
org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.io.IOException: 
Out of nodes and retries; caught exception
2014-05-12 23:16:31,765 [main] ERROR 
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-05-12 23:16:31,766 [main] INFO  
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures
1.2.10.12.0hduser2014-05-12 23:15:342014-05-12 23:16:31
GROUP_BY

Failed!

Failed Jobs:
JobIdAliasFeatureMessageOutputs
job_201405122310_0001weblog_count,weblog_group,weblogs
GROUP_BY,COMBINERMessage: Job failed! Error - # of failed Reduce Tasks 
exceeded allowed limit. FailedCount: 1. LastFailedTask: 
task_201405122310_0001_r_00weblogs1/logs2,

Input(s):
Failed to read data from /user/weblogs

Output(s):
Failed to produce result in weblogs1/logs2

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201405122310_0001


2014-05-12 23:16:31,766 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Failed!


And here is the script :

weblogs = LOAD '/user/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, 
group.month_num, COUNT_STAR(weblogs) as pageviews;
STORE weblog_count INTO 'weblogs1/logs2' USING 
org.elasticsearch.hadoop.pig.EsStorage();


Le lundi 12 mai 2014 16:28:20 UTC+1, Costin Leau a écrit :

 Check your network settings and make sure that the Hadoop nodes can 
 communicate with the ES nodes. 
 If you install ES besides Hadoop itself, this shouldn't be a problem. 
 There are various way to check this - try ping, tracert, etc... 

 Please refer to your distro manual/documentation for more information 
 about the configuration and setup. 

 Cheers, 

 On 5/12/14 3:42 PM, hanine haninne wrote: 
  I had get the same erreur but I don't know what I have to change in my 
 /etc/hosts 
  thank you for your help 
  
  Le mercredi 5 mars 2014 09:39:46 UTC, Yann Barraud a écrit : 
  
  Hi, 
  
  Is your ES instance known by your Hadoop cluster (/etc/hosts) ? 
  
  It does not even seems to read in it. 
  
  Cheers, 
  Yann 
  
  Le mercredi 5 mars 2014 06:32:55 UTC+1, siva mannem a écrit : 
  
  I installed ES(at the location /usr/lib/elasticsearch/) on our 
 gateway server and i am able to run some basic 
  curl commands like XPUT and XGET to create some indices and 
 retrieve the data in them. 
  i am able to give single line JSON record but i am unable to 
 give JSON file as input to curl XPUT . 
  can anybody give me the syntax for giving JSON file as input for 
 curl XPUT command? 
  
  my next issue is i copied  the following 4 elasticsearch-hadoop 
 jar files 
  elasticsearch-hadoop-1.3.0.M2.jar 
  elasticsearch-hadoop-1.3.0.M2-sources.jar 
  elasticsearch-hadoop-1.3.0.M2-javadoc.jar 
  elasticsearch-hadoop-1.3.0.M2-yarn.jar 
  
  to  /usr/lib/elasticsearch/elasticsearch-0.90.9/lib 
  and /usr/lib/gphd/pig/ 
  
  i have the following json file j.json 
  ++ 
  {k1:v1 ,  k2:v2 , k3:v3} 
   
  
  in my_hdfs_path. 
  
  my pig script is write_data_to_es.pig 
  + 
  REGISTER 
 /usr/lib/gphd/pig/elasticsearch-hadoop-1.3.0.M2-yarn.jar; 
  DEFINE ESTOR 
 org.elasticsearch.hadoop.pig.EsStorage('es.resource=usa/ca'); 
  A = LOAD '/my_hdfs_path/j.json' using 
 JsonLoader('k1:chararray,k2:chararray,k3:chararray'); 
  STORE A into 'usa/ca' USING ESTOR('es.input.json=true'); 
  ++ 
  
  when i run my pig script 
  + 

Aggregate children data

2014-05-12 Thread Vlad Mangeym
I am looking for help/ideas/examples. Thank you in advance for your help.

I have 3 types of docs loaded into elastic search: parent, child1, child2 
(child1, child2 have parent type set as parent in the mapping).
parent documents have among other fields factor1:[some number], 
factor2:[some number]. There are about 20,000 documents each with multiple 
children of both types, with total number or children of each type across all 
parents 250,000 (500,000 total). Each child document has a field amount:[some 
number].

I need to find a value for each parent that is the sum of all child1 amounts 
multiplied by parent factor1 divided by sum of all child2 amounts 
multiplied by factor2.

I need to be able to get it for each parent and to be able to search by it: 
find all parents that have value between 0.4 and 0.8 for example...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9ca7c521-7402-41e5-8610-c05444e408f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query question

2014-05-12 Thread 曾岩
Thank you very much!

--
Eateral

在 2014年5月8日星期四UTC+8上午12时10分42秒,Ivan Brusic写道:

 Your two clauses, mode and schedule, are joined via an AND, so those two 
 clauses should be part of the *must *section. The schedule clauses is 
 then an OR between two clauses, so it should be a nested bool filter using 
 *should*. Hopefully that made sense. :)

 Since you are using term queries on what are hopefully non-analyzed fields 
 (numeric fields are always non-analyzed), I will use a match all query with 
 filters since it should be more efficient. The query should looking 
 something like:

 {
query: {
   filtered: {
  query: {
 match_all: {}
  },
  filter: {
 bool: {
must: [
   {
  term: { mode: 1 }
   },
   {
  bool: {
 should: [
{
   term: { schedule: 1 }
},
{
   term: { schedule: 3 }
}
 ]
  }
   }
]
 }
  }
   }
}
 } 

 -- 
 Ivan


 On Mon, May 5, 2014 at 3:36 AM, 曾岩 eate...@gmail.com javascript:wrote:

 Hi,

 I'm new to Elasticsearch and try to integrate it into our project but met 
 a problem. In our data source, it has two fields: mode and schedule which 
 are all integer. Through UI, it should can query records based on these two 
 fields like: 
 *SELECT * FROM doc WHERE mode = 1 AND (schedule = 1 OR schedule = 3)*

 I tried below query JSONs but none return the expected results, anyone 
 can help? Thank you!

 *{*
 *  query: {*
 *bool: {*
 *  must: [*
 *{ match: { mode: 1 } }*
 *  ],*
 *  should: [*
 * { match: { schedule: 1 } },*
 *{ match: { schedule: 3 } }*
 *  ]*
 *}*
 *  }*
 *}*
 ---
 *{  query: { filtered: {  query: { match_all: {} },  
 filter: {and : [{term 
 : { mode : 1 } }]},  filter: 
 {and : [{term : { 
 schedule : 1 } }, {term : { 
 schedule : 3 }}]   }}  }}*

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/45bd7de6-ffe9-4d9f-bef6-be11e19b051f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/45bd7de6-ffe9-4d9f-bef6-be11e19b051f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e8e2332-d5b7-416b-a16e-f965884cc42f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


geohash_precision Units?

2014-05-12 Thread Michael Sander
What are the units of geohash_precision when no unit is explicitly 
specified?  

The docs currently 
statehttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-geo-point-type.html
:

If you enable the geohash option, a geohash “sub-field” will be indexed as, 
eg pin.geohash. The length of the geohash is controlled by the 
geohash_precision parameter, *which can either be set to an absolute length 
(eg **12**,* the default) or to a distance (eg 1km).


When you set geohash_precision to an absolute length, what units are you 
setting it to? Meters? This seems like such a simple question that I bet 
I'm missing something.

Thanks,

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/619a8e97-9bc0-4713-9088-a82897c8d605%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch on ZFS best practice

2014-05-12 Thread Patrick Proniewski
Hello,

I'm running an Elasticsearch node on a FreeBSD server, on top of ZFS storage. 
For now I've considered that ES is smart and manages its own cache, so I've 
disabled primary cache for data, leaving only metadata being cacheable. Last 
thing I want is to have data cached twice, one time is ZFS ARC and a second 
time in application's own cache. I've also disabled compression:

$ zfs get compression,primarycache,recordsize  zdata/elasticsearch
NAME PROPERTY  VALUE SOURCE
zdata/elasticsearch  compression   off   local
zdata/elasticsearch  primarycache  metadata  local
zdata/elasticsearch  recordsize128K  default

It's a general purpose server (web, mysql, mail, ELK, etc.). I'm not looking 
for absolute best ES performance, I'm looking for best use of my resources.
I have 16 GB RAM, and I plan to put a limit to ARC size (currently consuming 
8.2 GB RAM) so I can mlockall ES memory. But I don't think I'll go the RAM-only 
storage route 
(http://jprante.github.io/applications/2012/07/26/Mmap-with-Lucene.html) as 
I'm running only one node.

How can I estimate the amount of memory I must allocate to ES process?

Should I switch primarycache=all back on despite ES already caching data?

What is the best ZFS record/block size to accommodate Elasticsearch/Lucene IOs?

Thanks,
Patrick

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/FBBA84AE-D610-4060-AFBC-FC7D5BA0803F%40patpro.net.
For more options, visit https://groups.google.com/d/optout.