filter bitsets
I was reading this blog post about filter bitsets http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/ At the end, the conclusion is that the bool filter should be used for everything but geo, numeric range and custom script. However the example query at the end seems to include a range in the bool filter. Is this a typo or am I missing something? Also what about query filters? I would assume those wouldn't go in the bool filter either? { and : [ { bool : { must : [ { term : {} }, { range : {} }, { term : {} } ] } }, { custom_script : {} }, { geo_distance : {} } ] } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c50efe8-dce3-4cfc-a78f-ab0417884201%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Lot of GC in elasticsearch node.
My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem). Elasticsearch starts as- 498 31810 99.6 64.6 163846656 4976944 ? Sl 06:03 26:10 /usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch The node stopped responding (the ip:9200 status page), and so did kibana. It started working fine on a restart. i have logstash format docs wherein the index rotates daily. Stats: Daily: ~11G docs, ~15 million. Total: 195G docs, ~300 million. The logs of the time when it stopped responding are- [2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:40:52,293][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s], total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [150.3mb]-[1.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:45:32,191][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s], total [5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [197.4mb]-[9.3mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.5gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:06:01,224][INFO ][monitor.jvm ] [Hannibal King] [gc][old][241926][35985] duration [6s], collections [1]/[6.2s], total [6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [134.7mb]-[9.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:08:14,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s], total [5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [165.1mb]-[2.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:09:07,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242096][36011] duration [6.2s], collections [1]/[6.7s], total [6.2s]/[4.7h], memory [3.9gb]-[3.6gb]/[3.9gb], all_pools {[young] [265.9mb]-[2.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:10:08,387][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242152][36020] duration [5.4s], collections [1]/[5.6s], total [5.4s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] [176.5mb]-[5.8mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:13:12,774][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242326][36046] duration [5.6s], collections [1]/[5.8s], total [5.6s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] [167.4mb]-[12.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:14:22,729][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242386][36057] duration [6.3s], collections [1]/[6.5s], total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [224.2mb]-[3.5mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:15:12,192][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242431][36064] duration [5.2s], collections [1]/[5.4s], total [5.2s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [234mb]-[2.4mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:15:32,344][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242445][36067] duration [6.3s], collections [1]/[7.1s], total [6.3s]/[4.7h], memory [3.6gb]-[3.7gb]/[3.9gb], all_pools {[young] [1.2mb]-[34.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:15:39,627][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242446][36068] duration [6.7s], collections [1]/[7.2s], total [6.7s]/[4.7h], memory [3.7gb]-[3.7gb]/[3.9gb], all_pools {[young] [34.7mb]-[45.7mb]/[266.2mb]}{[survivor] [0b]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:15:51,547][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242448][36070] duration
Re: filter bitsets
also one followup question. if i do a terms filter and then a query filter, should put the terms filter in a bool with a single clause? It seems strange to do so, but he following passage leads made me wonder if this is the case: It matters because the Bool filter utilizes BitSets while the And/Or/Not filters do not. If you put a Terms Filter inside of an And…no BitSet will be used, even though it exists. On Monday, May 12, 2014 2:14:16 AM UTC-4, slushi wrote: I was reading this blog post about filter bitsets http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/ At the end, the conclusion is that the bool filter should be used for everything but geo, numeric range and custom script. However the example query at the end seems to include a range in the bool filter. Is this a typo or am I missing something? Also what about query filters? I would assume those wouldn't go in the bool filter either? { and : [ { bool : { must : [ { term : {} }, { range : {} }, { term : {} } ] } }, { custom_script : {} }, { geo_distance : {} } ] } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/71c46a63-8608-4c78-9176-8ef430ada278%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Lot of GC in elasticsearch node.
You need to reduce your data size, add more memory or add another node. Basically, you've reached the limits of that node. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 12 May 2014 16:38, Abhishek Tiwari erb...@gmail.com wrote: My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem). Elasticsearch starts as- 498 31810 99.6 64.6 163846656 4976944 ? Sl 06:03 26:10 /usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch The node stopped responding (the ip:9200 status page), and so did kibana. It started working fine on a restart. i have logstash format docs wherein the index rotates daily. Stats: Daily: ~11G docs, ~15 million. Total: 195G docs, ~300 million. The logs of the time when it stopped responding are- [2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:40:52,293][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s], total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [150.3mb]-[1.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:45:32,191][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s], total [5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [197.4mb]-[9.3mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.5gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:06:01,224][INFO ][monitor.jvm ] [Hannibal King] [gc][old][241926][35985] duration [6s], collections [1]/[6.2s], total [6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [134.7mb]-[9.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:08:14,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s], total [5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [165.1mb]-[2.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:09:07,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242096][36011] duration [6.2s], collections [1]/[6.7s], total [6.2s]/[4.7h], memory [3.9gb]-[3.6gb]/[3.9gb], all_pools {[young] [265.9mb]-[2.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:10:08,387][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242152][36020] duration [5.4s], collections [1]/[5.6s], total [5.4s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] [176.5mb]-[5.8mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:13:12,774][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242326][36046] duration [5.6s], collections [1]/[5.8s], total [5.6s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] [167.4mb]-[12.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:14:22,729][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242386][36057] duration [6.3s], collections [1]/[6.5s], total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [224.2mb]-[3.5mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:15:12,192][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242431][36064] duration [5.2s], collections [1]/[5.4s], total [5.2s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [234mb]-[2.4mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:15:32,344][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242445][36067] duration [6.3s], collections [1]/[7.1s], total [6.3s]/[4.7h], memory [3.6gb]-[3.7gb]/[3.9gb], all_pools {[young] [1.2mb]-[34.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:15:39,627][INFO ][monitor.jvm ] [Hannibal King]
Re: Lot of GC in elasticsearch node.
add more memory i am doing 15 million docs, which total to ~9G. The average doc size is ~2KB. 1. How much memory would you suggest for my use-case? 2.Also, is it prudent for me to have half of OS memory dedicated to elasticsearch? On Monday, 12 May 2014 14:03:19 UTC+5:30, Mark Walkom wrote: You need to reduce your data size, add more memory or add another node. Basically, you've reached the limits of that node. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 12 May 2014 16:38, Abhishek Tiwari erb...@gmail.com javascript:wrote: My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem). Elasticsearch starts as- 498 31810 99.6 64.6 163846656 4976944 ? Sl 06:03 26:10 /usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch The node stopped responding (the ip:9200 status page), and so did kibana. It started working fine on a restart. i have logstash format docs wherein the index rotates daily. Stats: Daily: ~11G docs, ~15 million. Total: 195G docs, ~300 million. The logs of the time when it stopped responding are- [2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:40:52,293][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s], total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [150.3mb]-[1.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:45:32,191][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s], total [5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [197.4mb]-[9.3mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.5gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:06:01,224][INFO ][monitor.jvm ] [Hannibal King] [gc][old][241926][35985] duration [6s], collections [1]/[6.2s], total [6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [134.7mb]-[9.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:08:14,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s], total [5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [165.1mb]-[2.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:09:07,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242096][36011] duration [6.2s], collections [1]/[6.7s], total [6.2s]/[4.7h], memory [3.9gb]-[3.6gb]/[3.9gb], all_pools {[young] [265.9mb]-[2.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:10:08,387][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242152][36020] duration [5.4s], collections [1]/[5.6s], total [5.4s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] [176.5mb]-[5.8mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:13:12,774][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242326][36046] duration [5.6s], collections [1]/[5.8s], total [5.6s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] [167.4mb]-[12.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:14:22,729][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242386][36057] duration [6.3s], collections [1]/[6.5s], total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [224.2mb]-[3.5mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:15:12,192][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242431][36064] duration [5.2s], collections [1]/[5.4s], total [5.2s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [234mb]-[2.4mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]}
Retrieve mapping of an index using pyes library
Hello, I need a quick help. I need to know how to get mapping of an index using pyes library. I know a CURL command to achieve the same but I need an API. Alternatively, if I can get some API to get index name for a given alias, that would also do. I found get_alias(alias) but it is a deprecated in pyes 0.19 version. Please help me. Thanks, Nishidha -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1bcc324a-968d-4623-b2c3-46fe1bc5274b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Lot of GC in elasticsearch node.
It's standard practise to use 50% of system memory for the heap. How much RAM you need depends on how long you want to keep your data around for. So, given you have ~200GB now on 4GB of RAM, you can probably extrapolate that out based on your needs. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 12 May 2014 19:33, Abhishek Tiwari erb...@gmail.com wrote: add more memory i am doing 15 million docs, which total to ~9G. The average doc size is ~2KB. 1. How much memory would you suggest for my use-case? 2.Also, is it prudent for me to have half of OS memory dedicated to elasticsearch? On Monday, 12 May 2014 14:03:19 UTC+5:30, Mark Walkom wrote: You need to reduce your data size, add more memory or add another node. Basically, you've reached the limits of that node. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 12 May 2014 16:38, Abhishek Tiwari erb...@gmail.com wrote: My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem). Elasticsearch starts as- 498 31810 99.6 64.6 163846656 4976944 ? Sl 06:03 26:10 /usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX: CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/ elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/ share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch The node stopped responding (the ip:9200 status page), and so did kibana. It started working fine on a restart. i have logstash format docs wherein the index rotates daily. Stats: Daily: ~11G docs, ~15 million. Total: 195G docs, ~300 million. The logs of the time when it stopped responding are- [2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:40:52,293][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s], total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [150.3mb]-[1.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:45:32,191][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s], total [5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [197.4mb]-[9.3mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.5gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:06:01,224][INFO ][monitor.jvm ] [Hannibal King] [gc][old][241926][35985] duration [6s], collections [1]/[6.2s], total [6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [134.7mb]-[9.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:08:14,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s], total [5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [165.1mb]-[2.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:09:07,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242096][36011] duration [6.2s], collections [1]/[6.7s], total [6.2s]/[4.7h], memory [3.9gb]-[3.6gb]/[3.9gb], all_pools {[young] [265.9mb]-[2.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:10:08,387][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242152][36020] duration [5.4s], collections [1]/[5.6s], total [5.4s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] [176.5mb]-[5.8mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:13:12,774][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242326][36046] duration [5.6s], collections [1]/[5.8s], total [5.6s]/[4.7h], memory [3.8gb]-[3.5gb]/[3.9gb], all_pools {[young] [167.4mb]-[12.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:14:22,729][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242386][36057] duration [6.3s], collections [1]/[6.5s], total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [224.2mb]-[3.5mb]/[266.2mb]}{[survivor]
Marvel: sudden errors with index_stats type
Hi, I'm sort of new to ES and currently evaluating how well it will work for storing and querying our log data. I installed Marvel and it worked fine for a while but now I'm suddenly getting weird errors and the index stats are gone from Kibana: [2014-05-09 13:35:43,035][ERROR][marvel.agent.exporter] [Bushmaster] create failure (index:[.marvel-2014.05.09] type: [index_stats]): MapperParsingException[failed to parse [index]]; nested: MapperParsingException[failed to parse date field [.marvel-2014.05.08], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: .marvel-2014.05.08]; [2014-05-09 13:35:43,037][ERROR][marvel.agent.exporter] [Bushmaster] create failure (index:[.marvel-2014.05.09] type: [index_stats]): MapperParsingException[failed to parse [index]]; nested: MapperParsingException[failed to parse date field [.marvel-2014.05.09], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: .marvel-2014.05.09]; I created a gist detailing the steps I took so far trying to understand/debug what's going on. While debugging, I realized that I'm unable to recreate a marvel index with the same mapping that it normally has (either created by marvel or by ES automapping). I detailed this in the gist, too: https://gist.github.com/poohsen/a3d3bb319010bf0c5648 Since I didn't temper with Marvel indices to begin with, I don't really see how it suddenly could've gone wrong other than because of a bug. Any hints are appreciated. Regards, Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/61d54877-c232-4a50-ae5d-301316fcc19a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is it possible to highlight the text with respect to no. of lines instead of no. of fragments
Thank u Nik for your reply. On Friday, 9 May 2014 20:25:55 UTC+5:30, Nikolas Everett wrote: On Fri, May 9, 2014 at 8:29 AM, Anand kumar anand...@gmail.comjavascript: wrote: Am having an index of huge content, from which I just want to have the highlighting of the specific text. The highlighted text might have appear as many as times, all I want to have two or more lines before and after the line of highlighted text, by which i can have a snippet of text with highlighted parts are in the middle of the snippet, so that they can be easily located and identified from a huge content of file. Is it possible? The only segmentation options are based on characters, sentences, and grabbing the contents of the whole field. The trick with lines is, unless the text contains explicit new lines and you only wrap on new lines, then you have to estimate line breaks based on the rendering context. Stuff like width in pixels and the font. If you want to be precise you need the screen dpi as well and a font rendering engine that works similarly. Some contexts don't properly render ligatures, some do. And its 1000x worse when you leave English and go to something like Arabic or Sanskrit. There be dragons. But, if you are talking about code, or something else with explicit newlines and that only wraps on newlines, then the answer is still no, but it wouldn't be hard to implement. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/892cbfdf-5783-4a0b-a06c-328c0e3f2b72%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Lot of GC in elasticsearch node.
How much RAM you need depends on how long you want to keep your data around for. So, given you have ~200GB now on 4GB of RAM, you can probably extrapolate that out based on your needs. Isn't my problem more with 9G *daily* index, than with total of 200G(20 days x 9G) indexes? Correct me if i am wrong here but doesn't kibana ask elasticsearch for just one day/week of indices(based on the query). Will elasticsearch really care if i have 500 days of total day-wise segregated indices out there but am performing queries on *just past 7 days*? Is this a total-footprint problem or a daily throughput problem? On Monday, 12 May 2014 15:30:36 UTC+5:30, Mark Walkom wrote: It's standard practise to use 50% of system memory for the heap. How much RAM you need depends on how long you want to keep your data around for. So, given you have ~200GB now on 4GB of RAM, you can probably extrapolate that out based on your needs. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 12 May 2014 19:33, Abhishek Tiwari erb...@gmail.com javascript:wrote: add more memory i am doing 15 million docs, which total to ~9G. The average doc size is ~2KB. 1. How much memory would you suggest for my use-case? 2.Also, is it prudent for me to have half of OS memory dedicated to elasticsearch? On Monday, 12 May 2014 14:03:19 UTC+5:30, Mark Walkom wrote: You need to reduce your data size, add more memory or add another node. Basically, you've reached the limits of that node. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 12 May 2014 16:38, Abhishek Tiwari erb...@gmail.com wrote: My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem). Elasticsearch starts as- 498 31810 99.6 64.6 163846656 4976944 ? Sl 06:03 26:10 /usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX: CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/ elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/ share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch The node stopped responding (the ip:9200 status page), and so did kibana. It started working fine on a restart. i have logstash format docs wherein the index rotates daily. Stats: Daily: ~11G docs, ~15 million. Total: 195G docs, ~300 million. The logs of the time when it stopped responding are- [2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:40:52,293][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s], total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [150.3mb]-[1.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:45:32,191][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s], total [5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [197.4mb]-[9.3mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.5gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:06:01,224][INFO ][monitor.jvm ] [Hannibal King] [gc][old][241926][35985] duration [6s], collections [1]/[6.2s], total [6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [134.7mb]-[9.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:08:14,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s], total [5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [165.1mb]-[2.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:09:07,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242096][36011] duration [6.2s], collections [1]/[6.7s], total [6.2s]/[4.7h], memory [3.9gb]-[3.6gb]/[3.9gb], all_pools {[young] [265.9mb]-[2.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:10:08,387][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242152][36020] duration [5.4s],
Re: Lot of GC in elasticsearch node.
Yes, kibana will load for whatever you ask for, *but* ES has to maintain index metadata for every index in memory. Those two coupled are pushing things too far for your heap. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 12 May 2014 20:45, Abhishek Tiwari erb...@gmail.com wrote: How much RAM you need depends on how long you want to keep your data around for. So, given you have ~200GB now on 4GB of RAM, you can probably extrapolate that out based on your needs. Isn't my problem more with 9G *daily* index, than with total of 200G(20 days x 9G) indexes? Correct me if i am wrong here but doesn't kibana ask elasticsearch for just one day/week of indices(based on the query). Will elasticsearch really care if i have 500 days of total day-wise segregated indices out there but am performing queries on *just past 7 days*? Is this a total-footprint problem or a daily throughput problem? On Monday, 12 May 2014 15:30:36 UTC+5:30, Mark Walkom wrote: It's standard practise to use 50% of system memory for the heap. How much RAM you need depends on how long you want to keep your data around for. So, given you have ~200GB now on 4GB of RAM, you can probably extrapolate that out based on your needs. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 12 May 2014 19:33, Abhishek Tiwari erb...@gmail.com wrote: add more memory i am doing 15 million docs, which total to ~9G. The average doc size is ~2KB. 1. How much memory would you suggest for my use-case? 2.Also, is it prudent for me to have half of OS memory dedicated to elasticsearch? On Monday, 12 May 2014 14:03:19 UTC+5:30, Mark Walkom wrote: You need to reduce your data size, add more memory or add another node. Basically, you've reached the limits of that node. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 12 May 2014 16:38, Abhishek Tiwari erb...@gmail.com wrote: My elasticsearch node is a AWS EC2 c3.xlarge (7.5G mem). Elasticsearch starts as- 498 31810 99.6 64.6 163846656 4976944 ? Sl 06:03 26:10 /usr/bin/java *-Xms4g -Xmx4g -Xss256k* -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX: CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/ elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/s hare/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/ elasticsearch org.elasticsearch.bootstrap.Elasticsearch The node stopped responding (the ip:9200 status page), and so did kibana. It started working fine on a restart. i have logstash format docs wherein the index rotates daily. Stats: Daily: ~11G docs, ~15 million. Total: 195G docs, ~300 million. The logs of the time when it stopped responding are- [2014-05-12 03:39:08,789][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:40:52,293][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240428][35773] duration [6.3s], collections [1]/[6.5s], total [6.3s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [150.3mb]-[1.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 03:44:11,739][INFO ][cluster.metadata ] [Hannibal King] [logstash-2014.05.12] update_mapping [medusa_ex] (dynamic) [2014-05-12 03:45:32,191][INFO ][monitor.jvm ] [Hannibal King] [gc][old][240703][35812] duration [5.2s], collections [1]/[5.8s], total [5.2s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [197.4mb]-[9.3mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.5gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:06:01,224][INFO ][monitor.jvm ] [Hannibal King] [gc][old][241926][35985] duration [6s], collections [1]/[6.2s], total [6s]/[4.7h], memory [3.7gb]-[3.6gb]/[3.9gb], all_pools {[young] [134.7mb]-[9.9mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.5gb]/[3.6gb]} [2014-05-12 04:08:14,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242049][36004] duration [5.8s], collections [1]/[5.9s], total [5.8s]/[4.7h], memory [3.8gb]-[3.6gb]/[3.9gb], all_pools {[young] [165.1mb]-[2.7mb]/[266.2mb]}{[survivor] [33.2mb]-[0b]/[33.2mb]}{[old] [3.6gb]-[3.6gb]/[3.6gb]} [2014-05-12 04:09:07,473][INFO ][monitor.jvm ] [Hannibal King] [gc][old][242096][36011] duration [6.2s], collections
Is there a quick way to set the data dir for ElasticsearchIntegrationTest ?
I want elastic to put its data under the target dir ... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e945762c-a26d-464d-bccb-62493373cdb6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Matching on sibling json nodes ?
We're submitting a json document that looks like this: { book: { title : book1, authors: [ {name:auth1, role:role1}, {name:auth2, role:role2} ] } } We would like to do searches that find this for a search on auth1/role1 but *not* for auth1/role2. We have used nested queries to make this work, but unfortunately nested queries dont work with highlighting. Is there any other way to accomplish this ? (We are contemplating simply filing a new field that combines name and role, but that would also have some drawbacks). Kristian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0b062a6d-5766-4dee-96a5-27aff637f56b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Matching on sibling json nodes ?
Hi Kristian You can use nested objects and set include_in_parent to true (it's like using type:nested and type:object on the same field), then highlight on the fields in the parent object. clint On 12 May 2014 13:42, Kristian Rosenvold kristian.rosenv...@gmail.comwrote: We're submitting a json document that looks like this: { book: { title : book1, authors: [ {name:auth1, role:role1}, {name:auth2, role:role2} ] } } We would like to do searches that find this for a search on auth1/role1 but *not* for auth1/role2. We have used nested queries to make this work, but unfortunately nested queries dont work with highlighting. Is there any other way to accomplish this ? (We are contemplating simply filing a new field that combines name and role, but that would also have some drawbacks). Kristian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0b062a6d-5766-4dee-96a5-27aff637f56b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0b062a6d-5766-4dee-96a5-27aff637f56b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSJ%3Dq2-koCpKd3ALdyw9eHVfb0RRmDm3PwZ_Ln%2BZ0-UOg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: unable to write data to elasticsearch using hadoop PIG
I had get the same erreur but I don't know what I have to change in my /etc/hosts thank you for your help Le mercredi 5 mars 2014 09:39:46 UTC, Yann Barraud a écrit : Hi, Is your ES instance known by your Hadoop cluster (/etc/hosts) ? It does not even seems to read in it. Cheers, Yann Le mercredi 5 mars 2014 06:32:55 UTC+1, siva mannem a écrit : I installed ES(at the location /usr/lib/elasticsearch/) on our gateway server and i am able to run some basic curl commands like XPUT and XGET to create some indices and retrieve the data in them. i am able to give single line JSON record but i am unable to give JSON file as input to curl XPUT . can anybody give me the syntax for giving JSON file as input for curl XPUT command? my next issue is i copied the following 4 elasticsearch-hadoop jar files elasticsearch-hadoop-1.3.0.M2.jar elasticsearch-hadoop-1.3.0.M2-sources.jar elasticsearch-hadoop-1.3.0.M2-javadoc.jar elasticsearch-hadoop-1.3.0.M2-yarn.jar to /usr/lib/elasticsearch/elasticsearch-0.90.9/lib and /usr/lib/gphd/pig/ i have the following json file j.json ++ {k1:v1 , k2:v2 , k3:v3} in my_hdfs_path. my pig script is write_data_to_es.pig + REGISTER /usr/lib/gphd/pig/elasticsearch-hadoop-1.3.0.M2-yarn.jar; DEFINE ESTOR org.elasticsearch.hadoop.pig.EsStorage('es.resource=usa/ca'); A = LOAD '/my_hdfs_path/j.json' using JsonLoader('k1:chararray,k2:chararray,k3:chararray'); STORE A into 'usa/ca' USING ESTOR('es.input.json=true'); ++ when i run my pig script + pig -x mapreduce write_data_to_es.pig i am getting following error + Input(s): Failed to read data from /my_hdfs_path/j.json Output(s): Failed to produce result in usa/ca Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1390436301987_0089 2014-03-05 00:26:50,839 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2014-03-05 00:26:50,841 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Input(s): Failed to read data from /elastic_search/es_hadoop_test.json Output(s): Failed to produce result in mannem/siva Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1390436301987_0089 2014-03-05 00:26:50,839 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2014-03-05 00:26:50,841 [main] ERROR org.apache.pig.tools.grunt.GruntParser - *ERROR 2997: Encountered IOException. Out of nodes and retries; caught exception* Details at logfile: /usr/lib/elasticsearch/elasticsearch-0.90.9/pig_1393997175206.log i am using pivotal hadoop version (1.0.1) which is basically apache hadoop (hadoop-2.0.2) and pig version is 0.10.1 and elastic search version is 0.90.9 can anybody help me out here? thank you so much in advance for your help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1dd8ff7d-ef53-4614-9300-13b5f6ed66fa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How to use operators in elasticsearch percolator query?
Hi have installed ES 1.0.1 and tried to check the percolator query with AND operator. Added a query to percolator: curl -XPUT 'localhost:9200/testperc/.percolator/1' -d '{query:{match:{message : hero AND shine}}}' When tried to check for hero word that query is matching ideally it should not happen. curl -XGET 'localhost:9200/testperc/message/_percolate' -d '{doc : {message : hero}}' {took:10,_shards:{total:5,successful:5,failed:0},total:1,matches:[{_index:testperc,_id:1}]} Please help me how can i use the operators in percolator query? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3abd96e1-6abb-4411-915b-9c24a352ba23%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Disk changes forced resync
Not doing any monitoring yet - this is my dev cluster running on 3 workstations. I thought I was quick enough that the rebalance wouldn't have marched ahead and changed much - clearly my admin skills need sharpening! Is there a way to get the cluster to avoid rebalancing when a node is removed from the cluster? I wouldn't want a cluster rebalance starting just because I'm patching the OS and need a reboot. Thanks Duncan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8eed4547-1ca7-4fc7-b291-529653618f94%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Script average value over hits
Hi! I want to make a script that does some statistics, as how I understand it, ES don't do statistic the way I need it to. The search I do returns multiple hits like this: http://pastebin.com/UkQjDXhm What I want to do is an average of time after 13:37 independent of date. So what I think would do it is creating a date object and push parseFloat(date.getMinutes()+.+date.getSeconds()), and then average it in the end and put it into the response. Wrote it like javascript, but I assume MVEL is the fastest language, so I'd like to do it in MVEL. Is this at all possible, or would I have to do it on the client side? Regards Jo Emil -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d30345b1-8126-4d89-8fd9-2897695086a7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Kibana Time Troubles
Any ideas here? On Tuesday, May 6, 2014 3:32:45 PM UTC-5, Tate Eskew wrote: Hello, Maybe someone can help me. My setup: AWS Servers using rsyslog (UTC time) Physical server in datacenter central syslog-ng server (CST). Logstash shipper is running on the central syslog-ng box (CST). It grabs the events coming in, mangles them, throws them into redis. Logstash indexer on another box grabs them out of redis, shoves them in elasticsearch. Everything works as expected for months now, the only problem I have is that the display in Kibana doesn't show the log events for 5 hours because of the Logstash shipper being CST (5 hours behind). Any idea on how to get it to display immediately? Logs display immediately if I send to the central log server from a server that is CST as well. Here is a sample from an AWS box (UTC) that is picked up by the central log server (CST) Is there any way to get Kibana to show the events as they come in correctly? We have lots of physical machines in our datacenters and they are all set to CST, but all of our AWS instances are set to UTC. As of right now, we don't want to change the central syslog server's timezone to UTC since it still resides in one of our data centers. Any ideas? Is this something we should try to fix at the Logstash config or is this a display fix for Kibana? Here is a sample from an AWS box (UTC) that is picked up by the central log server (CST) - Displays 5 hours later/incorrectly { _index: logstash-2014.05.06, _type: syslog, _id: mZvpk-_9T4WgA2zxlsxogA, _score: null, _source: { @version: 1, @timestamp: 2014-05-05T20:01:26.000-05:00, type: syslog, syslog_pri: 163, syslog_program: ubuntu, received_at: 2014-05-05 20:01:27 UTC, syslog_severity_code: 3, syslog_facility_code: 20, syslog_facility: local4, syslog_severity: error, @source_host: p-aws-emmaplatformsingle01, @message: trustinme, @host: p-aws-emmaplatformsingle01 }, sort: [ 1399338086000 ] } Here is a sample from a physical machine in one of our data centers (CST) that is picked up by the central logs server (CST) - Diplays instantly/correctly { _index: logstash-2014.05.06, _type: syslog, _id: SjWn9aJWRGKeshylyp1j2Q, _score: null, _source: { @version: 1, @timestamp: 2014-05-06T14:01:52.000-05:00, type: syslog, syslog_pri: 13, syslog_program: teskew, received_at: 2014-05-06 19:01:53 UTC, syslog_severity_code: 5, syslog_facility_code: 1, syslog_facility: user-level, syslog_severity: notice, @source_host: p-bna-apix01, @message: trustinme, @host: p-bna-apix01 }, sort: [ 1399402912000 ] } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e5f4158-8954-4ed6-85bf-cc7dc8099454%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Aggregation Names
Hey guys, I noticed this constraint on agg names https://github.com/elasticsearch/elasticsearch/commit/f1248e58 It's not mentioned in the guide or under breaking changes for 1.1.0 (instead the breaking change is buried in the issue, which is an enhancement) It seems to me the most convenient name for a terms agg is the field name itself, which very often contains the now non permited character '.' -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bfa4dce9-7e56-42b9-a4ab-702f456ce608%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Phrase Search with Proximity
What is the best way for us to perform a phrase search where we are concerned with the proximity between phrases (not terms)? I have looked at Match Query and Query String Query and I see how slop allows for Proximity between individual terms but we need to be able to do this at the level of phrases like: Quick Brown Fox NEAR(50) Flying Squirrel Thanks. -- Karen -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d1282168-fd85-4800-9b3c-08fd2ac848db%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: unable to write data to elasticsearch using hadoop PIG
Check your network settings and make sure that the Hadoop nodes can communicate with the ES nodes. If you install ES besides Hadoop itself, this shouldn't be a problem. There are various way to check this - try ping, tracert, etc... Please refer to your distro manual/documentation for more information about the configuration and setup. Cheers, On 5/12/14 3:42 PM, hanine haninne wrote: I had get the same erreur but I don't know what I have to change in my /etc/hosts thank you for your help Le mercredi 5 mars 2014 09:39:46 UTC, Yann Barraud a écrit : Hi, Is your ES instance known by your Hadoop cluster (/etc/hosts) ? It does not even seems to read in it. Cheers, Yann Le mercredi 5 mars 2014 06:32:55 UTC+1, siva mannem a écrit : I installed ES(at the location /usr/lib/elasticsearch/) on our gateway server and i am able to run some basic curl commands like XPUT and XGET to create some indices and retrieve the data in them. i am able to give single line JSON record but i am unable to give JSON file as input to curl XPUT . can anybody give me the syntax for giving JSON file as input for curl XPUT command? my next issue is i copied the following 4 elasticsearch-hadoop jar files elasticsearch-hadoop-1.3.0.M2.jar elasticsearch-hadoop-1.3.0.M2-sources.jar elasticsearch-hadoop-1.3.0.M2-javadoc.jar elasticsearch-hadoop-1.3.0.M2-yarn.jar to /usr/lib/elasticsearch/elasticsearch-0.90.9/lib and /usr/lib/gphd/pig/ i have the following json file j.json ++ {k1:v1 , k2:v2 , k3:v3} in my_hdfs_path. my pig script is write_data_to_es.pig + REGISTER /usr/lib/gphd/pig/elasticsearch-hadoop-1.3.0.M2-yarn.jar; DEFINE ESTOR org.elasticsearch.hadoop.pig.EsStorage('es.resource=usa/ca'); A = LOAD '/my_hdfs_path/j.json' using JsonLoader('k1:chararray,k2:chararray,k3:chararray'); STORE A into 'usa/ca' USING ESTOR('es.input.json=true'); ++ when i run my pig script + pig -x mapreduce write_data_to_es.pig i am getting following error + Input(s): Failed to read data from /my_hdfs_path/j.json Output(s): Failed to produce result in usa/ca Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1390436301987_0089 2014-03-05 00:26:50,839 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2014-03-05 00:26:50,841 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Input(s): Failed to read data from /elastic_search/es_hadoop_test.json Output(s): Failed to produce result in mannem/siva Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1390436301987_0089 2014-03-05 00:26:50,839 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2014-03-05 00:26:50,841 [main] ERROR org.apache.pig.tools.grunt.GruntParser - *ERROR 2997: Encountered IOException. Out of nodes and retries; caught exception* Details at logfile: /usr/lib/elasticsearch/elasticsearch-0.90.9/pig_1393997175206.log i am using pivotal hadoop version (1.0.1) which is basically apache hadoop (hadoop-2.0.2) and pig version is 0.10.1 and elastic search version is 0.90.9 can anybody help me out here? thank you so much in advance for your help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1dd8ff7d-ef53-4614-9300-13b5f6ed66fa%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/1dd8ff7d-ef53-4614-9300-13b5f6ed66fa%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Bulk Load Large Spatial Datasets
Hello. I'm trying to bulk load about 550k records with spatial data into ElasticSearch. After about 20 mins, an error occurs No Handlers Can Be Found For Logger elasticsearch', then the connection times out and the Python scripts stops. The Python loading script was working fine before adding the spatial data. Anyone have some ideas on how to load large spatial datasets? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2866bc2-f511-45ef-b780-3c7e275da3b6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Bulk Load Large Spatial Datasets
Hi Brian, that message you are seeing is not an error - it's a warning from the python logging system that you don't have any logging configured. So when elasticsearch tries to log something it cannot. I'd suggest to set up your logging and try again. To set up logging just include: import logging logging.basicConfig(level=logging.INFO) at the top of your script. On Mon, May 12, 2014 at 6:21 PM, Brian Behling brian.behl...@gmail.com wrote: Hello. I'm trying to bulk load about 550k records with spatial data into ElasticSearch. After about 20 mins, an error occurs No Handlers Can Be Found For Logger elasticsearch', then the connection times out and the Python scripts stops. The Python loading script was working fine before adding the spatial data. Anyone have some ideas on how to load large spatial datasets? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2866bc2-f511-45ef-b780-3c7e275da3b6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABfdDipHG3g9k%3D0i93iO1g5%3D1Pgi7GcLF3JMLgfY9dX8%3DSZ-bg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Bulk Load Large Spatial Datasets
Thank you. I did find the error causing this script to crash. Looks like there many invalid polygons (self intersecting) that is causing this problem. But this should be for another thread if the business rules dictate we can't simplify the geometries. On Monday, May 12, 2014 10:24:24 AM UTC-6, Honza Král wrote: Hi Brian, that message you are seeing is not an error - it's a warning from the python logging system that you don't have any logging configured. So when elasticsearch tries to log something it cannot. I'd suggest to set up your logging and try again. To set up logging just include: import logging logging.basicConfig(level=logging.INFO) at the top of your script. On Mon, May 12, 2014 at 6:21 PM, Brian Behling brian@gmail.comjavascript: wrote: Hello. I'm trying to bulk load about 550k records with spatial data into ElasticSearch. After about 20 mins, an error occurs No Handlers Can Be Found For Logger elasticsearch', then the connection times out and the Python scripts stops. The Python loading script was working fine before adding the spatial data. Anyone have some ideas on how to load large spatial datasets? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2866bc2-f511-45ef-b780-3c7e275da3b6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/56a80e65-cc36-4388-ac07-0f4a228922f2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
sizing for time data flow
(apologies in advance for yet another sizing post) We are indexing approximately 2KB documents and ingesting about 50 million documents daily. The index size ends up being about 75GB per day for the primary shards (doing replication = 1 so 150GB/day). In our use case, after 1 month, we throw away 95% of the data but need to keep the rest indefinitely. We are planning to use the time data flow mentioned in Shay's presentations and are currently thinking about what time period to use for each index. With a shorter period, the current month index may behave better, but we'll end up accumulating lots of smaller indices after the 1 month period. We currently have a 4 node setup, each with 12 cores, 96GB of ram and 2TB of disk space over 4 disks. By my calculations, to hold one year of data with r=1, we would need 150GB/day * 31 for the initial month, then 150GB/day*31*.05 for historical months = 4.65TB + 2.5TB = 7+TB for 1 year of data. This seems pretty tight to me considering additional space may be needed for merges, etc. 1. Is accumulating a lot of indexes per node a concern here? If we did a daily index with 4 shards and r=1, that would be over 700 shards per node for 1 year. I know that there is a memory limitation on the number of shards that can be managed by a node. 2. If we did a monthly index, that would be better for the historical indices, but the current month index would be huge, over 2TB. 3. Is there any difference here between doing a daily index with less shards vs. a monthly index with more primary shards? 4. How would having this many shards affect query performance? I assume there is some sweet spot of shards per node that must be found empirically? I would guess it's somewhat related to the number of disks/cores per node? 5. I am also wondering about the RAM to data ratio and whether we'll get decent query performance. Due to our use case, we can't use routing. Is there any rule of thumb here? 6. Another option we are considering is to do a daily index for the first month, and then have periodic jobs to combine the historical daily indexes into larger indices. So for example the first month = 31 daily indices and following months will get rolled up into 1 index per month. But we only want to do this extra work if it's needed. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b3b6e634-7184-4f7e-ac46-da453917721b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: The effect of multi-fields and copy_to on storage size
Even if the three fields are compressed isn't it still storing three compressed copies of the same thing? That is still three times more overhead than it needs to be using. It seems very wasteful of space. Ideally the space used by the database would be size_of_stored_fields_compressed + size_of_index. In my case my database will look more like (size_of_stored_fields_compressed x 3) + size_of_index. This greatly increases my storage requirements! If I enabled the type's _source field and disabled individual field storage could I still get highlighting info in the query response for those fields? Thanks, Adrien, for your response. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6dec8a1c-e354-447d-82c0-cdd355a5afcc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Filtering nested aggregates
A friend of mine made it work. It wasn't working because we were using a filter - term inside the nested aggregation with Tuberculosis, but the analyzed value was tuberculosis. Changing Tuberculosis to tuberculosis made it work. Also, repeating the first query (instead of using a filter) makes it work in the nested filter. Here's one example: curl -XPOST http://localhost:9200/test_results/_search?pretty=true; -d'{ size: 0, query: { nested: { path: data, query: { match: { data.condition: Tuberculosis } } } }, aggregations: { data: { nested: { path: data }, aggregations: { filtered_result: { filter: { query: { match: { condition: Tuberculosis } } }, aggregations : { result: { terms: { field: data.result } } } } } } } } ' On Friday, May 9, 2014 2:48:50 PM UTC-3, Ary Borenszweig wrote: Hi, I have an index where I need to store medical test results. A test result can talk about many conditions and their results: for example, Tuberculosis = positive, Flu = negative. So I modeled my index like this: curl -XPUT http://localhost:9200/test_results/; -d' { mappings: { result: { properties: { data: { type: nested, properties: { condition: {type: string}, result: {type: string} } } } } } }' I insert one test result with Tuberculosis = positive, Flu = negative: curl -XPOST http://localhost:9200/test_results/_bulk; -d' {index:{_index:test_results,_type:result}} {data: [{condition: Tuberculosis, result: positive}, {condition: FLU, result: negative}]} ' Then, one of the queries I need to do is this one: for Tuberculosis, give me how many positives you have and how many negatives you have (basically: filter by data.condition and group by data.result). So I tried this query: curl -XPOST http://localhost:9200/test_results/_search?pretty=true; -d'{ size: 0, query: { nested: { path: data, query: { match: { data.condition: Tuberculosis } } } }, aggregations: { data: { nested: { path: data }, aggregations: { result: { terms: { field: data.result } } } } } } ' However, the above gives me this result: aggregations : { data : { doc_count : 2, result : { buckets : [ { key : negative, doc_count : 1 }, { key : positive, doc_count : 1 } ] } } } That is, it gives me one negative result and one positive result. That's because the document has one positive and negative, and it's not discarding the one that has Flu. I see in the documentation there's a filter aggregate. I tried using it in many ways: 1. With term on data.condition: curl -XPOST http://localhost:9200/test_results/_search?pretty=true; -d'{ size: 0, query: { nested: { path: data, query: { match: { data.condition: Tuberculosis } } } }, aggregations: { data: { nested: { path: data }, aggregations: { filtered_result: { filter: { term: { data.condition : Tuberculosis } }, aggregations : { result: { terms: { field: data.result } } } } } } } } ' 2. With term on condition: curl -XPOST http://localhost:9200/test_results/_search?pretty=true; -d'{ size: 0, query: { nested: { path: data, query: { match: { data.condition: Tuberculosis } } } }, aggregations: { data: { nested: { path: data }, aggregations: { filtered_result: { filter: { term: { condition : Tuberculosis } }, aggregations : { result: { terms: { field: data.result } } } } } } } } ' 3. With nested: curl -XPOST http://localhost:9200/test_results/_search?pretty=true; -d'{ size: 0, query: { nested: { path: data, query: { match: { data.condition: Tuberculosis } } } }, aggregations: { data: { nested: { path: data }, aggregations: { filtered_result: { filter: { nested: {
Re: Multi DC cluster or separate cluster per DC?
Having a separate cluster is definitely a better way to go. OR, you can control the shard, replica placement so that they are always placed in the same DC. In this way, you can avoid interDC issues still having a single cluster. I have the similar issue and I am looking at it as one of the alternative. On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote: Thanks for the answer! We've been talking with several other teams in our company and it looks like this is the most recommended and stable setup. Regards Sebastian W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał: Go the latter method and have two clusters, ES can be very sensitive to network latency and you'll likely end up with more problems than it is worth. Given you already have the data source of truth being replicated, it's the sanest option to just read that locally. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote: Hi! I'd like to ask for advice about deployment in multi DC scenario. Currently we operate on 2 Data Centers in active/standby mode. like to opeIn case of ES we'd like to have different approach - we'drate in active-active mode (we want to optimize our resources especially for querying). Here are some details about target configuration: - 4 ES instances per DC. Full cluster will have 8 instances. - Up to 1 TB of data - Data pulled from database using JDBC River - Database is replicated asynchronously between DCs. Each DC will have its own database instance to pull data. - Average latency between DCs is about several miliseconds - We need to operate when passive DC is down We know that multi DC configuration might end with Split Brain issue. Here is how we want to prevent it: - Set node.master: true only in 4 nodes in active DC - Set node.master: false in passive DC - This way we'll be sure that new cluster will not be created in passive DC - Additionally we'd like to set discovery.zen.minimum_master_nodes: 3 (to avoid Split Brain in active DC) Additionally there is problem with switchover (passive DC becomes active and active becomes passive). In our system it takes about 20 minutes and this is the maximum length of our maintenance window. We were thinking of shutting down whole ES cluster and switch node.master setting in configuration files (as far as I know this settings can not be changed via REST api). Then we'd need to start whole cluster. So my question is: is it better to have one big ES cluster operating on both DCs or should we change our approach and create 2 separate clusters (and rely on database replication)? I'd be grateful for advice. Regards Sebastian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query string operators seem to not be working correctly
Erich, A colleague pointed out to me a much more complete explanation that I could ever do: http://searchhub.org//2011/12/28/why-not-and-or-and-not/ But the short of it is, it is working as expected and just need to map a bit back to Lucene Boolean logic to fully understand why/how it works. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9477552e-3fa8-4771-8277-064c73ea70f7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Corruption error after upgrade to 1.0
Did you ever get this resolved and if so, how was it resolved? I am experiencing the same issue... On Monday, February 17, 2014 4:25:00 PM UTC-5, Mo wrote: After upgrading to 1.0 I am unable to index any documents. I get the following error. Could somebody help? [Aardwolf] Message not fully read (response) for [0] handler future(org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler$1@5c6e3b4c), error [true], resetting [Aardwolf] failed to get node info for [#transport#-1][inet[/10.80.140.59:9300]], disconnecting... org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:168) at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:122) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.StreamCorruptedException: unexpected end of block data at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.io.ObjectInputStream.defaultReadObject(Unknown Source) at java.lang.Throwable.readObject(Throwable.java:913) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at java.io.ObjectStreamClass.invokeReadObject(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4af31baf-d27b-4f90-8f9d-fc6e72f70ead%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Mapping created using Template does not work
Hi Alexander, Yes it works when I remove the template setting. On Friday, May 9, 2014 12:26:49 PM UTC-7, Alexander Reelsen wrote: Hey, can you just take some sample data and index it into elasticsearch manually and see if that works? --Alex On Thu, May 1, 2014 at 1:53 AM, Deepak Jha dkjh...@gmail.comjavascript: wrote: Hi, I have setup ELK stack and I am going by default index name, which is logstash-.MM.DD . Since this is the only index format I have, I decided to create a template file, so that whenever new index gets created i can set up the mapping property. I am not able to push the data to elasticsearch if my index mapping gets created from template. May I know where am I wrong ? Here is my mapping file content: { X_Server : { properties : { @timestamp : { type : date, format : dateOptionalTime }, @version : { type : string }, class : { type : string }, file : { type : string}, message: {type: string}, host : { type : string, index: not_analyzed } }}} My template file content is { template: logstash-*, settings : { index.number_of_shards : 3, index.number_of_replicas : 1, index.query.default_field : @message, index.routing.allocation.total_shards_per_node : 2, index.auto_expand_replicas: false }, mappings: { X_Server: { _all: { enabled: false }, _source: { compress: false }, properties : { class : { type : string, }, host : { type : string, index : not_analyzed }, file : { type : string }, message : { type: string} } }}} -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b1d382b5-0fa7-4a2c-96f0-150d856482cc%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b1d382b5-0fa7-4a2c-96f0-150d856482cc%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/76e896a0-8e74-417b-8027-63b3fe67f2bc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query string operators seem to not be working correctly
Thanks Binh! To summarize for everyone else: 1) Queries are parsed left to right 2) NOT sets the Occurs flag of the clause to it’s right to MUST_NOT 3) AND will change the Occurs flag of the clause to it’s left to MUST unless it has already been set to MUST_NOT 4) AND sets the Occurs flag of the clause to it’s right to MUST 5) If the default operator of the query parser has been set to “And”: OR will change the Occurs flag of the clause to it’s left to SHOULD unless it has already been set to MUST_NOT 6) OR sets the Occurs flag of the clause to it’s right to SHOULD Practically speaking this means that NOT takes precedence over AND which takes precedence over OR — but only if the default operator for the query parser has not been changed from the default (“Or”). If the default operator is set to “And” then the behavior is just plain weird. Erich On Monday, May 12, 2014 12:37:24 PM UTC-7, Binh Ly wrote: Erich, A colleague pointed out to me a much more complete explanation that I could ever do: http://searchhub.org//2011/12/28/why-not-and-or-and-not/ But the short of it is, it is working as expected and just need to map a bit back to Lucene Boolean logic to fully understand why/how it works. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/447bfb62-4094-4024-9f53-6e713b11b895%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: The effect of multi-fields and copy_to on storage size
Hi Jeremy, On Mon, May 12, 2014 at 7:43 PM, Jeremy McLain gongcheng...@gmail.comwrote: Even if the three fields are compressed isn't it still storing three compressed copies of the same thing? That is still three times more overhead than it needs to be using. It seems very wasteful of space. Ideally the space used by the database would be size_of_stored_fields_compressed + size_of_index. In my case my database will look more like (size_of_stored_fields_compressed x 3) + size_of_index. This greatly increases my storage requirements! It is not storing 3 compressed copies of the same thing, but storing these 3 things (as a whole) compressed. The difference is important because it means that the 2nd and 3rd copies are effectively stored as references to the first field value. I would recommend building two indices, once with the copy_fields, and once without to see what the difference is in practice. If I enabled the type's _source field and disabled individual field storage could I still get highlighting info in the query response for those fields? Yes, although I would recommend keeping _source if possible. It makes lots of things easier, for example you can reindex from elasticsearch itself, etc. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j67kqaP9SC8d__b7Qi4KGpX7kRAa0hxo6enKTdC7wvLxA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
What is the difference between common terms query vs match query with cutoff_frequency set
I was reading up on the match query and noticed that it has a cutoff_frequency parameter, which seems to do pretty much what the common terms query does. 1. What is the difference between the common and match queries? 2. When would I want to use common terms over match? 3. Ultimately, would the direction be to have common terms query roll up into the match query (with any differences added to match)? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1bcf87e3-3b65-45bc-8578-dde77b10c37f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Marvel Indices taking lot of space ? Can we specify automatic delete of marvel indice ?
Is there a way to set marvel to delete the marvel indices after 7 days. It looks like Marvel is generating around 2 GB of data everyday. Our disk got full 2 times because of Marvel data. Is there a way to reduce the amount of data generated by marvel ? Also is there any plan to add alert mechanisms in Marvel. For example if the Marvel status goto red it will be good to get an email for a specified user. I see Marvel status as red. But it doesnt show why it is causing red. It will be good to get some alerts with the details when the cluster status goes red. -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Marvel-Indices-taking-lot-of-space-Can-we-specify-automatic-delete-of-marvel-indice-tp4055729.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1399905548133-4055729.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: Marvel Indices taking lot of space ? Can we specify automatic delete of marvel indice ?
I do not use Marvel, but another monitoring system built on top of Elasticsearch. I use the Elasticsearch Curactor to delete old indices: https://github.com/elasticsearch/curator I have a cron entry to run the curator once per day. Perhaps something already exists in Marvel, not sure since I am not a user. Cheers, Ivan On Mon, May 12, 2014 at 7:39 AM, deepakas deepak.subhraman...@gmail.comwrote: Is there a way to set marvel to delete the marvel indices after 7 days. It looks like Marvel is generating around 2 GB of data everyday. Our disk got full 2 times because of Marvel data. Is there a way to reduce the amount of data generated by marvel ? Also is there any plan to add alert mechanisms in Marvel. For example if the Marvel status goto red it will be good to get an email for a specified user. I see Marvel status as red. But it doesnt show why it is causing red. It will be good to get some alerts with the details when the cluster status goes red. -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Marvel-Indices-taking-lot-of-space-Can-we-specify-automatic-delete-of-marvel-indice-tp4055729.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1399905548133-4055729.post%40n3.nabble.com . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAq93AUa76%2BShoYdzRMVSq-m5Qzi%2B3utcdyaErRhA0FkA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Can't access S3 from Elastic Search
I have a cluster with 2 nodes which works fine. I am running the latest Elastic search version, and I would like to use snapshot/restore API, but having hard time getting to work. Note, it works fine with fs type, its AWS I am having hard time with. In my 2 instances, I have this in the config.yml files cloud: aws: access_key: XX secret_key: YYY discovery: type: ec2 I have created S3 bucket called it my-bucket, Inbound rules allow most ports, 22, 80, 9200, 9300 Tried to register the bucket with ElasticSearch: PUT /_snapshot/es_repository { type: s3, settings: { bucket: my-bucket } I get this error: error: RemoteTransportException[[node 2][inet[/10.240.78.87:9300]][cluster/repository/put]]; nested: RepositoryException[[elasticsrch] failed to create repository]; nested: CreationException[Guice creation errors:\n\n1) Error injecting constructor, com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain\n at org.elasticsearch.repositories.s3.S3Repository.init()\n at org.elasticsearch.repositories.s3.S3Repository\n at Key[type=org.elasticsearch.repositories.Repository, annotation=[none]]\n\n1 error]; nested: AmazonClientException[Unable to load AWS credentials from any provider in the chain]; , status: 500 } Any ideas? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3749a26a-75cb-4ae7-a3bb-8160a7030847%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Unable to Send Stats to Monitoring Cluster
Hi Mario, We just released marvel 1.1.1 with bug fix which I think will solve this - http://www.elasticsearch.org/guide/en/marvel/current/#_1_1_1 Can you check and see if it helps? (you can remove the 30s setting) Cheers, Boaz On Sunday, April 27, 2014 10:12:31 PM UTC+2, Boaz Leskes wrote: Hi Mario, Gists look good to me. One other thing I thought about - do you get this error all the time or does appear every once in a while? Said differently, do you have data in your monitoring cluster? If so, you can try increasing the timeout (defaults to 6s): marvel.agent.exporter.es.timeout: 30s This can be done through the yml file or via the Cluster Update Settings API : http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html#cluster-update-settings Cheers, Boaz On Fri, Apr 25, 2014 at 11:15 PM, Mario Rodriguez star...@gmail.comwrote: https://gist.github.com/anonymous/11303362#file-gistfile1-txt - Prod Server #1 https://gist.github.com/anonymous/11303451#file-gistfile1-txt - Monitoring Server -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/FNQf_hEXp04/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b2a7f82-036a-46a2-b738-e228dc7c1861%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9b2a7f82-036a-46a2-b738-e228dc7c1861%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d23bb576-4e78-4bac-9327-110c59905893%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Can't access S3 from Elastic Search
May be it's related to this? https://github.com/elasticsearch/elasticsearch-cloud-aws#recommended-s3-permissions -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 12 mai 2014 à 22:54, IronMan2014 sabdall...@hotmail.com a écrit : I have a cluster with 2 nodes which works fine. I am running the latest Elastic search version, and I would like to use snapshot/restore API, but having hard time getting to work. Note, it works fine with fs type, its AWS I am having hard time with. In my 2 instances, I have this in the config.yml files cloud: aws: access_key: XX secret_key: YYY discovery: type: ec2 I have created S3 bucket called it my-bucket, Inbound rules allow most ports, 22, 80, 9200, 9300 Tried to register the bucket with ElasticSearch: PUT /_snapshot/es_repository { type: s3, settings: { bucket: my-bucket } I get this error: error: RemoteTransportException[[node 2][inet[/10.240.78.87:9300]][cluster/repository/put]]; nested: RepositoryException[[elasticsrch] failed to create repository]; nested: CreationException[Guice creation errors:\n\n1) Error injecting constructor, com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain\n at org.elasticsearch.repositories.s3.S3Repository.init()\n at org.elasticsearch.repositories.s3.S3Repository\n at Key[type=org.elasticsearch.repositories.Repository, annotation=[none]]\n\n1 error]; nested: AmazonClientException[Unable to load AWS credentials from any provider in the chain]; , status: 500 } Any ideas? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3749a26a-75cb-4ae7-a3bb-8160a7030847%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0F2F0A65-69F5-4581-A3A1-5CAD8010D092%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: Odd behavior with AND condition
This appears to be caused by the snowball analyzer which is used on the tags field. To reproduce the odd behavior: curl -XDELETE http://localhost:9200/haystack; curl -XPOST http://localhost:9200/haystack/; -d ' { settings:{ index:{} } }' curl -XPOST http://localhost:9200/haystack/modelresult/_mapping; -d ' { modelresult : { _boost : { name : boost, null_value : 1.0 }, properties : { assigned_to : { type : string, term_vector : with_positions_offsets, analyzer : snowball }, clipped_from : { type : long, index : analyzed }, created_by : { type : long, index : analyzed }, django_ct : { type : string }, django_id : { type : string }, id : { type : string }, org : { type : long, index : analyzed }, tags : { type : string, store : true, term_vector : with_positions_offsets, analyzer : snowball }, text : { type : string, store : true, term_vector : with_positions_offsets, analyzer : snowball }, type : { type : long, index : analyzed } } } }' curl -XPOST http://localhost:9200/haystack/modelresult/; -d '{ assigned_to: [], created_by: 1, django_ct: preparations.preparation, django_id: 37, id: preparations.preparation.37, org: 1, tags: [ foo ], text: Wildlife.wmv\n:)\n, type: 2 }' echo Shows no results (good) curl http://127.0.0.1:9200/haystack/_search?q=(tags%3A(%22a%22))pretty echo Should show no results, but finds a match curl http://127.0.0.1:9200/haystack/_search?q=(org%3A(%221%22)%20AND%20tags%3A(%22a%22))pretty Switching the tags field to the standard analyzer fixes the problem. On Friday, May 9, 2014 3:31:08 PM UTC-7, md...@pdx.edu wrote: When I run the query (tags:(a)) in elasticsearch, I get 0 results. My query URL looks like: http://127.0.0.1:9200/haystack/_search?q=(tags%3A(%22a%22)) That is to be expected, since no objects have a tag set to a. Now when I change the condition, and add an AND, (org:(1) AND tags:(a)), *I get 3 results back*! The query URL looks like: http://127.0.0.1:9200/haystack/_search?q=(org%3A(%221%22)%20AND%20tags%3A(%22a%22)) Getting *more* results back does not make any sense to me. I would expect that kind of behavior with the OR operator, but AND? What is going on? (This is a cross post from stackoverflowhttp://stackoverflow.com/questions/23568699/odd-behavior-with-and-condition-in-elasticsearch ) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fcaddf25-7aa6-46e0-873d-d52d349ad5af%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: unable to write data to elasticsearch using hadoop PIG
thank you so much for your quick reply, Here is what I had done 1-instaled hadoop-1.2.1( pig-0.12.0 / hive-0.11.0 /...) 2-download Elasticsearch-1.0.1 and put it in the same file of hadoop 3-copied the following 4 elasticsearch-hadoop jar files elasticsearch-hadoop-1.3.0.M2.jar elasticsearch-hadoop-1.3.0.M2-sources.jar elasticsearch-hadoop-1.3.0.M2-javadoc.jar elasticsearch-hadoop-1.3.0.M2-yarn.jar to /pig and hadoop/lib 4- Add them in the PIG_CLASSPATH knowing that when I take data from my Desktop and put it in elasticsearch using pig script it works very well, but when I try to get data from my HDFS it gives me that : 2014-05-12 23:16:31,765 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.io.IOException: Out of nodes and retries; caught exception 2014-05-12 23:16:31,765 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2014-05-12 23:16:31,766 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 1.2.10.12.0hduser2014-05-12 23:15:342014-05-12 23:16:31 GROUP_BY Failed! Failed Jobs: JobIdAliasFeatureMessageOutputs job_201405122310_0001weblog_count,weblog_group,weblogs GROUP_BY,COMBINERMessage: Job failed! Error - # of failed Reduce Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201405122310_0001_r_00weblogs1/logs2, Input(s): Failed to read data from /user/weblogs Output(s): Failed to produce result in weblogs1/logs2 Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_201405122310_0001 2014-05-12 23:16:31,766 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! And here is the script : weblogs = LOAD '/user/weblogs' USING PigStorage('\t') AS (client_ip : chararray, full_request_date : chararray, day : int, month : chararray, month_num : int, year : int, hour : int, minute : int, second : int, timezone : chararray, http_verb : chararray, uri : chararray, http_status_code : chararray, bytes_returned : chararray, referrer : chararray, user_agent : chararray ); weblog_group = GROUP weblogs by (client_ip, year, month_num); weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, group.month_num, COUNT_STAR(weblogs) as pageviews; STORE weblog_count INTO 'weblogs1/logs2' USING org.elasticsearch.hadoop.pig.EsStorage(); Le lundi 12 mai 2014 16:28:20 UTC+1, Costin Leau a écrit : Check your network settings and make sure that the Hadoop nodes can communicate with the ES nodes. If you install ES besides Hadoop itself, this shouldn't be a problem. There are various way to check this - try ping, tracert, etc... Please refer to your distro manual/documentation for more information about the configuration and setup. Cheers, On 5/12/14 3:42 PM, hanine haninne wrote: I had get the same erreur but I don't know what I have to change in my /etc/hosts thank you for your help Le mercredi 5 mars 2014 09:39:46 UTC, Yann Barraud a écrit : Hi, Is your ES instance known by your Hadoop cluster (/etc/hosts) ? It does not even seems to read in it. Cheers, Yann Le mercredi 5 mars 2014 06:32:55 UTC+1, siva mannem a écrit : I installed ES(at the location /usr/lib/elasticsearch/) on our gateway server and i am able to run some basic curl commands like XPUT and XGET to create some indices and retrieve the data in them. i am able to give single line JSON record but i am unable to give JSON file as input to curl XPUT . can anybody give me the syntax for giving JSON file as input for curl XPUT command? my next issue is i copied the following 4 elasticsearch-hadoop jar files elasticsearch-hadoop-1.3.0.M2.jar elasticsearch-hadoop-1.3.0.M2-sources.jar elasticsearch-hadoop-1.3.0.M2-javadoc.jar elasticsearch-hadoop-1.3.0.M2-yarn.jar to /usr/lib/elasticsearch/elasticsearch-0.90.9/lib and /usr/lib/gphd/pig/ i have the following json file j.json ++ {k1:v1 , k2:v2 , k3:v3} in my_hdfs_path. my pig script is write_data_to_es.pig + REGISTER /usr/lib/gphd/pig/elasticsearch-hadoop-1.3.0.M2-yarn.jar; DEFINE ESTOR org.elasticsearch.hadoop.pig.EsStorage('es.resource=usa/ca'); A = LOAD '/my_hdfs_path/j.json' using JsonLoader('k1:chararray,k2:chararray,k3:chararray'); STORE A into 'usa/ca' USING ESTOR('es.input.json=true'); ++ when i run my pig script +
Aggregate children data
I am looking for help/ideas/examples. Thank you in advance for your help. I have 3 types of docs loaded into elastic search: parent, child1, child2 (child1, child2 have parent type set as parent in the mapping). parent documents have among other fields factor1:[some number], factor2:[some number]. There are about 20,000 documents each with multiple children of both types, with total number or children of each type across all parents 250,000 (500,000 total). Each child document has a field amount:[some number]. I need to find a value for each parent that is the sum of all child1 amounts multiplied by parent factor1 divided by sum of all child2 amounts multiplied by factor2. I need to be able to get it for each parent and to be able to search by it: find all parents that have value between 0.4 and 0.8 for example... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9ca7c521-7402-41e5-8610-c05444e408f0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query question
Thank you very much! -- Eateral 在 2014年5月8日星期四UTC+8上午12时10分42秒,Ivan Brusic写道: Your two clauses, mode and schedule, are joined via an AND, so those two clauses should be part of the *must *section. The schedule clauses is then an OR between two clauses, so it should be a nested bool filter using *should*. Hopefully that made sense. :) Since you are using term queries on what are hopefully non-analyzed fields (numeric fields are always non-analyzed), I will use a match all query with filters since it should be more efficient. The query should looking something like: { query: { filtered: { query: { match_all: {} }, filter: { bool: { must: [ { term: { mode: 1 } }, { bool: { should: [ { term: { schedule: 1 } }, { term: { schedule: 3 } } ] } } ] } } } } } -- Ivan On Mon, May 5, 2014 at 3:36 AM, 曾岩 eate...@gmail.com javascript:wrote: Hi, I'm new to Elasticsearch and try to integrate it into our project but met a problem. In our data source, it has two fields: mode and schedule which are all integer. Through UI, it should can query records based on these two fields like: *SELECT * FROM doc WHERE mode = 1 AND (schedule = 1 OR schedule = 3)* I tried below query JSONs but none return the expected results, anyone can help? Thank you! *{* * query: {* *bool: {* * must: [* *{ match: { mode: 1 } }* * ],* * should: [* * { match: { schedule: 1 } },* *{ match: { schedule: 3 } }* * ]* *}* * }* *}* --- *{ query: { filtered: { query: { match_all: {} }, filter: {and : [{term : { mode : 1 } }]}, filter: {and : [{term : { schedule : 1 } }, {term : { schedule : 3 }}] }} }}* -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45bd7de6-ffe9-4d9f-bef6-be11e19b051f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/45bd7de6-ffe9-4d9f-bef6-be11e19b051f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e8e2332-d5b7-416b-a16e-f965884cc42f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
geohash_precision Units?
What are the units of geohash_precision when no unit is explicitly specified? The docs currently statehttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-geo-point-type.html : If you enable the geohash option, a geohash “sub-field” will be indexed as, eg pin.geohash. The length of the geohash is controlled by the geohash_precision parameter, *which can either be set to an absolute length (eg **12**,* the default) or to a distance (eg 1km). When you set geohash_precision to an absolute length, what units are you setting it to? Meters? This seems like such a simple question that I bet I'm missing something. Thanks, -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/619a8e97-9bc0-4713-9088-a82897c8d605%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch on ZFS best practice
Hello, I'm running an Elasticsearch node on a FreeBSD server, on top of ZFS storage. For now I've considered that ES is smart and manages its own cache, so I've disabled primary cache for data, leaving only metadata being cacheable. Last thing I want is to have data cached twice, one time is ZFS ARC and a second time in application's own cache. I've also disabled compression: $ zfs get compression,primarycache,recordsize zdata/elasticsearch NAME PROPERTY VALUE SOURCE zdata/elasticsearch compression off local zdata/elasticsearch primarycache metadata local zdata/elasticsearch recordsize128K default It's a general purpose server (web, mysql, mail, ELK, etc.). I'm not looking for absolute best ES performance, I'm looking for best use of my resources. I have 16 GB RAM, and I plan to put a limit to ARC size (currently consuming 8.2 GB RAM) so I can mlockall ES memory. But I don't think I'll go the RAM-only storage route (http://jprante.github.io/applications/2012/07/26/Mmap-with-Lucene.html) as I'm running only one node. How can I estimate the amount of memory I must allocate to ES process? Should I switch primarycache=all back on despite ES already caching data? What is the best ZFS record/block size to accommodate Elasticsearch/Lucene IOs? Thanks, Patrick -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/FBBA84AE-D610-4060-AFBC-FC7D5BA0803F%40patpro.net. For more options, visit https://groups.google.com/d/optout.