Updating single field in large documents
What strategy could one do when you need to frequently update single field in large document? What can you do to improve update performance in case like that? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f8f9269-fd44-45ba-bbb7-eece817afefd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
update mapping
Is the way to update mapping of large index as follows Create empty index with new mapping Copy old data into new index Alias new index to previous If so, what are recommended tools? Ideally there would be a user interface for IT people to use? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1691b5c9-3aa5-4d8b-9149-e2b2dff56817%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cluster discovery on Amazon EC2 problem - need urgent help
I am pretty sure you can open the ports for the sec group the elb belongs to , regardless of the az. (Az, not region). Unless you r using network acls. Anyway, not really ES... pm me if u want to continue the AWS discussion :-) On 16/10/2014 3:37 pm, Zoran Jeremic zoran.jere...@gmail.com wrote: For the zone availability, I had to go with everything in one zone. Main reason was the problem to connect ELB controlled application instances with backend instances (MySQL, MongoDB and Elasticsearch). It's not possible to add rule to the backend instances having port+elb security group if instances are in different zones, so I had to keep everything in one zone. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4JkUOB_VmMyO41%2B1GjEF4S79Z2-doYkVXfjLgSOLowPFA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch- IndexReaders cannot exceed 2147483647
Dear All, Thanks for your replies. Conclusion is, we can not store more than 2147483647 records per shard as of now. The only option is we need to increase the shard count. Thanks Prasath Rajan On Tuesday, October 14, 2014 9:34:33 PM UTC+5:30, Jörg Prante wrote: You can not store more than 2G docs per shard in Lucene 4.x codecs. This is a documented Lucene limit: Similarly, Lucene uses a Java int to refer to document numbers, and the index file format uses an Int32 on-disk to store document numbers. This is a limitation of both the index file format and the current implementation. Eventually these should be replaced with either UInt64 values, or better yet, VInt values which have no limit. https://lucene.apache.org/core/4_9_1/core/org/apache/lucene/codecs/lucene49/package-summary.html#Limitations Jörg On Tue, Oct 14, 2014 at 5:37 PM, Prasanth R prasanth...@gmail.com javascript: wrote: Thanks for the reply. My scenario here is, 1) No nested docs. 2) I don't have any limit per shard.. I didn't know about internal limit of ES. On Oct 14, 2014 8:23 PM, Alexandre Rafalovitch araf...@gmail.com javascript: wrote: On 14 October 2014 10:33, Prasanth R prasanth...@gmail.com javascript: wrote: There is no upper limit... Well, then you must have an infinitely scalable architecture and a decision when the content starts getting shared. So, then the question is what is your individual shard allowed to grow to. Which is how many documents - including nested - you are expecting to have in a single shard. Because, ElasticSearch has an internal limit and you just hit it. So, the question is whether it is intentional, unintentional or a result of a bug. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/H74KAYmGtoc/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEFAe-G8GsLZpeO9-b5_RGs%3D-tGBNp39QgeY2rywjRnOQfcfnw%40mail.gmail.com . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJLGCR1LRDnv9Nw71py%3DPFksSqLnXOG6x6-K0kUW1j26%3D6_pYA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAJLGCR1LRDnv9Nw71py%3DPFksSqLnXOG6x6-K0kUW1j26%3D6_pYA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d85eb58f-2c9d-4fba-88ca-55b64a74be3a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: What MongoDB can do and ES cannot?
hi Clinton Considering the enormous amount of value addition in ES since this original question was posted . Wondering, if the answer has tilted in favor of ElasticSearch ? Can we safely say - ElasticSearch can be considered as a primary data store ? -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/What-MongoDB-can-do-and-ES-cannot-tp4032654p4064962.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1413471142126-4064962.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: Using a nested object property within custom_filters_score script
Hi Veda, I run into a similar issue like yours. Have you found a solution to your problem? Thanks, Vincent -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Using-a-nested-object-property-within-custom-filters-score-script-tp4046901p4064981.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1413496110842-4064981.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: Many indices.fielddata.breaker errors in logs and cluster slow...
This is caused by elasticsearch trying to load fielddata. Fielddata is used for sorting and faceting/aggregations. When a query has a sort parameter the node will try to load the fielddata for that field for all documents in the shard, not just those included in the query result. The breaker is tripped when ES estimates there is not enough heap available to load the fielddata so it just rejects the query rather than running the node out of heap space. You should probably start by looking at the queries that are being run to determine what's triggering the error. To deal with it the options I'm aware of are to add heap space, more nodes or look at using doc_values to move fielddata off the heap. Kimbro On Wed, Oct 15, 2014 at 10:42 PM, Robin Clarke robi...@gmail.com wrote: I'm still having this problem... has anybody got an idea what the cause / solution might be? Thank you! :) On Tuesday, 7 October 2014 14:29:22 UTC+2, Robin Clarke wrote: I'm getting a lot of these errors in my Elasticsearch logs, and am also experiencing a lot of slowness on the cluster... New used memory 7670582710 [7.1gb] from field [machineName.raw] would be larger than configured breaker: 7666532352 [7.1gb], breaking ... New used memory 7674188379 [7.1gb] from field [@timestamp] would be larger than configured breaker: 7666532352 [7.1gb], breaking I've looked at the documentation about memory limits http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html, but I don't really understand what is causing this, and more importantly how to avoid this... My cluster is 10 machines @ 32GB memory and 8 CPU cores each. I have one ES node on each machine with 12GB memory allocated. On each machine there is additionally one logstash agent (1GB) and one redis server (2GB). I have 10 indexes open with one replication per shard (so each node should only be holding 22 shards (two more for kibana-int)). I'm using Elasticsearch 1.3.3, Logstash 1.4.2 Thanks for your help! -Robin- -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5935b1f4-809c-46ac-ba03-f1df33a8737e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/5935b1f4-809c-46ac-ba03-f1df33a8737e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAA0DmXZRMFsAMXCs9qmMk0KN%2B%2BuLh%3DCiEtP-r4vK3tZF0CRAmA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Announcing elasticsearch plugin for Liferay - elasticray
Elastic (without Search) should be ok, I believe. At least according to the official source: http://www.elasticsearch.org/trademarks/ Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 17 October 2014 00:33, k...@rknowsys.com wrote: Hi all, I am glad to announce that we have initiated a github project - https://github.com/R-Knowsys/elasticray For Liferay users: Please test it in dev/staging environments. We are working on v1.0 RC and once tested, this should be production ready from v1.0 (1st week of Nov'14). We are fixing some minor issues and should have a 1.0 RC by next week. Query: Do we have any trademark issues by naming the plugin elasticray ? We chose that name to clearly indicate that this is an elasticsearch plugin for Liferay. Posting this query here as I did not find any other obvious place to ask this. Thanks, kc www.rknowsys.com -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aad07790-b852-4ba9-8c9c-3c575568b818%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEFAe-Fwo7f%2BYWySxX_sR%3D8GcS1NcCpD-fxysX0SEGxhrFTi1Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
River MongoDB-Elasticsearch (parent/child)
Hello, I'm looking for a solution to creat parent/child relation with the script of the river mongodb-ES plugine. I don't know if the relation parent/child must be present already in MongoDB to do that. For now, I just have the field parent_id in the all document with an ID which is the same between the parent and children. The type (parent or child) is in the field estype to dispatch them in their right type. My mapping : POST mongo_index_log { mappings: { parent: {}, child: { _parent: {type: parent} } }, settings: { number_of_shards: 1, number_of_replica: 0 } } My river : PUT _river/mongo_index_log/_meta { type: mongodb, mongodb: { servers: [ { host: 127.0.0.1, port: 27017 } ], options: { secondary_read_preference: true }, db: test, collection: mongodb_base, script: ctx._type = ctx.document.estype; if (ctx._type == 'fils') { ctx._parent = ctx.document.parent_id; } }, index: { name: mongo_index_log, type: mongo_type } } In this case, the split into the right type is ok, but the relarion parent/child doesn't do anything when I put the following command : (and i'm waiting for the parent who have at least 1 child) POST mongo_index_log/parent/_search { query: { bool: { must: [ { top_children: { type: child, query: { match_all: {} } } } ] } } } Also, i tried changing the _id of the parent like that : script: ctx._type = ctx.document.estype; if (ctx._type == 'fils') { ctx._parent = ctx.document.parent_id; } else { ctx._id = ctx.document.parent_id; } But in this case, the split in the right type doe'nt works et so the relation parent/child neither. Any ideas ? I found on this link, but nothing about my problems : https://github.com/richardwilly98/elasticsearch-river-mongodb/tree/master/manual-testing/issues/64 Have a good day, -- Ludovic M -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fba13a13-afaa-464f-ad82-b57c8e86197b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Understanding HEAP usage
Hi, We are using Elasticsearch for one of our applications. As a part of which we indexed about 3M documents and have built two indices around them. We have used a cluster of 2 Nodes each with 7.5 GB RAM and have dedicated 4 GM to the ES. What we are seeing is that on one of the nodes, the amount of HEAP using by ES is more that 60% allocated even though the most obvious ones like filter-cache, field-data cache etc are pretty low to almost zero. So I am trying to understand who else could be consuming the memory from ES. Any pointers on what else should I be looking at. Here is the snapshot of the same from elasticHQ: Cache ActivityField Size:0.00.0Field Evictions:00Filter Cache Size:24.0B 24.0BFilter Evictions:0 per query0 per queryID Cache Size:% ID Cache:0%0% MemoryTotal Memory:7 gb7 gbHeap Size:4 gb4 gbHeap % of RAM:54.5%54.5%% Heap Used:66.3%26%GC MarkSweep Frequency:0 s0 sGC MarkSweep Duration:0ms0msGC ParNew Frequency:0 s0 sGC ParNew Duration:0ms0msG1 GC Young Generation Freq:0 s0 sG1 GC Young Generation Duration:0ms0msG1 GC Old Generation Freq:0 s0 Thanks, Karthik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7PaHwAGgVj%3DL4hyJCJh--Z6pj5P7%3DVpCp-Mbf9MGyuvzOC-w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Filter by specific value without mapping
Tried to remove papping and make not_analizable curl -XPUT http://$HOST:9200/reports; -d' { mappings: { _default_: { dynamic_templates: [ { store_generic : { match : *, match_mapping_type : string, mapping : { type : string, index : not_analyzed } } } ] } } }' But Easy filtering returns empty results for approved, not approved or . Search example curl -X GET 'http://localhost:9200/reports/_search?pretty' -d '{ filter:{ term:{ general.approval: approved } } } ' Should I do other filtering syntax for this case? On Tuesday, October 7, 2014 5:17:02 PM UTC+3, Ivan Brusic wrote: The field do not need a custom analyzer, they just need to be simply marked as non_analyzed. You can setup a dynamic template that states any new field should be non analyzed. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html#_dynamic_templates You can still hardcode the mapping for specific fields. Cheers, Ivan On Oct 7, 2014 2:57 AM, Vladimir Krylov s6n...@gmail.com javascript: wrote: What I'm trying to do is to get data by filtering term with exact matching. I have ES 1.3.2 and I cannot do mapping, as attributes are dynamic (different users has different attributes). My data: { id: 111, org_id: 11, approval: approved, ... } { id: 112, org_id: 11, approval: not approved, ... } This request returns results: curl -X GET 'host:9200/data/_search?pretty' -d '{ filter:{ term:{ approval:approved } } } But this not: curl -X GET 'host:9200/reports/_search?pretty' -d '{ filter:{ term:{ approval:not approved } } } It's a dup of ticket https://github.com/elasticsearch/elasticsearch/issues/8006#issuecomment-58160111 As I was followed here. David Pilato proposed that index has indexed probably not and approved and that there is no exact matching to not approved. I tried to search for word not and it works curl -X GET 'host:9200/reports/_search?pretty' -d '{ filter:{ term:{ approval:not } } } So, how can I filter by exact word matching not approved? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6ea46153-68a7-481f-9064-9e094094bf29%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6ea46153-68a7-481f-9064-9e094094bf29%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4cc3fe20-296c-4a63-af9c-ec1516e3f54e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Set _score field value in Elasticsearch
Hello All, I'm trying achieve one functionality in Elasticsearch but I'm not able to do it. In SQL we can do it like -- select SET score_1 = _score from sometable I trying to assign value of score in one field. That means Elastic search will return 2 columns having same values _score and _score1. I have already tried custom score but it changes the value of _score column it self.I DO NOT WANT TO CHANGE . I'm already happy with the score returned in _score field. I want to have same value of _score column in another column for example score_1. I want to do same in Elasticsearch. Is it possible? Is there any functionality provided in elasticsearch? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/59487bae-f12b-42dd-b193-71422f48fcce%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
APT repository sync
Hi, Can someone point me in the right direction for running a local mirror of the elasticsearch APT repositories? Specifically, is there an rsync connection available? Thanks! Yapeng -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5259914a-2dfb-4327-96ec-aadd8b3a2581%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Understanding HEAP usage
Measuring heap usage in Java applications is very different than measuring memory usage for other stuff. 1. Usually java allocates all the heap its going to need up front at startup. At least, we do that in server applications. 2. Java's garbage collection is very lazy so heap usage will go up slowly with time. If you zoom out it'll look like a saw tooth. So its perfectly normal for on server to be using more heap than another because it is at a different place in the saw tooth. Its interesting to compare the depth of the valleys in the saw tooth and the time between peaks. There are other interesting things you can look at to but one snapshot of % heap used isn't one of them. Nik On Fri, Oct 17, 2014 at 6:27 AM, karthik jayanthi karthikjayanthi.i...@gmail.com wrote: Hi, We are using Elasticsearch for one of our applications. As a part of which we indexed about 3M documents and have built two indices around them. We have used a cluster of 2 Nodes each with 7.5 GB RAM and have dedicated 4 GM to the ES. What we are seeing is that on one of the nodes, the amount of HEAP using by ES is more that 60% allocated even though the most obvious ones like filter-cache, field-data cache etc are pretty low to almost zero. So I am trying to understand who else could be consuming the memory from ES. Any pointers on what else should I be looking at. Here is the snapshot of the same from elasticHQ: Cache ActivityField Size:0.00.0Field Evictions:00Filter Cache Size:24.0B 24.0BFilter Evictions:0 per query0 per queryID Cache Size:% ID Cache:0%0% MemoryTotal Memory:7 gb7 gbHeap Size:4 gb4 gbHeap % of RAM:54.5%54.5%% Heap Used:66.3%26%GC MarkSweep Frequency:0 s0 sGC MarkSweep Duration:0ms 0msGC ParNew Frequency:0 s0 sGC ParNew Duration:0ms0msG1 GC Young Generation Freq:0 s0 sG1 GC Young Generation Duration:0ms0msG1 GC Old Generation Freq:0 s0 Thanks, Karthik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7PaHwAGgVj%3DL4hyJCJh--Z6pj5P7%3DVpCp-Mbf9MGyuvzOC-w%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAD7PaHwAGgVj%3DL4hyJCJh--Z6pj5P7%3DVpCp-Mbf9MGyuvzOC-w%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0XKZfgcWkkHO5KsKrHVDz6j84CJovFkiJxOvin3Y8ZGg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: copy index
You can use the knapsack plugin for export/import data and change mappings (and much more!) For a 1:1 online copy, just one curl command is necessary, yes. https://github.com/jprante/elasticsearch-knapsack Jörg On Thu, Oct 16, 2014 at 7:55 PM, euneve...@gmail.com wrote: Hi I can see there are lots of utilities to copy the contents of an index such as elasticdump reindexer streames etc And they mostly use scan scroll. Is there a single curl command to copy an index to a new index? Without too much investigation it looks like scan scroll requires repeated calls? Can you please confirm? If this is the case what is the simplest supported utility? Alternatively is there a plugin with front end to choose from and to index? Thanks in advance -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1caeebf5-44de-4eba-ad5a-c702461bf3d2%40googlegroups.com . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHaUo1mF5xjjyvObT7MoXkiu20WrN1kJi-uPt1oOFdKEA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch spark esRDD not returing the aggregate values in aggregated query
Siva, Try the latest build of elasticsearch-hadoop, ver 2.1.0 Beta 2 http://www.elasticsearch.org/overview/hadoop/download/ The esRDD has been changed to sparks PairRDD https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions The RDD will now be key/value (tuples) that look like (String, Map[String, ANY]) so you could start to walk the json key/value hierarchy with something like: esRDD.flatMap { args = args._2.get(aggregations) } (the syntax above is not exact, since your specific query result may have a different first key/value pair as the first object ) Best, Jeff Steinmetz Director of Data Science Ekho, Inc. www.ekho.me @jeffsteinmetz On Wednesday, September 17, 2014 6:13:37 AM UTC-7, siva pradeep wrote: Hi, I have a query which filters the rows and then applies the aggregation. I tried running the query in Sense it gave me the expected result. But when I try to run the same query using elasticsearch-spark_2.10 I get the rows filtered by the query but not the aggregation result. I am sure I am missing some thing but unable to figure out that. Here is the query GET _search { query : { bool: { must: [ { filtered: { query: { range: { @timestamp: { from: 2014-09-03T01:40:37.437Z, to: 2014-09-03T01:45:11.437Z } } } } } ] } }, size: 0, fields: [cid,entity], aggs: { cid: { terms: { field: cid, min_doc_count: 2, size: 100 }, aggs: { tn: { terms: { field: entity } } } } } } Query Result: { took: 10005, timed_out: false, _shards: { total: 10, successful: 10, failed: 0 }, hits: { total: 2430, max_score: 0, hits: [] }, aggregations: { cid: { buckets: [ { key: 01abcecc9a20cd3d6ae6be3509d014ba@76.96.107.168 javascript:, doc_count: 2, tn: { buckets: [ { key: 15052563268, doc_count: 2 } ] } } ] } } } Spark program : object PresenceFilter extends App { val query: String = {\n\n \query\ : {\n\n\bool\: {\n\n \must\: [\n\n{\n\n \filtered\: {\n\n \query\: {\n\n \range\: {\n\n \@timestamp\: {\n\n \from\: \2014-09-03T01:40:37.437Z\,\n\n \to\: \2014-09-03T01:45:11.437Z\\n\n}\n\n }\n\n}\n\n }\n\n}\n\n ]\n\n}\n\n },\n \n \size\: 0,\n \n \fields\: [\cid\,\entity\],\n\n \aggs\: {\n\n\cid\: {\n\n \terms\: {\n\n\field\: \cid\,\n\n\min_doc_count\: 2,\n\n\size\: 100\n\n },\n \n \aggs\: {\n\n\tn\: {\n\n \terms\: {\n\n\field\: \entity\\n\n }\n\n}\n\n }\n\n}\n }\n\n} val sparkConf = new SparkConf() .setAppName(PresenceAnalysis) .setMaster(local[4]) .set(es.nodes, prs-wch-10.sys.comcast.net) .set(es.port, 9200) .set(es.resource, spresence-2014.09.03/presence) .set(es.endpoint, _search) // .set(es.query, query) val sc = new SparkContext(sparkConf) sc.esRDD.count returns 2430 rows How do I get the aggregation part (the following part) of the result into the program aggregations: { cid: { buckets: [ { key: 01abcecc9a20cd3d6ae6be3509d014ba@76.96.107.168 javascript:, doc_count: 2, tn: { buckets: [ { key: 15052563268, doc_count: 2 } ] } } Please advise. Thanks, Siva P -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/01d05c62-3095-4d9e-9407-4357add26896%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Sorting by nested fields
Has somebody another idea? Or it is not possible at all? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d1748088-0ceb-409d-9e42-deffa314f0e9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Minimum double score in a native script
Hi!, I am writing a Java plugin with a customized score script (native) returning a double. Basically I wrote a class extending AbstractDoubleSearchScript. For some documents which don't pass a specific test, the score should be the lowest possible, meaning they should be at the bottom of the results. Its is hard for me to find a lower bound for my scores, since they are logarithms of probabilities. (the theoretical lower bound is log(0)) I have tried returning in the runAsDouble() method Double.NEGATIVE_INFINITY and also (-Double.MAX_VALUE) since the Double.MIN_VALUE is not actualy the minimum negative value (I guess the name of that constant is not consistent with the one for Integer.MIN_VALUE but that's a different story). When I return the aforementioned constants I get an error: java.lang.IllegalArgumentException: docID must be = 0 and maxDoc=58514550 (got docID=2147483647) at org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseCompositeReader.java:182) at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:109) at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:196) at org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:228) at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:156) at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:340) at org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:308) at org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:305) at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) I am using ES 1.2.0 on a single machine. and the query is formed like this: { query : { function_score : { query : { //some filters }, script_score : { script : my_script, lang : native, params : { //some parameters } }, score_mode : first, boost_mode : replace } } } Cheers -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/67ebe35f-d58c-4890-aacb-b7647fcde75a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How to count tuples of 3 variables, sorted
Greetings community, I'm new to elasticsearch, so first of all sorry for my questions being so basic. I developed a flow collector which dumps flows to my elasticsearch server. Right now i use Kibana to perform the Top 10 destination and Top 10 source IPs filters, and such. But the query I'm having more difficulties about is knowing the Top 10 combination of (source + dest + dest_port) so that I can know what the top flows are, and from which IPs and to which destinations and protocols. Example: { aggs:{ tupulo_teste:{ value_count:{ field:SRC_ADDR, field:DST_ADDR, field:DST_PORT } } } } This does not compute all combinations of (SRC_ADDR, DST_ADDR, DST_PORT) nor even sort it giving the Top10 hits. If you are familiar with splunk, I need the equivalent of *stats count by a,b,c | sort 10 -count* I've tried: { aggs:{ src:{ terms:{field: SRC_ADDR}, aggs:{ dst:{ terms:{field: DST_ADDR}, aggs:{ dstprt:{ terms:{field: DST_PORT} } } } } } } but this produces a strange and long combination, also without sorting. Can someone please help me on how to do this result combination, with a sort by occurence count? Thank you -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f2c7bad6-dbd7-4edd-b3bd-a9cc6018e7a7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Kibana group by terms
I'm using Kibana w/ logstash to view web server logs. I'd like to add a graph that displays uniques of the *entire* User-Agent string. I've tried adding a terms graph, but that breaks the UA string into separate words, which is less than desirable in this situation. Is there a way to do this? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5a5f529e-e326-4a29-aa0f-d656a37848d4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
managing snapshots
I'm investigating snapshots and came across some things that aren't clear in the docs. My understanding is that the snapshots are incremental and only transfer things that were changed since the last snapshot. (Is that shards, lucene stuff, something else ???) One thing that isn't clear is if I create the following :9200/_snapshot/es_snapshots/snap_yesterday :9200/_snapshot/es_snapshots/snap_today Can I restore snap_yesterday to get that state back OR snap_today for today's snapshot? I read in the groups that a new snapshot would replace the change things That also leads to the question of how to manage the snapshots. Is there some cleaning I need to (or can) do. If so, then how can I ensure that the state of a snapshot is usable if I delete older ones. Lastly I was thinking about the idea of multiple snapshots. So in the examples above I might replace the es_snapshots with a date such as: :9200/_snapshot/yesterday/snap_1 :9200/_snapshot/yesterday/snap_2 :9200/_snapshot/today/snap_1 :9200/_snapshot/today/snap_2 Then I could delete yesterday at some point (or older than X days). Are there any thoughts around that or am I really misunderstanding things? \@matthias -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/914c2aa7-cfc0-4311-8de3-adf010a54363%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Get only ids with no source Java API
Have you tried setting no fields to be returned or the explicit setNoFields() method? http://jenkins.elasticsearch.org/job/Elasticsearch%20Master%20Branch%20Javadoc/Elasticsearch_API_Documentation/org/elasticsearch/action/search/SearchRequestBuilder.html#setNoFields() -- Ivan On Thu, Oct 16, 2014 at 2:45 AM, Ilija Subasic subasic.il...@gmail.com wrote: Is there a way in elasticsearch using JAVA API to get only the ids of the documents returned for a give query. SearchResponse sr = esClient.prepareSearch(index).setSize(resultSize).setQuery(q).setScroll(new TimeValue(1)).setQuery(fqb).setFetchSource(false).get(); but I get empty hits (`sr.getHits().hits[].length == 0`) althouh the total count of returned hits is 0 (`sr.getHits().getTotalHits == 2`). I understand that nothing is returned by elasticsearch because I set fetch source to false, but the ids should somehow be available. My current solution is: SearchResponse sr = esClient.prepareSearch(index).setSize(resultSize).setQuery(q).setScroll(new TimeValue(1)).setQuery(fqb).srb.setFetchSource(_id, null).get(); However I think that gets the _id field from source, and for speed I would like to avoid this if possible. Thanks, Ilija -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/089b670f-763c-4795-859a-720767d24a81%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/089b670f-763c-4795-859a-720767d24a81%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCYskSm7WyTDX5LVCrcL%2BR5y%2B2e9fUBTH0Z8iamu06OBg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Scaling strategies without shard splitting
Hey Nik - Thanks for the response. - Ian On Mon, Oct 13, 2014 at 4:28 PM, Nikolas Everett nik9...@gmail.com wrote: On Mon, Oct 13, 2014 at 11:12 AM, Ian Rose ianr...@fullstory.com wrote: Hi - My team has used Solr in it's single-node configuration (without SolrCloud) for a few years now. In our current product we are now looking at transitioning to SolrCloud, but before we made that leap I wanted to also take a good look at whether ElasticSearch would be a better fit for our needs. Although ES has some nice advantages (such as automatic shard rebalancing) I'm trying to figure out how to live in a world without shard splitting. In brief, our situation is as follows: - We use one index (collection in Solr) per customer. - The indexes are going to vary quite a bit in size, following something like a power-law distribution with many small indexes (let's guess 250k documents), some medium sized indexes (up to a few million documents) and a few large indexes (hundreds of millions of documents). - So the number of shards required per index will vary greatly, and will be hard to predict accurately at creation time. How do people generally approach this kind of problem? Do you just make a best guess at the appropriate number of shards for each new index and then do a full re-index (with more shards) if the number of documents grows bigger than expected? I'm in a pretty similar boat and have done just fine without shard splitting. I maintain the search index for about 900 wikis http://noc.wikimedia.org/conf/all.dblist. Each wiki gets two Elasticsearch indexes and those indexes vary in size, update rate, and query rate a ton. Most wikis get a single shard for all of there indexes but many of them use more https://git.wikimedia.org/blob/operations%2Fmediawiki-config.git/747fc7436226774d1735775c2ef41c911d59b5d2/wmf-config%2FInitialiseSettings.php#L13828. I basically just guestimated and reindexed the ones that were too big into more shards. We have a script that creates a new index with new configuration and then copies all the document from the old index to the new one and then swap the aliases (that we use for updates and queries) to the new index. Then it re-does any updates or deletes that occurred since copy script started. Having something like that is pretty common. I rarely use it to change sharding configuration - its much more common that I'll use it to change how a field in the document is analyzed. Elasticsearch also has another way to handle this problem (we don't use it for other reasons) where you create a single index for all customers and then filter them at query time. You also add routing values to your documents and queries so all documents from the same customer get routed to the same shard. That way you can serve queries for a single customer out of one shard which is pretty cool. For larger customers that don't fit on a single shard you still create indexes just for them. One thing to watch out for, though, is that Elasticsearch doesn't use the shard's size when determining where to place the shard. It'll check to make sure the shard won't fill the disk beyond some percentage but it won't try to spread out the large shards so you can get somewhat unbalanced disk usage. I have an open pull request for something to do that so probably won't be true forever but it is true for now. How big are your documents and how frequently do you think you'll need shard splitting? If your documents are pretty small you may be able to get away with just reindexing all of them for the customer when you need more shards like I do. It sure isn't optimal but it gets the job done. Another way to do things is once your customers get too big you create a new index and route all of their new data there. You have to query both indexes. This is _kindof_ how people handle log messages and it might work, depending on your use case. Nik -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/5JTYFC93jS8/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd051yRH2AiG7ZsSPR_zD2a%3DMfaRcWFywyPfsfSPsyBf4Q%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAPmjWd051yRH2AiG7ZsSPR_zD2a%3DMfaRcWFywyPfsfSPsyBf4Q%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Re: Filters: odd behavior
They are indeed executed in the defined order. Filters that are more specific should be placed early on and those that cannot be cached (geo/timebased) should be placed last. Cheers, Ivan On Thu, Oct 16, 2014 at 5:16 AM, @mromagnoli marce.romagn...@gmail.com wrote: Hi everyone, I have a doubt about Filters. If I have more than one filter, in a filtered query, are they executed in the defined order? And, are they filtering in a 'chain' mode, i.e. using the results of the previous filters? Thanks in advance as always. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d528067f-5042-4667-bcbc-38dcde87010a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d528067f-5042-4667-bcbc-38dcde87010a%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQASJatGfPg2kP%3D8soiHvvxKDZKJ6qkK0FyfZT4B2x_7Qw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
elasticsearch fields and elasticsearch-hadoop
Is there an easy way to rename the fields on an index? I have a field named searchTerm that I use for some event tracking. But the elasticsearch-hadoop library assumes all elasticsearch fields are lowercase and is converting all field names to lower case. When hadoop tries to retrieve the data from the index the field doesn't match and I just get a null value back. So the question is can I rename this field from searchTerm to search_term or searchterm in some easy way? Or do I need to setup a new index pull all the records in the current index rename the fields to lowercase and insert into the new index. Thanks, Akil -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3a8c91ba-4e7a-4a47-9593-f371502c310a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Filters: odd behavior
And there is post-filter as well: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-post-filter.html Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 17 October 2014 16:27, Ivan Brusic i...@brusic.com wrote: They are indeed executed in the defined order. Filters that are more specific should be placed early on and those that cannot be cached (geo/timebased) should be placed last. Cheers, Ivan On Thu, Oct 16, 2014 at 5:16 AM, @mromagnoli marce.romagn...@gmail.com wrote: Hi everyone, I have a doubt about Filters. If I have more than one filter, in a filtered query, are they executed in the defined order? And, are they filtering in a 'chain' mode, i.e. using the results of the previous filters? Thanks in advance as always. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d528067f-5042-4667-bcbc-38dcde87010a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQASJatGfPg2kP%3D8soiHvvxKDZKJ6qkK0FyfZT4B2x_7Qw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEFAe-GLae4nZ7hFLxfGs9tQ%2BC2SD%2Bvr7CsXv97sFcwEcsnpTg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Future of cardinality aggregation feature.
Guys, I see that the cardinality aggregation feature is marked as experimental feature. We are using this feature and feel it is very useful. But would like to how if this feature will be supported going forward or any chance of getting removed? Thanks in advance. Regards, -G -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/03407a12-47cd-4fe4-b0c3-90605d32d220%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Scaling strategies without shard splitting
In my use case I have indexed a union catalog for some hundred libraries, where each library can have a search service, plus adding their own catalog data they do not want to share. Elasticsearch offers far more flexibility and performance than Solr with the ability of automatic extending the cluster by adding nodes (without configuration change) combined with automatic rebalancing of shards, plus the feature of index aliases and shard over-allocation, an explanation is here: http://elasticsearch-users.115913.n3.nabble.com/Over-allocation-of-shards-td3673978.html With index aliases, I do not have to perform evil things like shard splitting. No index copy required, no full re-index. That is, I can organize some library catalog index over the machines, and address an index view for each library by assigning several index aliases (e.g. collection names or library identifiers) to the library catalog segments they are interested in, with term filters. Index updates come from a single point of a primary data base plus data packages the libraries can upload. If the number of input data exceeds the capacity, I can simply start a new node, without touching the configuration. Also, releasing new index versions is a snap with Elasticsearch. The index names carry timestamp information (e.g. ddMMyyHH) and it is easy to organize index versions like rolling windows, with the latest index being the current one to search. Old indices are dropped if the are no longer needed. Jörg On Mon, Oct 13, 2014 at 8:12 PM, Ian Rose ianr...@fullstory.com wrote: Hi - My team has used Solr in it's single-node configuration (without SolrCloud) for a few years now. In our current product we are now looking at transitioning to SolrCloud, but before we made that leap I wanted to also take a good look at whether ElasticSearch would be a better fit for our needs. Although ES has some nice advantages (such as automatic shard rebalancing) I'm trying to figure out how to live in a world without shard splitting. In brief, our situation is as follows: - We use one index (collection in Solr) per customer. - The indexes are going to vary quite a bit in size, following something like a power-law distribution with many small indexes (let's guess 250k documents), some medium sized indexes (up to a few million documents) and a few large indexes (hundreds of millions of documents). - So the number of shards required per index will vary greatly, and will be hard to predict accurately at creation time. How do people generally approach this kind of problem? Do you just make a best guess at the appropriate number of shards for each new index and then do a full re-index (with more shards) if the number of documents grows bigger than expected? Thanks! - Ian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ded96e32-e1f1-4d09-8356-7367c86b1166%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ded96e32-e1f1-4d09-8356-7367c86b1166%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHWv1bNZ571cu64VArC-H9cZ60snV8qRuPcj4JCqsVrBw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Reduce Disk Space Requirements
Details: Elastic Search version used: 1.3.4 Docs to index: ~ 2.2 Million Growth in docs: few 100 docs every week. Number of fields per doc: ~10-15 tokenizers used: ngram (min:2, max:15), path_hierarchy filters used: word_delimiter, pattern_capture, lowercase, unique Size on disk: ~ 150 GB (No replicas are active) Problem: Unfortunately, I don't have the luxury of a lot of free disk space at my disposal. Why? [Let me just say I work for a too big-to-fail organizations, if you know what I mean :-)] I need to reduce my index storage footprint by at least 50%. Solutions tried: 1. run _flush _optimize on the index. Didn't affect the size on disk. 2. decrease the number of primary shards from 5 to 2 (realized this is a useless attempt as number of shards doesn't affect disk space) 3. Looked into archiving the index after closing (can't use this solution as I want our users to search through all of the 2.2 Million docs, so can't archive partial docs) Can you guys suggest any other options to reduce index disk size? Your inputs are much appreciated. Thanks, Parth Gandhi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Reduce Disk Space Requirements
ngram min=2 kills your index space. Use min=3 or higher. Also maybe edge ngram tokenizer might be an alternative. Jörg On Sat, Oct 18, 2014 at 12:06 AM, PARTH GANDHI parth.gandh...@gmail.com wrote: Details: Elastic Search version used: 1.3.4 Docs to index: ~ 2.2 Million Growth in docs: few 100 docs every week. Number of fields per doc: ~10-15 tokenizers used: ngram (min:2, max:15), path_hierarchy filters used: word_delimiter, pattern_capture, lowercase, unique Size on disk: ~ 150 GB (No replicas are active) Problem: Unfortunately, I don't have the luxury of a lot of free disk space at my disposal. Why? [Let me just say I work for a too big-to-fail organizations, if you know what I mean :-)] I need to reduce my index storage footprint by at least 50%. Solutions tried: 1. run _flush _optimize on the index. Didn't affect the size on disk. 2. decrease the number of primary shards from 5 to 2 (realized this is a useless attempt as number of shards doesn't affect disk space) 3. Looked into archiving the index after closing (can't use this solution as I want our users to search through all of the 2.2 Million docs, so can't archive partial docs) Can you guys suggest any other options to reduce index disk size? Your inputs are much appreciated. Thanks, Parth Gandhi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvF%2BnExCR-%3DCr5Z1zdMQdMvaNbNw3q44Gg2_sZTZgJQA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Java 8 recommended version?
Hi Jilles, 1.7u55 has indeed be the recommended version for a long time, but JDK 8u25 is fine too. The page that you linked is from elasticsearch-hadoop and might be a bit outdated, we are trying to keep up to date information about recommended JVMs at the following URL: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html#jvm-version For reference we are also trying to improve our startup scripts so that they would fail to start if you are using a JVM with known issues. See for instance https://github.com/elasticsearch/elasticsearch/pull/7580 On Thu, Oct 16, 2014 at 1:03 PM, Jilles van Gurp jillesvang...@gmail.com wrote: I know JDK 7u55 was labeled as OK some time ago and this is still listed as the official requirement: http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/requirements.html However, time has moved on and I was wondering what the testing status and advice is for more recent JDKs. Particularly, I'd like to know whether Oracle JDK 8u25 safe for production use (on centos 7)? We've used JDK 8u20 without issues on our dev servers but it would be nice to have some guidance on this since we are moving to production soon with this. The reason we're using Java 8 is because we are using that for our apps as well and it is kind of nice to have just one jdk to worry about. Also, I suspect there may be some perfomance benefits given the amount of change that went into e.g. hotspot. In general, an overview of common vms and status with respect to elasticsearch would be nice to have somewhere. There are quite a few different suppliers of vms at this point and picking one seems to be a bit of a black art leap of faith currently. There's Openjdk, oracle jdk, Azul's Zulu (essentially openjdk as far as I know), and Azul's Zulu Enterprise. You can get each of these for Java 6, 7, and 8. Especially for openjdk, it also matters how it was built. Jilles -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c3ecaa35-b1cb-47a5-8ee9-5bca711c9b38%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c3ecaa35-b1cb-47a5-8ee9-5bca711c9b38%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j64dH-%2Bp3YE7J9GNOt9JGb%3DeWsuE-4xX1YiUpNHoQKgPw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Multi Field Aggregation
Hello, I'm having the exact same problem. Have you managed to find a solution? My thread is here: LINK https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/Oum03VSBzHQ Thanks On Thursday, October 16, 2014 1:57:35 PM UTC+1, Alastair James wrote: Hi there. I am trying to create an aggregation that mimics the following SQL query: SELECT col1, col2, COUNT(*), SUM(metric) FROM table GROUP BY col1, col2 ORDER BY SUM(metric) DESC On the face of it, I could create an terms aggregation for col1, add a terms aggregation for col2 inside it, and the metric aggregations inside that. I could then dynamically build the SQL result like grid and sort it myself. However this breaks down for large results set, or a paginated result set of a larger result. The problem is that the ES aggregation system always returns the top N results for each parent and child bucket. Thus for each value of col1 I have N values of col2. What I really want is to consider all possible combinations of col1 and col2 in the same way as SQL does it and return the top N based on some other metric. E.g. in ES speak, a single aggregation where the keys are tuples of (col1, col2). I suppose one way would be to use a script terms aggregation to concatenate each value of col1 and col2, however thats going to be slow. Does anyone else have any ideas? Ideally there would be a tuple aggregation built in, e.g.: my_agg:{ tuple:{ fields:[col1,col2] } } Would product keys that are objects like: { col1:value1, col2:value2 } Does anyone know if this would be possible to write as a plugin? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/713127be-b89e-42ee-8811-18dd0e31d16a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: best practice for thread pool queue size
Yes the particular error is from July. How can I determine the optimal setting for queue size? On Monday, October 13, 2014 3:21:32 PM UTC-7, Mark Walkom wrote: Increasing queues isn't going to help if there are underlying problems stopping the processing. Based on those errors it looks like you may have network issues, but they are from July? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 14 October 2014 08:16, Zaki Agha za...@roblox.com javascript: wrote: Hi We have several elastic search clusters Recently we faced an issue in which one of our nodes experienced queueing. In fact, the queue length was greater than 1000. Subsequent requests were rejected as the queue was full. Should we increase the default queue size? I understand that there are several queue's within elastic search. 1. Queues in Elastic Search 1. Index http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#modules-threadpool - default 200 2. Bulk - default 50 3. Get - default 1000 4. Search - default 1000 5. Suggest - default 1000 6. Percolate - default 1000 7. ThreadPool queue_size http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#_literal_fixed_literal: 1000 Errors: 1. Error # 1 [[LApp45][SiyuJOHVRRG1udLiFwM9Yw][es1][inet[/xxx.xxx.xxx.xxx:9300]]], id [84124759] [2014-07-13 04:13:35,332][WARN ][transport] [es2] Received response for a request that has timed out, sent [55372ms] ago, timed out [25372ms] ago, action [discovery/zen/fd/ping], node [2014-07-13 04:13:35,332][WARN ][transport] [es2] Received response for a request that has timed out, sent [55372ms] ago, timed out [25372ms] ago, action [discovery/zen/fd/ping], node [[LApp37][FKVv20F4RSiEsxJ4Bo8rMA][es3][inet[/xxx.xxx.xxx.xxx:9300]]], id [80874233] 1. Error # 2 [2014-07-13 06:28:26,043][WARN ] [transport] [es2] Received response for a request that has timed out, sent [55795ms] ago, timed out [25795ms] ago, action [discovery/zen/fd/ping], node 1. Error # 3 [2014-07-13 06:28:26,049][WARN ][transport] [es2] Received response for a request that has timed out, sent [56023ms] ago, timed out [26023ms] ago, action [discovery/zen/fd/ping], node [[es3][FKVv20F4RSiEsxJ4Bo8rMA][es3][inet[/xxx.xxx.xxx.xxx:9300]]], id [84124758] 1. Error # 4 There are several errors of this type all for the same index aggregated_user_game_points [2014-07-13 06:28:26,153][DEBUG][action.search.type ] [es2] [aggregated_user_game_points][3], node[8qI5LGo2TxG1S-mQUgEA_w], [P], s [STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@3367563e] lastShard [true] org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.action.search .type.TransportSearchTypeAction$BaseAsyncAction$4@71bd1bf omitted the rest of the error message -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9cc3b7a1-2b2c-4eec-b3e2-85593b021123%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/9cc3b7a1-2b2c-4eec-b3e2-85593b021123%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/12e46524-3f8f-4a1e-90d7-5ae4f4c3a191%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [ANN] Elasticsearch CSV plugin for formatting search responses as CSV
This is priceless. Thank you. On Wednesday, July 16, 2014 12:23:11 AM UTC+1, Jörg Prante wrote: Hi, I wrote a little plugin for formatting search responses as CSV (comma separated values) This format is useful for extracting some (or all) fields from ES JSON and wrap it into a tabular display, e.g. for exporting them to spreadsheet tools. More info: https://github.com/jprante/elasticsearch-csv In the hope it's useful, Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45604748-47dd-4203-853b-8c64ec93f7b9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
word delimiter
Hello, I am experimenting with word_delimiter and have an example with a special character that is indexed. The character is in the type table for the word delimiter. analysis of the tokenization looks good, but when i attempt to do a match query it doesnt seem to respect tokenization as expected. The example indexes 'HER2+ Breast Cancer'. Tokenization is 'her2+', 'breast', 'cancer', which is good. searching for 'HER2\\+' results in a hit, as well as 'HER2\\-' #!/bin/sh curl -XPUT 'http://localhost:9200/specialchars' -d '{ settings : { index : { number_of_shards : 1, number_of_replicas : 1 }, analysis : { filter : { special_character_spliter : { type : word_delimiter, split_on_numerics:false, type_table: [+ = ALPHA, - = ALPHA] } }, analyzer : { schar_analyzer : { type : custom, tokenizer : whitespace, filter : [lowercase, special_character_spliter] } } } }, mappings : { specialchars : { properties : { msg : { type : string, analyzer : schar_analyzer } } } } }' curl -XPOST localhost:9200/specialchars/1 -d '{msg : HER2+ Breast Cancer}' curl -XPOST localhost:9200/specialchars/2 -d '{msg : Non-Small Cell Lung Cancer}' curl -XPOST localhost:9200/specialchars/3 -d '{msg : c.2573TG NSCLC}' curl -XPOST localhost:9200/specialchars/_refresh curl -XGET 'localhost:9200/specialchars/_analyze?field=msgpretty=1' -d HER2+ Breast Cancer #curl -XGET 'localhost:9200/specialchars/_analyze?field=msgpretty=1' -d Non-Small Cell Lung Cancer #curl -XGET 'localhost:9200/specialchars/_analyze?field=msgpretty=1' -d c.2573TG NSCLC printf HER2+\n curl -XGET localhost:9200/specialchars/_search?pretty -d '{ query : { match : { msg : { query : HER2\\+ } } } }' printf HER2-\n curl -XGET localhost:9200/specialchars/_search?pretty -d '{ query : { match : { msg : { query : HER2\\- } } } }' curl -X DELETE localhost:9200/specialchars -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/becb02b7-72f0-42dd-b347-5f031fa154d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.