deploying ElasticSearch to a large memory server
Hi all, I have a server with 1.5 TB memory. I can either use it with a single ES process, or launch few separate instances (using either VM, docker, or just different ports on the same server OS). What will be a reasonable number of instances for such a server ? Thanks, Tzahi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8909b6ad-2435-4804-900a-bfdec2aaddea%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
selecting a server - a single quad socket, or two dual socket
Today we can buy very performant servers at very reasonable price points. e.g. – the price of two dual socket servers with 512 GB memory is comparable to a single quad socket server with 1024 GB (1 TB) memory. (Assuming same number of cores and MHz on each CPU) My gut feeling is that a single quad server will give better performance since balancing shards and indexes across servers is simpler – especially if a query targets certain shards. Thanks for your opinion. Tzahi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/de40706d-972a-4349-98a2-ba55ee580177%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
is it possible to get query results from document values ?
Hi all, I need to query an index with tens of millions of short documents. The result set may contain 100,000 documents, and I need to process a single field from each document. It those are simple stored fields in *.fdt file - it will take forever +-. I thought document values will answer my need of reading a single field from each document. But I cannot make it work. Is there a way to make a query return a single field that is stored in doc value from the *.dvd file, as opposed to slowely digging it from the *.fdt file ? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7f18b409-c70f-4bef-88cc-96661fe5710f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: is it possible to get query results from document values ?
Thanks so much. But the answer is very frustrating. Getting large result sets will always be slow - even if I need just a single field. Only aggregations and facets enjoy document fields - we commoners need to dig our fields from the *.fdt file. Bugger – and thanks again -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7454911-b5e5-4a89-b0aa-2b24ef324246%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: scan query that returns document values only is heavily accessing the *.FDT file .
Thanks Sorry - I did not stress this is *document* values and not *field* values. Document values are stores in DVD file. which is small, compressed format. I defined it to avoide having to access and parse the lucene document from the huge FDT file (in my test- FDT file is 1000 times bigger than DVD file). see https://lucene.apache.org/core/4_3_1/core/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.html . I still try to avoide accessing the FDT file - it makes my query t slow. Thanks again. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd6ed6a9-f1c7-47c4-be3d-833553cb2bf6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
scan query that returns document values only is heavily accessing the *.FDT file .
Hi all, I have a tests index with 43 million documenst. there is a string document value for each document. (about 5-10 character value for each document) Mapping is: { myindex : { mappings : { num_type : { _type : { store : true }, properties : { doc_value : { type : string, doc_values_format : default }, int1 : { type : integer, index : analyzed, store : true }, int2 : { . . . I need to retrieve the document values only for queries that may return about 100,000 documents result set. I do not need ranking or anything else that will slow this down. My understanding is that if the query is only a filter – ranking is not computed, and it is faster. Here is a small python program to test it: *import *elasticsearch es = elasticsearch.Elasticsearch() results = es.search(*myindex*, *num_type*, { *fields*:[*doc_value*], *size*:1000, *query*: {*filtered*: { *query*: {*match_all*:{}} ,*filter*: { *term*: {*r_int3*: 929}} }} },scroll=*10s*,search_type=*scan*) *while True*: results = es.scroll(results[*_scroll_id*], scroll=*10s*) *if *len(results[*hits*][*hits*]) = 0: *break* The query runs pretty slow, and I see there is huge number of access to the *.fdt (field data) file. But I ask for a document value field – so why does ES access the *.fdt. Thanks a lot in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89480f13-b00e-4e3f-a538-15fdbd18f073%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
has-child query question
Hi All, When my query contains a has-child query - can I get the child documents as part of their parent documents? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d38d7f4f-4fa2-4d7d-be19-00adad78d194%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: has-child query question
Thanks so much. I have many small child documents (well - actually records) for each parent - so nested objects will cause all child documents to re-index with each new child. So the only difference between a has_child query and filter is that the query allows you to influence the score? Again thanks – will need to scratch my head quite heavily L On Friday, April 4, 2014 1:04:33 AM UTC+3, Binh Ly wrote: Unfortunately no. If you can afford to do nested objects instead, then you get back the whole doc with children. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53923937-4e4d-4d78-813b-87585b0c3a35%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Retrieving parent document according to relations between child documents
I am new to ES – so, please bear with me. My data model is parent-child relationship. The parent document contains attributes of people. The child document contains time and location for that person. In a relational model, it would look like: Create Table Parent ( personId int, personName varchar); Create table child ( personId Int, Location varchar, detectionTime dateTime); A possible query on this model is: A person named X that was spotted at location A, and then, within 10 minutes, was spotted at location B In SQL, it would look like: select personId, C1.detectionTime From person, child as C1, child as C2 Where Parent.personId = C1.personId, Parent.personId = C2.personId, C1.location = A, C2.location = B, personName = X, C2.detectionTime between C1.detectionTime and C1.detectionTime + 10 (minutes); The between part of the query is the problem. No retrieval system that I am aware of can do it. I guess the way to ask it is to request a parent document with name=X, that has child document\s with location A, and child document\s with location B. Once the parent and child documents are retrieved – the requesting program will filter the results that do not match the within 10 minutes condition. This solution is far from optimal: 1. Wasted bandwidth in returning documents that will be filtered out. 2. Wasted computation on ranking and sorting those documents 3. Invalidates facets I there a way do the filtering at the shard level? (Even if it requires programming) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0eee6b66-c5b4-41d0-9eb7-c5b99d272988%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.