deploying ElasticSearch to a large memory server

2015-04-21 Thread Tzahi jakubovitz
Hi all,
I have a server with 1.5 TB memory.
I can either use it with a single ES process, or launch few separate 
instances (using either VM, docker, or just different ports on the same 
server OS).

What will be a reasonable number of instances for such a server ?


You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit
For more options, visit

selecting a server - a single quad socket, or two dual socket

2015-04-21 Thread Tzahi jakubovitz

Today we can buy very performant servers at very reasonable price points.

e.g. – the price of two dual socket servers with 512 GB memory is 
comparable to a single quad socket server with 1024 GB (1 TB) memory. 
(Assuming same number of cores and MHz on each CPU) 

My gut feeling is that a single quad server will give better performance 
since balancing shards and indexes across servers is simpler – especially 
if a query targets certain shards.

Thanks for your opinion.


You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit
For more options, visit

is it possible to get query results from document values ?

2014-11-25 Thread Tzahi jakubovitz

Hi all,

I need to query an index with tens of millions of short documents.

The result set may contain  100,000 documents, and I need to process a 
single field from each document. It those are simple stored fields in *.fdt 
file - it will take forever +-.


I thought document values will answer my need of reading a single field 
from each document. But I cannot make it work.


Is there a way to make a query return a single field that is stored in doc 
value from the *.dvd file, as opposed to slowely digging it from the *.fdt 
file ?


You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit
For more options, visit

Re: is it possible to get query results from document values ?

2014-11-25 Thread Tzahi jakubovitz

Thanks so much.


But the answer is very frustrating.

Getting large result sets will always be slow - even if I need just a 
single field.


Only aggregations and facets enjoy document fields - we commoners need to 
dig our fields from the *.fdt file.
Bugger – and thanks again 

You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit
For more options, visit

Re: scan query that returns document values only is heavily accessing the *.FDT file .

2014-11-24 Thread Tzahi jakubovitz
Sorry - I did not stress this is *document* values and not *field* values.
Document values are stores in DVD file. which is small, compressed format. 
I defined it to avoide having to access and parse the lucene document from 
the huge FDT file (in my test- FDT file is 1000 times bigger than DVD file).

I still try to avoide accessing the FDT file - it makes my query t slow.

Thanks again.

You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit
For more options, visit

scan query that returns document values only is heavily accessing the *.FDT file .

2014-11-23 Thread Tzahi jakubovitz

Hi all,

I have a tests index with 43 million documenst. there is a string document 
value for each document. (about 5-10 character value for each document)

Mapping is:


  myindex : {

mappings : {

  num_type : {

_type : {

  store : true


properties : {

  doc_value : {

type : string,

doc_values_format : default


  int1 : {

type : integer,

index : analyzed,

store : true


  int2 : {




I need to retrieve the document values only for queries that may return 
about 100,000 documents result set. I do not need ranking or anything else 
that will slow this down.


My understanding is that if the query is only a filter – ranking is not 
computed, and it is faster.

Here is a small python program to test it:

*import *elasticsearch

es = elasticsearch.Elasticsearch()

results =*myindex*, *num_type*,
   *query*: {*filtered*: {
   *query*: {*match_all*:{}}
  ,*filter*: {
*term*: {*r_int3*: 929}}

*while True*:
results = es.scroll(results[*_scroll_id*], scroll=*10s*)
*if *len(results[*hits*][*hits*]) = 0:


The query runs pretty slow, and I see there is huge number of access to the 
*.fdt (field data) file.

But I ask for a document value field – so why does ES access the *.fdt.

Thanks a lot in advance.


You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit
For more options, visit

has-child query question

2014-04-03 Thread Tzahi jakubovitz
Hi All,
When my query contains a has-child query - can I get the child documents as 
part of their parent documents?


You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit
For more options, visit

Re: has-child query question

2014-04-03 Thread Tzahi jakubovitz

Thanks so much.

I have many small child documents (well - actually records) for each parent 
- so nested objects will cause all child documents to re-index with each 
new child.


So the only difference between a has_child query and filter is that the 
query allows you to influence the score?


Again thanks – will need to scratch my head quite heavily L

On Friday, April 4, 2014 1:04:33 AM UTC+3, Binh Ly wrote:

 Unfortunately no. If you can afford to do nested objects instead, then you 
 get back the whole doc with children.

You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit
For more options, visit

Retrieving parent document according to relations between child documents

2014-03-30 Thread Tzahi jakubovitz

I am new to ES – so, please bear with me.

My data model is parent-child relationship.

The parent document contains attributes of people. The child document 
contains time and location for that person. In a relational model, it would 
look like:

Create Table Parent (

personId int,

personName varchar);


Create table child (

personId Int,

Location varchar,

detectionTime dateTime);

A possible query on this model is:

A person named X that was spotted at location A, and then, within 10 
minutes, was spotted at location B

In SQL, it would look like:

select personId, C1.detectionTime

From person, child as C1, child as C2


Parent.personId = C1.personId,

Parent.personId = C2.personId,

C1.location = A,

C2.location = B,

personName = X,

C2.detectionTime between C1.detectionTime and C1.detectionTime + 10 

The between part of the query is the problem. No retrieval system that I 
am aware of can do it.

 I guess the way to ask it is to request a parent document with name=X, 
that has child document\s with location A, and child document\s with 
location B.  Once the parent and child documents are retrieved – the 
requesting program will filter the results that do not match the within 10 
minutes condition.

This solution is far from optimal:

1.   Wasted bandwidth in returning documents that will be filtered out.

2.   Wasted computation on ranking and sorting those documents

3.   Invalidates facets


I there a way do the filtering at the shard level? (Even if it requires 


You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit
For more options, visit