deploying ElasticSearch to a large memory server

2015-04-21 Thread Tzahi jakubovitz
Hi all,
I have a server with 1.5 TB memory.
I can either use it with a single ES process, or launch few separate 
instances (using either VM, docker, or just different ports on the same 
server OS).

What will be a reasonable number of instances for such a server ?

Thanks,
Tzahi

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8909b6ad-2435-4804-900a-bfdec2aaddea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


selecting a server - a single quad socket, or two dual socket

2015-04-21 Thread Tzahi jakubovitz


Today we can buy very performant servers at very reasonable price points.

e.g. – the price of two dual socket servers with 512 GB memory is 
comparable to a single quad socket server with 1024 GB (1 TB) memory. 
(Assuming same number of cores and MHz on each CPU) 

My gut feeling is that a single quad server will give better performance 
since balancing shards and indexes across servers is simpler – especially 
if a query targets certain shards.

Thanks for your opinion.

Tzahi

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/de40706d-972a-4349-98a2-ba55ee580177%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


is it possible to get query results from document values ?

2014-11-25 Thread Tzahi jakubovitz


Hi all,

I need to query an index with tens of millions of short documents.

The result set may contain  100,000 documents, and I need to process a 
single field from each document. It those are simple stored fields in *.fdt 
file - it will take forever +-.

 

I thought document values will answer my need of reading a single field 
from each document. But I cannot make it work.

 

Is there a way to make a query return a single field that is stored in doc 
value from the *.dvd file, as opposed to slowely digging it from the *.fdt 
file ?

 
Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f18b409-c70f-4bef-88cc-96661fe5710f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: is it possible to get query results from document values ?

2014-11-25 Thread Tzahi jakubovitz




Thanks so much.

 

But the answer is very frustrating.

Getting large result sets will always be slow - even if I need just a 
single field.

 

Only aggregations and facets enjoy document fields - we commoners need to 
dig our fields from the *.fdt file.
Bugger – and thanks again 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7454911-b5e5-4a89-b0aa-2b24ef324246%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: scan query that returns document values only is heavily accessing the *.FDT file .

2014-11-24 Thread Tzahi jakubovitz
Thanks 
Sorry - I did not stress this is *document* values and not *field* values.
Document values are stores in DVD file. which is small, compressed format. 
I defined it to avoide having to access and parse the lucene document from 
the huge FDT file (in my test- FDT file is 1000 times bigger than DVD file).
see 
https://lucene.apache.org/core/4_3_1/core/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.html
.

I still try to avoide accessing the FDT file - it makes my query t slow.

Thanks again.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cd6ed6a9-f1c7-47c4-be3d-833553cb2bf6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


scan query that returns document values only is heavily accessing the *.FDT file .

2014-11-23 Thread Tzahi jakubovitz


Hi all,

I have a tests index with 43 million documenst. there is a string document 
value for each document. (about 5-10 character value for each document)

Mapping is:

{

  myindex : {

mappings : {

  num_type : {

_type : {

  store : true

},

properties : {

  doc_value : {

type : string,

doc_values_format : default

  },

  int1 : {

type : integer,

index : analyzed,

store : true

  },

  int2 : {

.

.

.

I need to retrieve the document values only for queries that may return 
about 100,000 documents result set. I do not need ranking or anything else 
that will slow this down.

 

My understanding is that if the query is only a filter – ranking is not 
computed, and it is faster.

Here is a small python program to test it:


*import *elasticsearch

es = elasticsearch.Elasticsearch()

results = es.search(*myindex*, *num_type*,
{
*fields*:[*doc_value*],
   *size*:1000,
   *query*: {*filtered*: {
   *query*: {*match_all*:{}}
  ,*filter*: {
*term*: {*r_int3*: 929}}
   }}
},scroll=*10s*,search_type=*scan*)


*while True*:
results = es.scroll(results[*_scroll_id*], scroll=*10s*)
*if *len(results[*hits*][*hits*]) = 0:
*break*

 

The query runs pretty slow, and I see there is huge number of access to the 
*.fdt (field data) file.

But I ask for a document value field – so why does ES access the *.fdt.

Thanks a lot in advance.

 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/89480f13-b00e-4e3f-a538-15fdbd18f073%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


has-child query question

2014-04-03 Thread Tzahi jakubovitz
Hi All,
When my query contains a has-child query - can I get the child documents as 
part of their parent documents?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d38d7f4f-4fa2-4d7d-be19-00adad78d194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: has-child query question

2014-04-03 Thread Tzahi jakubovitz


Thanks so much.

I have many small child documents (well - actually records) for each parent 
- so nested objects will cause all child documents to re-index with each 
new child.

 

So the only difference between a has_child query and filter is that the 
query allows you to influence the score?

 

Again thanks – will need to scratch my head quite heavily L

On Friday, April 4, 2014 1:04:33 AM UTC+3, Binh Ly wrote:

 Unfortunately no. If you can afford to do nested objects instead, then you 
 get back the whole doc with children.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53923937-4e4d-4d78-813b-87585b0c3a35%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Retrieving parent document according to relations between child documents

2014-03-30 Thread Tzahi jakubovitz


I am new to ES – so, please bear with me.

My data model is parent-child relationship.

The parent document contains attributes of people. The child document 
contains time and location for that person. In a relational model, it would 
look like:

Create Table Parent (

personId int,

personName varchar);

 

Create table child (

personId Int,

Location varchar,

detectionTime dateTime);


A possible query on this model is:

A person named X that was spotted at location A, and then, within 10 
minutes, was spotted at location B

In SQL, it would look like:

select personId, C1.detectionTime

From person, child as C1, child as C2

Where

Parent.personId = C1.personId,

Parent.personId = C2.personId,

C1.location = A,

C2.location = B,

personName = X,

C2.detectionTime between C1.detectionTime and C1.detectionTime + 10 
(minutes);


The between part of the query is the problem. No retrieval system that I 
am aware of can do it.

 I guess the way to ask it is to request a parent document with name=X, 
that has child document\s with location A, and child document\s with 
location B.  Once the parent and child documents are retrieved – the 
requesting program will filter the results that do not match the within 10 
minutes condition.

This solution is far from optimal:

1.   Wasted bandwidth in returning documents that will be filtered out.

2.   Wasted computation on ranking and sorting those documents

3.   Invalidates facets

 

I there a way do the filtering at the shard level? (Even if it requires 
programming) 

 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0eee6b66-c5b4-41d0-9eb7-c5b99d272988%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.