date:20140225

Maybe there are two Elasticsearch jar versions in the class path.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE31P9BejcT341KN%3DOFVT8sJJKmC-%3DonhRQBhbFvz%3DLuQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: long GC pauses but only one 1 host in the cluster

2014-02-25 Thread Mark Walkom

Depends on a lot of things; java version, ES version, doc size and count,
index size and count, number of nodes.
What are you monitoring the cluster with as well?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 25 February 2014 20:21, T Vinod Gupta tvi...@readypulse.com wrote:

 im seeing this consistently happen on only 1 host in my cluster. the other
 hosts don't have this problem.. what could be the reason and whats the
 remedy?

 im running ES on a ec2 m1.xlarge host - 16GB ram on the machine and i
 allocate 8GB to ES.

 e.g.
 [2014-02-25 09:14:38,726][WARN ][monitor.jvm  ] [Lunatica]
 [gc][ParNew][1188745][942327] duration [48.3s], collections [1]/[1.1m],
 total [48.3s]/[1d], memory [7.9gb]-[6.9gb]/[7.9gb], all_pools {[Code
 Cache] [14.5mb]-[14.5mb]/[48mb]}{[Par Eden Space]
 [15.7mb]-[14.7mb]/[66.5mb]}{[Par Survivor Space]
 [8.3mb]-[0b]/[8.3mb]}{[CMS Old Gen] [7.8gb]-[6.9gb]/[7.9gb]}{[CMS Perm
 Gen] [46.8mb]-[46.8mb]/[168mb]}


 thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAHau4ysuCGHKbgf9WaJ224fHdk0FZuCGG%3DTykAookVNYeGOARQ%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajFysyP0qG7JtOVxdTuGO4H98C0bL0docZuuoQifpFvA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Is there a difference between indexing envelopes or polygons.

2014-02-25 Thread Nicolas THOMASSON

Hey thanks a lot !

Now it works just fine. I didn't see that coming, I thought ES was
complaining if envelope's coordinates was reverted. My bad...

Nicolas

Le lundi 24 février 2014 15:58:24 UTC+1, Alexander Reelsen a écrit :

Hey,

if there is an error, can you please open a github issue? However the
envelope shape expects you to set an upper left and lower right boundary.
Your coordinates more look like lower left and upper right (meaning you
might create quite a huge envelope acutally) - which obviously does not
matter for a polygon

--Alex

On Sat, Feb 22, 2014 at 11:14 AM, Nicolas THOMASSON
nico.th...@gmail.comjavascript:
wrote:

Hello,

I'm new to ES. Please forgive me if I'm asking something stupid.

Is there a fundamental difference between indexing an envelope or
indexing a polygon ?

For example if I define the area as a envelope

{
frame:{
type:envelope,
coordinates: [[3,4],[1,2]]
}
}

or as a polygon

{
frame:{
type:polygon,
coordinates: [[[3,4],[3,2],[1,2],[1,4],[3,4]]]
}
}

As in my comprehension they both define the same area, should I be able
to perform the same queries whatever the way I defined the area ?
(Currently I have a search query that returns wrong results on the envelope
and seems to perform well on the polygon.)

Thanks for your help,

Nicolas

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com javascript:.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a2fa0fd8-f9a9-435b-9d34-e603c7242d2f%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/01f3c770-60b3-494c-bac5-bba5a1ef673e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

fragment_size not used for simple queries

2014-02-25 Thread Neamar Tucote

Hello,

Using the highlight API for a simple query like this:

curl localhost:9200/company_52fb7b90c8318c4dc86b/_search -d'{
  fields: [],
  query: {
filtered: {
  query: {
match: {
  _all: i do not
}
  }
}
  },
  highlight: {
fields: {
  metadatas.*: {
number_of_fragments : 1,
fragment_size : 20
  }
}
  }
}'

This should return snippet whose size does not exceeds 20 characters. Most 
of the time, this works, however i do have one document analyzed with the 
same mappings which yields really long snippets - in fact, it is not 
truncated, and contains all text.

Here is a sample working as expected:

{took:21,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[,
 
and emdo/em not 
hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[
 
take his child.\nemI/em 
emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[
 
resident of DC, emI/em am]}}]}}

And here is the unruly one:

{took:122,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[,
 
and emdo/em not 
hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[
 
take his child.\nemI/em 
emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[
 
resident of DC, emI/em 
am]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc7,_score:0.13437755,highlight:{metadatas.text:[.\nemI/em
 
emdo/em not enlighten those who are not eager to learn, nor 
arouse\nthose who are not anxious to give an explanation themselves. If 
emI/em\nhave presented one corner of the square and they cannot 
come\nback to me with the other three, emI/em should not go over the 
points\nagain.\n― Confucius\nBesides explaining JavaScript, this book tries 
to be an introduction to the basic\nprinciples of programming. Programming, 
it turns out, is hard. The\nfundamental rules are, most of the time, simple 
and clear. But programs,\nwhile built on top of these basic rules, tend to 
become complex enough to\nintroduce their own rules, their own complexity. 
Because of this, programming\nis rarely simple or predictable. As Donald 
Knuth, who is something of a\nfounding father of the field, says, it is an 
art.\nTo get something out of this book, more than just passive reading is 
required.\nTry to stay sharp, make an effort to solve the exercises, and 
only continue on\nwhen you are reasonably sure you understand the material 
that came before.\nThe computer programmer is a creator of universes for 
which he\nalone is responsible. Universes of virtually unlimited complexity 
can\nbe created in the form of computer programs.\n― Joseph Weizenbaum, 
Computer Power and Human Reason\nA program is many things. It is a piece of 
text typed by a programmer, it is\nthe directing force that makes the 
computer emdo/em what it does, it is data in the\ncomputer's memory, 
yet it controls the actions performed on this same\nmemory. Analogies that 
try to compare programs to objects we are familiar\nwith tend to fall 
short, but a superficially fitting one is that of a machine. The\ngears of 
a mechanical watch fit together ingeniously, and if the watchmaker\nwas any 
good, it will accurately show the time for many years. The elements\nof a 
program fit together in a similar way, and if the programmer knows what\nhe 
is doing, the program will run without crashing.\nA computer is a machine 
built to act as a host for these immaterial machines.\nComputers themselves 
can only emdo/em stupidly straightforward things. The reason\nthey are 
so useful is that they emdo/em these things at an incredibly high 
speed. A\nprogram can, by ingeniously combining many of these simple 
actions, emdo/em very\ncomplicated things.\nTo some of us, writing 
computer programs is a fascinating game. A program\nis a building of 
thought. It is costless to build, weightless, growing easily under\nour 
typing hands. If we get carried away, its size and complexity will grow 
out\nof control, confusing even the one who created it. This is the main 
problem of\nprogramming. It is why so much of today's software tends to 
crash, fail,\nscrew up.\nWhen a program works, it is beautiful. The art of 
programming is the skill of\ncontrolling complexity. The great program is 
subdued, made simple in its\ncomplexity.\nToday, many programmers believe

Re: long GC pauses but only one 1 host in the cluster

Is this node showing more activity than others? What kind of workload is
this, indexing/search? Are caches used, for filter/facets?

Full GC runs caused by CMS Old Gen may be a sign that you are close at
memory limits and need to add nodes, but it could also mean a lot of other
different things.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEHSS9tmb26PPcm5uB4QNurczaXvo8iRXkj9APCFUuBHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Problem with keeping in sync Elasticsearch across two data centers

2014-02-25 Thread Dario Rossi

I will try the tribe node feature, even if I don't understand it
completely... but I think it deserves some experimentation

Il giorno martedì 25 febbraio 2014 08:05:05 UTC, amit.soni ha scritto:

Thanks so much everyone for sharing your thoughts!

-Amit.

On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu
hari...@gmail.comjavascript:
wrote:

I think with current ES version you have 3 options.

Reference post

https://groups.google.com/forum/#!searchin/elasticsearch/TribeNodes/elasticsearch/MG1RerVSWOk/qZFWvr0HPSwJ

On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote:

Hi Amit,

--Mike

On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amits...@gmail.com wrote:

Hello Michael - Understand that ES is not built to maintain consistent
cluster state across data centers. what I am wondering is whether there is
a way for ElasticSearch to continue to replicate data onto a different
data
center (with some delay of course) so that when the primary center fails,
the fail over data center still has most of the data (may be except for
the
last few seconds/minutes/hours).

Overall I am looking for a right way to implement cross data center
deployment of elastic-search!

-Amit.

On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick
michae...@serenesoftware.com wrote:

Dario,

I believe that you're looking for TribeNodes http://www.
elasticsearch.org/guide/en/elasticsearch/reference/
master/modules-tribe.html

ES is not built to consistently cluster across DC's / larger network
lags.

On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi dari...@gmail.comwrote:

Hi,
I've the following problem: our application publishes content to an
Elasticsearch cluster. We use local data less node for querying
elasticsearch then, so we don't use HTTP REST and the local nodes are
the
loadbalancer. Now they came with the requirement of having the cluster
replicated to another data center too (and in the future maybe another
too... ) for resilience.

At the very beginning we thought of having one large cluster that
goes across data centers (crazy). This solution has the following
problems:

- The cluster has the split-brain problem (!)
- The client data less node will try to do requests across different
data centers (is there a solution to this???). I can't find a way to
avoid
this. We don't want this to happen because of a) latency and b)
firewalling
issues.

So we started to think that this solution is not really viable. So we
thought of having one cluster per data center, which seems more
sensible.
But then here we have the problem that we must publish data to all
clusters
and, if one fails, we have no means of rolling back (unless we try to
set
up a complicated version based rollback system). I find this very
complicated and hard to maintain, although can be somewhat doable.

My biggest problem is that we have to keep the data centers in the
same state at any time, so that if one goes down, we can readily switch
to
the other.

Any ideas, or can you recommend some support to help use deal with
this?

For more options, visit https://groups.google.com/groups/opt_out.

--
You received

Expanding terms

2014-02-25 Thread Petr Janský

Hello,

I'trying to find a way how to:

   1. expand a term - get all words and count that are relevant for a 
   term(s)
   2. get relevant words for a query - list of all words that 
   are highlighted
   3. get phrases by word - e.g. for word war = world war, second word 
   war, the second word war, 

and complicated one

   1. is there a way how to get/highlighted only words that are relevant 
   for multiple term conditions e.g. 

must: {
  wildcard: {
 content_morfo: {
   value: v*
  }
},
   wildcard: {
 content_morfo: {
   value: ==AA*==
  }
}
}

thx
Petr
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f2c0eeb5-8da3-4554-bdac-d6b5122f01c1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Is consistent scoring across 2 documents that match either 1 of 2 properties possible?

2014-02-25 Thread Michelle May

Hi,

We've been struggling with this for a few days now so I think it is time to 
pick the expert brains, although probably best to explain by delving 
straight into an example:


1) Assuming we have the following document:

{

id: /people/person1,

dob: 1980-04-12,

fullname: Mickey Arthur Mouse,

aliasfullname: Mickey Bernard Mouse
}

2) When we do the this search:

{
query :
{
bool :
{
should : [
{
match :
{
fullname :
{
query : mickey arthur mouse
}
}
},
{
match :
{
aliasfullname :
{
query : mickey arthur mouse
}
}
}
]
}
}
}

we get score 13.37 (for example)


1) Now, assuming we have this document (same as above except no 
aliasfullname)

{
id: /people/person1,
dob: 1980-04-12,
fullname: Mickey Arthur Mouse
}

2) When we do the search from step 2 above we get score 3.76 (for example)


How can we ensure that if a search is done on either the real name or the 
alias name (we won't know which is being searched on) that a person with an 
alias does not get scored higher than someone without an alias? What type 
of query could we use to ensure that both searches return the same score? 
We've tried dis_max and have omit_norms: true on the searched fields but 
nothing gives the same score so I am beginning to wonder if it is an 
unrealistic expectation. 

Any assistance/advice would be greatly appreciated.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aff6faac-3ead-4a69-ab70-12d2ccd3bc59%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: ES doesn't take into account field level boost in prefix query over catch-all field?

2014-02-25 Thread Maxim Vorobyov

Hi All. I have the same issue and would highly appreciate answer.
Many Thanks! Maxim

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/31a85891-ab76-402a-83a0-bbf442defc00%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Is there a difference between indexing envelopes or polygons.

2014-02-25 Thread Alexander Reelsen

Hey,

the problem here is that elasticsearch cant tell by itself if the envelope
borders need to be reverted or not... maybe you want/need such an envelope
in your calculations. Hard to tell from a machine perspective :-)

--Alex

On Tue, Feb 25, 2014 at 10:38 AM, Nicolas THOMASSON
nico.thomas...@gmail.com wrote:

Hey thanks a lot !

Now it works just fine. I didn't see that coming, I thought ES was
complaining if envelope's coordinates was reverted. My bad...

Nicolas

Le lundi 24 février 2014 15:58:24 UTC+1, Alexander Reelsen a écrit :

Hey,

--Alex

On Sat, Feb 22, 2014 at 11:14 AM, Nicolas THOMASSON nico.th...@gmail.com
wrote:

Hello,

I'm new to ES. Please forgive me if I'm asking something stupid.

Is there a fundamental difference between indexing an envelope or
indexing a polygon ?

For example if I define the area as a envelope

{
frame:{
type:envelope,
coordinates: [[3,4],[1,2]]
}
}

or as a polygon

{
frame:{
type:polygon,
coordinates: [[[3,4],[3,2],[1,2],[1,4],[3,4]]]
}
}

Thanks for your help,

Nicolas

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a2fa0fd8-f9a9-435b-9d34-e603c7242d2f%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8_oP8dVY4%3DBur2eWxbiJDXUx9LxpjXnBqpRksk1yQnkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Hadoop] Any goos tut to start with ?

2014-02-25 Thread Yann Barraud

Hi Costin,

I did not see the video. It's a good starting point. I 'm not a big fan of
videos though. I might reproduce it using Hortonworks sandbox.

Cordialement,
Yann Barraud

2014-02-24 13:35 GMT+01:00 Costin Leau costin.l...@gmail.com:

Have you looked at the video? It does exactly that.

Is there something missing?

On 2/24/2014 12:41 PM, Yann Barraud wrote:

Hi Costin,

What I'd love to see is a step by step tut have ES and Haddop working
together.

Is there somewhere I can have something like this ?

Regards,
Yann

Le jeudi 20 février 2014 16:25:28 UTC+1, John Pauley a écrit :

Any more tutorials, say append to list?

On Wednesday, February 19, 2014 12:54:15 PM UTC-5, Costin Leau wrote:

Hi,

We tried to make the docs friendly in this regard - each section
(from Map/Reduce to Pig) has several examples.
There's
also a short video which guides you through the various features
(with code) available here [1].

Hope this helps,

[1] http://www.elasticsearch.org/videos/search-and-analytics-
with-hadoop-and-elasticsearch/
http://www.elasticsearch.org/videos/search-and-analytics-
with-hadoop-and-elasticsearch/

On 19/02/2014 5:11 PM, Yann Barraud wrote:
Hi everyone,

Do you have a good pointer to a tut to start playing with ES
Hadoop ? Using Hortonworks VM for example ?

Thanks.

Cheers,
Yann

--
You received this message because you are subscribed to the
Google Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from
it, send an email to
elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a7b35ba0-
2b42-4270-bb64-228dad7fc426%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a7b35ba0-
2b42-4270-bb64-228dad7fc426%40googlegroups.com.
For more options, visithttps://groups.google.com/groups/opt_out
https://groups.google.com/groups/opt_out.

--
Costin

You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c55aa6e1-
adde-4044-baee-c80516fe00e6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/fUX6tYNRu1k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/530B3CA2.6010001%40gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2BhvuXftgMHOUciMUawAK5%2BbWEiuq4-aczdUL_umP8%3D6sc_Hqg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Problem with keeping in sync Elasticsearch across two data centers

2014-02-25 Thread Dario Rossi

From the docs it is not clear if having two clusters with the same indexes,
a indexing operation will have effect on both...

There is a line that leaves me bit doubtful:

However, there are a few exceptions:

- The merged view cannot handle indices with the same name in multiple
clusters. It will pick one of them and discard the other.

Il giorno martedì 25 febbraio 2014 10:04:05 UTC, Dario Rossi ha scritto:

I will try the tribe node feature, even if I don't understand it
completely... but I think it deserves some experimentation

Il giorno martedì 25 febbraio 2014 08:05:05 UTC, amit.soni ha scritto:

Thanks so much everyone for sharing your thoughts!

-Amit.

On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu
hari...@gmail.comwrote:

I think with current ES version you have 3 options.

Reference post

https://groups.google.com/forum/#!searchin/elasticsearch/TribeNodes/elasticsearch/MG1RerVSWOk/qZFWvr0HPSwJ

On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote:

Hi Amit,

Ivan is correct. You might also check out I believe that you're
looking for TribeNodes http://www.elasticsearch.org/guide/en/
elasticsearch/reference/master/modules-tribe.html and see if it fits
your needs for cross-dc replication.

--Mike

On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amits...@gmail.com wrote:

Hello Michael - Understand that ES is not built to maintain consistent
cluster state across data centers. what I am wondering is whether there
is
a way for ElasticSearch to continue to replicate data onto a different
data
center (with some delay of course) so that when the primary center fails,
the fail over data center still has most of the data (may be except for
the
last few seconds/minutes/hours).

Overall I am looking for a right way to implement cross data center
deployment of elastic-search!

-Amit.

On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick
michae...@serenesoftware.com wrote:

Dario,

I believe that you're looking for TribeNodes http://www.
elasticsearch.org/guide/en/elasticsearch/reference/
master/modules-tribe.html

ES is not built to consistently cluster across DC's / larger network
lags.

On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi dari...@gmail.comwrote:

Hi,
I've the following problem: our application publishes content to an
Elasticsearch cluster. We use local data less node for querying
elasticsearch then, so we don't use HTTP REST and the local nodes are
the
loadbalancer. Now they came with the requirement of having the cluster
replicated to another data center too (and in the future maybe another
too... ) for resilience.

At the very beginning we thought of having one large cluster that
goes across data centers (crazy). This solution has the following
problems:

- The cluster has the split-brain problem (!)
- The client data less node will try to do requests across different
data centers (is there a solution to this???). I can't find a way to
avoid
this. We don't want this to happen because of a) latency and b)
firewalling
issues.

So we started to think that this solution is not really viable. So
we thought of having one cluster per data center, which seems more
sensible. But then here we have the problem that we must publish data
to
all clusters and, if one fails, we have no means of rolling back
(unless we
try to set up a complicated version based rollback system). I find this
very complicated and hard to maintain, although can be somewhat doable.

My biggest problem is that we have to keep the data centers in the
same state at any time, so that if one goes down, we can readily switch
to
the other.

Any ideas, or can you recommend some support to help use deal with
this?

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5424a274-
3f6b-4c12-9fe6-621e04f87a8d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the

dumping index is slow as hell

2014-02-25 Thread Attila Bukor


Hey guys,

I needed to migrate an index to a new cluster and after a lot of hesitating
I decided to give it a try to taskrabbit's elasticsearch-dump:
https://github.com/taskrabbit/elasticsearch-dump

I tested it with 10k documents, which worked fine, so I decided to migrate
the real data to the new cluster with the following command:

elasticdump --input=http://oldcluster:9200/my_index \
--output=http://newcluster:9200/my_index

my_index contains ~5 million documents, so I expected it to take a while,
but not *this* long. It's been running since 10 AM UTC+1 yesterday and it's
migrated only a bit over 1.5 million docs so far - in roughly 28 hours.

When it started, it indexed around 100 docs per second, by the time I went
home from work (around 5 PM UTC+1), it was only around 30 docs/s, now it's
around 10 docs/s.

Being a newbie with ElasticSearch, I don't even know how to diagnose what is
the reason of this slowness. Could you help me with this?

Keep in mind that I'm at work for 2 or 3 more hours today, but after that,
I won't have access to the servers until next Monday. Feel free to suggest
anything in that time too, I will read it and try to reply, but can't look
into anything or do anything about it.

Regards,
Attila Bukor

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/lei4qh%24370%241%40ger.gmane.org.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Aggregation on parent/child documents

2014-02-25 Thread Augusto Uehara

We run 4 instances of ES 1.0.0 using 30G for JVM. We run 64-bit OpenJDK 
1.7.0_25 on ubuntu servers.

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 515139
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 64000
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 515139
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

And I also disabled swap on linux.

You can use this gist to simulate the issue we have: 
https://gist.github.com/chaos-generator/9143655

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a6db68fc-a7c8-43af-bbc4-59a0866aba36%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: fragment_size not used for simple queries

2014-02-25 Thread Luca Cavanna

It would be useful if you can post a complete recreation, mappings 
included. Which highlighter are you using?

On Tuesday, February 25, 2014 10:39:10 AM UTC+1, Neamar Tucote wrote:

 Hello,

 Using the highlight API for a simple query like this:

 curl localhost:9200/company_52fb7b90c8318c4dc86b/_search -d'{
   fields: [],
   query: {
 filtered: {
   query: {
 match: {
   _all: i do not
 }
   }
 }
   },
   highlight: {
 fields: {
   metadatas.*: {
 number_of_fragments : 1,
 fragment_size : 20
   }
 }
   }
 }'

 This should return snippet whose size does not exceeds 20 characters. Most 
 of the time, this works, however i do have one document analyzed with the 
 same mappings which yields really long snippets - in fact, it is not 
 truncated, and contains all text.

 Here is a sample working as expected:

 {took:21,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[,
  
 and emdo/em not 
 hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[
  
 take his child.\nemI/em 
 emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[
  
 resident of DC, emI/em am]}}]}}

 And here is the unruly one:

 {took:122,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[,
  
 and emdo/em not 
 hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[
  
 take his child.\nemI/em 
 emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[
  
 resident of DC, emI/em 
 am]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc7,_score:0.13437755,highlight:{metadatas.text:[.\nemI/em
  
 emdo/em not enlighten those who are not eager to learn, nor 
 arouse\nthose who are not anxious to give an explanation themselves. If 
 emI/em\nhave presented one corner of the square and they cannot 
 come\nback to me with the other three, emI/em should not go over the 
 points\nagain.\n― Confucius\nBesides explaining JavaScript, this book tries 
 to be an introduction to the basic\nprinciples of programming. Programming, 
 it turns out, is hard. The\nfundamental rules are, most of the time, simple 
 and clear. But programs,\nwhile built on top of these basic rules, tend to 
 become complex enough to\nintroduce their own rules, their own complexity. 
 Because of this, programming\nis rarely simple or predictable. As Donald 
 Knuth, who is something of a\nfounding father of the field, says, it is an 
 art.\nTo get something out of this book, more than just passive reading is 
 required.\nTry to stay sharp, make an effort to solve the exercises, and 
 only continue on\nwhen you are reasonably sure you understand the material 
 that came before.\nThe computer programmer is a creator of universes for 
 which he\nalone is responsible. Universes of virtually unlimited complexity 
 can\nbe created in the form of computer programs.\n― Joseph Weizenbaum, 
 Computer Power and Human Reason\nA program is many things. It is a piece of 
 text typed by a programmer, it is\nthe directing force that makes the 
 computer emdo/em what it does, it is data in the\ncomputer's memory, 
 yet it controls the actions performed on this same\nmemory. Analogies that 
 try to compare programs to objects we are familiar\nwith tend to fall 
 short, but a superficially fitting one is that of a machine. The\ngears of 
 a mechanical watch fit together ingeniously, and if the watchmaker\nwas any 
 good, it will accurately show the time for many years. The elements\nof a 
 program fit together in a similar way, and if the programmer knows what\nhe 
 is doing, the program will run without crashing.\nA computer is a machine 
 built to act as a host for these immaterial machines.\nComputers themselves 
 can only emdo/em stupidly straightforward things. The reason\nthey are 
 so useful is that they emdo/em these things at an incredibly high 
 speed. A\nprogram can, by ingeniously combining many of these simple 
 actions, emdo/em very\ncomplicated things.\nTo some of us, writing 
 computer programs is a fascinating game. A program\nis a building of 
 thought. It is costless to build, weightless, growing easily under\nour 
 typing hands. If we get carried away, its size and complexity will grow 
 out\nof control, confusing even the one who created it. This is the main 
 problem of\nprogramming. It

How can I do date-calculation/conversion in an MVEL script in ES 1.0.0?

2014-02-25 Thread h . b . wassenaar

Hello,

I'm considering upgrading from 0.90.3 to 1.0.0, but I've hit a snag with 
one of the MVEL scripts I use to update documents through the update api. 
My update-script uses Joda to parse/format/manipulate dates, but it appears 
that Joda is no longer available to MVEL scripts in version 1.0.0. (I think 
it changed in commit c7f6c52 from november 24th, so it's been like that for 
a while)

Here are some code-snippets of how it currently works

   parser = Joda.forPattern('dateOptionalTime').parser();
   lastdate = parser.parseMillis(update.date);
   prevdate = parser.parseMillis(ctx._source.published);
   timediff = lastdate-prevdate;
   ...
   nextdate=parser.parseMutableDateTime(update.date);
   nextdate.addHours(calculated_hours); 
   ctx._source.nextupdate = 
nextdate.toString('-MM-dd\'T\'HH:mm:ss\'Z\'');

Is there some way to do similar date/time calculations in ES 1.0.0?

I've considered I could use a native script to do these updates; however, 
when I wrote the update-script I tested this, and to my surprise using a 
native script proved to be significantly slower than using the MVEL script 
for this use-case.


Any help would be much appreciated,
   -- Harmen

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a263df38-557e-4221-9b71-19ba78737edf%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[new project using es] Elasticboard - tracking github data

2014-02-25 Thread Mihnea Dobrescu-Balaur

Hello again,

Using the recently released github river[1], I'm working on an open source 
dashboard for keeping track of github projects. It's in the working 
protoype state right now and I'm trying to figure out what kind of 
information is desired and relevant.

The idea is that people/orgs who want to use this will self deploy their 
own instance, but in order to show what the project is about, I set up a 
hosted demo. There's a landing page here[2] and the demo getting data for 
the elasticsearch repo here[3]. I'd love some feedback!

[1] 
https://groups.google.com/forum/#!searchin/elasticsearch/github$20river/elasticsearch/Oy57lUSn_aY/6w6uBgNcq_MJ
[2] http://elasticboard.mihneadb.net/landing.html
[3] http://elasticboard.mihneadb.net/#/elasticsearch/elasticsearch


Thanks,
Mihnea

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c4bbe8a9-78ed-48ad-8e22-04b76c2b8af0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: the document payload of the Delete api

Unfortunately, you'll have to GET it first.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/49c78498-d1b6-44ce-bb85-b6f9d9d9b7a7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: DateRange aggregation semantics - include_lower/include_upper?

Yes, you are correct. The from is inclusive, and the to is exclusive.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01116905-849f-44a4-aa61-17ec6149ba00%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread Tony Su

In principle,
I agree with everything you describe about best practice but those
practices become more important only when you're managing larger numbers of
nodes.

For those who manage only 5 nodes, the balance may swing in favor of just
edit each machine's config directly instead of a more centralized
strategy. It's a cost/benefit of which approach requires more work.

As far as re-making configs with every version change, from what I've seen
so far I don't think that is the intention of Elasticsearch (currently).
The configs I see in elasticsearch.yml are largely consistent across major
and minor versions... although there are exceptions.

But, the current scenario doesn't even change versions... Much.
The scenario is a reasonable and common reaction when repairing a
package-based installation. SOP is after attempting a package
update/upgrade (fails) and then a forced update (forced re-install in
place), the normal last attempt is a manual removal and re-installation
(which is required when upgrading from RC to GA). And, then the config file
is removed.

That said, with my latest RC GA upgrades, I noticed the new workaround
the Packager is now doing. Although the config file is still being wiped
out, a backup of the config file is being created. So, although a bit
unusual it works for me and should prevent the worst complaints in the
future.

Feature Request:
Improving the current packaging practice of creating a config backup, it
would also be nice if the old config file can be parsed for uncommented
commands and merged into the new config file.

Tony

On Monday, February 24, 2014 12:43:04 PM UTC-8, InquiringMind wrote:

I am not sure what the complaints are all about.

Over the past 20 years, my best practices are to treat the installed
configurations as a template that is subject to change upon reinstallation.
Then, I always create my own configuration and point the server to it, and
never point a server to the package's installed configuration.

And then, I maintain all of my customized configurations separately from
the installed packages.

Pointing to the installed configuration that you've modified is really no
different that running the installed jars that you have modified.Would you
really expect a reinstallation of Elasticsearch to preserve the changes you
have made to the originally installed elasticsearch-1.0.0.jar file?

The beauty of Elasticsearch's configurations are that they document
everything but actually set nothing. That's even better than the
configurations for the servers I write in which I set everything but to the
default values in the code. Same end result; different means of getting
there. In fact, the installed config is a big part of the package's
documentation about what is available to be configured. So I would expected
it to change on each installation.

And for the turn-key servers I developed in the past where the configs
were not maintained by Puppet or Chef or some other automated tool, I would
write a post installation step that would copy the installed config over a
taret config, but only if that target config did not exist. That way, the
customer could modify the target config and their changes would be
preserved. But today, our elasticsearch.yml file and other server configs
are maintained by Puppet and because we don't touch the installed config we
never have any problems with overwriting on a reinstallation.

Brian

On Monday, February 17, 2014 5:14:46 PM UTC-5, Tony Su wrote:

What?!

Removing and re-installing the ES package either removes the original or
over-writes the existing elasticsearch.yml

The is contrary to conventional packaging from what I've generally seen.
Typically, when a package is removed, the configuration fie is left alone
and must be removed manually if desired

No big deal in my case, I've been working on elasticsearch.yml heavily
for several days so can remember all the customizations I've made, but IMO
this is a disaster waiting to happen for clusters with new Admins or those
who attempt to fix a problem by removing and re-installing.

Leaving the config file alone and re-using is the safer option.

IMO,
Tony

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5d7d6de3-799d-4812-884a-698aff9d6121%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread Tony Su

One other issue.
 
I have never been able to deploy an elasticsearch.yml which names the 
cluster node the same as the machine hostname despite the suggestions in 
another thread. It just won't work, and based on another thread I strongly 
suspect the underlying Java code implements single quotes instead of double 
quotes when evaluating the variable. 
 
So, because it's a unique variable that needs to be set on each machine, 
that part of the config won't allow simply pointing all nodes to the same 
config script.
 
Is why, short of looking for the error in the Java code I've been looking 
at various simple and more enterprise tools that write individual config 
files to each node.
 
Tony

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/958a2321-91c6-45a7-920b-fb6b3b08621c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11

This is a known issue and will be fixed shortly. For now, what you can do 
is run _optimize on all your indexes and set max_num_segments to 1, like 
below. Note that this may take a while depending on the size of your 
indexes.

http://localhost:9200/_optimize?max_num_segments


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6d794291-eca6-46cb-93e8-d45a513990d3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Relation Between Heap Size and Total Data Size

2014-02-25 Thread Umutcan


Hi,

I created a Elasticsearch cluster with 4 instance. Elasticsearch 0.90.10 
is running all of them. Heap size is 6 GB for all the instances, so 
total heap size is 24 GB. I have 5 shard for each index and each shard 
has 1 replica. A new index is created for every day, so all indices have 
nearly same size.


When total data size reaches around 100 GB (replicas are included), my 
cluster begins to  fail to allocate some of the shards (status yellow). 
After I delete some old indices and restart all the nodes, everything is 
fine (status is green). If I do not delete some data, status eventually 
turns red.


So, I am wondering that is there any relationship between heap size and 
total data size? Is there any formula to determine heap size based on 
data size?


Thanks,
Umutcan

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/530CB5FE.80203%40gamegos.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Relation Between Heap Size and Total Data Size

2014-02-25 Thread Randy

Probably low on disc on at least one machine. Monitor disc usage. Also look in
the logs and find out what error you are getting. Report back.

Sent from my iPhone

On Feb 25, 2014, at 7:25 AM, Umutcan umut...@gamegos.com wrote:

Hi,

I created a Elasticsearch cluster with 4 instance. Elasticsearch 0.90.10 is
running all of them. Heap size is 6 GB for all the instances, so total heap
size is 24 GB. I have 5 shard for each index and each shard has 1 replica. A
new index is created for every day, so all indices have nearly same size.

When total data size reaches around 100 GB (replicas are included), my
cluster begins to fail to allocate some of the shards (status yellow). After
I delete some old indices and restart all the nodes, everything is fine
(status is green). If I do not delete some data, status eventually turns red.

So, I am wondering that is there any relationship between heap size and total
data size? Is there any formula to determine heap size based on data size?

Thanks,
Umutcan

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/530CB5FE.80203%40gamegos.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/A319EF53-D5C4-485C-B320-574C677D8314%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11

2014-02-25 Thread Benoît

I forgot to say that one consequence is that the 'head' plugin interface 
remain empty.

The following request timeout :
 *  _status
 *  stats?all=true
 *  _nodes

How to have some information on the cluster in this conditions ? 

Benoît

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1511903-f9c8-4332-8451-ed10aaa0fcad%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread InquiringMind

I always start Elasticsearch from within my own wrapper script, es.sh.

Inside this wrapper script is the following incantation:

NODE_OPT=-D*es.node.name*=$(uname -n | cut -d'.' -f1)

This is verified to work on Linux, Mac OS X, and Solaris (at least).

I then pass $NODE_OPT as a command-line argument to the elasticsearchstart-up
script.

BTW, I seem to recall reading that the *es.* prefix on the node.namevariable
is no longer needed for 1.0 GA. But it still works fine, so I have
left it there.

This has always worked since ES 0.19.4 (the very first version I installed
and started using). I worked closely with our deployment engineer, and we
settled on a set of wrapper scripts that let me start everything on my
laptop in exactly the same way that it all starts on a production server.

Brian

On Tuesday, February 25, 2014 10:21:29 AM UTC-5, Tony Su wrote:

One other issue.

I have never been able to deploy an elasticsearch.yml which names the
cluster node the same as the machine hostname despite the suggestions in
another thread. It just won't work, and based on another thread I strongly
suspect the underlying Java code implements single quotes instead of double
quotes when evaluating the variable.

So, because it's a unique variable that needs to be set on each machine,
that part of the config won't allow simply pointing all nodes to the same
config script.

Is why, short of looking for the error in the Java code I've been looking
at various simple and more enterprise tools that write individual config
files to each node.

Tony

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5af309d0-22b5-4809-907d-92b099b36632%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: the document payload of the Delete api

2014-02-25 Thread InquiringMind

And note that if you GET it and save the version number, and then pass the 
version number into the DELETE, you can be sure it will be deleted only if 
nobody else updated it in the meantime.

This all works so much better in Java than in scripts + curl.

Brian

On Tuesday, February 25, 2014 9:35:37 AM UTC-5, Binh Ly wrote:

 Unfortunately, you'll have to GET it first.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f3e7e97a-284b-4d3a-b333-b03b8c9f65fe%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Default analyzer when the given analyzer not found?

2014-02-25 Thread InquiringMind

Based on posts to this newsgroup early on in my usage of ES (over a year
now!), I used to put the following in my elasticsearch.yml file. Any field
that was not explicitly assigned an analyzer and that was deemed by ES to
be a string would pick up English snowball analyzer with no stop words (my
preference at the time):

index:
analysis:
analyzer:
# set stemming analyzer with no stop words as the default
default:
type: snowball
language: English
stopwords: _none_
filter:
stopWordsFilter:
type: stop
stopwords: _none_

But since then, I've long abandoned this default approach. Instead, I
explicitly assigned an analyzer to each and every field (you know, like a
real database!). And then my elasticsearch.yml file now contains the
following:

# Do not automatically create an index when a document is loaded, and do
# not automatically index unknown (unmapped) fields:
action.auto_create_index: false
index.mapper.dynamic: false

Therefore, I cannot automatically create an index during a load (which
would then create a useless index without any of the analyzers and mappings
I've carefully crafted). And I cannot get ES to automatically create a new
field; this is very helpful when someone uses a low-level tool such as
curl, and misspells a field name; ES will no longer create, for example,
the givveName field when it should have been givenName.

Brian

On Tuesday, February 25, 2014 8:57:30 AM UTC-5, Frederic Meyer wrote:

Hey there.

Nearly one year after this initial post, I'm running into the exact same
issue, even though ES is now released (1.0).

Has anybody found a proper solution within ES? I've spent like 1 hour
searching for this, without any luck.

The only ugly workaround that I can think of right now is deal with a fall
back language at the data level i.e. before sending documents to be indexed
by ES.

Thanks.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2f1dbdc3-299a-46fa-855f-a34c74497c43%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Default analyzer when the given analyzer not found?

2014-02-25 Thread Frederic Meyer

Ah yes, via the default in the yaml configuration file, of course. I'll
give that a try, thanks!

It is a pity though that the default analyzer doesn't seem to do his job
of processing all unmatched document as far as the _analyze field is
concerned.

Thanks
Fred

P.S. : I do understand your position about not indexing documents for which
you haven't craft a dedicated analyzer yet. Makes real sense.

On Tuesday, February 25, 2014 5:09:43 PM UTC+1, InquiringMind wrote:

index:
analysis:
analyzer:
# set stemming analyzer with no stop words as the default
default:
type: snowball
language: English
stopwords: _none_
filter:
stopWordsFilter:
type: stop
stopwords: _none_

# Do not automatically create an index when a document is loaded, and do
# not automatically index unknown (unmapped) fields:
action.auto_create_index: false
index.mapper.dynamic: false

Brian

On Tuesday, February 25, 2014 8:57:30 AM UTC-5, Frederic Meyer wrote:

Hey there.

Nearly one year after this initial post, I'm running into the exact same
issue, even though ES is now released (1.0).

Has anybody found a proper solution within ES? I've spent like 1 hour
searching for this, without any luck.

The only ugly workaround that I can think of right now is deal with a
fall back language at the data level i.e. before sending documents to be
indexed by ES.

Thanks.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a0fdb30b-d63a-4679-899a-36b45c788d8d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11

2014-02-25 Thread Benoît

Thank you Binh Ly,

On Tuesday, February 25, 2014 4:25:59 PM UTC+1, Binh Ly wrote:

 This is a known issue and will be fixed shortly. For now, what you can do 
 is run _optimize on all your indexes and set max_num_segments to 1, like 
 below. Note that this may take a while depending on the size of your 
 indexes.

 http://localhost:9200/_optimize?max_num_segments


 Your suggestion confirm what Jörg Prante said here 
https://groups.google.com/d/msg/elasticsearch/7mrDhqe6LEo/3gjOJka85OYJ
This is a problem with Lucene segment of version 3.x

I have around 1T of index, so I'm not really happy to run optimize, I will 
try on one of the smallest index.

If I stop all the request to the statistics API, I should see the load 
decreasing ?

Regards.


Benoît

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/99b02139-3d02-434d-a3e4-724b876c3a27%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Book] Mastering ElasticSearch Review

I purchased the book when Packt was having a $5 ebook sale a couple of
months ago. Did not really need the book, but it was cheap and I wanted to
support the author who has posted on the mailing list in the past.

Overall a decent book, recommended for anyone getting started with
Elasticsearch. My main complaint was that book went through each
configuration parameter in detail, resulting in a lot of bloat. Some might
consider such an approach a good thing.

Ivan

On Mon, Feb 24, 2014 at 6:55 PM, Nick Wood nwood...@gmail.com wrote:

I read Elasticsearch Server several months ago and found it helpful. But
I'm hesitant to get any more books that aren't focused on 1.x - hopefully
we'll see some pop up soon (nudge nudge).

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCsHQNHRX1QwAYBk4r_bTSYnsg8xcwCmeDtiq96st2Oqw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: dumping index is slow as hell

Have you benchmarked your cluster? How many docs can you index per second
with bulk indexing?

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH%3DEDqwhKC%3DgZXOwvdf%2B6FJ%3DBOrLmNFpSuraX2-JcTbYA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat

Not sure, but maybe you have  jars with ES classes in the plugins folder
that went astray?

IIRC I saw these kind of errors and it was a plugin with dependencies that
were not compatible.

If that is your code you can hack on, last resort is printing the current
classpath in the log file...

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHA5QW2C4R_XQNc26PcxAuGgB_%2BR7ii3f_r3iEmeim88g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Compute TF/IDF across indexes

2014-02-25 Thread Luiz Guilherme Pais dos Santos

Hi,

I'm trying to search across multiple indexes and I couldn't understand the
result of the TF/TDF function. I didn't expect for the indexes where the
term is more frequent to get penalized.

Here follows an example:
https://gist.github.com/luizgpsantos/9216108

When searching for the term alice the document {_index: index2,
_type: type, _id: 1} got a score 0.8784157 while {_index: index1,
_type: type, _id: 1} got a score 0.4451987.

In my use case I got one index about sports and another about celebrities
and when I search for a celebrity documents across sports and celebrities
indexes, results from sports index tend to appear in first place due to the
explanation above (we have few celebrities documents in sports index). But
the point is that when searching for a celebrity I would expect results
from the celebrity index.

Is there any way to calculate the score not penalizing indexes where the
frequency of a term is higher?

Cheers,

-- 
Luiz Guilherme P. Santos

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Compute TF/IDF across indexes

I have never tried or looked at the code, but off the top of my head
perhaps the DFS query type would work:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch

Since the DFS query type calculates the TF/IDF values based on the values
in each individual shard, perhaps it ignores which index the shard belongs
to. Easy to test.

If not, the solution might be tricky. You can eliminate term length
normalization, but your issue is with the IDF. You can create your own
Similarity, but the best you can do is ignore the IDF, which probably would
not be ideal.

Ultimately, you can try script based scoring. The TF/IDF values are exposed
to the scripts, so you can try to apply some type of normalization
yourself. Kludgy and it would impact performance.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

Hopefully DFS queries would work or someone else has a better idea!

Cheers,

Ivan

On Tue, Feb 25, 2014 at 12:00 PM, Luiz Guilherme Pais dos Santos
luizgpsan...@gmail.com wrote:

Hi,

I'm trying to search across multiple indexes and I couldn't understand the
result of the TF/TDF function. I didn't expect for the indexes where the
term is more frequent to get penalized.

Here follows an example:
https://gist.github.com/luizgpsantos/9216108

When searching for the term alice the document {_index: index2,
_type: type, _id: 1} got a score 0.8784157 while {_index:
index1, _type: type, _id: 1} got a score 0.4451987.

In my use case I got one index about sports and another about celebrities
and when I search for a celebrity documents across sports and celebrities
indexes, results from sports index tend to appear in first place due to the
explanation above (we have few celebrities documents in sports index). But
the point is that when searching for a celebrity I would expect results
from the celebrity index.

Is there any way to calculate the score not penalizing indexes where the
frequency of a term is higher?

Cheers,

--
Luiz Guilherme P. Santos

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgREX6svvcso%2Bf6VqW2Y6-DvBnWUtO5tVod8GAX2b0Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Removing elasticsearch logs

There is currently discussion around this:

https://github.com/elasticsearch/elasticsearch-marvel/issues/95

But in the meantime, try this to see if it helps:

https://github.com/elasticsearch/curator

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4e55016-a101-403f-b93d-74bd976eadc1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Put mapping documentation -- What options are available? Specifically, how to store a property but without indexing it?

2014-02-25 Thread Daniel Winterstein

Dear Hariharan, Alex, Luke,

My apologies. You're quite right. The information is there -- I just
didn't read far enough down.

Thank you for your help  persistence.

Best regards,
 - Daniel

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEmLStnHQCUuMPJHhbcoq8_iQgFX%3D22t9%3DS9gOwWC7C1OtDToA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: ES 1.0.0 Source filtering using the Java API

2014-02-25 Thread Dan

Thanks for your response.I  can't see the method 'setFetchSource' in the 
Client class. Are you sure that is in 1.0.0?

On Tuesday, February 25, 2014 8:41:37 PM UTC, Binh Ly wrote:

 Yes you can use the client.setFetchSource() method:

   SearchResponse response = client.prepareSearch(index)
 .setFetchSource(new String[] {field1, field2}, null)
 .execute()
 .actionGet();



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d9b1aa0b-80f8-49a8-9f50-5e54db05562c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sorting date fields

2014-02-25 Thread Adrian

Hi all,

I have a question on how sorting during queries works in elasticsearch. 

I have an index with a custom date format field, on which the sort is applied.
When quering the index for a given keywork, results are provided with the given
sort.

However, I've observed that some documents are not present in the result set. I
would have expected these results to be part of the result set as it would be
in relational systems using the SQL ORDER BY statement.  I've verified that
these missing documents are covered by the query using the explain api.
According to the documentation, score computation ist not performed when using
sorts on fields.

Maybe someone can provide more information on how sorting is done? 

I am using Elasticsearch 1.0.0RC1 on debian whezzy with openjdk7-jdk.

Thanks, Adrian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20140225212301.GA22436%40server1.850nm.net.
For more options, visit https://groups.google.com/groups/opt_out.

Re: scalability and creating 1 index per user

2014-02-25 Thread Nikolas Everett

On Tue, Feb 25, 2014 at 4:46 PM, ESUser neerav...@gmail.com wrote:

Hi All,
I am exploring elastic search to create one index per user instead of one
big index for all the users. Each index would be about 6G.
I am wondering if anyone has tried it and how would it scale?

I couldn't find that elastic search has limit on maximum number of
indices. Is it safe/recommended to have say 20K indices for 20K users?
Would it architecture scale well?

I'm running 1107 indexes right now. Some of the cluster actions are a bit
slower then I'd like but I think that is better in 1.0. I don't think it'd
work well an order of magnitude larger but I could be wrong.

Also, if start with say a 5 nodes cluster now, and add more nodes as I
need them, does ES redistributes its shards every time I add new nodes? How
newly added nodes are utilized in a cluster?

It'll smooth the shards out across the new nodes. There is configuration
for how many concurrent moves can take place and how much bandwidth is ok
per move. The defaults are a bit slow especially if you have fast network
and disks.

Nik

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Atp5piP5bOYqaxnMPw6iW7yKTY8%3DxQhmO56GCcUsa_A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: scalability and creating 1 index per user

2014-02-25 Thread Mark Walkom

20K is a lot of indexes, probably too many as ES will need to maintain
state about each of those in memory which could mean you have nothing left
for caching indexed data!
You might want to look at
http://www.elasticsearch.org/blog/customizing-your-document-routing/instead,
that way you can reduce your index count but still gain the same
usability outcome.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 26 February 2014 08:52, Nikolas Everett nik9...@gmail.com wrote:

On Tue, Feb 25, 2014 at 4:46 PM, ESUser neerav...@gmail.com wrote:

I couldn't find that elastic search has limit on maximum number of
indices. Is it safe/recommended to have say 20K indices for 20K users?
Would it architecture scale well?

Also, if start with say a 5 nodes cluster now, and add more nodes as I
need them, does ES redistributes its shards every time I add new nodes? How
newly added nodes are utilized in a cluster?

Nik

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZbwOJarNCT8wrEzi047V8GCP3mYfh_2X7MQOFrb4QbCg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Sorting date fields

ES loads the values of the fields to sort on into memory cache.

You should update to 1.0.0, maybe you hit a bug that has been fixed.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHo_N3LXo6NiGztL5%3D1GaqVG4QOTCX32OuAjYeOhGfFng%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Elasticsearch 1.0.0 is now GA

I do not use quotes at all. Simply:

node.name: ${HOSTNAME}

--
Ivan

On Tue, Feb 25, 2014 at 7:56 AM, InquiringMind brian.from...@gmail.comwrote:

I always start Elasticsearch from within my own wrapper script, es.sh.

Inside this wrapper script is the following incantation:

NODE_OPT=-D*es.node.name http://es.node.name*=$(uname -n | cut -d'.'
-f1)

This is verified to work on Linux, Mac OS X, and Solaris (at least).

I then pass $NODE_OPT as a command-line argument to the elasticsearchstart-up
script.

BTW, I seem to recall reading that the *es.* prefix on the
node.namevariable is no longer needed for 1.0 GA. But it still works fine, so
I have
left it there.

Brian

On Tuesday, February 25, 2014 10:21:29 AM UTC-5, Tony Su wrote:

One other issue.

So, because it's a unique variable that needs to be set on each machine,
that part of the config won't allow simply pointing all nodes to the same
config script.

Is why, short of looking for the error in the Java code I've been looking
at various simple and more enterprise tools that write individual config
files to each node.

Tony

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBr%2BakzzsYR3cd3UQ2dsTDD0iFSHoS0w7cgUz_fE7HNpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: ES 1.0.0 Source filtering using the Java API

Hmmm, can please double-check. I can see it from the tests here:

https://github.com/elasticsearch/elasticsearch/blob/v1.0.0/src/test/java/org/elasticsearch/search/source/SourceFetchingTests.java

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0676082-5638-4f16-b5b7-76fb42ac2a5e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Sorting date fields

2014-02-25 Thread Adrian

On Tue, Feb 25, 2014 at 11:11:13PM +0100, joergpra...@gmail.com wrote:

Jörg,

 ES loads the values of the fields to sort on into memory cache.

Yes, I've read that - is it known when these caches are flushed?

 You should update to 1.0.0, maybe you hit a bug that has been fixed.

I'll do that. I am just wondering if I am missing something .. 

Best regards, Adrian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20140225222829.GB22436%40server1.850nm.net.
For more options, visit https://groups.google.com/groups/opt_out.

Re: ES 1.0.0 Source filtering using the Java API

2014-02-25 Thread Dan

Yes, I can see it. Thanks.


 On 25 Feb 2014, at 22:23, Binh Ly binhly...@yahoo.com wrote:
 
 Hmmm, can please double-check. I can see it from the tests here:
 
 https://github.com/elasticsearch/elasticsearch/blob/v1.0.0/src/test/java/org/elasticsearch/search/source/SourceFetchingTests.java
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a0676082-5638-4f16-b5b7-76fb42ac2a5e%40googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9E1DEA66-32A9-4AFC-B0EB-C5507A72B266%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat

2014-02-25 Thread Kevin J. Smith

Many, many, way to many, hours later it came down to what everyone was 
suggesting was the problem in the first place: an old elasticsearch jar 
sitting in an abandoned directory but still scanned by tomcat's class 
loader.

Thanks for your help.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1d2f4fba-7ae9-46ce-8ea1-d05ca53e3357%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Sorting date fields

For the cache, see

http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/index-modules-fielddata.html

By default, the field cache size is unbounded, and does not expire. For
sort, it means that each field to sort is examined, all values of the field
are loaded, so the in-memory sorting can take place. It's exactly the same
what Lucene is executing.

With the default settings of the field cache, sort is working alright
(unless the field values will exceed the available memory)

Maybe you can set up an example of your sort  as a demo, so that the error
can be reproduced?

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF4zOnB5TJddf3EZTgGtDqXSBzJAHpZAPo8biUE35wGQg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

copy_to objects?

2014-02-25 Thread asanderson

Does copy_to work with objects?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec31c050-3414-4747-ba77-7e25c7418b88%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

EsRejectedExecutionException when searching date based indices.

2014-02-25 Thread Alex Clark

Hello all, I’m getting failed nodes when running searches and I’m hoping
someone can point me in the right direction. I have indices created per
day to store messages. The pattern is pretty straight forward: the index
for January 1 is messages_20140101, for January 2 is messages_20140102
and so on. Each index is created against a template that specifies 20
shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have
recently upgraded to ES 1.0.

The more often I search, the fewer failed nodes I get (probably caching in
ES) but I can’t get down to 0 failed nodes. I’m using ES for analytics so
the document counts coming back have to be accurate. The aggregate counts
will change depending on the number of node failures. We use the Java API
to create a local node to index and search the documents. However, we also
see the issue if we use the URL search API on port 9200.

If I restrict the search for 30 days then I do not see any failures (it’s
under 1000 nodes so as expected). However, it is a pretty common use case
for our customers to search messages spanning an entire year. Any
suggestions on how I can prevent these failures?

Thank you for your help!

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Need help with a large cluster restart.

2014-02-25 Thread Search User

I have 20 ES data nodes and 10 master nodes in my cluster. I have 6 minimum 
master nodes for the cluster to function. I wanted to know if any one knows 
of a correct way to restart a large cluster. I see different results on 
each cluster restart. Some times, some of the shards are in Unassigned 
state and they are stuck in that state. Some times the shards are getting 
re-allocated. So far, I am always doing a Full Cluster restart. All I want 
to do is restart and come back to the state where it was before restart. I 
really appreciate any insight into this or a link to a documentation about 
the cluster restarts.

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e8a0df2c-38a6-47c0-ad7a-113306733e09%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Lost index metadata and overwriting pre-existing index files

2014-02-25 Thread Danny Berger

Hi - I recently experienced some surprising elasticsearch behavior and I'd
appreciate some verification on the whys behind what we saw. Basically,
during a cluster restart we lost some index metadata causing those indices
to not be realized and loaded from the data nodes (raw index files still
existed on disk), then, before we realized that and had a chance to recover
them, new incoming data caused the cluster to create new indices under the
same names, completely overwriting the original, raw index data on disk
(clearing out and losing a lot of data). If that's unclear or for further
details, I've posted the scenario and straightforward steps to reproduce:
https://github.com/dpb587/elasticsearch-lost-index.

These are my core questions...

1. Is it true that index metadata (sharding size, mapping, etc) will only
ever be stored on master-capable nodes? Previously, my understanding of the
master was that it was primarily responsible for managing cluster state and
coordinating cluster balancing, not persisting index metadata. (I'm not
arguing it doesn't necessarily make sense, just that I didn't realize
cluster state included the index metadata)

2. Is there documentation on elasticsearch.org which more precisely defines
the responsibilities of master and data nodes? The only vague references
I've come across are
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-node.html,

the elasticsearch default configuration file, and various non-authoritative
blog posts and Stack Overflow answers, none of which prompted me to realize
data nodes would not hold their own metadata.

3. Is it true that elasticsearch (Lucene?) will overwrite existing data
files without error or warning if the cluster is not aware of the index? If
so, is there a way to disable that behavior to avoid accidental data loss
due to misconfiguration (aside from the broad `action.auto_create_index`
setting)? If not, is there anything else which would explain the behavior
we saw?

Thank you for your time!

Danny

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9407e415-db8f-461d-b04f-027fda4f5c9c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Put mapping documentation -- What options are available? Specifically, how to store a property but without indexing it?

Luke? :)


On Tue, Feb 25, 2014 at 1:09 PM, Daniel Winterstein 
daniel.winterst...@gmail.com wrote:

 Dear Hariharan, Alex, Luke,

 My apologies. You're quite right. The information is there -- I just
 didn't read far enough down.

 Thank you for your help  persistence.

 Best regards,
  - Daniel

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEmLStnHQCUuMPJHhbcoq8_iQgFX%3D22t9%3DS9gOwWC7C1OtDToA%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD%3Dk0htmXcEwXBBB4T%2BwqNAyA_fOz41DX5cinf3aYsQGg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Compute TF/IDF across indexes

2014-02-25 Thread Luiz Guilherme Pais dos Santos

Hi Ivan,

The DFS query then fetch worked very well!

Thank you!

Cheers,
Luiz Guilherme

On Tue, Feb 25, 2014 at 5:15 PM, Ivan Brusic i...@brusic.com wrote:

Since the DFS query type calculates the TF/IDF values based on the values
in each individual shard, perhaps it ignores which index the shard belongs
to. Easy to test.

Ultimately, you can try script based scoring. The TF/IDF values are
exposed to the scripts, so you can try to apply some type of normalization
yourself. Kludgy and it would impact performance.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

Hopefully DFS queries would work or someone else has a better idea!

Cheers,

Ivan

On Tue, Feb 25, 2014 at 12:00 PM, Luiz Guilherme Pais dos Santos
luizgpsan...@gmail.com wrote:

Hi,

I'm trying to search across multiple indexes and I couldn't understand
the result of the TF/TDF function. I didn't expect for the indexes where
the term is more frequent to get penalized.

Here follows an example:
https://gist.github.com/luizgpsantos/9216108

When searching for the term alice the document {_index: index2,
_type: type, _id: 1} got a score 0.8784157 while {_index:
index1, _type: type, _id: 1} got a score 0.4451987.

Is there any way to calculate the score not penalizing indexes where the
frequency of a term is higher?

Cheers,

--
Luiz Guilherme P. Santos

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgREX6svvcso%2Bf6VqW2Y6-DvBnWUtO5tVod8GAX2b0Bw%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Luiz Guilherme P. Santos

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGLPTbZgwyoBARjwcg9v0sUsjuxw4m_6X1iFQqO6zTHaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Need help with a large cluster restart.

2014-02-25 Thread Mark Walkom

Some of these will help -
http://gibrown.wordpress.com/2013/12/05/managing-elasticsearch-cluster-restart-time/

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 26 February 2014 11:57, Search User feedwo...@gmail.com wrote:

I have 20 ES data nodes and 10 master nodes in my cluster. I have 6
minimum master nodes for the cluster to function. I wanted to know if any
one knows of a correct way to restart a large cluster. I see different
results on each cluster restart. Some times, some of the shards are in
Unassigned state and they are stuck in that state. Some times the shards
are getting re-allocated. So far, I am always doing a Full Cluster restart.
All I want to do is restart and come back to the state where it was before
restart. I really appreciate any insight into this or a link to a
documentation about the cluster restarts.

Thanks.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e8a0df2c-38a6-47c0-ad7a-113306733e09%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZdhaHpnkh8E1OmD5gVzfKjac2b5sd0s2tNqHi4mUVYBA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Interesting question on Transaction Log record mutability

2014-02-25 Thread Yuri Panchenko

Hi guys,

If I turn off automatic indexing and refreshing, and continually execute
partial updates on the same document (say 100 times), do the updates change
the same record in the transaction log or will it create 100 changes? The
reason I'm curious is because when I ask ES to index (or refresh) after a
batch of partial updates, will it try to index the same document 100 times
or just once? So efficiency seems to be important here.

My data structure is a Customer with lots of Transactions with each record
containing a date, description, and dollar amount. I would like to see if
a denormalized data structure works here by keeping a list of transactions
on the customer, then updating new transactions into the same customer
record. But this would be very inefficient if the document would have to be
reindexed as many times as the number of incoming partial updates. I'm
hoping I can control this by turning off indexing/refreshing and let ES
update the same record in the Transaction log. I understand that Lucene
has immutable records, but that does not really mean that the Transaction
log has to have immutability, right?

Thanks for any feedback/thoughts!!

Yuri

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fa6ba2b6-4e41-4470-811c-08e578c8c596%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Kibana: showing a ratio

2014-02-25 Thread Andrew Vine

Ok, I'll check it out

On Tuesday, 25 February 2014 00:17:20 UTC+2, Binh Ly wrote:

 Unfortunately not at the moment. But if you're up to it, you can probably 
 easily write a custom panel that will do this for you.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0a38341f-3c79-43b9-89ef-684b1e82c7e3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Compute TF/IDF across indexes