Re: Problem with keeping in sync Elasticsearch across two data centers

2014-02-25 Thread Amit Soni
Thanks so much everyone for sharing your thoughts!

-Amit.


On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu harii...@gmail.comwrote:


 I think with current ES version you have 3 options.

 - Use the great snapshot and restore feature to snapshot from a DC and
 restore in the other one
 - Index in both DC (so two distinct clusters) from a client level
 - Use Tribe node feature to search or index on multiple clusters

 Reference post

 https://groups.google.com/forum/#!searchin/elasticsearch/TribeNodes/elasticsearch/MG1RerVSWOk/qZFWvr0HPSwJ

 On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote:

 Hi Amit,

 Ivan is correct. You might also check out I believe that you're looking
 for TribeNodes http://www.elasticsearch.org/guide/en/
 elasticsearch/reference/master/modules-tribe.html and see if it fits
 your needs for cross-dc replication.

 --Mike

 On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amits...@gmail.com wrote:

 Hello Michael - Understand that ES is not built to maintain consistent
 cluster state across data centers. what I am wondering is whether there is
 a way for ElasticSearch to continue to replicate data onto a different data
 center (with some delay of course) so that when the primary center fails,
 the fail over data center still has most of the data (may be except for the
 last few seconds/minutes/hours).

 Overall I am looking for a right way to implement cross data center
 deployment of elastic-search!

 -Amit.


 On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick michae...@serenesoftware.
 com wrote:

 Dario,

 I believe that you're looking for TribeNodes http://www.
 elasticsearch.org/guide/en/elasticsearch/reference/
 master/modules-tribe.html

 ES is not built to consistently cluster across DC's / larger network
 lags.

 On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi dari...@gmail.comwrote:

 Hi,
 I've the following problem: our application publishes content to an
 Elasticsearch cluster. We use local data less node for querying
 elasticsearch then, so we don't use HTTP REST and the local nodes are the
 loadbalancer. Now they came with the requirement of having the cluster
 replicated to another data center too (and in the future maybe another
 too... ) for resilience.

 At the very beginning we thought of having one large cluster that goes
 across data centers (crazy). This solution has the following problems:

 - The cluster has the split-brain problem (!)
 - The client data less node will try to do requests across different
 data centers (is there a solution to this???). I can't find a way to avoid
 this. We don't want this to happen because of a) latency and b) 
 firewalling
 issues.

 So we started to think that this solution is not really viable. So we
 thought of having one cluster per data center, which seems more sensible.
 But then here we have the problem that we must publish data to all 
 clusters
 and, if one fails, we have no means of rolling back (unless we try to set
 up a complicated version based rollback system). I find this very
 complicated and hard to maintain, although can be somewhat doable.

 My biggest problem is that we have to keep the data centers in the
 same state at any time, so that if one goes down, we can readily switch to
 the other.

 Any ideas, or can you recommend some support to help use deal with
 this?

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/5424a274-3f6b-4c12-9fe6-621e04f87a8d%
 40googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
  To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/CAP8axnDW4GCDnnzwA%2BcyR%
 2BN4g-26VV4CZ-ZW6SDGgxFL75qy%2Bw%40mail.gmail.com.

 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/CAAOGaQKLUGepyKyR4oDNq1B7-uosp9SWCCeZmkRdQHsSJTSndA%
 40mail.gmail.com.

 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 

Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat

2014-02-25 Thread Alexander Reelsen
Hey,

You dont have by accident two lucene versions in your project, right? Would
like to know more about that class cast exception, but this is the most
verbose output you get I fear?


--Alex


On Tue, Feb 25, 2014 at 8:03 AM, Kevin J. Smith ke...@rootsmith.ca wrote:

 Hi,

 I am using elasticsearch embedded in a tomcat 7 webapp container
 (everything running under java 7.) All libs for elasticsearch are in
 WEB-INF/lib. In v0.90 everything is running swimmingly. We upgraded to v1.0
 (libs and all and paid attention to breaking API calls) but now on Ubuntu
 Linux when I make a call to create an index via the following call:

 final CreateIndexResponse response = 
 _client.admin().indices().prepareCreate(index).setSource(mapping).execute().actionGet();

 I get the following exception:

 org.elasticsearch.common.util.concurrent.UncategorizedExecutionException: 
 Failed execution
 at 
 org.elasticsearch.action.support.AdapterActionFuture.rethrowExecutionException(AdapterActionFuture.java:90)
 at 
 org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:50)
 at com.bitstew.search.SearchNode.createIndex(SearchNode.java:1507)
 at 
 com.bitstew.search.SystemInit.loadIndexDefinition(SystemInit.java:206)
 at com.bitstew.search.SystemInit.loadIndex(SystemInit.java:81)
 at com.bitstew.search.SystemInit.loadIndices(SystemInit.java:52)
 at 
 com.bitstew.ws.servlet.SystemAction.loadIndices(SystemAction.java:1798)
 at 
 com.bitstew.ws.servlet.SystemAction.executeAction(SystemAction.java:383)
 at 
 com.bitstew.ws.servlet.WebServicesDeployer.service(WebServicesDeployer.java:1888)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at 
 org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:176)
 at 
 org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:145)
 at 
 org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:92)
 at 
 org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:381)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at 
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
 at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
 at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
 at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
 at 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.NoClassDefFoundError: 
 org/apache/lucene/codecs/PostingsFormat
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
 at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at 
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1701)
 at 
 

When a Java process with ES Client terminate, does it automatically close the connection?

2014-02-25 Thread Arinto Murdopo
Hi all, 

A question on Java API for ElasticSearch. When a Java process with ES 
Client terminate, does it automatically close the connection? Or we should 
explicitly close the connection to save the resources? 

Best regards, 

Arinto

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/27516684-8413-4e2c-9c44-021baf85b952%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: engine failure, message [OutOfMemoryError[unable to create new native thread]]

2014-02-25 Thread T Vinod Gupta
thanks for your response Jörg, somehow missed replying earlier.
for some strange reason, the max threads setting was reset when i did a
reboot.. so i had to set it back to a high number.



On Tue, Feb 11, 2014 at 12:10 AM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 Your user ran out of thread/process space. This is reported as OOM in
 Java.

 You can check the nproc entry in /etc/security.d/limits.conf for maximum
 settings and compare this with the process table.

 The OS settings regarding threads are usually ok and should not be
 modified. Check if you have modified ES default settings regarding the
 thread pools, and revert this changes to the default settings. If this does
 not help, you should upgrade from 0.90.6 to 0.90.11

 Jörg



 On Tue, Feb 11, 2014 at 6:45 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 hi,
 i had a stable ES cluster on aws ec2 instances till a week ago.. and i
 don't know whats going on and my cluster keeps getting into a bad state
 every few hours. the error says OOM but i know that that is not the reason.
 the instance has enough heap space left. im running ES 0.90.6 version and
 giving half the ram (8gb) to ES process. and i see these messages (the same
 message kind of) in the logs on all the machines in the cluster.

 [2014-02-11 03:17:39,936][WARN ][cluster.action.shard ] [Star-Dancer]
 [facebook_022014][1] sending failed shard for [facebook_022014][1],
 node[zO9Pc1GNSuiVMA_Kn2b3UQ], [R], s[STARTED], indexUUID
 [qN3CUSfVS-m2KlgQQtOqxg], reason [engine failure, message
 [OutOfMemoryError[unable to create new native thread]]]

 any ideas on how to debug this or how to figure out whats causing this
 would be really helpful.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGnDrE-yZnyFgUbMks84KVsyB%3Dp_9UGvQ_DmUo5Diub0g%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHau4yuipzRgx-YmzEzjVJmyuE%3DXkTTZm08pv0-Yp5Lv5Xc97A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


long GC pauses but only one 1 host in the cluster

2014-02-25 Thread T Vinod Gupta
im seeing this consistently happen on only 1 host in my cluster. the other
hosts don't have this problem.. what could be the reason and whats the
remedy?

im running ES on a ec2 m1.xlarge host - 16GB ram on the machine and i
allocate 8GB to ES.

e.g.
[2014-02-25 09:14:38,726][WARN ][monitor.jvm  ] [Lunatica]
[gc][ParNew][1188745][942327] duration [48.3s], collections [1]/[1.1m],
total [48.3s]/[1d], memory [7.9gb]-[6.9gb]/[7.9gb], all_pools {[Code
Cache] [14.5mb]-[14.5mb]/[48mb]}{[Par Eden Space]
[15.7mb]-[14.7mb]/[66.5mb]}{[Par Survivor Space]
[8.3mb]-[0b]/[8.3mb]}{[CMS Old Gen] [7.8gb]-[6.9gb]/[7.9gb]}{[CMS Perm
Gen] [46.8mb]-[46.8mb]/[168mb]}


thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHau4ysuCGHKbgf9WaJ224fHdk0FZuCGG%3DTykAookVNYeGOARQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat

2014-02-25 Thread joergpra...@gmail.com
Maybe there are two Elasticsearch jar versions in the class path.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE31P9BejcT341KN%3DOFVT8sJJKmC-%3DonhRQBhbFvz%3DLuQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: long GC pauses but only one 1 host in the cluster

2014-02-25 Thread Mark Walkom
Depends on a lot of things; java version, ES version, doc size and count,
index size and count, number of nodes.
What are you monitoring the cluster with as well?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 25 February 2014 20:21, T Vinod Gupta tvi...@readypulse.com wrote:

 im seeing this consistently happen on only 1 host in my cluster. the other
 hosts don't have this problem.. what could be the reason and whats the
 remedy?

 im running ES on a ec2 m1.xlarge host - 16GB ram on the machine and i
 allocate 8GB to ES.

 e.g.
 [2014-02-25 09:14:38,726][WARN ][monitor.jvm  ] [Lunatica]
 [gc][ParNew][1188745][942327] duration [48.3s], collections [1]/[1.1m],
 total [48.3s]/[1d], memory [7.9gb]-[6.9gb]/[7.9gb], all_pools {[Code
 Cache] [14.5mb]-[14.5mb]/[48mb]}{[Par Eden Space]
 [15.7mb]-[14.7mb]/[66.5mb]}{[Par Survivor Space]
 [8.3mb]-[0b]/[8.3mb]}{[CMS Old Gen] [7.8gb]-[6.9gb]/[7.9gb]}{[CMS Perm
 Gen] [46.8mb]-[46.8mb]/[168mb]}


 thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAHau4ysuCGHKbgf9WaJ224fHdk0FZuCGG%3DTykAookVNYeGOARQ%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajFysyP0qG7JtOVxdTuGO4H98C0bL0docZuuoQifpFvA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Is there a difference between indexing envelopes or polygons.

2014-02-25 Thread Nicolas THOMASSON
Hey thanks a lot !

Now it works just fine. I didn't see that coming, I thought ES was 
complaining if envelope's coordinates was reverted. My bad...

Nicolas

Le lundi 24 février 2014 15:58:24 UTC+1, Alexander Reelsen a écrit :

 Hey,

 if there is an error, can you please open a github issue? However the 
 envelope shape expects you to set an upper left and lower right boundary. 
 Your coordinates more look like lower left and upper right (meaning you 
 might create quite a huge envelope acutally) - which obviously does not 
 matter for a polygon


 --Alex


 On Sat, Feb 22, 2014 at 11:14 AM, Nicolas THOMASSON 
 nico.th...@gmail.comjavascript:
  wrote:

 Hello,

 I'm new to ES. Please forgive me if I'm asking something stupid.

 Is there a fundamental difference between indexing an envelope or 
 indexing a polygon ?

 For example if I define the area as a envelope

 {
   frame:{
 type:envelope,
 coordinates: [[3,4],[1,2]]
   }
 }

 or as a polygon

 {
   frame:{
 type:polygon,
 coordinates: [[[3,4],[3,2],[1,2],[1,4],[3,4]]]
   }
 }

 As in my comprehension they both define the same area, should I be able 
 to perform the same queries whatever the way I defined the area ? 
 (Currently I have a search query that returns wrong results on the envelope 
 and seems to perform well on the polygon.)

 Thanks for your help,

 Nicolas

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a2fa0fd8-f9a9-435b-9d34-e603c7242d2f%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01f3c770-60b3-494c-bac5-bba5a1ef673e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


fragment_size not used for simple queries

2014-02-25 Thread Neamar Tucote
Hello,

Using the highlight API for a simple query like this:

curl localhost:9200/company_52fb7b90c8318c4dc86b/_search -d'{
  fields: [],
  query: {
filtered: {
  query: {
match: {
  _all: i do not
}
  }
}
  },
  highlight: {
fields: {
  metadatas.*: {
number_of_fragments : 1,
fragment_size : 20
  }
}
  }
}'

This should return snippet whose size does not exceeds 20 characters. Most 
of the time, this works, however i do have one document analyzed with the 
same mappings which yields really long snippets - in fact, it is not 
truncated, and contains all text.

Here is a sample working as expected:

{took:21,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[,
 
and emdo/em not 
hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[
 
take his child.\nemI/em 
emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[
 
resident of DC, emI/em am]}}]}}

And here is the unruly one:

{took:122,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[,
 
and emdo/em not 
hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[
 
take his child.\nemI/em 
emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[
 
resident of DC, emI/em 
am]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc7,_score:0.13437755,highlight:{metadatas.text:[.\nemI/em
 
emdo/em not enlighten those who are not eager to learn, nor 
arouse\nthose who are not anxious to give an explanation themselves. If 
emI/em\nhave presented one corner of the square and they cannot 
come\nback to me with the other three, emI/em should not go over the 
points\nagain.\n― Confucius\nBesides explaining JavaScript, this book tries 
to be an introduction to the basic\nprinciples of programming. Programming, 
it turns out, is hard. The\nfundamental rules are, most of the time, simple 
and clear. But programs,\nwhile built on top of these basic rules, tend to 
become complex enough to\nintroduce their own rules, their own complexity. 
Because of this, programming\nis rarely simple or predictable. As Donald 
Knuth, who is something of a\nfounding father of the field, says, it is an 
art.\nTo get something out of this book, more than just passive reading is 
required.\nTry to stay sharp, make an effort to solve the exercises, and 
only continue on\nwhen you are reasonably sure you understand the material 
that came before.\nThe computer programmer is a creator of universes for 
which he\nalone is responsible. Universes of virtually unlimited complexity 
can\nbe created in the form of computer programs.\n― Joseph Weizenbaum, 
Computer Power and Human Reason\nA program is many things. It is a piece of 
text typed by a programmer, it is\nthe directing force that makes the 
computer emdo/em what it does, it is data in the\ncomputer's memory, 
yet it controls the actions performed on this same\nmemory. Analogies that 
try to compare programs to objects we are familiar\nwith tend to fall 
short, but a superficially fitting one is that of a machine. The\ngears of 
a mechanical watch fit together ingeniously, and if the watchmaker\nwas any 
good, it will accurately show the time for many years. The elements\nof a 
program fit together in a similar way, and if the programmer knows what\nhe 
is doing, the program will run without crashing.\nA computer is a machine 
built to act as a host for these immaterial machines.\nComputers themselves 
can only emdo/em stupidly straightforward things. The reason\nthey are 
so useful is that they emdo/em these things at an incredibly high 
speed. A\nprogram can, by ingeniously combining many of these simple 
actions, emdo/em very\ncomplicated things.\nTo some of us, writing 
computer programs is a fascinating game. A program\nis a building of 
thought. It is costless to build, weightless, growing easily under\nour 
typing hands. If we get carried away, its size and complexity will grow 
out\nof control, confusing even the one who created it. This is the main 
problem of\nprogramming. It is why so much of today's software tends to 
crash, fail,\nscrew up.\nWhen a program works, it is beautiful. The art of 
programming is the skill of\ncontrolling complexity. The great program is 
subdued, made simple in its\ncomplexity.\nToday, many programmers believe 

Re: long GC pauses but only one 1 host in the cluster

2014-02-25 Thread joergpra...@gmail.com
Is this node showing more activity than others? What kind of workload is
this, indexing/search? Are caches used, for filter/facets?

Full GC runs caused by CMS Old Gen may be a sign that you are close at
memory limits and need to add nodes, but it could also mean a lot of other
different things.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEHSS9tmb26PPcm5uB4QNurczaXvo8iRXkj9APCFUuBHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Problem with keeping in sync Elasticsearch across two data centers

2014-02-25 Thread Dario Rossi
I will try the tribe node feature, even if I don't understand it 
completely... but I think it deserves some experimentation

Il giorno martedì 25 febbraio 2014 08:05:05 UTC, amit.soni ha scritto:

 Thanks so much everyone for sharing your thoughts!

 -Amit.


 On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu 
 hari...@gmail.comjavascript:
  wrote:


 I think with current ES version you have 3 options.

 - Use the great snapshot and restore feature to snapshot from a DC and 
 restore in the other one
 - Index in both DC (so two distinct clusters) from a client level
 - Use Tribe node feature to search or index on multiple clusters

 Reference post

 https://groups.google.com/forum/#!searchin/elasticsearch/TribeNodes/elasticsearch/MG1RerVSWOk/qZFWvr0HPSwJ

 On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote:

 Hi Amit,

 Ivan is correct. You might also check out I believe that you're looking 
 for TribeNodes http://www.elasticsearch.org/guide/en/
 elasticsearch/reference/master/modules-tribe.html and see if it fits 
 your needs for cross-dc replication. 

 --Mike

 On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amits...@gmail.com wrote:

 Hello Michael - Understand that ES is not built to maintain consistent 
 cluster state across data centers. what I am wondering is whether there is 
 a way for ElasticSearch to continue to replicate data onto a different 
 data 
 center (with some delay of course) so that when the primary center fails, 
 the fail over data center still has most of the data (may be except for 
 the 
 last few seconds/minutes/hours).

 Overall I am looking for a right way to implement cross data center 
 deployment of elastic-search!

 -Amit.


 On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick 
 michae...@serenesoftware.com wrote:

 Dario,

 I believe that you're looking for TribeNodes http://www.
 elasticsearch.org/guide/en/elasticsearch/reference/
 master/modules-tribe.html

 ES is not built to consistently cluster across DC's / larger network 
 lags. 

 On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi dari...@gmail.comwrote:

 Hi, 
 I've the following problem: our application publishes content to an 
 Elasticsearch cluster. We use local data less node for querying 
 elasticsearch then, so we don't use HTTP REST and the local nodes are 
 the 
 loadbalancer. Now they came with the requirement of having the cluster 
 replicated to another data center too (and in the future maybe another 
 too... ) for resilience. 

 At the very beginning we thought of having one large cluster that 
 goes across data centers (crazy). This solution has the following 
 problems:

 - The cluster has the split-brain problem (!)
 - The client data less node will try to do requests across different 
 data centers (is there a solution to this???). I can't find a way to 
 avoid 
 this. We don't want this to happen because of a) latency and b) 
 firewalling 
 issues.

 So we started to think that this solution is not really viable. So we 
 thought of having one cluster per data center, which seems more 
 sensible. 
 But then here we have the problem that we must publish data to all 
 clusters 
 and, if one fails, we have no means of rolling back (unless we try to 
 set 
 up a complicated version based rollback system). I find this very 
 complicated and hard to maintain, although can be somewhat doable. 

 My biggest problem is that we have to keep the data centers in the 
 same state at any time, so that if one goes down, we can readily switch 
 to 
 the other.

 Any ideas, or can you recommend some support to help use deal with 
 this?
  
 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, 
 send an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/5424a274-3f6b-4c12-9fe6-621e04f87a8d%
 40googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.


  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
  To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/CAP8axnDW4GCDnnzwA%2BcyR%
 2BN4g-26VV4CZ-ZW6SDGgxFL75qy%2Bw%40mail.gmail.com.
  
 For more options, visit https://groups.google.com/groups/opt_out.


  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/CAAOGaQKLUGepyKyR4oDNq1B7-
 uosp9SWCCeZmkRdQHsSJTSndA%40mail.gmail.com.

 For more options, visit https://groups.google.com/groups/opt_out.


  -- 
 You received 

Expanding terms

2014-02-25 Thread Petr Janský
Hello,

I'trying to find a way how to:

   1. expand a term - get all words and count that are relevant for a 
   term(s)
   2. get relevant words for a query - list of all words that 
   are highlighted
   3. get phrases by word - e.g. for word war = world war, second word 
   war, the second word war, 

and complicated one

   1. is there a way how to get/highlighted only words that are relevant 
   for multiple term conditions e.g. 

must: {
  wildcard: {
 content_morfo: {
   value: v*
  }
},
   wildcard: {
 content_morfo: {
   value: ==AA*==
  }
}
}

thx
Petr
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f2c0eeb5-8da3-4554-bdac-d6b5122f01c1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Is consistent scoring across 2 documents that match either 1 of 2 properties possible?

2014-02-25 Thread Michelle May
Hi,

We've been struggling with this for a few days now so I think it is time to 
pick the expert brains, although probably best to explain by delving 
straight into an example:


1) Assuming we have the following document:

{

id: /people/person1,

dob: 1980-04-12,

fullname: Mickey Arthur Mouse,

aliasfullname: Mickey Bernard Mouse
}

2) When we do the this search:

{
query :
{
bool :
{
should : [
{
match :
{
fullname :
{
query : mickey arthur mouse
}
}
},
{
match :
{
aliasfullname :
{
query : mickey arthur mouse
}
}
}
]
}
}
}

we get score 13.37 (for example)


1) Now, assuming we have this document (same as above except no 
aliasfullname)

{
id: /people/person1,
dob: 1980-04-12,
fullname: Mickey Arthur Mouse
}

2) When we do the search from step 2 above we get score 3.76 (for example)


How can we ensure that if a search is done on either the real name or the 
alias name (we won't know which is being searched on) that a person with an 
alias does not get scored higher than someone without an alias? What type 
of query could we use to ensure that both searches return the same score? 
We've tried dis_max and have omit_norms: true on the searched fields but 
nothing gives the same score so I am beginning to wonder if it is an 
unrealistic expectation. 

Any assistance/advice would be greatly appreciated.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aff6faac-3ead-4a69-ab70-12d2ccd3bc59%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: ES doesn't take into account field level boost in prefix query over catch-all field?

2014-02-25 Thread Maxim Vorobyov
Hi All. I have the same issue and would highly appreciate answer.
Many Thanks! Maxim

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/31a85891-ab76-402a-83a0-bbf442defc00%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Is there a difference between indexing envelopes or polygons.

2014-02-25 Thread Alexander Reelsen
Hey,

the problem here is that elasticsearch cant tell by itself if the envelope
borders need to be reverted or not... maybe you want/need such an envelope
in your calculations. Hard to tell from a machine perspective :-)


--Alex


On Tue, Feb 25, 2014 at 10:38 AM, Nicolas THOMASSON 
nico.thomas...@gmail.com wrote:

 Hey thanks a lot !

 Now it works just fine. I didn't see that coming, I thought ES was
 complaining if envelope's coordinates was reverted. My bad...

 Nicolas

 Le lundi 24 février 2014 15:58:24 UTC+1, Alexander Reelsen a écrit :

 Hey,

 if there is an error, can you please open a github issue? However the
 envelope shape expects you to set an upper left and lower right boundary.
 Your coordinates more look like lower left and upper right (meaning you
 might create quite a huge envelope acutally) - which obviously does not
 matter for a polygon


 --Alex


 On Sat, Feb 22, 2014 at 11:14 AM, Nicolas THOMASSON nico.th...@gmail.com
  wrote:

 Hello,

 I'm new to ES. Please forgive me if I'm asking something stupid.

 Is there a fundamental difference between indexing an envelope or
 indexing a polygon ?

 For example if I define the area as a envelope

 {
   frame:{
 type:envelope,
 coordinates: [[3,4],[1,2]]
   }
 }

 or as a polygon

 {
   frame:{
 type:polygon,
 coordinates: [[[3,4],[3,2],[1,2],[1,4],[3,4]]]
   }
 }

 As in my comprehension they both define the same area, should I be able
 to perform the same queries whatever the way I defined the area ?
 (Currently I have a search query that returns wrong results on the envelope
 and seems to perform well on the polygon.)

 Thanks for your help,

 Nicolas

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/a2fa0fd8-f9a9-435b-9d34-e603c7242d2f%
 40googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/01f3c770-60b3-494c-bac5-bba5a1ef673e%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8_oP8dVY4%3DBur2eWxbiJDXUx9LxpjXnBqpRksk1yQnkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [Hadoop] Any goos tut to start with ?

2014-02-25 Thread Yann Barraud
Hi Costin,

I did not see the video. It's a good starting point. I 'm not a big fan of
videos though. I might reproduce it using Hortonworks sandbox.


Cordialement,
Yann Barraud


2014-02-24 13:35 GMT+01:00 Costin Leau costin.l...@gmail.com:

 Have you looked at the video? It does exactly that.

 Is there something missing?


 On 2/24/2014 12:41 PM, Yann Barraud wrote:

 Hi Costin,

 What I'd love to see is a step by step tut have ES and Haddop working
 together.

 Is there somewhere I can have something like this ?

 Regards,
 Yann

 Le jeudi 20 février 2014 16:25:28 UTC+1, John Pauley a écrit :

 Any more tutorials, say append to list?

 On Wednesday, February 19, 2014 12:54:15 PM UTC-5, Costin Leau wrote:

 Hi,

 We tried to make the docs friendly in this regard - each section
 (from Map/Reduce to Pig) has several examples.
 There's
 also a short video which guides you through the various features
 (with code) available here [1].

 Hope this helps,

 [1] http://www.elasticsearch.org/videos/search-and-analytics-
 with-hadoop-and-elasticsearch/
 http://www.elasticsearch.org/videos/search-and-analytics-
 with-hadoop-and-elasticsearch/

 On 19/02/2014 5:11 PM, Yann Barraud wrote:
  Hi everyone,
 
  Do you have a good pointer to a tut to start playing with ES 
 Hadoop ? Using Hortonworks VM for example ?
 
  Thanks.
 
  Cheers,
  Yann
 
  --
  You received this message because you are subscribed to the
 Google Groups elasticsearch group.
  To unsubscribe from this group and stop receiving emails from
 it, send an email to
 elasticsearc...@googlegroups.com.
  To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a7b35ba0-
 2b42-4270-bb64-228dad7fc426%40googlegroups.com
  https://groups.google.com/d/msgid/elasticsearch/a7b35ba0-
 2b42-4270-bb64-228dad7fc426%40googlegroups.com.
  For more options, visithttps://groups.google.com/groups/opt_out
 https://groups.google.com/groups/opt_out.

 --
 Costin

 --

 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c55aa6e1-
 adde-4044-baee-c80516fe00e6%40googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.

 --
 Costin

 --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit https://groups.google.com/d/
 topic/elasticsearch/fUX6tYNRu1k/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/530B3CA2.6010001%40gmail.com.

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BhvuXftgMHOUciMUawAK5%2BbWEiuq4-aczdUL_umP8%3D6sc_Hqg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Problem with keeping in sync Elasticsearch across two data centers

2014-02-25 Thread Dario Rossi
From the docs it is not clear if having two clusters with the same indexes, 
a indexing operation will have effect on both...

There is a line that leaves me bit doubtful:

However, there are a few exceptions:

   - The merged view cannot handle indices with the same name in multiple 
   clusters. It will pick one of them and discard the other.


Il giorno martedì 25 febbraio 2014 10:04:05 UTC, Dario Rossi ha scritto:

 I will try the tribe node feature, even if I don't understand it 
 completely... but I think it deserves some experimentation

 Il giorno martedì 25 febbraio 2014 08:05:05 UTC, amit.soni ha scritto:

 Thanks so much everyone for sharing your thoughts!

 -Amit.


 On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu 
 hari...@gmail.comwrote:


 I think with current ES version you have 3 options.

 - Use the great snapshot and restore feature to snapshot from a DC and 
 restore in the other one
 - Index in both DC (so two distinct clusters) from a client level
 - Use Tribe node feature to search or index on multiple clusters

 Reference post

 https://groups.google.com/forum/#!searchin/elasticsearch/TribeNodes/elasticsearch/MG1RerVSWOk/qZFWvr0HPSwJ

 On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote:

 Hi Amit,

 Ivan is correct. You might also check out I believe that you're 
 looking for TribeNodes http://www.elasticsearch.org/guide/en/
 elasticsearch/reference/master/modules-tribe.html and see if it fits 
 your needs for cross-dc replication. 

 --Mike

 On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amits...@gmail.com wrote:

 Hello Michael - Understand that ES is not built to maintain consistent 
 cluster state across data centers. what I am wondering is whether there 
 is 
 a way for ElasticSearch to continue to replicate data onto a different 
 data 
 center (with some delay of course) so that when the primary center fails, 
 the fail over data center still has most of the data (may be except for 
 the 
 last few seconds/minutes/hours).

 Overall I am looking for a right way to implement cross data center 
 deployment of elastic-search!

 -Amit.


 On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick 
 michae...@serenesoftware.com wrote:

 Dario,

 I believe that you're looking for TribeNodes http://www.
 elasticsearch.org/guide/en/elasticsearch/reference/
 master/modules-tribe.html

 ES is not built to consistently cluster across DC's / larger network 
 lags. 

 On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi dari...@gmail.comwrote:

 Hi, 
 I've the following problem: our application publishes content to an 
 Elasticsearch cluster. We use local data less node for querying 
 elasticsearch then, so we don't use HTTP REST and the local nodes are 
 the 
 loadbalancer. Now they came with the requirement of having the cluster 
 replicated to another data center too (and in the future maybe another 
 too... ) for resilience. 

 At the very beginning we thought of having one large cluster that 
 goes across data centers (crazy). This solution has the following 
 problems:

 - The cluster has the split-brain problem (!)
 - The client data less node will try to do requests across different 
 data centers (is there a solution to this???). I can't find a way to 
 avoid 
 this. We don't want this to happen because of a) latency and b) 
 firewalling 
 issues.

 So we started to think that this solution is not really viable. So 
 we thought of having one cluster per data center, which seems more 
 sensible. But then here we have the problem that we must publish data 
 to 
 all clusters and, if one fails, we have no means of rolling back 
 (unless we 
 try to set up a complicated version based rollback system). I find this 
 very complicated and hard to maintain, although can be somewhat doable. 

 My biggest problem is that we have to keep the data centers in the 
 same state at any time, so that if one goes down, we can readily switch 
 to 
 the other.

 Any ideas, or can you recommend some support to help use deal with 
 this?
  
 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, 
 send an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/5424a274-
 3f6b-4c12-9fe6-621e04f87a8d%40googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.


  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, 
 send an email to elasticsearc...@googlegroups.com.
  To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/
 CAP8axnDW4GCDnnzwA%2BcyR%2BN4g-26VV4CZ-ZW6SDGgxFL75qy%
 2Bw%40mail.gmail.com.
  
 For more options, visit https://groups.google.com/groups/opt_out.


  -- 
 You received this message because you are subscribed to the 

dumping index is slow as hell

2014-02-25 Thread Attila Bukor

Hey guys,

I needed to migrate an index to a new cluster and after a lot of hesitating
I decided to give it a try to taskrabbit's elasticsearch-dump:
https://github.com/taskrabbit/elasticsearch-dump

I tested it with 10k documents, which worked fine, so I decided to migrate
the real data to the new cluster with the following command:

elasticdump --input=http://oldcluster:9200/my_index \
--output=http://newcluster:9200/my_index

my_index contains ~5 million documents, so I expected it to take a while,
but not *this* long. It's been running since 10 AM UTC+1 yesterday and it's
migrated only a bit over 1.5 million docs so far - in roughly 28 hours.

When it started, it indexed around 100 docs per second, by the time I went
home from work (around 5 PM UTC+1), it was only around 30 docs/s, now it's
around 10 docs/s.

Being a newbie with ElasticSearch, I don't even know how to diagnose what is
the reason of this slowness. Could you help me with this?

Keep in mind that I'm at work for 2 or 3 more hours today, but after that,
I won't have access to the servers until next Monday. Feel free to suggest
anything in that time too, I will read it and try to reply, but can't look
into anything or do anything about it.

Regards,
Attila Bukor

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/lei4qh%24370%241%40ger.gmane.org.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Aggregation on parent/child documents

2014-02-25 Thread Augusto Uehara
We run 4 instances of ES 1.0.0 using 30G for JVM. We run 64-bit OpenJDK 
1.7.0_25 on ubuntu servers.

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 515139
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 64000
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 515139
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

And I also disabled swap on linux.

You can use this gist to simulate the issue we have: 
https://gist.github.com/chaos-generator/9143655

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a6db68fc-a7c8-43af-bbc4-59a0866aba36%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: fragment_size not used for simple queries

2014-02-25 Thread Luca Cavanna
It would be useful if you can post a complete recreation, mappings 
included. Which highlighter are you using?

On Tuesday, February 25, 2014 10:39:10 AM UTC+1, Neamar Tucote wrote:

 Hello,

 Using the highlight API for a simple query like this:

 curl localhost:9200/company_52fb7b90c8318c4dc86b/_search -d'{
   fields: [],
   query: {
 filtered: {
   query: {
 match: {
   _all: i do not
 }
   }
 }
   },
   highlight: {
 fields: {
   metadatas.*: {
 number_of_fragments : 1,
 fragment_size : 20
   }
 }
   }
 }'

 This should return snippet whose size does not exceeds 20 characters. Most 
 of the time, this works, however i do have one document analyzed with the 
 same mappings which yields really long snippets - in fact, it is not 
 truncated, and contains all text.

 Here is a sample working as expected:

 {took:21,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[,
  
 and emdo/em not 
 hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[
  
 take his child.\nemI/em 
 emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[
  
 resident of DC, emI/em am]}}]}}

 And here is the unruly one:

 {took:122,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:19,max_score:0.24860834,hits:[{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd8,_score:0.24860834,highlight:{metadatas.text:[,
  
 and emdo/em not 
 hesitate]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c5949ba7daaa265ffdd6,_score:0.14883985,highlight:{metadatas.text:[
  
 take his child.\nemI/em 
 emdo/em]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc8,_score:0.1365959,highlight:{metadatas.text:[
  
 resident of DC, emI/em 
 am]}},{_index:company_52fb7b90c8318c4dc86b,_type:document,_id:5309c57a9ba7daaa265ffdc7,_score:0.13437755,highlight:{metadatas.text:[.\nemI/em
  
 emdo/em not enlighten those who are not eager to learn, nor 
 arouse\nthose who are not anxious to give an explanation themselves. If 
 emI/em\nhave presented one corner of the square and they cannot 
 come\nback to me with the other three, emI/em should not go over the 
 points\nagain.\n― Confucius\nBesides explaining JavaScript, this book tries 
 to be an introduction to the basic\nprinciples of programming. Programming, 
 it turns out, is hard. The\nfundamental rules are, most of the time, simple 
 and clear. But programs,\nwhile built on top of these basic rules, tend to 
 become complex enough to\nintroduce their own rules, their own complexity. 
 Because of this, programming\nis rarely simple or predictable. As Donald 
 Knuth, who is something of a\nfounding father of the field, says, it is an 
 art.\nTo get something out of this book, more than just passive reading is 
 required.\nTry to stay sharp, make an effort to solve the exercises, and 
 only continue on\nwhen you are reasonably sure you understand the material 
 that came before.\nThe computer programmer is a creator of universes for 
 which he\nalone is responsible. Universes of virtually unlimited complexity 
 can\nbe created in the form of computer programs.\n― Joseph Weizenbaum, 
 Computer Power and Human Reason\nA program is many things. It is a piece of 
 text typed by a programmer, it is\nthe directing force that makes the 
 computer emdo/em what it does, it is data in the\ncomputer's memory, 
 yet it controls the actions performed on this same\nmemory. Analogies that 
 try to compare programs to objects we are familiar\nwith tend to fall 
 short, but a superficially fitting one is that of a machine. The\ngears of 
 a mechanical watch fit together ingeniously, and if the watchmaker\nwas any 
 good, it will accurately show the time for many years. The elements\nof a 
 program fit together in a similar way, and if the programmer knows what\nhe 
 is doing, the program will run without crashing.\nA computer is a machine 
 built to act as a host for these immaterial machines.\nComputers themselves 
 can only emdo/em stupidly straightforward things. The reason\nthey are 
 so useful is that they emdo/em these things at an incredibly high 
 speed. A\nprogram can, by ingeniously combining many of these simple 
 actions, emdo/em very\ncomplicated things.\nTo some of us, writing 
 computer programs is a fascinating game. A program\nis a building of 
 thought. It is costless to build, weightless, growing easily under\nour 
 typing hands. If we get carried away, its size and complexity will grow 
 out\nof control, confusing even the one who created it. This is the main 
 problem of\nprogramming. It 

How can I do date-calculation/conversion in an MVEL script in ES 1.0.0?

2014-02-25 Thread h . b . wassenaar
Hello,

I'm considering upgrading from 0.90.3 to 1.0.0, but I've hit a snag with 
one of the MVEL scripts I use to update documents through the update api. 
My update-script uses Joda to parse/format/manipulate dates, but it appears 
that Joda is no longer available to MVEL scripts in version 1.0.0. (I think 
it changed in commit c7f6c52 from november 24th, so it's been like that for 
a while)

Here are some code-snippets of how it currently works

   parser = Joda.forPattern('dateOptionalTime').parser();
   lastdate = parser.parseMillis(update.date);
   prevdate = parser.parseMillis(ctx._source.published);
   timediff = lastdate-prevdate;
   ...
   nextdate=parser.parseMutableDateTime(update.date);
   nextdate.addHours(calculated_hours); 
   ctx._source.nextupdate = 
nextdate.toString('-MM-dd\'T\'HH:mm:ss\'Z\'');

Is there some way to do similar date/time calculations in ES 1.0.0?

I've considered I could use a native script to do these updates; however, 
when I wrote the update-script I tested this, and to my surprise using a 
native script proved to be significantly slower than using the MVEL script 
for this use-case.


Any help would be much appreciated,
   -- Harmen

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a263df38-557e-4221-9b71-19ba78737edf%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


[new project using es] Elasticboard - tracking github data

2014-02-25 Thread Mihnea Dobrescu-Balaur
Hello again,

Using the recently released github river[1], I'm working on an open source 
dashboard for keeping track of github projects. It's in the working 
protoype state right now and I'm trying to figure out what kind of 
information is desired and relevant.

The idea is that people/orgs who want to use this will self deploy their 
own instance, but in order to show what the project is about, I set up a 
hosted demo. There's a landing page here[2] and the demo getting data for 
the elasticsearch repo here[3]. I'd love some feedback!

[1] 
https://groups.google.com/forum/#!searchin/elasticsearch/github$20river/elasticsearch/Oy57lUSn_aY/6w6uBgNcq_MJ
[2] http://elasticboard.mihneadb.net/landing.html
[3] http://elasticboard.mihneadb.net/#/elasticsearch/elasticsearch


Thanks,
Mihnea

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c4bbe8a9-78ed-48ad-8e22-04b76c2b8af0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: the document payload of the Delete api

2014-02-25 Thread Binh Ly
Unfortunately, you'll have to GET it first.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/49c78498-d1b6-44ce-bb85-b6f9d9d9b7a7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: DateRange aggregation semantics - include_lower/include_upper?

2014-02-25 Thread Binh Ly
Yes, you are correct. The from is inclusive, and the to is exclusive.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01116905-849f-44a4-aa61-17ec6149ba00%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread Tony Su
In principle,
I agree with everything you describe about best practice but those 
practices become more important only when you're managing larger numbers of 
nodes.
 
For those who manage only 5 nodes, the balance may swing in favor of just 
edit each machine's config directly instead of a more centralized 
strategy. It's a cost/benefit of which approach requires more work.
 
As far as re-making configs with every version change, from what I've seen 
so far I don't think that is the intention of Elasticsearch (currently). 
The configs I see in elasticsearch.yml are largely consistent across major 
and minor versions... although there are exceptions.
 
But, the current scenario doesn't even change versions... Much.
The scenario is a reasonable and common reaction when repairing a 
package-based installation. SOP is after attempting a package 
update/upgrade (fails) and then a forced update (forced re-install in 
place), the normal last attempt is a manual removal and re-installation 
(which is required when upgrading from RC to GA). And, then the config file 
is removed.
 
That said, with my latest RC  GA upgrades, I noticed the new workaround 
the Packager is now doing. Although the config file is still being wiped 
out, a backup of the config file is being created. So, although a bit 
unusual it works for me and should prevent the worst complaints in the 
future.
 
Feature Request:
Improving the current packaging practice of creating a config backup, it 
would also be nice if the old config file can be parsed for uncommented 
commands and merged into the new config file.
 
Tony
 
 
 
 

On Monday, February 24, 2014 12:43:04 PM UTC-8, InquiringMind wrote:

 I am not sure what the complaints are all about.

 Over the past 20 years, my best practices are to treat the installed 
 configurations as a template that is subject to change upon reinstallation. 
 Then, I always create my own configuration and point the server to it, and 
 never point a server to the package's installed configuration.

 And then, I maintain all of my customized configurations separately from 
 the installed packages.

 Pointing to the installed configuration that you've modified is really no 
 different that running the installed jars that you have modified.Would you 
 really expect a reinstallation of Elasticsearch to preserve the changes you 
 have made to the originally installed elasticsearch-1.0.0.jar file?

 The beauty of Elasticsearch's configurations are that they document 
 everything but actually set nothing. That's even better than the 
 configurations for the servers I write in which I set everything but to the 
 default values in the code. Same end result; different means of getting 
 there. In fact, the installed config is a big part of the package's 
 documentation about what is available to be configured. So I would expected 
 it to change on each installation.

 And for the turn-key servers I developed in the past where the configs 
 were not maintained by Puppet or Chef or some other automated tool, I would 
 write a post installation step that would copy the installed config over a 
 taret config, but only if that target config did not exist. That way, the 
 customer could modify the target config and their changes would be 
 preserved. But today, our elasticsearch.yml file and other server configs 
 are maintained by Puppet and because we don't touch the installed config we 
 never have any problems with overwriting on a reinstallation.

 Brian

 On Monday, February 17, 2014 5:14:46 PM UTC-5, Tony Su wrote:

 What?!
  
 Removing and re-installing the ES package either removes the original or 
 over-writes the existing elasticsearch.yml
  
 The is contrary to conventional packaging from what I've generally seen.
 Typically, when a package is removed, the configuration fie is left alone 
 and must be removed manually if desired
  
 No big deal in my case, I've been working on elasticsearch.yml heavily 
 for several days so can remember all the customizations I've made, but IMO 
 this is a disaster waiting to happen for clusters with new Admins or those 
 who attempt to fix a problem by removing and re-installing.
  
 Leaving the config file alone and re-using is the safer option.
  
 IMO,
 Tony

  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d7d6de3-799d-4812-884a-698aff9d6121%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread Tony Su
One other issue.
 
I have never been able to deploy an elasticsearch.yml which names the 
cluster node the same as the machine hostname despite the suggestions in 
another thread. It just won't work, and based on another thread I strongly 
suspect the underlying Java code implements single quotes instead of double 
quotes when evaluating the variable. 
 
So, because it's a unique variable that needs to be set on each machine, 
that part of the config won't allow simply pointing all nodes to the same 
config script.
 
Is why, short of looking for the error in the Java code I've been looking 
at various simple and more enterprise tools that write individual config 
files to each node.
 
Tony

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/958a2321-91c6-45a7-920b-fb6b3b08621c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11

2014-02-25 Thread Binh Ly
This is a known issue and will be fixed shortly. For now, what you can do 
is run _optimize on all your indexes and set max_num_segments to 1, like 
below. Note that this may take a while depending on the size of your 
indexes.

http://localhost:9200/_optimize?max_num_segments


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6d794291-eca6-46cb-93e8-d45a513990d3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Relation Between Heap Size and Total Data Size

2014-02-25 Thread Umutcan

Hi,

I created a Elasticsearch cluster with 4 instance. Elasticsearch 0.90.10 
is running all of them. Heap size is 6 GB for all the instances, so 
total heap size is 24 GB. I have 5 shard for each index and each shard 
has 1 replica. A new index is created for every day, so all indices have 
nearly same size.


When total data size reaches around 100 GB (replicas are included), my 
cluster begins to  fail to allocate some of the shards (status yellow). 
After I delete some old indices and restart all the nodes, everything is 
fine (status is green). If I do not delete some data, status eventually 
turns red.


So, I am wondering that is there any relationship between heap size and 
total data size? Is there any formula to determine heap size based on 
data size?


Thanks,
Umutcan

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/530CB5FE.80203%40gamegos.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Relation Between Heap Size and Total Data Size

2014-02-25 Thread Randy
Probably low on disc on at least one machine.  Monitor disc usage. Also look in 
the logs and find out what error you are getting. Report back. 

Sent from my iPhone

 On Feb 25, 2014, at 7:25 AM, Umutcan umut...@gamegos.com wrote:
 
 Hi,
 
 I created a Elasticsearch cluster with 4 instance. Elasticsearch 0.90.10 is 
 running all of them. Heap size is 6 GB for all the instances, so total heap 
 size is 24 GB. I have 5 shard for each index and each shard has 1 replica. A 
 new index is created for every day, so all indices have nearly same size.
 
 When total data size reaches around 100 GB (replicas are included), my 
 cluster begins to  fail to allocate some of the shards (status yellow). After 
 I delete some old indices and restart all the nodes, everything is fine 
 (status is green). If I do not delete some data, status eventually turns red.
 
 So, I am wondering that is there any relationship between heap size and total 
 data size? Is there any formula to determine heap size based on data size?
 
 Thanks,
 Umutcan
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/530CB5FE.80203%40gamegos.com.
 For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/A319EF53-D5C4-485C-B320-574C677D8314%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11

2014-02-25 Thread Benoît
I forgot to say that one consequence is that the 'head' plugin interface 
remain empty.

The following request timeout :
 *  _status
 *  stats?all=true
 *  _nodes

How to have some information on the cluster in this conditions ? 

Benoît

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1511903-f9c8-4332-8451-ed10aaa0fcad%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread InquiringMind
I always start Elasticsearch from within my own wrapper script, es.sh.

Inside this wrapper script is the following incantation:

NODE_OPT=-D*es.node.name*=$(uname -n | cut -d'.' -f1)

This is verified to work on Linux, Mac OS X, and Solaris (at least).

I then pass $NODE_OPT as a command-line argument to the elasticsearchstart-up 
script.

BTW, I seem to recall reading that the *es.* prefix on the node.namevariable 
is no longer needed for 1.0 GA. But it still works fine, so I have 
left it there.

This has always worked since ES 0.19.4 (the very first version I installed 
and started using). I worked closely with our deployment engineer, and we 
settled on a set of wrapper scripts that let me start everything on my 
laptop in exactly the same way that it all starts on a production server.

Brian

On Tuesday, February 25, 2014 10:21:29 AM UTC-5, Tony Su wrote:

 One other issue.
  
 I have never been able to deploy an elasticsearch.yml which names the 
 cluster node the same as the machine hostname despite the suggestions in 
 another thread. It just won't work, and based on another thread I strongly 
 suspect the underlying Java code implements single quotes instead of double 
 quotes when evaluating the variable. 
  
 So, because it's a unique variable that needs to be set on each machine, 
 that part of the config won't allow simply pointing all nodes to the same 
 config script.
  
 Is why, short of looking for the error in the Java code I've been looking 
 at various simple and more enterprise tools that write individual config 
 files to each node.
  
 Tony


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5af309d0-22b5-4809-907d-92b099b36632%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: the document payload of the Delete api

2014-02-25 Thread InquiringMind
And note that if you GET it and save the version number, and then pass the 
version number into the DELETE, you can be sure it will be deleted only if 
nobody else updated it in the meantime.

This all works so much better in Java than in scripts + curl.

Brian

On Tuesday, February 25, 2014 9:35:37 AM UTC-5, Binh Ly wrote:

 Unfortunately, you'll have to GET it first.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f3e7e97a-284b-4d3a-b333-b03b8c9f65fe%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Default analyzer when the given analyzer not found?

2014-02-25 Thread InquiringMind
Based on posts to this newsgroup early on in my usage of ES (over a year 
now!), I used to put the following in my elasticsearch.yml file. Any field 
that was not explicitly assigned an analyzer and that was deemed by ES to 
be a string would pick up English snowball analyzer with no stop words (my 
preference at the time):

index:
  analysis:
analyzer:
  # set stemming analyzer with no stop words as the default
  default:
type: snowball
language: English
stopwords: _none_
filter:
  stopWordsFilter:
type: stop
stopwords: _none_

But since then, I've long abandoned this default approach. Instead, I 
explicitly assigned an analyzer to each and every field (you know, like a 
real database!). And then my elasticsearch.yml file now contains the 
following:

# Do not automatically create an index when a document is loaded, and do
# not automatically index unknown (unmapped) fields:
action.auto_create_index: false
index.mapper.dynamic: false

Therefore, I cannot automatically create an index during a load (which 
would then create a useless index without any of the analyzers and mappings 
I've carefully crafted). And I cannot get ES to automatically create a new 
field; this is very helpful when someone uses a low-level tool such as 
curl, and misspells a field name; ES will no longer create, for example, 
the givveName field when it should have been givenName.

Brian

On Tuesday, February 25, 2014 8:57:30 AM UTC-5, Frederic Meyer wrote:

 Hey there.

 Nearly one year after this initial post, I'm running into the exact same 
 issue, even though ES is now released (1.0).

 Has anybody found a proper solution within ES? I've spent like 1 hour 
 searching for this, without any luck.

 The only ugly workaround that I can think of right now is deal with a fall 
 back language at the data level i.e. before sending documents to be indexed 
 by ES.

 Thanks.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2f1dbdc3-299a-46fa-855f-a34c74497c43%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Default analyzer when the given analyzer not found?

2014-02-25 Thread Frederic Meyer
Ah yes, via the default in the yaml configuration file, of course. I'll 
give that a try, thanks!

It is a pity though that the default analyzer doesn't seem to do his job 
of processing all unmatched document as far as the _analyze field is 
concerned.

Thanks
Fred

P.S. : I do understand your position about not indexing documents for which 
you haven't craft a dedicated analyzer yet. Makes real sense.

On Tuesday, February 25, 2014 5:09:43 PM UTC+1, InquiringMind wrote:

 Based on posts to this newsgroup early on in my usage of ES (over a year 
 now!), I used to put the following in my elasticsearch.yml file. Any field 
 that was not explicitly assigned an analyzer and that was deemed by ES to 
 be a string would pick up English snowball analyzer with no stop words (my 
 preference at the time):

 index:
   analysis:
 analyzer:
   # set stemming analyzer with no stop words as the default
   default:
 type: snowball
 language: English
 stopwords: _none_
 filter:
   stopWordsFilter:
 type: stop
 stopwords: _none_

 But since then, I've long abandoned this default approach. Instead, I 
 explicitly assigned an analyzer to each and every field (you know, like a 
 real database!). And then my elasticsearch.yml file now contains the 
 following:

 # Do not automatically create an index when a document is loaded, and do
 # not automatically index unknown (unmapped) fields:
 action.auto_create_index: false
 index.mapper.dynamic: false

 Therefore, I cannot automatically create an index during a load (which 
 would then create a useless index without any of the analyzers and mappings 
 I've carefully crafted). And I cannot get ES to automatically create a new 
 field; this is very helpful when someone uses a low-level tool such as 
 curl, and misspells a field name; ES will no longer create, for example, 
 the givveName field when it should have been givenName.

 Brian

 On Tuesday, February 25, 2014 8:57:30 AM UTC-5, Frederic Meyer wrote:

 Hey there.

 Nearly one year after this initial post, I'm running into the exact same 
 issue, even though ES is now released (1.0).

 Has anybody found a proper solution within ES? I've spent like 1 hour 
 searching for this, without any luck.

 The only ugly workaround that I can think of right now is deal with a 
 fall back language at the data level i.e. before sending documents to be 
 indexed by ES.

 Thanks.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0fdb30b-d63a-4679-899a-36b45c788d8d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11

2014-02-25 Thread Benoît
Thank you Binh Ly,

On Tuesday, February 25, 2014 4:25:59 PM UTC+1, Binh Ly wrote:

 This is a known issue and will be fixed shortly. For now, what you can do 
 is run _optimize on all your indexes and set max_num_segments to 1, like 
 below. Note that this may take a while depending on the size of your 
 indexes.

 http://localhost:9200/_optimize?max_num_segments


 Your suggestion confirm what Jörg Prante said here 
https://groups.google.com/d/msg/elasticsearch/7mrDhqe6LEo/3gjOJka85OYJ
This is a problem with Lucene segment of version 3.x

I have around 1T of index, so I'm not really happy to run optimize, I will 
try on one of the smallest index.

If I stop all the request to the statistics API, I should see the load 
decreasing ?

Regards.


Benoît

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/99b02139-3d02-434d-a3e4-724b876c3a27%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [Book] Mastering ElasticSearch Review

2014-02-25 Thread Ivan Brusic
I purchased the book when Packt was having a $5 ebook sale a couple of
months ago. Did not really need the book, but it was cheap and I wanted to
support the author who has posted on the mailing list in the past.

Overall a decent book, recommended for anyone getting started with
Elasticsearch. My main complaint was that book went through each
configuration parameter in detail, resulting in a lot of bloat. Some might
consider such an approach a good thing.

Ivan





On Mon, Feb 24, 2014 at 6:55 PM, Nick Wood nwood...@gmail.com wrote:

 I read Elasticsearch Server several months ago and found it helpful.  But
 I'm hesitant to get any more books that aren't focused on 1.x - hopefully
 we'll see some pop up soon (nudge nudge).

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5f02e712-b071-4cd8-9bfd-1585a1c6e4ca%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCsHQNHRX1QwAYBk4r_bTSYnsg8xcwCmeDtiq96st2Oqw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: dumping index is slow as hell

2014-02-25 Thread joergpra...@gmail.com
Have you benchmarked your cluster? How many docs can you index per second
with bulk indexing?

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH%3DEDqwhKC%3DgZXOwvdf%2B6FJ%3DBOrLmNFpSuraX2-JcTbYA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat

2014-02-25 Thread joergpra...@gmail.com
Not sure, but maybe you have  jars with ES classes in the plugins folder
that went astray?

IIRC I saw these kind of errors and it was a plugin with dependencies that
were not compatible.

If that is your code you can hack on, last resort is printing the current
classpath in the log file...

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHA5QW2C4R_XQNc26PcxAuGgB_%2BR7ii3f_r3iEmeim88g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Compute TF/IDF across indexes

2014-02-25 Thread Luiz Guilherme Pais dos Santos
Hi,

I'm trying to search across multiple indexes and I couldn't understand the
result of the TF/TDF function. I didn't expect for the indexes where the
term is more frequent to get penalized.

Here follows an example:
https://gist.github.com/luizgpsantos/9216108

When searching for the term alice the document {_index: index2,
_type: type, _id: 1} got a score 0.8784157 while {_index: index1,
_type: type, _id: 1} got a score 0.4451987.

In my use case I got one index about sports and another about celebrities
and when I search for a celebrity documents across sports and celebrities
indexes, results from sports index tend to appear in first place due to the
explanation above (we have few celebrities documents in sports index). But
the point is that when searching for a celebrity I would expect results
from the celebrity index.

Is there any way to calculate the score not penalizing indexes where the
frequency of a term is higher?

Cheers,

-- 
Luiz Guilherme P. Santos

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Compute TF/IDF across indexes

2014-02-25 Thread Ivan Brusic
I have never tried or looked at the code, but off the top of my head
perhaps the DFS query type would work:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch

Since the DFS query type calculates the TF/IDF values based on the values
in each individual shard, perhaps it ignores which index the shard belongs
to. Easy to test.

If not, the solution might be tricky. You can eliminate term length
normalization, but your issue is with the IDF. You can create your own
Similarity, but the best you can do is ignore the IDF, which probably would
not be ideal.

Ultimately, you can try script based scoring. The TF/IDF values are exposed
to the scripts, so you can try to apply some type of normalization
yourself. Kludgy and it would impact performance.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

Hopefully DFS queries would work or someone else has a better idea!

Cheers,

Ivan


On Tue, Feb 25, 2014 at 12:00 PM, Luiz Guilherme Pais dos Santos 
luizgpsan...@gmail.com wrote:

 Hi,

 I'm trying to search across multiple indexes and I couldn't understand the
 result of the TF/TDF function. I didn't expect for the indexes where the
 term is more frequent to get penalized.

 Here follows an example:
 https://gist.github.com/luizgpsantos/9216108

 When searching for the term alice the document {_index: index2,
 _type: type, _id: 1} got a score 0.8784157 while {_index:
 index1, _type: type, _id: 1} got a score 0.4451987.

 In my use case I got one index about sports and another about celebrities
 and when I search for a celebrity documents across sports and celebrities
 indexes, results from sports index tend to appear in first place due to the
 explanation above (we have few celebrities documents in sports index). But
 the point is that when searching for a celebrity I would expect results
 from the celebrity index.

 Is there any way to calculate the score not penalizing indexes where the
 frequency of a term is higher?

 Cheers,

 --
 Luiz Guilherme P. Santos

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgREX6svvcso%2Bf6VqW2Y6-DvBnWUtO5tVod8GAX2b0Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Removing elasticsearch logs

2014-02-25 Thread Binh Ly
There is currently discussion around this:

https://github.com/elasticsearch/elasticsearch-marvel/issues/95

But in the meantime, try this to see if it helps:

https://github.com/elasticsearch/curator

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4e55016-a101-403f-b93d-74bd976eadc1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Put mapping documentation -- What options are available? Specifically, how to store a property but without indexing it?

2014-02-25 Thread Daniel Winterstein
Dear Hariharan, Alex, Luke,

My apologies. You're quite right. The information is there -- I just
didn't read far enough down.

Thank you for your help  persistence.

Best regards,
 - Daniel

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEmLStnHQCUuMPJHhbcoq8_iQgFX%3D22t9%3DS9gOwWC7C1OtDToA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: ES 1.0.0 Source filtering using the Java API

2014-02-25 Thread Dan
Thanks for your response.I  can't see the method 'setFetchSource' in the 
Client class. Are you sure that is in 1.0.0?

On Tuesday, February 25, 2014 8:41:37 PM UTC, Binh Ly wrote:

 Yes you can use the client.setFetchSource() method:

   SearchResponse response = client.prepareSearch(index)
 .setFetchSource(new String[] {field1, field2}, null)
 .execute()
 .actionGet();



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d9b1aa0b-80f8-49a8-9f50-5e54db05562c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Sorting date fields

2014-02-25 Thread Adrian
Hi all,

I have a question on how sorting during queries works in elasticsearch. 

I have an index with a custom date format field, on which the sort is applied.
When quering the index for a given keywork, results are provided with the given
sort.

However, I've observed that some documents are not present in the result set. I
would have expected these results to be part of the result set as it would be
in relational systems using the SQL ORDER BY statement.  I've verified that
these missing documents are covered by the query using the explain api.
According to the documentation, score computation ist not performed when using
sorts on fields.

Maybe someone can provide more information on how sorting is done? 

I am using Elasticsearch 1.0.0RC1 on debian whezzy with openjdk7-jdk.

Thanks, Adrian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20140225212301.GA22436%40server1.850nm.net.
For more options, visit https://groups.google.com/groups/opt_out.


Re: scalability and creating 1 index per user

2014-02-25 Thread Nikolas Everett
On Tue, Feb 25, 2014 at 4:46 PM, ESUser neerav...@gmail.com wrote:

 Hi All,
 I am exploring elastic search to create one index per user instead of one
 big index for all the users. Each index would be about 6G.
 I am wondering if anyone has tried it and how would it scale?

 I couldn't find that elastic search has limit on maximum number of
 indices. Is it safe/recommended to have say 20K indices for 20K users?
 Would it architecture scale well?


I'm running 1107 indexes right now.  Some of the cluster actions are a bit
slower then I'd like but I think that is better in 1.0.  I don't think it'd
work well an order of magnitude larger but I could be wrong.


 Also, if start with say a 5 nodes cluster now, and add more nodes as I
 need them, does ES redistributes its shards every time I add new nodes? How
 newly added nodes are utilized in a cluster?


It'll smooth the shards out across the new nodes.  There is configuration
for how many concurrent moves can take place and how much bandwidth is ok
per move.  The defaults are a bit slow especially if you have fast network
and disks.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Atp5piP5bOYqaxnMPw6iW7yKTY8%3DxQhmO56GCcUsa_A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: scalability and creating 1 index per user

2014-02-25 Thread Mark Walkom
20K is a lot of indexes, probably too many as ES will need to maintain
state about each of those in memory which could mean you have nothing left
for caching indexed data!
You might want to look at
http://www.elasticsearch.org/blog/customizing-your-document-routing/instead,
that way you can reduce your index count but still gain the same
usability outcome.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 26 February 2014 08:52, Nikolas Everett nik9...@gmail.com wrote:




 On Tue, Feb 25, 2014 at 4:46 PM, ESUser neerav...@gmail.com wrote:

 Hi All,
 I am exploring elastic search to create one index per user instead of one
 big index for all the users. Each index would be about 6G.
 I am wondering if anyone has tried it and how would it scale?

 I couldn't find that elastic search has limit on maximum number of
 indices. Is it safe/recommended to have say 20K indices for 20K users?
 Would it architecture scale well?


 I'm running 1107 indexes right now.  Some of the cluster actions are a bit
 slower then I'd like but I think that is better in 1.0.  I don't think it'd
 work well an order of magnitude larger but I could be wrong.


 Also, if start with say a 5 nodes cluster now, and add more nodes as I
 need them, does ES redistributes its shards every time I add new nodes? How
 newly added nodes are utilized in a cluster?


 It'll smooth the shards out across the new nodes.  There is configuration
 for how many concurrent moves can take place and how much bandwidth is ok
 per move.  The defaults are a bit slow especially if you have fast network
 and disks.

 Nik

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Atp5piP5bOYqaxnMPw6iW7yKTY8%3DxQhmO56GCcUsa_A%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZbwOJarNCT8wrEzi047V8GCP3mYfh_2X7MQOFrb4QbCg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Sorting date fields

2014-02-25 Thread joergpra...@gmail.com
ES loads the values of the fields to sort on into memory cache.

You should update to 1.0.0, maybe you hit a bug that has been fixed.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHo_N3LXo6NiGztL5%3D1GaqVG4QOTCX32OuAjYeOhGfFng%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Elasticsearch 1.0.0 is now GA

2014-02-25 Thread Ivan Brusic
I do not use quotes at all. Simply:

node.name: ${HOSTNAME}

-- 
Ivan


On Tue, Feb 25, 2014 at 7:56 AM, InquiringMind brian.from...@gmail.comwrote:

 I always start Elasticsearch from within my own wrapper script, es.sh.

 Inside this wrapper script is the following incantation:

 NODE_OPT=-D*es.node.name http://es.node.name*=$(uname -n | cut -d'.'
 -f1)

 This is verified to work on Linux, Mac OS X, and Solaris (at least).

 I then pass $NODE_OPT as a command-line argument to the elasticsearchstart-up 
 script.

 BTW, I seem to recall reading that the *es.* prefix on the 
 node.namevariable is no longer needed for 1.0 GA. But it still works fine, so 
 I have
 left it there.

 This has always worked since ES 0.19.4 (the very first version I installed
 and started using). I worked closely with our deployment engineer, and we
 settled on a set of wrapper scripts that let me start everything on my
 laptop in exactly the same way that it all starts on a production server.

 Brian


 On Tuesday, February 25, 2014 10:21:29 AM UTC-5, Tony Su wrote:

 One other issue.

 I have never been able to deploy an elasticsearch.yml which names the
 cluster node the same as the machine hostname despite the suggestions in
 another thread. It just won't work, and based on another thread I strongly
 suspect the underlying Java code implements single quotes instead of double
 quotes when evaluating the variable.

 So, because it's a unique variable that needs to be set on each machine,
 that part of the config won't allow simply pointing all nodes to the same
 config script.

 Is why, short of looking for the error in the Java code I've been looking
 at various simple and more enterprise tools that write individual config
 files to each node.

 Tony

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5af309d0-22b5-4809-907d-92b099b36632%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBr%2BakzzsYR3cd3UQ2dsTDD0iFSHoS0w7cgUz_fE7HNpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: ES 1.0.0 Source filtering using the Java API

2014-02-25 Thread Binh Ly
Hmmm, can please double-check. I can see it from the tests here:

https://github.com/elasticsearch/elasticsearch/blob/v1.0.0/src/test/java/org/elasticsearch/search/source/SourceFetchingTests.java

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0676082-5638-4f16-b5b7-76fb42ac2a5e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Sorting date fields

2014-02-25 Thread Adrian
On Tue, Feb 25, 2014 at 11:11:13PM +0100, joergpra...@gmail.com wrote:

Jörg,

 ES loads the values of the fields to sort on into memory cache.

Yes, I've read that - is it known when these caches are flushed?

 You should update to 1.0.0, maybe you hit a bug that has been fixed.

I'll do that. I am just wondering if I am missing something .. 

Best regards, Adrian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20140225222829.GB22436%40server1.850nm.net.
For more options, visit https://groups.google.com/groups/opt_out.


Re: ES 1.0.0 Source filtering using the Java API

2014-02-25 Thread Dan
Yes, I can see it. Thanks.


 On 25 Feb 2014, at 22:23, Binh Ly binhly...@yahoo.com wrote:
 
 Hmmm, can please double-check. I can see it from the tests here:
 
 https://github.com/elasticsearch/elasticsearch/blob/v1.0.0/src/test/java/org/elasticsearch/search/source/SourceFetchingTests.java
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a0676082-5638-4f16-b5b7-76fb42ac2a5e%40googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9E1DEA66-32A9-4AFC-B0EB-C5507A72B266%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: upgrade to elasticsearch 1.0 now ClassCastException: class ElasticSearch090PostingsFormat

2014-02-25 Thread Kevin J. Smith
Many, many, way to many, hours later it came down to what everyone was 
suggesting was the problem in the first place: an old elasticsearch jar 
sitting in an abandoned directory but still scanned by tomcat's class 
loader.

Thanks for your help.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1d2f4fba-7ae9-46ce-8ea1-d05ca53e3357%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Sorting date fields

2014-02-25 Thread joergpra...@gmail.com
For the cache, see

http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/index-modules-fielddata.html

By default, the field cache size is unbounded, and does not expire. For
sort, it means that each field to sort is examined, all values of the field
are loaded, so the in-memory sorting can take place. It's exactly the same
what Lucene is executing.

With the default settings of the field cache, sort is working alright
(unless the field values will exceed the available memory)

Maybe you can set up an example of your sort  as a demo, so that the error
can be reproduced?

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF4zOnB5TJddf3EZTgGtDqXSBzJAHpZAPo8biUE35wGQg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


copy_to objects?

2014-02-25 Thread asanderson
Does copy_to work with objects?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec31c050-3414-4747-ba77-7e25c7418b88%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


EsRejectedExecutionException when searching date based indices.

2014-02-25 Thread Alex Clark
 

Hello all, I’m getting failed nodes when running searches and I’m hoping 
someone can point me in the right direction.  I have indices created per 
day to store messages.  The pattern is pretty straight forward: the index 
for January 1 is messages_20140101, for January 2 is messages_20140102 
and so on.  Each index is created against a template that specifies 20 
shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have 
recently upgraded to ES 1.0.

When I search for all messages in a year (either using an alias or 
specifying “messages_2013*”), I get many failed nodes.  The reason given 
is: “EsRejectedExecutionException[rejected execution (queue capacity 1000) 
on 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924]”).
  
The more often I search, the fewer failed nodes I get (probably caching in 
ES) but I can’t get down to 0 failed nodes.  I’m using ES for analytics so 
the document counts coming back have to be accurate. The aggregate counts 
will change depending on the number of node failures.  We use the Java API 
to create a local node to index and search the documents.  However, we also 
see the issue if we use the URL search API on port 9200.

If I restrict the search for 30 days then I do not see any failures (it’s 
under 1000 nodes so as expected).  However, it is a pretty common use case 
for our customers to search messages spanning an entire year.  Any 
suggestions on how I can prevent these failures?  

Thank you for your help!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Need help with a large cluster restart.

2014-02-25 Thread Search User
I have 20 ES data nodes and 10 master nodes in my cluster. I have 6 minimum 
master nodes for the cluster to function. I wanted to know if any one knows 
of a correct way to restart a large cluster. I see different results on 
each cluster restart. Some times, some of the shards are in Unassigned 
state and they are stuck in that state. Some times the shards are getting 
re-allocated. So far, I am always doing a Full Cluster restart. All I want 
to do is restart and come back to the state where it was before restart. I 
really appreciate any insight into this or a link to a documentation about 
the cluster restarts.

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e8a0df2c-38a6-47c0-ad7a-113306733e09%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Lost index metadata and overwriting pre-existing index files

2014-02-25 Thread Danny Berger
Hi - I recently experienced some surprising elasticsearch behavior and I'd 
appreciate some verification on the whys behind what we saw. Basically, 
during a cluster restart we lost some index metadata causing those indices 
to not be realized and loaded from the data nodes (raw index files still 
existed on disk), then, before we realized that and had a chance to recover 
them, new incoming data caused the cluster to create new indices under the 
same names, completely overwriting the original, raw index data on disk 
(clearing out and losing a lot of data). If that's unclear or for further 
details, I've posted the scenario and straightforward steps to reproduce: 
https://github.com/dpb587/elasticsearch-lost-index.

These are my core questions...

1. Is it true that index metadata (sharding size, mapping, etc) will only 
ever be stored on master-capable nodes? Previously, my understanding of the 
master was that it was primarily responsible for managing cluster state and 
coordinating cluster balancing, not persisting index metadata. (I'm not 
arguing it doesn't necessarily make sense, just that I didn't realize 
cluster state included the index metadata)

2. Is there documentation on elasticsearch.org which more precisely defines 
the responsibilities of master and data nodes? The only vague references 
I've come across are 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-node.html,
 
the elasticsearch default configuration file, and various non-authoritative 
blog posts and Stack Overflow answers, none of which prompted me to realize 
data nodes would not hold their own metadata.

3. Is it true that elasticsearch (Lucene?) will overwrite existing data 
files without error or warning if the cluster is not aware of the index? If 
so, is there a way to disable that behavior to avoid accidental data loss 
due to misconfiguration (aside from the broad `action.auto_create_index` 
setting)? If not, is there anything else which would explain the behavior 
we saw?

Thank you for your time!

Danny

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9407e415-db8f-461d-b04f-027fda4f5c9c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Put mapping documentation -- What options are available? Specifically, how to store a property but without indexing it?

2014-02-25 Thread Ivan Brusic
Luke? :)


On Tue, Feb 25, 2014 at 1:09 PM, Daniel Winterstein 
daniel.winterst...@gmail.com wrote:

 Dear Hariharan, Alex, Luke,

 My apologies. You're quite right. The information is there -- I just
 didn't read far enough down.

 Thank you for your help  persistence.

 Best regards,
  - Daniel

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEmLStnHQCUuMPJHhbcoq8_iQgFX%3D22t9%3DS9gOwWC7C1OtDToA%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD%3Dk0htmXcEwXBBB4T%2BwqNAyA_fOz41DX5cinf3aYsQGg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Compute TF/IDF across indexes

2014-02-25 Thread Luiz Guilherme Pais dos Santos
Hi Ivan,

The DFS query then fetch worked very well!

Thank you!

Cheers,
Luiz Guilherme


On Tue, Feb 25, 2014 at 5:15 PM, Ivan Brusic i...@brusic.com wrote:

 I have never tried or looked at the code, but off the top of my head
 perhaps the DFS query type would work:
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch

 Since the DFS query type calculates the TF/IDF values based on the values
 in each individual shard, perhaps it ignores which index the shard belongs
 to. Easy to test.

 If not, the solution might be tricky. You can eliminate term length
 normalization, but your issue is with the IDF. You can create your own
 Similarity, but the best you can do is ignore the IDF, which probably would
 not be ideal.

 Ultimately, you can try script based scoring. The TF/IDF values are
 exposed to the scripts, so you can try to apply some type of normalization
 yourself. Kludgy and it would impact performance.


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

 Hopefully DFS queries would work or someone else has a better idea!

 Cheers,

 Ivan


  On Tue, Feb 25, 2014 at 12:00 PM, Luiz Guilherme Pais dos Santos 
 luizgpsan...@gmail.com wrote:

  Hi,

 I'm trying to search across multiple indexes and I couldn't understand
 the result of the TF/TDF function. I didn't expect for the indexes where
 the term is more frequent to get penalized.

 Here follows an example:
 https://gist.github.com/luizgpsantos/9216108

 When searching for the term alice the document {_index: index2,
 _type: type, _id: 1} got a score 0.8784157 while {_index:
 index1, _type: type, _id: 1} got a score 0.4451987.

 In my use case I got one index about sports and another about celebrities
 and when I search for a celebrity documents across sports and celebrities
 indexes, results from sports index tend to appear in first place due to the
 explanation above (we have few celebrities documents in sports index). But
 the point is that when searching for a celebrity I would expect results
 from the celebrity index.

 Is there any way to calculate the score not penalizing indexes where the
 frequency of a term is higher?

 Cheers,

 --
 Luiz Guilherme P. Santos

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgREX6svvcso%2Bf6VqW2Y6-DvBnWUtO5tVod8GAX2b0Bw%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
Luiz Guilherme P. Santos

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGLPTbZgwyoBARjwcg9v0sUsjuxw4m_6X1iFQqO6zTHaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Need help with a large cluster restart.

2014-02-25 Thread Mark Walkom
Some of these will help -
http://gibrown.wordpress.com/2013/12/05/managing-elasticsearch-cluster-restart-time/

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 26 February 2014 11:57, Search User feedwo...@gmail.com wrote:

 I have 20 ES data nodes and 10 master nodes in my cluster. I have 6
 minimum master nodes for the cluster to function. I wanted to know if any
 one knows of a correct way to restart a large cluster. I see different
 results on each cluster restart. Some times, some of the shards are in
 Unassigned state and they are stuck in that state. Some times the shards
 are getting re-allocated. So far, I am always doing a Full Cluster restart.
 All I want to do is restart and come back to the state where it was before
 restart. I really appreciate any insight into this or a link to a
 documentation about the cluster restarts.

 Thanks.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/e8a0df2c-38a6-47c0-ad7a-113306733e09%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZdhaHpnkh8E1OmD5gVzfKjac2b5sd0s2tNqHi4mUVYBA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Interesting question on Transaction Log record mutability

2014-02-25 Thread Yuri Panchenko
Hi guys,

If I turn off automatic indexing and refreshing, and continually execute 
partial updates on the same document (say 100 times), do the updates change 
the same record in the transaction log or will it create 100 changes?  The 
reason I'm curious is because when I ask ES to index (or refresh) after a 
batch of partial updates, will it try to index the same document 100 times 
or just once?  So efficiency seems to be important here.

My data structure is a Customer with lots of Transactions with each record 
containing a date, description, and dollar amount.  I would like to see if 
a denormalized data structure works here by keeping a list of transactions 
on the customer, then updating new transactions into the same customer 
record. But this would be very inefficient if the document would have to be 
reindexed as many times as the number of incoming partial updates.  I'm 
hoping I can control this by turning off indexing/refreshing and let ES 
update the same record in the Transaction log.  I understand that Lucene 
has immutable records, but that does not really mean that the Transaction 
log has to have immutability, right?

Thanks for any feedback/thoughts!!

Yuri


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fa6ba2b6-4e41-4470-811c-08e578c8c596%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Kibana: showing a ratio

2014-02-25 Thread Andrew Vine
Ok, I'll check it out

On Tuesday, 25 February 2014 00:17:20 UTC+2, Binh Ly wrote:

 Unfortunately not at the moment. But if you're up to it, you can probably 
 easily write a custom panel that will do this for you.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0a38341f-3c79-43b9-89ef-684b1e82c7e3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Compute TF/IDF across indexes

2014-02-25 Thread Ivan Brusic
Great, I am glad that it worked. I do not use multi-index searches, so I
was not sure if it would. Good to know that shards from different indices
can be aggregated with DFS queries.

-- 
Ivan


On Tue, Feb 25, 2014 at 6:04 PM, Luiz Guilherme Pais dos Santos 
luizgpsan...@gmail.com wrote:

 Hi Ivan,

 The DFS query then fetch worked very well!

 Thank you!

 Cheers,
 Luiz Guilherme


 On Tue, Feb 25, 2014 at 5:15 PM, Ivan Brusic i...@brusic.com wrote:

 I have never tried or looked at the code, but off the top of my head
 perhaps the DFS query type would work:
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch

 Since the DFS query type calculates the TF/IDF values based on the values
 in each individual shard, perhaps it ignores which index the shard belongs
 to. Easy to test.

 If not, the solution might be tricky. You can eliminate term length
 normalization, but your issue is with the IDF. You can create your own
 Similarity, but the best you can do is ignore the IDF, which probably would
 not be ideal.

 Ultimately, you can try script based scoring. The TF/IDF values are
 exposed to the scripts, so you can try to apply some type of normalization
 yourself. Kludgy and it would impact performance.


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

 Hopefully DFS queries would work or someone else has a better idea!

 Cheers,

 Ivan


  On Tue, Feb 25, 2014 at 12:00 PM, Luiz Guilherme Pais dos Santos 
 luizgpsan...@gmail.com wrote:

  Hi,

 I'm trying to search across multiple indexes and I couldn't understand
 the result of the TF/TDF function. I didn't expect for the indexes where
 the term is more frequent to get penalized.

 Here follows an example:
 https://gist.github.com/luizgpsantos/9216108

 When searching for the term alice the document {_index: index2,
 _type: type, _id: 1} got a score 0.8784157 while {_index:
 index1, _type: type, _id: 1} got a score 0.4451987.

 In my use case I got one index about sports and another about
 celebrities and when I search for a celebrity documents across sports and
 celebrities indexes, results from sports index tend to appear in first
 place due to the explanation above (we have few celebrities documents in
 sports index). But the point is that when searching for a celebrity I would
 expect results from the celebrity index.

 Is there any way to calculate the score not penalizing indexes where the
 frequency of a term is higher?

 Cheers,

 --
 Luiz Guilherme P. Santos

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgREX6svvcso%2Bf6VqW2Y6-DvBnWUtO5tVod8GAX2b0Bw%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.




 --
 Luiz Guilherme P. Santos

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGLPTbZgwyoBARjwcg9v0sUsjuxw4m_6X1iFQqO6zTHaQ%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDygVTcJwb9BcsC5_7zx5KC2q3FVWj%3DinEt1MjS%2Bp1ZZg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Relation Between Heap Size and Total Data Size

2014-02-25 Thread Umutcan
There is enough space on every machine. I looked in the logs and find 
out that org.apache.lucene.store.LockObtainFailedException: Lock obtain 
timed out: 
NativeFSLock@/ebs/elasticsearch/elasticsearch-0.90.10/data/elasticsearch/nodes/0/indices/logstash-2014.02.26/0/index/write.lock 
is what causes the shard fails to start.



On 02/25/2014 05:29 PM, Randy wrote:

Probably low on disc on at least one machine.  Monitor disc usage. Also look in 
the logs and find out what error you are getting. Report back.

Sent from my iPhone


On Feb 25, 2014, at 7:25 AM, Umutcan umut...@gamegos.com wrote:

Hi,

I created a Elasticsearch cluster with 4 instance. Elasticsearch 0.90.10 is 
running all of them. Heap size is 6 GB for all the instances, so total heap 
size is 24 GB. I have 5 shard for each index and each shard has 1 replica. A 
new index is created for every day, so all indices have nearly same size.

When total data size reaches around 100 GB (replicas are included), my cluster 
begins to  fail to allocate some of the shards (status yellow). After I delete 
some old indices and restart all the nodes, everything is fine (status is 
green). If I do not delete some data, status eventually turns red.

So, I am wondering that is there any relationship between heap size and total 
data size? Is there any formula to determine heap size based on data size?

Thanks,
Umutcan

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/530CB5FE.80203%40gamegos.com.
For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/530D9286.60300%40gamegos.com.
For more options, visit https://groups.google.com/groups/opt_out.


Histogram of high-cardinality aggregate

2014-02-25 Thread Mike Kaplinskiy
Hey folks,

Playing around with the aggregation API, I was wondering whether this is 
possible. Taking the example 
at 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html
 
, how would I get the histogram of the minimum price [not all prices] of 
all the products?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fee3c6f5-1341-4590-a682-bdc7bcdc595e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Text Categorization in ES

2014-02-25 Thread prashy
Hi All,

To be specific I want a query like :
Searching for Laptop will automatically give result for Dell, Sony, HP,
Lenevo, Samsung... as well. As lingo3g is used for clustering the documents
so it will store the reference for above terms as well.

For that I have installed Carrot2 and Lingo3g on top of ES.

So what should be my query wrt lingo3g to search the specified items. Or is
there anything else I have to do to make it work. 




--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Text-Categorization-in-ES-tp4050194p4050512.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1393399736105-4050512.post%40n3.nabble.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: EsRejectedExecutionException when searching date based indices.

2014-02-25 Thread David Pilato
You are mixing nodes and shards, right?
How many elasticsearch nodes do you have to manage your 7300 shards?
Why did you set 20 shards per index?

You can increase the queue size in elasticsearch.yml but I'm not sure it's the 
right thing to do here.

My 2 cents

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 26 févr. 2014 à 01:36, Alex Clark a...@bitstew.com a écrit :

Hello all, I’m getting failed nodes when running searches and I’m hoping 
someone can point me in the right direction.  I have indices created per day to 
store messages.  The pattern is pretty straight forward: the index for January 
1 is messages_20140101, for January 2 is messages_20140102 and so on.  Each 
index is created against a template that specifies 20 shards. A full year will 
give 365 indices * 20 shards = 7300 nodes. I have recently upgraded to ES 1.0.

When I search for all messages in a year (either using an alias or specifying 
“messages_2013*”), I get many failed nodes.  The reason given is: 
“EsRejectedExecutionException[rejected execution (queue capacity 1000) on 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924]”).
  The more often I search, the fewer failed nodes I get (probably caching in 
ES) but I can’t get down to 0 failed nodes.  I’m using ES for analytics so the 
document counts coming back have to be accurate. The aggregate counts will 
change depending on the number of node failures.  We use the Java API to create 
a local node to index and search the documents.  However, we also see the issue 
if we use the URL search API on port 9200.

If I restrict the search for 30 days then I do not see any failures (it’s under 
1000 nodes so as expected).  However, it is a pretty common use case for our 
customers to search messages spanning an entire year.  Any suggestions on how I 
can prevent these failures?  

Thank you for your help!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/AD2469D8-4910-4166-91BA-D98D67860BAC%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.