Re: Exception cause unwrapping ran for 10 levels
Jörg, Done, https://github.com/elasticsearch/elasticsearch/issues/4639 Today when I investigated this issue, and just do a query to the time stamp when the exceptions is happening, data were indexed though. The reason I query is that, we worry if there is no data index during that period exceptions are happening , thus data lost. Thank you. Jason On Tue, Jan 7, 2014 at 4:34 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Yes, it looks like two nodes do not agree about an update action and a version conflict is pinging between them, node1 and node4. Not sure if this happens while index recovery or while an update is executed, but it is definitely worth raising an issue at the Elasticsearch github to let the Elasticsearch core team have a look. It might be some kind of a deadlock. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGKNB-1KXab4eWhDnKpe4szdPsidEWq2his2j%3DfPwU7Zw%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itzoqWdujn713RK83ZZL4iGr19nY9nz34wbRtTKOSzcMNA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Sorting by deeply nested filters
Hi, I am trying to sort with a twice nested filter, this doesn't seem to work. My question is that is this even supposed to be possible? I can provide the query if necessary, but it is quite complicated and requires a bit of obfuscating. Sincerely, Vesa Marttila -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e13c168e-b6c4-42e3-a536-ed9310fc2500%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Query: Parents with at least x children of type y
Changing the parent doc should be prevented, because there may be new child_types added or old child_types may be removed and we want the child_types be independed from the parent_type. So it seems, that there is at the moment no way for doing suchs queries with elasticsearch? Thanks for helping. Am Dienstag, 7. Januar 2014 10:39:41 UTC+1 schrieb David Pilato: I would probably add a num_of_children field in parent doc and update it when a new child is added or removed. But I guess it depends on your actual use case! -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 7 janv. 2014 à 08:15, Alexander Stautner alexander...@gmail.comjavascript: a écrit : Sorry for bumping, but i need an answer, if it's posible to answer the question above with elasticsearch Am Donnerstag, 2. Januar 2014 15:22:32 UTC+1 schrieb Alexander Stautner: Hello, after some research without any results I have a question about parent/child relations. The case: I have a parent of type parent_type which has children of different types e.g. child_type_1, child_type_2, child_type_3. My Question is: Is there any possibility to get only the parents which have at least x children of type child_type_2 with an specific value in an attribute. e.g parent_type: family child_type_1: girl attribute:name child_type_2: boy attribute:name child_type_3: cat attribute:name And i want to have all families which have at least three girls with name Jane. Thank you for your help, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6622897e-3e72-4db4-b4a0-4d8555c077e8%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6ca23c8a-631b-4d3c-879e-69bb389eef06%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Order results by value in one of the array entries.
Hi, I'm trying to order the result of a query by a specified entry in a array. Here is a sample entry { product_name: product alfa, product_id: 4a86c92ccd26111d7ba0eada7da6a75af, description: This is a sample product, image_id: product_a.jpg, inventory: [ { warehouse: warehouse_a, stock: 99 }, { warehouse: warehouse_b, stock: 19 }, { warehouse: warehouse_c, stock: 99 } ] } If there were more products containing alfa, I would (for example) want to sort they by the stock of a warehouse. I'm currently using a query like: POST _search { query: { match: { product_name:{ query:alfa, type : phrase } } }, filter: { bool: { must: [ { term: { availability.warehouse: warehouse_a } } ] } } } I would like the results sorted by stock (for warehouse_a only) descending. Any ideas? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/01a7baad-40e3-40b3-8104-66910762b004%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Sorting by deeply nested filters
On Tuesday, January 7, 2014 12:15:42 PM UTC+2, Vesa Marttila wrote: Hi, I am trying to sort with a twice nested filter, this doesn't seem to work. My question is that is this even supposed to be possible? I can provide the query if necessary, but it is quite complicated and requires a bit of obfuscating. Sincerely, Vesa Marttila Just to add, the filter when used for queries works as desired, the problems only occur when using it in sorting. Vesa -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2537e5a8-7cb4-4d71-af44-5c7948793641%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: ElasticsearchHadoop Hive integration issue
Hi, The 'es.resource' you specified is incorrect - you need to specify both an index and a type - e.g. myIndex/products P.S. Are you using M1 or the current master - the latter should give a proper error (and message). Thanks, On 07/01/2014 9:48 AM, Badal Mohapatra wrote: Hi, I am trying to index data from hive table to elasticsearch and and using the latest elasticsearch-hadoop-master plugin. My elasticsearch version is 0.90.9 and hive version is hive-0.11.0. As per the documentation of elasticsearch-hadoop plugin (hive integration), I successfully created an external table with the below command /CREATE EXTERNAL TABLE es_products ( sku int,rating float, name string, type string, saleprice float, department string, manufacturer string, userid string, category_name string, query string) STORED BY 'org.elasticsearch.hadoop.hive.ESStorageHandler' TBLPROPERTIES('es.resource' ='products');/ Even though the external table is created I am not able to either insert data or even query the external table. When I do a /select * from es_products;/ I get the below exception. hive select * from es_products; OK Failed with exception java.io.IOException:java.lang.StringIndexOutOfBoundsException: String index out of range: -1 Time taken: 1.699 seconds Can someone please suggest what / where I am wrong! Kind Regards, Badal -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd63310c-dc07-4dc6-9354-69051a05da3f%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52CBDC15.6040307%40gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Hipchat Elasticsearch
Here are some related links, including a video of a talk: http://www.meetup.com/Elasticsearch-San-Francisco/events/141698772/ -- Ivan On Tue, Jan 7, 2014 at 1:43 AM, Ümit Seren uemit.se...@gmail.com wrote: Interesting read about elasticsearch in HipChat http://highscalability.com/blog/2014/1/6/how-hipchat-stores-and-indexes-billions-of-messages-using-el.html -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/15bcb5d7-b1c6-4499-b0de-041e308f083e%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA28GpcwC1bWJE%3DOGDyZiQAnsBUKea6DoVs2zvxRjY3pg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Upgrades causing Elastic Search downtime
Hello, We've upgraded Elastic Search twice over the last month and have experienced downtime (roughly 8 minutes) during the roll out. I'm not sure if it something we are doing wrong or not. We use EC2 instances for our Elastic Search cluster and cloud formation to manage our stack. When we deploy a new version or change to Elastic Search we upload the new artefact, double the number of EC2 instances and wait for the new instances to join the cluster. For example 6 nodes form a cluster on v 0.90.7. We upload the 0.90.9 version via our deployment process and double the number nodes for the cluster (12). The 6 new nodes will join the cluster with the 0.90.9 version. We then want to remove each of the 0.90.7 nodes. We do this by shutting down the node (using the plugin head), wait for the cluster to rebalance the shards and then terminate the EC2 instances. Then repeat with the next node. We leave the master node until last so that it does the re-election just once. The issue we have found in the last two upgrades is that while the penultimate node is shutting down the master starts throwing errors and the cluster goes red. To fix this we've stopped the Elastic Search process on master and have had to restart each of the other nodes (though perhaps they would have rebalanced themselves in a longer time period?). We find that we send an increase error response to our clients during this time. We've set out queue size for search to 300 and we start to see the queue gets full: at java.lang.Thread.run(Thread.java:724) 2014-01-07 15:58:55,508 DEBUG action.search.type[Matt Murdock] [92036651] Failed to execute fetch phase org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 300) on org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction$2@23f1bc3 at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:61) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) But also we see the following error which we've been unable to find the diagnosis for: 2014-01-07 15:58:55,530 DEBUG index.shard.service [Matt Murdock] [index-name][4] Can not build 'doc stats' from engine shard state [RECOVERING] org.elasticsearch.index.shard.IllegalIndexShardStateException: [index-name][4] CurrentState[RECOVERING] operations only allowed when started/relocated at org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:765) Are we doing anything wrong or has anyone experienced this? Thanks, Jenny -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b2328296-e9c9-4763-b61b-6ad2e145e59b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: score based on term frequency only
Great feature. However, it looks like it is only available in the master branch: https://github.com/elasticsearch/elasticsearch/issues/3772 -- Ivan On Tue, Jan 7, 2014 at 8:31 AM, Britta Weber britta.we...@elasticsearch.com wrote: You could also use a script as described here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html Cheers, Britta On Mon, Jan 6, 2014 at 2:13 AM, Ivan Brusic i...@brusic.com wrote: You could provide your own Similarity class as a plugin. Don't have any sample code in front of me, but it would be based of TFIDFSimilarity and you would basically needed to ignore the norms and other values. http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html The IDF portion could probably remain since it ranks the different terms in your query, not the score of each term. Cheers, Ivan On Sun, Jan 5, 2014 at 1:57 PM, Kevin S kevinste...@gmail.com wrote: I would like to score based entirely on term count. For example, given the following two documents: 1) { apple } 2) { apple apple } Searching apple ranks the first before the second. I wish to rank the second, in which the term occurs twice, with a higher score. Can someone please point me in the right direction for this? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1bb386ae-3ab5-4878-9d29-6462eaff14c7%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBwEy7UgdqYQmX3EuO71TwSAMCnDp7hdSkcvxLwH5jMJw%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALhJbBiFtgJOfhBqXkS-%2B2YWnDy81j7c5jaSFEkG%3DVizqTpykg%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDAzNoZwdcquTqyB70Kpw4DSPSPZr2fe%3DCUbMORv1pbUQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Scoring and Relative-ness based on Business Rules
What is the best way to make products more relevant outside of the default scoring? I have an unknown number of business rules that will dictate a document's relativity. Meaning, if one document scores higher than the other, it's possible that the other document will be more relevant to the user. Given two products with similar titles but different attributes and the query ipad, I'd like to promote one over the other: { title_simple: iPad Mini Case, description_simple: Royce Leather iPad Mini Case:..., category: Computers Accessories, brand : Royce Leather, id: 794809052574 } { title_simple: Apple iPad mini (16GB, Wi-Fi + Sprint 4G, White), description_simple: iPad mini features a beautiful 7.9\ display..., category: Electronics, brand : Apple, id: 885909689712 } A simple query scores the iPad case high: { query: { term: { title_simple: ipad }} } But business rules dictate that the actual iPad be on the top. I can run a filter or score based on the attribute or brand to get what I'm looking for: { query: { function_score: { query: { term: { title_simple: ipad } }, functions : [{ filter : { term: { category_simple: electronics } }, boost_factor : 2 }] } } } But building a bunch of these isn't scalable or reasonable. I have an unknown number of these and that number will continue to grow. Some other examples: - query xbox should promote consoles over games - query macbook should promote Apple computers over macbook sleeves - query Apple should promote Apple products and not food Building a thousand queries based on functions filters is unreasonable and unscalable. Some possible solutions I've considered: - building a lookup table that will build the filter portion of the query (this could get unmaintainable) - Including a pre-calculated score in the document (unfortunately, doesn't work on a per query basis, as the score may change based on the user's needs) - Extending the DefaultSimilary class (I'm not sure how this helps me in this scenario, though) What have other people done to solve these problems? Is there something else that I'm missing that could help? Here's a runnable gist - https://gist.github.com/dlmitchell/826e8fb7ca89bed30e4a/raw/613be2c202b26f5899bdcfeac714737beb49/sample_mapping.sh -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/70849d62-822a-4bb6-99f4-d9400d091fa9%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Too many open files
Hi, my model is quite slow with just about some thousands documents I realised that, when opening a node = builder.client(clientOnly).data(!clientOnly).local(local).node(); client = node.client(); from my Java program to ES with such a small model, ES automatically creates 10 sockets. Casually I have 10 shards (?). * Is this the expected behavior? * Can I reduce the number of ES shards dynamically to reduce the number of sockets or should I redeploy my ES install? * By opening other connections I finally get up to 200 simultaneous open sockets and, I am afraid, that, when fetching highlight information, some of the results are randomly being lost. Can this missing results be somehow as a consequence of a too large number of open sockets? Thanks for your pointers. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c0a4660-ef70-491d-998f-5ed73c4a9025%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Beta2 Java Client: java.nio.channels.UnresolvedAddressException
Hi, I'm having difficulty connecting with the Java client to 1.0.0.Beta2, the cluster is up and health, monitoring is fine using elasticsearch Head, elasticsearch HQ etc. This is the stack trace I am getting: https://gist.github.com/dav-rob/8304130 thanks, David. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6df4a88e-82da-4ef7-ac33-f514e4e50711%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
incrementally scaling ES from the small data
Hi, I plan to start with a small project, initially, with small data (few thousands records) to learn ES response, and, incrementally, increase data and resources on demand, to the big data, taking advantage of ES scalability. Is there a document describing such a strategy, i.e.: * how to properly configure an small basic deployment with good performance on low resources? (shards, nodes, clusters...) * then, how to keep detecting the necessity of incrementally adding resources, shard/nodes..., according to increases on data load? All docs that I find on scaling ES starts on deployments with m/billions of records. Alternatively, any advice on properly configuring ES for the small data? (as a starting point?) Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/79926cfe-4365-4a34-895b-70835ae895dc%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Too many open files
I guess, my problem with excessive number of sockets could be also a consequence of having 2 JVM running ES, one embedded in Tomcat, a second embedded in other Java app, as said here: https://groups.google.com/forum/?hl=en-GB#!topicsearchin/elasticsearch/scale%7Csort:date%7Cspell:true/elasticsearch/m9IWpGzoLLE Is there any experience running an unique embedded ES (as jar files), for example, in tomcat's lib folder, being consumed by several tomcat apps and other standalone apps in different JVMs? Any opinion on this configuration as an starting point? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ba7b377-9b66-4d8b-ad65-de362318f9f2%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Scoring and Relative-ness based on Business Rules
I think you will find that for small documents, that aren't actually documents at all, but really a mass of data points, such as a product library, you won't even use the built in scoring at all. The built in scoring works well for books and articles (long works of text). For a product library, you will use an array of custom boosts through the function score query. The key is to get all those data points in your documents so that you can boost on matches. For example, with xbox, you could have a keywords field that includes xbox just for consoles. Maybe Xbox is the title of the product while games just have Xbox listed as their console compatibility. Only matches in the titles will score higher. For the macbook, you could have an accessories flag where items flagged as an accessory receive a negative boost. For Apple food vs. Apple products, you can use sales data or user history. The key to having relevancy that works for your organization is by providing all the data points to elasticsearch to base its decisions. For products, your best solution is a big old set of constant score queries wrapped in some wild function score queries. On Tuesday, January 7, 2014 12:36:43 PM UTC-5, David Mitchell wrote: What is the best way to make products more relevant outside of the default scoring? I have an unknown number of business rules that will dictate a document's relativity. Meaning, if one document scores higher than the other, it's possible that the other document will be more relevant to the user. Given two products with similar titles but different attributes and the query ipad, I'd like to promote one over the other: { title_simple: iPad Mini Case, description_simple: Royce Leather iPad Mini Case:..., category: Computers Accessories, brand : Royce Leather, id: 794809052574 } { title_simple: Apple iPad mini (16GB, Wi-Fi + Sprint 4G, White), description_simple: iPad mini features a beautiful 7.9\ display..., category: Electronics, brand : Apple, id: 885909689712 } A simple query scores the iPad case high: { query: { term: { title_simple: ipad }} } But business rules dictate that the actual iPad be on the top. I can run a filter or score based on the attribute or brand to get what I'm looking for: { query: { function_score: { query: { term: { title_simple: ipad } }, functions : [{ filter : { term: { category_simple: electronics } }, boost_factor : 2 }] } } } But building a bunch of these isn't scalable or reasonable. I have an unknown number of these and that number will continue to grow. Some other examples: - query xbox should promote consoles over games - query macbook should promote Apple computers over macbook sleeves - query Apple should promote Apple products and not food Building a thousand queries based on functions filters is unreasonable and unscalable. Some possible solutions I've considered: - building a lookup table that will build the filter portion of the query (this could get unmaintainable) - Including a pre-calculated score in the document (unfortunately, doesn't work on a per query basis, as the score may change based on the user's needs) - Extending the DefaultSimilary class (I'm not sure how this helps me in this scenario, though) What have other people done to solve these problems? Is there something else that I'm missing that could help? Here's a runnable gist - https://gist.github.com/dlmitchell/826e8fb7ca89bed30e4a/raw/613be2c202b26f5899bdcfeac714737beb49/sample_mapping.sh -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/48fb3984-a23c-4d95-aa34-e8e67dce8df9%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Transport Client hangs in my web application during search.
I have a web application in which I create a Transport Client using Spring (singleton) and inject it into my service. When I receive a request in my controller, controller calls the service and service uses the transport client to execute the query and return the results. When I deploy this application in tomcat, I have the client created but when I execute the query, client hangs. If I create the client for every request (in my service) and run the query, everything is fine. Can some one help me understand this behavior? Following is my code to create the Client object. Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name, mysearchcluster).put(client.transport.sniff, true).build(); Client client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress(10.150.200.101, 9300)); Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c846ec4-15c5-4c6f-9e1c-6c56912cc2ee%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Scoring and Relative-ness based on Business Rules
Thanks for your answer. So, instead of relying on queries to pull out the right stuff, you're suggesting to model the documents to the queries. This suggests that there's a custom boost for every search term, which is what I was hoping to avoid, if only because of the impossible task of going through all our data and determining what to boost/not boost. This also implies that there's another key/value store of queries-to-boost keywords, which again could get costly to maintain. If I'm understanding you correctly, it would look similar to what I previously posted, but only with a larger (possibly dynamic) set of boost queries. Doing so is primarily a manual task - are there more automatic ways to build up relevancy, or even tools/processes that help? On Tuesday, January 7, 2014 11:50:40 AM UTC-8, Justin Treher wrote: I think you will find that for small documents, that aren't actually documents at all, but really a mass of data points, such as a product library, you won't even use the built in scoring at all. The built in scoring works well for books and articles (long works of text). For a product library, you will use an array of custom boosts through the function score query. The key is to get all those data points in your documents so that you can boost on matches. For example, with xbox, you could have a keywords field that includes xbox just for consoles. Maybe Xbox is the title of the product while games just have Xbox listed as their console compatibility. Only matches in the titles will score higher. For the macbook, you could have an accessories flag where items flagged as an accessory receive a negative boost. For Apple food vs. Apple products, you can use sales data or user history. The key to having relevancy that works for your organization is by providing all the data points to elasticsearch to base its decisions. For products, your best solution is a big old set of constant score queries wrapped in some wild function score queries. On Tuesday, January 7, 2014 12:36:43 PM UTC-5, David Mitchell wrote: What is the best way to make products more relevant outside of the default scoring? I have an unknown number of business rules that will dictate a document's relativity. Meaning, if one document scores higher than the other, it's possible that the other document will be more relevant to the user. Given two products with similar titles but different attributes and the query ipad, I'd like to promote one over the other: { title_simple: iPad Mini Case, description_simple: Royce Leather iPad Mini Case:..., category: Computers Accessories, brand : Royce Leather, id: 794809052574 } { title_simple: Apple iPad mini (16GB, Wi-Fi + Sprint 4G, White), description_simple: iPad mini features a beautiful 7.9\ display... , category: Electronics, brand : Apple, id: 885909689712 } A simple query scores the iPad case high: { query: { term: { title_simple: ipad }} } But business rules dictate that the actual iPad be on the top. I can run a filter or score based on the attribute or brand to get what I'm looking for: { query: { function_score: { query: { term: { title_simple: ipad } }, functions : [{ filter : { term: { category_simple: electronics } }, boost_factor : 2 }] } } } But building a bunch of these isn't scalable or reasonable. I have an unknown number of these and that number will continue to grow. Some other examples: - query xbox should promote consoles over games - query macbook should promote Apple computers over macbook sleeves - query Apple should promote Apple products and not food Building a thousand queries based on functions filters is unreasonable and unscalable. Some possible solutions I've considered: - building a lookup table that will build the filter portion of the query (this could get unmaintainable) - Including a pre-calculated score in the document (unfortunately, doesn't work on a per query basis, as the score may change based on the user's needs) - Extending the DefaultSimilary class (I'm not sure how this helps me in this scenario, though) What have other people done to solve these problems? Is there something else that I'm missing that could help? Here's a runnable gist - https://gist.github.com/dlmitchell/826e8fb7ca89bed30e4a/raw/613be2c202b26f5899bdcfeac714737beb49/sample_mapping.sh -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/79abb91e-1be3-430a-b23d-a1582fae525b%40googlegroups.com. For more options, visit
Any fix timeline for split brain issue: 2488
Hi, is there any timeline on a fix for https://github.com/elasticsearch/elasticsearch/issues/2488 ? thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/79fc8f45-08f5-4abc-9349-06b23debc3a2%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Match exact substring in not analyzed field
This is an interesting problem. Typically, my view of stop words is dim. I would prefer that the client side avoids searching on them if that is desired, rather than the engine ignores them. Then, phrase matching can work properly. And queries such as The Wall can look for just Wall(ignoring The as a stop word), but then the Google-like +The Wall can look for The Wall. Yeah, I know that ES is not Google; I only look to Google for ideas that are nice and for hints about their implementation based upon their external behavior. Then, your problem could be solved using a phrase query with no slop. Maybe your testMulti field is analyzed but no stop words are ignored. Or, maybe testMulti.raw is analyzed but with no stop words ignored. Either way, you'd have the full set of words indexed for a phrase query to quickly find the sub-match. At least, much, much more quickly than a grep-style wildcard search against a non-analyzed form of the field. I also used phrases within my own table-based synonym matching. Instead of using ES synonyms, I create a separate type with lists of synonyms. A query for a synonym is first directed to that type to fetch a list of synonyms; then an OR query is generated. This has proven to be fast enough. It has the benefit of allowing the synonyms to be updated with no changes to the 97-millon documents that are already indexed. And, synonyms can be phrases, for example: HUGE - VERY BIG. So now a synonym query for HUGE can find The Very Big Dog. Likewise, a synonym query for the phrase VERY BIG can find The Huge Dog. Really cool; just a matter of Java coding on the front end. And ES does the heavy lifting underneath. But I digress a little... Hope this helps. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5440531a-2ccc-4df1-9edb-422012f7dd3b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: incrementally scaling ES from the small data
As a really, really rough guide; Start with a small instance, 4-8G RAM (2-4G heap). Keep loading documents until things start to slow down (ie query/update responsiveness drops). Add a new node. Rinse and repeat. If you have one node there is no point using replicas as they have nowhere to go. You can easily add replicas later though so it's no big deal. Shards is a little harder, start with the standard/default of 8 shards and go from there. Using aliases can allow you to reindex your data later if you feel you may want to change this. You can monitor your cluster with a range of monitoring plugins - elasticHQ, kopf, elasticsearch-monitoring, bigdesk. Just search for them on github. As Boaz mentioned, it really does depend on what you are doing. Chances are you will go through all this and get to a point where you want to rebuild your cluster with all your gained knowledge! Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 8 January 2014 09:18, Boaz Leskes b.les...@gmail.com wrote: Hi Adolfo, The best way to scale depends on your data and how it behaves. You can watch this great talk by Shay about two use cases to get inspired: http://www.elasticsearch.org/videos/big-data-search-and-analytics/ Cheers, Boaz On Tuesday, January 7, 2014 8:13:18 PM UTC+1, Adolfo Rodriguez wrote: Hi, I plan to start with a small project, initially, with small data (few thousands records) to learn ES response, and, incrementally, increase data and resources on demand, to the big data, taking advantage of ES scalability. Is there a document describing such a strategy, i.e.: * how to properly configure an small basic deployment with good performance on low resources? (shards, nodes, clusters...) * then, how to keep detecting the necessity of incrementally adding resources, shard/nodes..., according to increases on data load? All docs that I find on scaling ES starts on deployments with m/billions of records. Alternatively, any advice on properly configuring ES for the small data? (as a starting point?) Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d444d6f-fa0d-4567-a46b-538ea9b379f9%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZRacXqWCg56kFvjYsf1_cDxLT4Drhdbk6jFL5_Q1EekA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: incrementally scaling ES from the small data
Thanks both for your comments. Shards is a little harder, start with the* standard/default of 8 shards*and go from there. * This is the point that is confusing me the most. For a very small initial deployment, with a few thousand docs, why not using just define 1 shard with no replica? What criteria you used to set 8 shards as a default (BTW, defaults - in ES 0.90.5 - are 5 Successful Shards, 5 Unassigned Shards, is not it?). * Suppose that you start with the smaller minimum setup: 1 cluster, 1 node, 1 shard, no replica, Will I be able to incrementally scale any of these settings up? And will I able also to scale any of these settings down after? (or will need to repopulate ES in any particular case). The idea is testing different configs. * In my current particular case, can I scale down my current 5 shards/1 replica (default 0.90.5 AFAIK) to 1 shard/no replica? And start from there? The reason I am concerned about this is that I see lot of sockets (maybe 200 hundreds on my system - 2 ES on different apps in same machine - and want to understand where they come from and how to allocate the optimum). I watched Shai's presentation yesterday but could no grasp this info. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e8b513f-42a0-45e7-b677-842876c2570b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: incrementally scaling ES from the small data
Elasticsearch uses consistent hashing, so you cannot change the number of shards for an index. If you can reindex data, then you can create a new index with a different number of shards and simply reindex. If your data is temporal in nature, you can create a new index per day/week/month and these new indices can have a different shard value. You can search against multiple indices even if they have different shard values. IMHO, shard values in the high single digits (5-10) is a great starting point. Even with a single node cluster, the default number of shards (5) should not cause any performance issues. Cheers, Ivan On Tue, Jan 7, 2014 at 4:47 PM, Adolfo Rodriguez pellyado...@yahoo.eswrote: Thanks both for your comments. Shards is a little harder, start with the* standard/default of 8 shards*and go from there. * This is the point that is confusing me the most. For a very small initial deployment, with a few thousand docs, why not using just define 1 shard with no replica? What criteria you used to set 8 shards as a default (BTW, defaults - in ES 0.90.5 - are 5 Successful Shards, 5 Unassigned Shards, is not it?). * Suppose that you start with the smaller minimum setup: 1 cluster, 1 node, 1 shard, no replica, Will I be able to incrementally scale any of these settings up? And will I able also to scale any of these settings down after? (or will need to repopulate ES in any particular case). The idea is testing different configs. * In my current particular case, can I scale down my current 5 shards/1 replica (default 0.90.5 AFAIK) to 1 shard/no replica? And start from there? The reason I am concerned about this is that I see lot of sockets (maybe 200 hundreds on my system - 2 ES on different apps in same machine - and want to understand where they come from and how to allocate the optimum). I watched Shai's presentation yesterday but could no grasp this info. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e8b513f-42a0-45e7-b677-842876c2570b%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDzAdvA1mNk%2BBUb-4N5mPayP9MCBXm%2BONsptYhnBOhFgA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Design practices for hosting multiple clusters/on-demand cluster creation?
While ES is still in a pre deployment stage at my job, there is growing interest in it. For various reasons, a monster cluster holding everyone's stuff is simply not possible. Individual projects require complete control over their data and the culture and security requirements here are such that doing something like always naming project 1's indexes PROJECT_1_something will not fly. We have a fairly beefy hadoop cluster hosting our content currently, along with a separate head node acting as the master. In this situation, is it simply a matter of starting up new processes on each node pointed at different configuration profiles and tying specific ports to specific projects/clusters? Basically, is there an established way to build on-demand clusters, given a set of resources? We'll layer something in front of it to deal with access control/etc. Thanks! -Josh -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ad2695f7-d1a2-4036-82b2-58bddf349681%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: incrementally scaling ES from the small data
Thanks Ivan, Elasticsearch uses consistent hashing, so you cannot change the number of shards for an index. So, I understand that, once the index is created, is only possible to scale, up and down, nodes, clusters and replicas. But no shards. Interesting. IMHO, shard values in the high single digits (5-10) is a *great starting point.* Even with a single node cluster, the default number of shards (5) should not cause any performance issues. I am worried about the 200 hundred established sockets in my machine (running 2 ES) since I suspect they are producing me some random data lose on getting highlighting information. And I was wondering if setting just 1 shard/0 replica on each ES would get rid of these unwanted sockets (?). Why is advised to start with (5-10) rather than with (1-0) * 2 ES ? Any reason? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/42c801d9-83ac-4096-b148-f973dadaeb1e%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: incrementally scaling ES from the small data
An increase of shards will not cause an increase in sockets used. Each node shard action is responsible for gather the responses from each shard at the file-level before sending the response back to the client. Since each shard is actually its own Lucene index, an increase of shards will increase metrics at the IO level, especially the number of open file descriptors. It is advised to start of with 5 because that would allow you to scale an index horizontally without needing to reindex. You can increase your cluster from 1 to 5 and each node will have a piece of the index instead of the entire index that. Beyond that number, you can distribute the index with more replicas. More shards increase availability IMHO. Ultimately you do not want large shards for performance reasons. -- Ivan On Tue, Jan 7, 2014 at 5:23 PM, Adolfo Rodriguez pellyado...@yahoo.eswrote: I am worried about the 200 hundred established sockets in my machine (running 2 ES) since I suspect they are producing me some random data lose on getting highlighting information. And I was wondering if setting just 1 shard/0 replica on each ES would get rid of these unwanted sockets (?). Why is advised to start with (5-10) rather than with (1-0) * 2 ES ? Any reason? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDcRQsnr_WONKAcu8QWiroHabhfD9spLKk2qcqatTfgrQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
How to index an existing json file
Hi, I am just starting with ElasticSearch, I would like to know how to index a simple json document books.json that has the following in it: Where do I place the document? I placed it in root directory of elastic search and in /bin folder.. {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda Jones”]}} $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json Warning: Couldn't read data from file books.json, this makes an empty POST. {error:MapperParsingException[failed to parse, document is empty],status:400} Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How to index an existing json file
The JSON file is used by the curl command, so in your example it should be in the same directory in which you executed the command (current directory). -- Ivan On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabdall...@gmail.com wrote: Hi, I am just starting with ElasticSearch, I would like to know how to index a simple json document books.json that has the following in it: Where do I place the document? I placed it in root directory of elastic search and in /bin folder.. {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda Jones”]}} $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json Warning: Couldn't read data from file books.json, this makes an empty POST. {error:MapperParsingException[failed to parse, document is empty],status:400} Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDg%3Du3HfBvKnQrCy6XEJ6knyrvx042j8kn7YZmMz96FhA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How to index an existing json file
Great, Do you know why I am getting {error:MapperParsingException[failed to parse]; nested: JsonParseException[Unrecognized token 'life': was expecting ('true', 'false' or 'null')\n at [Source: [B@5c9a9d06; line: 1, column: 35]]; ,status:400} data: {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda Jones”}]} On Tuesday, January 7, 2014 9:06:01 PM UTC-5, Ivan Brusic wrote: The JSON file is used by the curl command, so in your example it should be in the same directory in which you executed the command (current directory). -- Ivan On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabda...@gmail.comjavascript: wrote: Hi, I am just starting with ElasticSearch, I would like to know how to index a simple json document books.json that has the following in it: Where do I place the document? I placed it in root directory of elastic search and in /bin folder.. {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda Jones”]}} $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json Warning: Couldn't read data from file books.json, this makes an empty POST. {error:MapperParsingException[failed to parse, document is empty],status:400} Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d15fcdf-4a0f-4d92-9dd3-f07899d915fe%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
How many metadata fields exist of MP3 file ?
Hi all, I am wondering how many metadata fields of MP3 files exist when I post the mp3 file into ElasticSearch using the mapper-attachment. Because in Solr we can know the field information through the endpoint SOLR_HOST/update/extract?extractOnly=true, but in ElasticSearch are there any ways to get such informations? Except for the MP3 files, how about the doc files? I know the ElasticSearch use tika to support this operations, can you give me some example to fetch some special field of some special file format? Regards, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Too many open files
Happily, the problem of missing highlight records looks to be gone by making a config change. * Initially I had 2 ES in 2 different apps (a Tomcat and a standalone) configured equal (both listening for incoming TransportClients requests on port 9300 and both open with client(false)) and a third ES connecting to then opened with new TransportClient() to fetch highlighting info. It looks that this third ES was randomly loosing highlighting records. (?) * What I did to fix it was a configuration change to have only one client(false)) ES listening for TransportClients and 2 new TransportClient()s connecting to it. It looks this change fixes the issue which was some kind of coupling between both client(false)) ESs listening on port 9300. Regards -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fbe72b9f-eeac-4d2b-9545-6851352aa3d5%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: incrementally scaling ES from the small data
Thanks Ivan, makes sense. Still could not test how sockets relate to shards and why I automatically get 10 established sockets when opening a client: node = builder.client(clientOnly).data(!clientOnly).local(local).node(); client = node.client(); on default ES configuration, and many many more sockets after (up to 200), and how this number changes when increasing/decreasing number of shards, but happily I managed to fix the initial issue of highlighting info being randomly lost by a config change as described here: https://groups.google.com/d/msg/elasticsearch/3t6UL_vzM7o/TLnV2m2B1NAJ so sockets does not look an issue anymore. Regards. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8a65007d-1053-4842-9c6b-93564b3ec44f%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Transport Client hangs in my web application during search.
Does it show anything in the log? Perhaps try catch block on your code and set a query timeout. HTH /Jason On Wed, Jan 8, 2014 at 4:41 AM, Search User feedwo...@gmail.com wrote: I have a web application in which I create a Transport Client using Spring (singleton) and inject it into my service. When I receive a request in my controller, controller calls the service and service uses the transport client to execute the query and return the results. When I deploy this application in tomcat, I have the client created but when I execute the query, client hangs. If I create the client for every request (in my service) and run the query, everything is fine. Can some one help me understand this behavior? Following is my code to create the Client object. Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name , mysearchcluster).put(client.transport.sniff, true).build(); Client client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress(10.150.200.101, 9300)); Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c846ec4-15c5-4c6f-9e1c-6c56912cc2ee%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itxM795Xuo8tikF4oADgYH50R58Y8B0qwdMz4nU82koN3w%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How to index an existing json file
Start with a clean index: curl -XDELETE http://localhost:9200/books/; You probably have a bad mapping (some docs already indexed?) If you still have problems, please gist a full curl recreation. See http://www.elasticsearch.org/help/ -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 8 janv. 2014 à 03:10, ZenMaster80 sabdall...@gmail.com a écrit : Great, Do you know why I am getting {error:MapperParsingException[failed to parse]; nested: JsonParseException[Unrecognized token 'life': was expecting ('true', 'false' or 'null')\n at [Source: [B@5c9a9d06; line: 1, column: 35]]; ,status:400} data: {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda Jones”}]} On Tuesday, January 7, 2014 9:06:01 PM UTC-5, Ivan Brusic wrote: The JSON file is used by the curl command, so in your example it should be in the same directory in which you executed the command (current directory). -- Ivan On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabda...@gmail.com wrote: Hi, I am just starting with ElasticSearch, I would like to know how to index a simple json document books.json that has the following in it: Where do I place the document? I placed it in root directory of elastic search and in /bin folder.. {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda Jones”]}} $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json Warning: Couldn't read data from file books.json, this makes an empty POST. {error:MapperParsingException[failed to parse, document is empty],status:400} Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d15fcdf-4a0f-4d92-9dd3-f07899d915fe%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E9FE0784-B10E-48AD-9C46-45B44B1513B9%40pilato.fr. For more options, visit https://groups.google.com/groups/opt_out.
Re: Transport Client hangs in my web application during search.
Your code looks good to me. Don't create multiple client but only one for your whole application. As Jason wrote, look at logs. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 8 janv. 2014 à 07:40, Jason Wee peich...@gmail.com a écrit : Does it show anything in the log? Perhaps try catch block on your code and set a query timeout. HTH /Jason On Wed, Jan 8, 2014 at 4:41 AM, Search User feedwo...@gmail.com wrote: I have a web application in which I create a Transport Client using Spring (singleton) and inject it into my service. When I receive a request in my controller, controller calls the service and service uses the transport client to execute the query and return the results. When I deploy this application in tomcat, I have the client created but when I execute the query, client hangs. If I create the client for every request (in my service) and run the query, everything is fine. Can some one help me understand this behavior? Following is my code to create the Client object. Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name, mysearchcluster).put(client.transport.sniff, true).build(); Client client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress(10.150.200.101, 9300)); Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c846ec4-15c5-4c6f-9e1c-6c56912cc2ee%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itxM795Xuo8tikF4oADgYH50R58Y8B0qwdMz4nU82koN3w%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/FD569ACE-1811-4FEC-AFDC-7DA96A621B61%40pilato.fr. For more options, visit https://groups.google.com/groups/opt_out.
Re: Design practices for hosting multiple clusters/on-demand cluster creation?
You could look at chef cookbook: https://github.com/elasticsearch/cookbook-elasticsearch http://www.elasticsearch.org/tutorials/deploying-elasticsearch-with-chef-solo/ Does it help? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 8 janv. 2014 à 02:01, Josh Harrison hij...@gmail.com a écrit : While ES is still in a pre deployment stage at my job, there is growing interest in it. For various reasons, a monster cluster holding everyone's stuff is simply not possible. Individual projects require complete control over their data and the culture and security requirements here are such that doing something like always naming project 1's indexes PROJECT_1_something will not fly. We have a fairly beefy hadoop cluster hosting our content currently, along with a separate head node acting as the master. In this situation, is it simply a matter of starting up new processes on each node pointed at different configuration profiles and tying specific ports to specific projects/clusters? Basically, is there an established way to build on-demand clusters, given a set of resources? We'll layer something in front of it to deal with access control/etc. Thanks! -Josh -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ad2695f7-d1a2-4036-82b2-58bddf349681%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/535D6769-0469-4BF8-9840-C67FA81CFD89%40pilato.fr. For more options, visit https://groups.google.com/groups/opt_out.