More like this scoring algorithm unclear
Hi, I have a question about why the 'more like this' algorithm scores documents higher than others, while they are (at first glance) the same. What i've done is index wishlist-documents which contain 1 property: product_id, this property contains an array of product_id's (e.g. [1234, , , ]. What i'm trying to do is find similair wishlist for a given wishlist with id x. The MLT API seems to work, it returns other documents which contain at least 1 of the product_id's from the original list. But what is see is that, for example. i get 10 hits, the first 6 hits contain the same (and only 1) product_id, this product_id is present in the original wishlist. What i would expect is that the score of the first 6 is the same. However what i see is that only the first 2 have the same, the next 2 a lower score and the next 2 even lower. Why is this? Also, i'm trying to write the MLT API as an MLT query, but somehow it doesn't work. I would expect that i need to take the entire content of the original product_id property and feed is as input for the 'like_text'. The documentation is not very clear and doesn't provide examples so i'm a little lost. Hope someone can give some pointers. Thanks, Maarten -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e2827b2-5a21-4cff-b773-ebdd861c5972%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How many metadata fields exist of MP3 file ?
Have a look at https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376 You will see that mapper attachment reads: Metadata.DATE Metadata.TITLE Metadata.AUTHOR Metadata.KEYWORDS Metadata.CONTENT_TYPE Metadata.CONTENT_LENGTH Does it help? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxua...@gmail.com) a écrit: Hi all, I am wondering how many metadata fields of MP3 files exist when I post the mp3 file into ElasticSearch using the mapper-attachment. Because in Solr we can know the field information through the endpoint SOLR_HOST/update/extract?extractOnly=true, but in ElasticSearch are there any ways to get such informations? Except for the MP3 files, how about the doc files? I know the ElasticSearch use tika to support this operations, can you give me some example to fetch some special field of some special file format? Regards, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52cd0d7f.2eb141f2.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Re: ElasticsearchHadoop Hive integration issue
Hi Costin, Thanks for your kind reply. After specifying the type in es.resource I am now able to index. I am using M1, will try with master once indexing is done. Regards, Badal On Tuesday, 7 January 2014 16:21:01 UTC+5:30, Costin Leau wrote: Hi, The 'es.resource' you specified is incorrect - you need to specify both an index and a type - e.g. myIndex/products P.S. Are you using M1 or the current master - the latter should give a proper error (and message). Thanks, On 07/01/2014 9:48 AM, Badal Mohapatra wrote: Hi, I am trying to index data from hive table to elasticsearch and and using the latest elasticsearch-hadoop-master plugin. My elasticsearch version is 0.90.9 and hive version is hive-0.11.0. As per the documentation of elasticsearch-hadoop plugin (hive integration), I successfully created an external table with the below command /CREATE EXTERNAL TABLE es_products ( sku int,rating float, name string, type string, saleprice float, department string, manufacturer string, userid string, category_name string, query string) STORED BY 'org.elasticsearch.hadoop.hive.ESStorageHandler' TBLPROPERTIES('es.resource' ='products');/ Even though the external table is created I am not able to either insert data or even query the external table. When I do a /select * from es_products;/ I get the below exception. hive select * from es_products; OK Failed with exception java.io.IOException:java.lang.StringIndexOutOfBoundsException: String index out of range: -1 Time taken: 1.699 seconds Can someone please suggest what / where I am wrong! Kind Regards, Badal -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd63310c-dc07-4dc6-9354-69051a05da3f%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fdcdffc0-bffc-45b3-96dd-30e894c68677%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How to query custom rest handler in elastic search using Java api
Hi, I am not facing any issue with the NodesInfoAction or the custom endpoint code. The rest endpoint is working fine, if I curl to it using:curl -XGET ' localhost:9200/_mastering/nodes?prettyhttp://10.114.24.132:9200/_mastering/nodes?pretty '. I am trying to find out a way to do this from an embedded node. In other words, somthing like below: Node node = NodeBuilder.nodeBuilder().clusterName(elasticsearch).node(); Client client = node.client(); SearchResponse response = client.prepareSearch().setSearchType(/_mastering/nodes). setQuery(QueryBuilders.queryString()). execute().actionGet(); P.S. the code snippet doesn't actually work. But I want to query the /_mastering/nodes through Java api. On Friday, 3 January 2014 13:35:13 UTC+5:30, Jörg Prante wrote: You have wrapped a NodesInfoAction, so all you have to do is NodesInfoResponse response = client.admin().cluster(). prepareNodesInfo().all().execute().actionGet(); That is the Java API. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc517dbc-8dfc-4b8e-a9b3-567d87cc734c%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Replicating one cluster to another cluster
First and most important, the good news: ES 1.0.0.Beta2 has snapshot/restore feature in place so it should be easy to snapshot and restore the result back to a target cluster. The snapshots are also incremental. Second, there are also news for the knapsack plugin. In the next knapsack plugin version due this week, a full copy from cluster1 to cluster2 will be as simple as curl -XPOST 'http://cluster1node:port1 /_export/copy?cluster=cluster2namehost=cluster2nodeport=port2' Limitations will be that you have knapsack plugin installed at cluster1node, the same JVM version in cluster1 and cluster2, same ES version in cluster1 and cluster2, and all your indexes have stored fields, preferably the _source field. Also, cluster1 must not modify the indexes while the _export/copy is running, or cluster2 may have different data (there is no inherent locking). In the new knapsack export version, you will be able to use arbitrary ES queries to select subsets of the cluster data to copy, so only the hits of a query can be transferred. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHn%2BLA-BHeZTzxr6C2w4g7ULqWLHpr6gw6zstWptmDt4g%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Unique Count in aggregations
Haven't tried the aggregations module. But if what you want are unique terms, I think you can do that using Term Facets as well. Also, in case you use Terms Facet, you will have to select such a size that will ensure that ES returns all the terms and does not discard some when size is lesser than the number of unique terms. Vaidik Kapoor vaidikkapoor.info On 8 January 2014 14:41, Konstantinos Zacharakis kzach...@gmail.com wrote: Hello, I would like to ask about the support of unique terms in aggregations. Shay had mentioned in the issue #1044https://github.com/elasticsearch/elasticsearch/issues/1044 that once the aggregation framework is done you plan to add this new feature? Since aggregations are here since Beta2, how close in your roadmap is the unique terms support? Should we expect that in 1.0.0 Release? Kind Regards Kostas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24ca975a-55f4-41ee-8d65-b9e65642eb74%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5neMVd4Cgm9g9W3ePvNXg%3DhRHrzoE5mj_QaT8CTEXVByw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Unique Count in aggregations
Hi Vaidik, This method is fine when the term cardinality is low and can be also achieved using the aggregations framework. However when cardinality is high, the memory footprint will be also high and for sure not so safe. On Wednesday, 8 January 2014 11:15:22 UTC+2, Vaidik Kapoor wrote: Haven't tried the aggregations module. But if what you want are unique terms, I think you can do that using Term Facets as well. Also, in case you use Terms Facet, you will have to select such a size that will ensure that ES returns all the terms and does not discard some when size is lesser than the number of unique terms. Vaidik Kapoor vaidikkapoor.info On 8 January 2014 14:41, Konstantinos Zacharakis kzac...@gmail.comjavascript: wrote: Hello, I would like to ask about the support of unique terms in aggregations. Shay had mentioned in the issue #1044https://github.com/elasticsearch/elasticsearch/issues/1044 that once the aggregation framework is done you plan to add this new feature? Since aggregations are here since Beta2, how close in your roadmap is the unique terms support? Should we expect that in 1.0.0 Release? Kind Regards Kostas -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24ca975a-55f4-41ee-8d65-b9e65642eb74%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3802839d-ac98-4557-abf9-9509280f9c57%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
cassandra river plugin installation issue
I have downloaded river from: https://github.com/eBay/cassandra-river change the settings in file: CassandraRiver.java as per my Cassandra setting: if (riverSettings.settings().containsKey(cassandra)) { @SuppressWarnings(unchecked) MapString, Object couchSettings = (MapString, Object) settings.settings().get(cassandra); this.clusterName = XContentMapValues.nodeStringValue(couchSettings.get(cluster_name), Test Cluster); this.keyspace = XContentMapValues.nodeStringValue(couchSettings.get(keyspace), topic_space); this.columnFamily = XContentMapValues.nodeStringValue(couchSettings.get(column_family), users); this.batchSize = XContentMapValues.nodeIntegerValue(couchSettings.get(batch_size), 1000); this.hosts = XContentMapValues.nodeStringValue(couchSettings.get(hosts), localhost:9160); this.username = XContentMapValues.nodeStringValue(couchSettings.get(username), USERNAME); this.password = XContentMapValues.nodeStringValue(couchSettings.get(password), P$$WD); } else { /* * Set default values */ this.clusterName = Test Cluster; this.keyspace = topic_space; this.columnFamily = users; this.batchSize = 1000; this.hosts = localhost:9160; this.username = USERNAME; this.password = P$$WD; } when i build maven using given command, mvn clean package in TEST mvn log it shows: --- T E S T S --- Running org.elasticsearch.river.cassandra.CassandraRiverIntegrationTest Configuring TestNG with: org.apache.maven.surefire.testng.conf.TestNG652Configurator@67eaf25d Exception in thread Queue-Indexer-thread-0 java.lang.NullPointerException at org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Exception in thread Queue-Indexer-thread-2 java.lang.NullPointerException at org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Exception in thread Queue-Indexer-thread-5 java.lang.NullPointerException at org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Exception in thread Queue-Indexer-thread-4 java.lang.NullPointerException i tried to do same after installing plugin in ES, it shows same error continuously. Anybody have any idea, whats going wrong with my setup?? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ef16f8fa-3145-43be-87ce-e8f53060938f%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: cassandra river plugin installation issue
CassandraRiver.java:149 contains: logger.info(Starting thread with {} keys, this.keys.rowColumnMap.size()); where rowColumnMap is a map, and may be empty thats why this error comes And at first i build that river module normally and install it as a plugin in ES. But when i ran script: curl -XPUT 'localhost:9200/_river/userinfo/_meta' -d '{ type : cassandra, cassandra : { cluster_name : Test Cluster, keyspace : topic_space, column_family : users, batch_size : 100, hosts : localhost:9160 }, index : { index : userinfo, type : users } }' same error comes in ES console, same as i have copied from maven console. and it also not fetching data from cassandra to ES. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0bc5a111-0562-46ff-a640-9b9241638887%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How many metadata fields exist of MP3 file ?
I would recommend not to use the mapper attachment but to manage that on your side. I removed for example mapper attachment from fsriver project to have a finer control. (see https://github.com/dadoonet/fsriver/issues/38) BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer? Could be nice to add it to fsriver as well. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxua...@gmail.com) a écrit: Thanks for the reply. Except for the six standard fields, I also want to know the extra field. For example, in Solr we can extract the album field in MP3 file. Does this function also support in ElasticSearch? I just tested: I post a mp3 file into ES, but the fields of the mp3 file contains only the six fields. Ideas? Thanks a lot. David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道: Have a look at https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376 You will see that mapper attachment reads: Metadata.DATE Metadata.TITLE Metadata.AUTHOR Metadata.KEYWORDS Metadata.CONTENT_TYPE Metadata.CONTENT_LENGTH Does it help? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit: Hi all, I am wondering how many metadata fields of MP3 files exist when I post the mp3 file into ElasticSearch using the mapper-attachment. Because in Solr we can know the field information through the endpoint SOLR_HOST/update/extract?extractOnly=true, but in ElasticSearch are there any ways to get such informations? Except for the MP3 files, how about the doc files? I know the ElasticSearch use tika to support this operations, can you give me some example to fetch some special field of some special file format? Regards, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52cd2707.1190cde7.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Re: cassandra river plugin installation issue
1 change which i have made in that cassandra-river project is to change the casandra jar version from 1.3 to 2.0.3 in pox.xml as i am using Cassandra 2.0.4 Any idea whats going wrong? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f8867fd-5f92-47a5-bf6b-5f4b2f5306ee%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: cassandra river plugin installation issue
So probably CassandraCFData cassandraData = db.getCFData(columnFamily, start, 1000); did not get any data from Cassandra? Never played with this plugin either Cassandra so I'm afraid I can't help more here! -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 11:21:26, shamsul haque (shams...@gmail.com) a écrit: CassandraRiver.java:149 contains: logger.info(Starting thread with {} keys, this.keys.rowColumnMap.size()); where rowColumnMap is a map, and may be empty thats why this error comes And at first i build that river module normally and install it as a plugin in ES. But when i ran script: curl -XPUT 'localhost:9200/_river/userinfo/_meta' -d '{ type : cassandra, cassandra : { cluster_name : Test Cluster, keyspace : topic_space, column_family : users, batch_size : 100, hosts : localhost:9160 }, index : { index : userinfo, type : users } }' same error comes in ES console, same as i have copied from maven console. and it also not fetching data from cassandra to ES. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0bc5a111-0562-46ff-a640-9b9241638887%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52cd2806.109cf92e.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Re: cassandra river plugin installation issue
ok, Thanks for pointing this. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69213f6c-0e23-4a8d-bbf8-f9423d7200b3%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks
Dear all: I insert 1 logs to elasticsearch, each log is about 2M, and there are about 3000 keys and values. when i insert about 2, it used about 30G memory, and then elasticsearch is very slow, and it's hard to insert log. Could someone help me how to solve it? Thanks very much. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f57de01f-7c63-4d88-9bcc-80daf7cc6a1d%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Order results by value in one of the array entries.
Hi Jun, Thanks for your reply. Im not sure how I can get that to work. In my project I need to only boost/order by the stock of warehouse_a, how do I use only the value for that entry in the array? Thanks Johan On Wednesday, January 8, 2014 4:35:50 AM UTC, Jun Ohtani wrote: Hi Johan, You try to use script based sorting. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_script_based_sorting Or the function score query. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_script_score I hope this helps. Regards, Jun Ohtani joh...@gmail.com javascript: blog : http://blog.johtani.info twitter : http://twitter.com/johtani 2014/01/07 19:45、Johan E joha...@gmail.com javascript: のメール: Hi, I'm trying to order the result of a query by a specified entry in a array. Here is a sample entry { product_name: product alfa, product_id: 4a86c92ccd26111d7ba0eada7da6a75af, description: This is a sample product, image_id: product_a.jpg, inventory: [ { warehouse: warehouse_a, stock: 99 }, { warehouse: warehouse_b, stock: 19 }, { warehouse: warehouse_c, stock: 99 } ] } If there were more products containing alfa, I would (for example) want to sort they by the stock of a warehouse. I'm currently using a query like: POST _search { query: { match: { product_name:{ query:alfa, type : phrase } } }, filter: { bool: { must: [ { term: { availability.warehouse: warehouse_a } } ] } } } I would like the results sorted by stock (for warehouse_a only) descending. Any ideas? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/01a7baad-40e3-40b3-8104-66910762b004%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0eb410ef-0117-4004-84f0-713b5b02616f%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Is it possible to do a text query against a (pre)defined set of fields?
Hi, I am indexing some large documents in an index. When making full text queries, I've generally used {text: {_all: some text search}} to find all possible results. However, the document contains a few private fields that should be queryable only by a certain user group. What I was wondering is if there is a possibility to define some kind of alias for a set of fields (or even better - all fields except the set of fields) in the mapping definition. I could then do a query {text: {alias_for_public_fields: some text search}} while the private fields would not be searched for. I do not know if this is possible already now? I know that it's possible to list all fields in the query and leave out the privates, but as there can be hundreds of fields that should be queryable but only 2-3 private fields, listing fields explicitly adds significant overhead to the queries. Best regards, Ville -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8dc6c390-b483-40d0-bc4e-264380743aef%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How many metadata fields exist of MP3 file ?
Hi David, I only got the ALBUM field by using the endpoint of Solr, which is HOST/solr/update/extract?extractOnly=true. So it seems the mapper attachment does not support the extra field extraction. right? BTW, can you give me some tutorial about the fsriver? I am also curious what's the plugin for ? What's the purpose of the plugin? Best, Ivan David Pilato於 2014年1月8日星期三UTC+8下午6時23分03秒寫道: I would recommend not to use the mapper attachment but to manage that on your side. I removed for example mapper attachment from fsriver project to have a finer control. (see https://github.com/dadoonet/fsriver/issues/38) BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer? Could be nice to add it to fsriver as well. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com javascript:) a écrit: Thanks for the reply. Except for the six standard fields, I also want to know the extra field. For example, in Solr we can extract the album field in MP3 file. Does this function also support in ElasticSearch? I just tested: I post a mp3 file into ES, but the fields of the mp3 file contains only the six fields. Ideas? Thanks a lot. David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道: Have a look at https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376 You will see that mapper attachment reads: Metadata.DATE Metadata.TITLE Metadata.AUTHOR Metadata.KEYWORDS Metadata.CONTENT_TYPE Metadata.CONTENT_LENGTH Does it help? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit: Hi all, I am wondering how many metadata fields of MP3 files exist when I post the mp3 file into ElasticSearch using the mapper-attachment. Because in Solr we can know the field information through the endpoint SOLR_HOST/update/extract?extractOnly=true, but in ElasticSearch are there any ways to get such informations? Except for the MP3 files, how about the doc files? I know the ElasticSearch use tika to support this operations, can you give me some example to fetch some special field of some special file format? Regards, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00fe2081-0f22-400f-a0be-78ee5687ee10%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks
On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: Dear all: I insert 1 logs to elasticsearch, each log is about 2M, and there are about 3000 keys and values. when i insert about 2, it used about 30G memory, and then elasticsearch is very slow, and it's hard to insert log. Could someone help me how to solve it? Thanks very much. The following is my log format: { user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}], } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6d215d9b-a194-4a97-9ab5-081d7e8eb3ab%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks
On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: Dear all: I insert 1 logs to elasticsearch, each log is about 2M, and there are about 3000 keys and values. when i insert about 2, it used about 30G memory, and then elasticsearch is very slow, and it's hard to insert log. Could someone help me how to solve it? Thanks very much. The following is my log format: { user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}], . product:[{}], name:[] } There are about 4000~ 1 users information, so a log may be 2M Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks
Do you insert that using bulk? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 12:29:33, xjj210...@gmail.com (xjj210...@gmail.com) a écrit: On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: Dear all: I insert 1 logs to elasticsearch, each log is about 2M, and there are about 3000 keys and values. when i insert about 2, it used about 30G memory, and then elasticsearch is very slow, and it's hard to insert log. Could someone help me how to solve it? Thanks very much. The following is my log format: { user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}], . product:[{}], name:[] } There are about 4000~ 1 users information, so a log may be 2M Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52cd36cd.333ab105.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Re: How many metadata fields exist of MP3 file ?
Mapper attachment does not support extra field extraction. May be you could open an issue there? https://github.com/elasticsearch/elasticsearch-mapper-attachments About FSRiver, I guess everything is described here: https://github.com/dadoonet/fsriver#filesystem-river-for-elasticsearch Is there something you don't understand? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 12:24:11, HongXuan Ji (hxua...@gmail.com) a écrit: Hi David, I only got the ALBUM field by using the endpoint of Solr, which is HOST/solr/update/extract?extractOnly=true. So it seems the mapper attachment does not support the extra field extraction. right? BTW, can you give me some tutorial about the fsriver? I am also curious what's the plugin for ? What's the purpose of the plugin? Best, Ivan David Pilato於 2014年1月8日星期三UTC+8下午6時23分03秒寫道: I would recommend not to use the mapper attachment but to manage that on your side. I removed for example mapper attachment from fsriver project to have a finer control. (see https://github.com/dadoonet/fsriver/issues/38) BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer? Could be nice to add it to fsriver as well. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com) a écrit: Thanks for the reply. Except for the six standard fields, I also want to know the extra field. For example, in Solr we can extract the album field in MP3 file. Does this function also support in ElasticSearch? I just tested: I post a mp3 file into ES, but the fields of the mp3 file contains only the six fields. Ideas? Thanks a lot. David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道: Have a look at https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376 You will see that mapper attachment reads: Metadata.DATE Metadata.TITLE Metadata.AUTHOR Metadata.KEYWORDS Metadata.CONTENT_TYPE Metadata.CONTENT_LENGTH Does it help? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit: Hi all, I am wondering how many metadata fields of MP3 files exist when I post the mp3 file into ElasticSearch using the mapper-attachment. Because in Solr we can know the field information through the endpoint SOLR_HOST/update/extract?extractOnly=true, but in ElasticSearch are there any ways to get such informations? Except for the MP3 files, how about the doc files? I know the ElasticSearch use tika to support this operations, can you give me some example to fetch some special field of some special file format? Regards, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00fe2081-0f22-400f-a0be-78ee5687ee10%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52cd3741.6763845e.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks
On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote: Do you insert that using bulk? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com javascript: ( xjj2...@gmail.com javascript:) a écrit: On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: Dear all: I insert 1 logs to elasticsearch, each log is about 2M, and there are about 3000 keys and values. when i insert about 2, it used about 30G memory, and then elasticsearch is very slow, and it's hard to insert log. Could someone help me how to solve it? Thanks very much. The following is my log format: { user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}], . product:[{}], name:[] } There are about 4000~ 1 users information, so a log may be 2M Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. No,i insert a log one by one, use thrift to transport the log . I set heap_size=30G, when i insert 2, it used 30g memory. I don't change the elasticsearch.yml except the heap_size ,and thrift.frame.(most of value i use the default value) Thanks, -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc9fbb4a-5eb2-4e3d-afa8-7524165a4e31%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How many metadata fields exist of MP3 file ?
OK, I will post the issue later. About the river, The first line: This river plugin helps to index documents from your local file system and using SSH. Does it means I store a bunch of pdf file in my local directory and by using the river plugin I can search the file in the directory. ? In fact, I started to study ElasticSearch this week and I am not very familiar the filesystem means here. Thanks a lot. Ivan David Pilato於 2014年1月8日星期三UTC+8下午7時32分17秒寫道: Mapper attachment does not support extra field extraction. May be you could open an issue there? https://github.com/elasticsearch/elasticsearch-mapper-attachments About FSRiver, I guess everything is described here: https://github.com/dadoonet/fsriver#filesystem-river-for-elasticsearch Is there something you don't understand? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 8 janvier 2014 at 12:24:11, HongXuan Ji (hxu...@gmail.com javascript:) a écrit: Hi David, I only got the ALBUM field by using the endpoint of Solr, which is HOST/solr/update/extract?extractOnly=true. So it seems the mapper attachment does not support the extra field extraction. right? BTW, can you give me some tutorial about the fsriver? I am also curious what's the plugin for ? What's the purpose of the plugin? Best, Ivan David Pilato於 2014年1月8日星期三UTC+8下午6時23分03秒寫道: I would recommend not to use the mapper attachment but to manage that on your side. I removed for example mapper attachment from fsriver project to have a finer control. (see https://github.com/dadoonet/fsriver/issues/38) BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer? Could be nice to add it to fsriver as well. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com) a écrit: Thanks for the reply. Except for the six standard fields, I also want to know the extra field. For example, in Solr we can extract the album field in MP3 file. Does this function also support in ElasticSearch? I just tested: I post a mp3 file into ES, but the fields of the mp3 file contains only the six fields. Ideas? Thanks a lot. David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道: Have a look at https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376 You will see that mapper attachment reads: Metadata.DATE Metadata.TITLE Metadata.AUTHOR Metadata.KEYWORDS Metadata.CONTENT_TYPE Metadata.CONTENT_LENGTH Does it help? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit: Hi all, I am wondering how many metadata fields of MP3 files exist when I post the mp3 file into ElasticSearch using the mapper-attachment. Because in Solr we can know the field information through the endpoint SOLR_HOST/update/extract?extractOnly=true, but in ElasticSearch are there any ways to get such informations? Except for the MP3 files, how about the doc files? I know the ElasticSearch use tika to support this operations, can you give me some example to fetch some special field of some special file format? Regards, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00fe2081-0f22-400f-a0be-78ee5687ee10%40googlegroups.com . For more options, visit
Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks
That was not really my question. Are you using BULK feature? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 12:38:00, xjj210...@gmail.com (xjj210...@gmail.com) a écrit: I use the elasticsearch version is 0.90.2 On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote: Do you insert that using bulk? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com (xjj2...@gmail.com) a écrit: On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: Dear all: I insert 1 logs to elasticsearch, each log is about 2M, and there are about 3000 keys and values. when i insert about 2, it used about 30G memory, and then elasticsearch is very slow, and it's hard to insert log. Could someone help me how to solve it? Thanks very much. The following is my log format: { user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}], . product:[{}], name:[] } There are about 4000~ 1 users information, so a log may be 2M Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66b722eb-0d8b-4bc9-88a2-f13ccd08a92b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52cd39ee.4353d0cd.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Re: How many metadata fields exist of MP3 file ?
Yes. It index your documents available on your local hard drive. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 12:42:56, HongXuan Ji (hxua...@gmail.com) a écrit: OK, I will post the issue later. About the river, The first line: This river plugin helps to index documents from your local file system and using SSH. Does it means I store a bunch of pdf file in my local directory and by using the river plugin I can search the file in the directory. ? In fact, I started to study ElasticSearch this week and I am not very familiar the filesystem means here. Thanks a lot. Ivan David Pilato於 2014年1月8日星期三UTC+8下午7時32分17秒寫道: Mapper attachment does not support extra field extraction. May be you could open an issue there? https://github.com/elasticsearch/elasticsearch-mapper-attachments About FSRiver, I guess everything is described here: https://github.com/dadoonet/fsriver#filesystem-river-for-elasticsearch Is there something you don't understand? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 12:24:11, HongXuan Ji (hxu...@gmail.com) a écrit: Hi David, I only got the ALBUM field by using the endpoint of Solr, which is HOST/solr/update/extract?extractOnly=true. So it seems the mapper attachment does not support the extra field extraction. right? BTW, can you give me some tutorial about the fsriver? I am also curious what's the plugin for ? What's the purpose of the plugin? Best, Ivan David Pilato於 2014年1月8日星期三UTC+8下午6時23分03秒寫道: I would recommend not to use the mapper attachment but to manage that on your side. I removed for example mapper attachment from fsriver project to have a finer control. (see https://github.com/dadoonet/fsriver/issues/38) BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer? Could be nice to add it to fsriver as well. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com) a écrit: Thanks for the reply. Except for the six standard fields, I also want to know the extra field. For example, in Solr we can extract the album field in MP3 file. Does this function also support in ElasticSearch? I just tested: I post a mp3 file into ES, but the fields of the mp3 file contains only the six fields. Ideas? Thanks a lot. David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道: Have a look at https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376 You will see that mapper attachment reads: Metadata.DATE Metadata.TITLE Metadata.AUTHOR Metadata.KEYWORDS Metadata.CONTENT_TYPE Metadata.CONTENT_LENGTH Does it help? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit: Hi all, I am wondering how many metadata fields of MP3 files exist when I post the mp3 file into ElasticSearch using the mapper-attachment. Because in Solr we can know the field information through the endpoint SOLR_HOST/update/extract?extractOnly=true, but in ElasticSearch are there any ways to get such informations? Except for the MP3 files, how about the doc files? I know the ElasticSearch use tika to support this operations, can you give me some example to fetch some special field of some special file format? Regards, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00fe2081-0f22-400f-a0be-78ee5687ee10%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails
Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks
no, i don't use bulk. You mean i use bulk it maybe solve the problem?Thanks On Wednesday, January 8, 2014 7:43:41 PM UTC+8, David Pilato wrote: That was not really my question. Are you using BULK feature? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 8 janvier 2014 at 12:38:00, xjj2...@gmail.com javascript: ( xjj2...@gmail.com javascript:) a écrit: I use the elasticsearch version is 0.90.2 On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote: Do you insert that using bulk? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com (xjj2...@gmail.com) a écrit: On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: Dear all: I insert 1 logs to elasticsearch, each log is about 2M, and there are about 3000 keys and values. when i insert about 2, it used about 30G memory, and then elasticsearch is very slow, and it's hard to insert log. Could someone help me how to solve it? Thanks very much. The following is my log format: { user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}], . product:[{}], name:[] } There are about 4000~ 1 users information, so a log may be 2M Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66b722eb-0d8b-4bc9-88a2-f13ccd08a92b%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ba188b4a-7f99-4194-8002-7a595821c141%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
memory surges in client app when a node dies
Hi all, I have a situation where if a node in our cluster dies (for whatever reason) the client app experiences a surge in memory usage, full GCs, and essentially dies. I think this is because the client holds on to the connections for a whlie before realising the node is dead. Does this sound possible? And does anyone have tips for how to deal with this. My thinking so far is: 1. More memory 2. A circuit-breaker pattern or some such to make sure the app disconnects quicker when ES is not responding But are there ways to configure the ES client to improve the behaviour here? Thanks, Nic -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66c393a3-91d9-4314-a38f-e5267390b9b7%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks
I use the elasticsearch version is 0.90.2 On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote: Do you insert that using bulk? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com javascript: ( xjj2...@gmail.com javascript:) a écrit: On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: Dear all: I insert 1 logs to elasticsearch, each log is about 2M, and there are about 3000 keys and values. when i insert about 2, it used about 30G memory, and then elasticsearch is very slow, and it's hard to insert log. Could someone help me how to solve it? Thanks very much. The following is my log format: { user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}], . product:[{}], name:[] } There are about 4000~ 1 users information, so a log may be 2M Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66b722eb-0d8b-4bc9-88a2-f13ccd08a92b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks
I only insert the log to elasticsearch. I will do the following wrok: 1: write the data to elasticsearch. 2: Then to search the data. Now, when i insert the data to es, It used too much memory. I wonder why the es use so much memory. Could you give me some suggestions. Thanks I use jmap to watch the pid. the result is following:(i change the heap_size 1G to watch the memory use) num #instances#bytes Class description -- 1: 229353 18348240java.util.WeakHashMap$Entry[] 2: 229353 12843768java.util.WeakHashMap 3: 145045 8703384 org.elasticsearch.index.mapper.FieldMapper[] 4: 229353 7339296 java.lang.ref.ReferenceQueue 5: 235890 5661360 org.elasticsearch.common.collect.RegularImmutableMap$TerminalEntry 6: 229346 5504304 org.apache.lucene.util.CloseableThreadLocal 7: 57303 4125816 org.elasticsearch.index.mapper.core.LongFieldMapper 8: 85939 3836608 char[] 9: 155465 3731160 org.elasticsearch.common.collect.RegularImmutableMap$NonTerminalEntry 10: 229353 3669648 java.lang.ThreadLocal 11: 229353 3669648 java.lang.ref.ReferenceQueue$Lock 12: 229353 3669648 java.util.concurrent.atomic.AtomicInteger 13: 114662 3669184 org.elasticsearch.index.analysis.NamedAnalyzer 14: 28698 3518912 org.elasticsearch.common.collect.RegularImmutableMap$LinkedEntry[] 15: 145044 3481056 java.util.Arrays$ArrayList 16: 145044 3481056 org.elasticsearch.index.mapper.FieldMappers 17: 114620 2750880 org.elasticsearch.index.analysis.NumericLongAnalyzer 18: 52044 2081760 org.apache.lucene.document.FieldType 19: 85939 2062536 java.lang.String 20: 57499 1839968 org.elasticsearch.index.mapper.FieldMapper$Names 21: 114683 1834928 org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy 22: 114662 1834592 org.apache.lucene.analysis.Analyzer$GlobalReuseStrategy 23: 57493 1379832 org.elasticsearch.index.fielddata.FieldDataType 24: 57332 1375968 org.elasticsearch.index.mapper.core.NumberFieldMapper$1 25: 57303 1375272 org.elasticsearch.common.Explicit 26: 14321 1267344 byte[] 27: 37088 1186816 java.util.HashMap$Entry 28: 14300 915200 org.elasticsearch.index.mapper.object.ObjectMapper 29: 2180660520 java.lang.Object[] 30: 14349 573960 org.elasticsearch.common.collect.RegularImmutableMap 31: 16458 526656 org.elasticsearch.common.collect.RegularImmutableList 32: 14314 343536 org.apache.lucene.index.Term 33: 14314 343536 org.apache.lucene.util.BytesRef 34: 14293 343032 org.elasticsearch.common.collect.RegularImmutableMap$EntrySet 35: 14293 343032 org.elasticsearch.common.collect.RegularImmutableAsList 36: 14293 343032 org.elasticsearch.common.collect.ImmutableMapValues 37: 8 279936 java.util.HashMap$Entry[] 38: 14314 229024 java.lang.Object 39: 14314 229024 org.elasticsearch.common.lucene.search.TermFilter 40: 216451936 org.elasticsearch.index.mapper.ObjectMappers 41: 1 16400 java.lang.String[] 42: 119 8568 org.elasticsearch.index.mapper.core.StringFieldMapper 43: 1 8208 org.elasticsearch.common.jackson.core.sym.CharsToNameCanonicalizer$Bucket[] 44: 28 1120 org.elasticsearch.common.collect.SingletonImmutableBiMap 45: 14 728 org.elasticsearch.index.mapper.RootMapper[] 46: 7 728 org.elasticsearch.index.mapper.DocumentMapper 47: 7 672 org.elasticsearch.index.mapper.internal.TimestampFieldMapper 48: 28 672 org.elasticsearch.common.collect.SingletonImmutableSet 49: 7 616 org.elasticsearch.index.mapper.internal.TTLFieldMapper 50: 7 560 org.elasticsearch.index.mapper.internal.SourceFieldMapper 51: 7 560 org.elasticsearch.index.mapper.internal.SizeFieldMapper 52: 7 504 org.elasticsearch.index.mapper.object.RootObjectMapper 53: 7 504 org.elasticsearch.index.mapper.internal.BoostFieldMapper 54: 21 504 org.elasticsearch.index.analysis.FieldNameAnalyzer 55: 14 448 java.util.concurrent.locks.ReentrantLock$NonfairSync 56: 7 392 org.elasticsearch.index.mapper.internal.UidFieldMapper 57: 7 392 org.elasticsearch.index.mapper.internal.IdFieldMapper 58: 7 392
Re: Order results by value in one of the array entries.
I ended up changing the format of the json, with warehouse stock in separate entries in an array. This way I can check for it and get the stock at the same time. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c2103ee-7786-4ec0-b981-10aabb365fb9%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: memory surges in client app when a node dies
We're using the Java transport client. The problem only happens when the app is dealing with a high number of requests. I wondered whether it was because the client takes a little bit of time to detect that the node is unavailable: potentially up to 10 seconds in total (with default settings - 5 seconds to ping the node, another 5 for the timeout). And perhaps even after the node has been dropped the existing connections to the node still need to timeout (not sure what the default is here)? On Wednesday, 8 January 2014 13:19:29 UTC, Jason Wee wrote: It should not be possible right? If you configures client app to have two or more elasticsearch nodes, it should detect if elasticsearch node is down and not use it during indexing/querying. What client are you using? Jason On Wed, Jan 8, 2014 at 7:48 PM, nicola...@guardian.co.uk javascript:wrote: Hi all, I have a situation where if a node in our cluster dies (for whatever reason) the client app experiences a surge in memory usage, full GCs, and essentially dies. I think this is because the client holds on to the connections for a whlie before realising the node is dead. Does this sound possible? And does anyone have tips for how to deal with this. My thinking so far is: 1. More memory 2. A circuit-breaker pattern or some such to make sure the app disconnects quicker when ES is not responding But are there ways to configure the ES client to improve the behaviour here? Thanks, Nic -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66c393a3-91d9-4314-a38f-e5267390b9b7%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/96062a71-107c-4a4c-80cf-ee676d963218%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: memory surges in client app when a node dies
I think you probably replied just after mine! We are using the transport client yes. And to clarify, ES itself is fine during these periods. It is the client app that has problems. On Wednesday, 8 January 2014 13:34:29 UTC, Jörg Prante wrote: Have you tried TransportClient? TransportClient does not share the heap memory with a cluster node. The setting client.transport.ping_timeout checks if the nodes connected still respond. By default, it is 5 seconds, I use values up to 30 seconds to survive long GCs without disconnects. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1ee0c0fe-967d-4b2a-bdce-62173b255911%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Beta2 Java Client: java.nio.channels.UnresolvedAddressException
There are FQDNs like vcll36a-1001.equity.csfb.com which can not be resolved by your DNS settings, it seems. 14 eth interfaces are quite cool to try to connect to, but I would reduce them by the network interface eth alias names ES provides http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-network.html Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHj-Vq17bisdYFSmMDHzrih6EkiWQCOoD5JehWpim-N3w%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: memory surges in client app when a node dies
ES TransportClient uses a RetryListener which is a bit flaky in case of exceptions caused by faulty nodes. Some users reported an explosion of port use and connection retries, and this may also bring the client memory to a limit. Maybe you have stack traces that show abnormal behavior so it's worth to raise a github issue? Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHbhiwkCmkKdu_4x7f6S28pmC35detFcXyaDDg9Dkjrkg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
No hit using scan/scroll with has_parent filter
Hi folks, I use a parent/child mapping configuration which works flawlessly with classic search requests, e.g using has_parent to find child documents with criteria on the parent documents. I am trying to get all child document IDs that match a given set of criteria using scan and scroll, which also works well - until I introduce the has_parent filter, in which case the scroll request returns no hit (although total_hits is correct). Is it a known issue? I can provide sample mapping files and queries with associated/expected results. Please note that this behavior has been noticed on 0.90.6 but is still present in 0.90.9. Thanks, best regards, -- Jean-Baptiste Lièvremont -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fd7c563e-34f7-4aa8-ab1a-460840ba2af0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: More like this scoring algorithm unclear
Hey Maarten, I would use the explain:true option to see just why your documents are being scored higher than others. MoreLikeThis using the same fulltext scoring as far as I know, so term position would affect score. http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html Justin On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote: Hi, I have a question about why the 'more like this' algorithm scores documents higher than others, while they are (at first glance) the same. What i've done is index wishlist-documents which contain 1 property: product_id, this property contains an array of product_id's (e.g. [1234, , , ]. What i'm trying to do is find similair wishlist for a given wishlist with id x. The MLT API seems to work, it returns other documents which contain at least 1 of the product_id's from the original list. But what is see is that, for example. i get 10 hits, the first 6 hits contain the same (and only 1) product_id, this product_id is present in the original wishlist. What i would expect is that the score of the first 6 is the same. However what i see is that only the first 2 have the same, the next 2 a lower score and the next 2 even lower. Why is this? Also, i'm trying to write the MLT API as an MLT query, but somehow it doesn't work. I would expect that i need to take the entire content of the original product_id property and feed is as input for the 'like_text'. The documentation is not very clear and doesn't provide examples so i'm a little lost. Hope someone can give some pointers. Thanks, Maarten -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a0e9a58d-89e7-4084-b7ed-7f34c8514ce5%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Searching indexed fields without analysing
Hi. I've deployed elasticsearch with logstash and kibana to take in Windows logs from my OSSEC log server, following this guide: http://vichargrave.com/ossec-log-management-with-elasticsearch/ I've tweaked the logstash config to extract some specific fields from the logs, such as User_Name. I'm having some issues searching on these fields though. These searches work as expected: - User_Name: * - User_Name: john.smith - User_Name: john.* - NOT User_Name: john.* But I'm having problems with Computer accounts, which take the format w-dc-01$ - they're being split on the - and the $ is ignored. So a search for w-dc-01 returns all the servers named w-anything. Also I can't do NOT User_Name: *$ to exclude computer accounts. The mappings are created automatically by logstash, and GET /logstash-2014.01.08/_mapping shows: User_Name: { type: multi_field, fields: { User_Name: { type: string, omit_norms: true }, raw: { type: string, index: *not_analyzed*, omit_norms: true, index_options: docs, include_in_all: false, ignore_above: 256 } } }, My (limited) understanding is that the not_analyzed should stop the field being split, so that my searching matches the full name, but it doesn't. I'm trying both kibana and curl to get results. Hope this makes sense. I really like the look of elasticsearch, but being able to search on extracted fields like this is pretty key to me using it. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/62e3ebfc-aaa3-4af0-b93e-d4454146607b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How to query custom rest handler in elastic search using Java api
The CustomRestAction code you posted contains *exactly* the Java code you need to execute the same action as the REST action. If you want to still want to use the REST URL, you cannot use the elasticsearch libraries. /_mastering/nodes is not a valid search type. The action does not even execute a query technically, but retrieves node leve information. Cheers, Ivan On Wed, Jan 8, 2014 at 12:46 AM, Shishir Kumar shishir.su...@gmail.comwrote: Hi, I am not facing any issue with the NodesInfoAction or the custom endpoint code. The rest endpoint is working fine, if I curl to it using:curl -XGET ' localhost:9200/_mastering/nodes?prettyhttp://10.114.24.132:9200/_mastering/nodes?pretty '. I am trying to find out a way to do this from an embedded node. In other words, somthing like below: Node node = NodeBuilder.nodeBuilder().clusterName(elasticsearch).node(); Client client = node.client(); SearchResponse response = client.prepareSearch().setSearchType(/_mastering/nodes). setQuery(QueryBuilders.queryString()). execute().actionGet(); P.S. the code snippet doesn't actually work. But I want to query the /_mastering/nodes through Java api. On Friday, 3 January 2014 13:35:13 UTC+5:30, Jörg Prante wrote: You have wrapped a NodesInfoAction, so all you have to do is NodesInfoResponse response = client.admin().cluster().prepa reNodesInfo().all().execute().actionGet(); That is the Java API. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc517dbc-8dfc-4b8e-a9b3-567d87cc734c%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD3Z-Ui8swVzkLoxASUiyuU9aWAeKpMkMkXfbN23vOGGQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Odd hot MVEL
Does anyone know what might be causing MVEL to do this: 100.3% (501.3ms out of 500ms) cpu usage by thread 'elasticsearch[elastic1002][search][T#23]' 9/10 snapshots sharing following 47 elements java.lang.Throwable.fillInStackTrace(Native Method) java.lang.Throwable.fillInStackTrace(Throwable.java:782) java.lang.Throwable.init(Throwable.java:265) java.lang.Exception.init(Exception.java:66) java.lang.RuntimeException.init(RuntimeException.java:62) java.lang.IllegalArgumentException.init(IllegalArgumentException.java:53) sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.GetterAccessor.getValue(GetterAccessor.java:43) org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.MapAccessorNest.getValue(MapAccessorNest.java:54) org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.VariableAccessor.getValue(VariableAccessor.java:37) org.elasticsearch.common.mvel2.ast.ASTNode.getReducedValueAccelerated(ASTNode.java:108) org.elasticsearch.common.mvel2.MVELRuntime.execute(MVELRuntime.java:86) org.elasticsearch.common.mvel2.compiler.CompiledExpression.getDirectValue(CompiledExpression.java:123) org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:119) org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:106) org.elasticsearch.common.mvel2.ast.Substatement.getReducedValueAccelerated(Substatement.java:44) org.elasticsearch.common.mvel2.ast.BinaryOperation.getReducedValueAccelerated(BinaryOperation.java:114) org.elasticsearch.common.mvel2.ast.BinaryOperation.getReducedValueAccelerated(BinaryOperation.java:114) org.elasticsearch.common.mvel2.compiler.ExecutableAccessor.getValue(ExecutableAccessor.java:42) org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.MethodAccessor.executeAndCoerce(MethodAccessor.java:164) org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.MethodAccessor.getValue(MethodAccessor.java:73) org.elasticsearch.common.mvel2.ast.ASTNode.getReducedValueAccelerated(ASTNode.java:108) org.elasticsearch.common.mvel2.MVELRuntime.execute(MVELRuntime.java:86) org.elasticsearch.common.mvel2.compiler.CompiledExpression.getDirectValue(CompiledExpression.java:123) org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:119) org.elasticsearch.script.mvel.MvelScriptEngineService$MvelSearchScript.run(MvelScriptEngineService.java:191) org.elasticsearch.script.mvel.MvelScriptEngineService$MvelSearchScript.runAsDouble(MvelScriptEngineService.java:206) org.elasticsearch.common.lucene.search.function.ScriptScoreFunction.score(ScriptScoreFunction.java:54) It isn't an error. Looking at MVEL's source it looks like it catches this error and works around it by inspecting the function, casting the arguments appropriately, and they retying. I imagine it'd be nice and fast if I didn't get the types wrong but it works anyway which feels a bit trappy at scale. I know this is caused by scoring tons of documents in a FunctionScore which is a pretty strong argument for moving all FunctionScoring into a rescore for protection but what in the world am I doing with MVEL to make it do this? My candidate MVEL looks like this: log10( ($doc['a'].empty ? 0 : $doc['a']) + ($doc['b'].empty ? 0 : $doc['b']) + 2 ) I'm trying to reproduce it with the debugger and Elasticsearch's tests but I haven't had any luck yet so I'd love to hear if anyone else has seen this. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1aX7vNR3L5ORHWROKvR6fMM6BkUNVSFVxKbpR8DwT4_g%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Upgrades causing Elastic Search downtime
Thanks both for the replies. Our rebalance process doesn't take too long (~5 mins per node). I had some of the plugins (head, paramedic, bigdesk) open as I was closing down the old nodes and didn't see any split brain issue although I agree we can lead ourselves down this route by doubling the instances. We want our cluster to rebalance as we bring nodes in and out so disabling is not going to work for us unless I'm misunderstanding? On Tuesday, 7 January 2014 22:16:46 UTC, Mark Walkom wrote: You can also use cluster.routing.allocation.disable_allocation to reduce the need of waiting for things to rebalance. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 8 January 2014 04:41, Ivan Brusic iv...@brusic.com javascript:wrote: Almost elasticsearch should support clusters of nodes with different minor versions, I have seen issues between minor versions. Version 0.90.8 did contain an upgrade of Lucene (4.6), but that does not look like it would cause your issue. You could look at the github issues tagged 0.90.[8-9] and see if something applies in your case. A couple of points about upgrading: If you want to use the double-the-nodes techniques (which should not be necessary for minor version upgrades), you could decommission a node using the Shard API. Here is a good writeup: http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/ Since you doubled the amount of nodes in the cluster, the minimum_master_nodes setting would be temporarily incorrect and potential split-brain clusters might occur. In fact, it might have occurred in your case since the cluster state seems incorrect. Merely hypothesizing. Cheers, Ivan On Tue, Jan 7, 2014 at 9:26 AM, Jenny Sivapalan jennifer@gmail.comjavascript: wrote: Hello, We've upgraded Elastic Search twice over the last month and have experienced downtime (roughly 8 minutes) during the roll out. I'm not sure if it something we are doing wrong or not. We use EC2 instances for our Elastic Search cluster and cloud formation to manage our stack. When we deploy a new version or change to Elastic Search we upload the new artefact, double the number of EC2 instances and wait for the new instances to join the cluster. For example 6 nodes form a cluster on v 0.90.7. We upload the 0.90.9 version via our deployment process and double the number nodes for the cluster (12). The 6 new nodes will join the cluster with the 0.90.9 version. We then want to remove each of the 0.90.7 nodes. We do this by shutting down the node (using the plugin head), wait for the cluster to rebalance the shards and then terminate the EC2 instances. Then repeat with the next node. We leave the master node until last so that it does the re-election just once. The issue we have found in the last two upgrades is that while the penultimate node is shutting down the master starts throwing errors and the cluster goes red. To fix this we've stopped the Elastic Search process on master and have had to restart each of the other nodes (though perhaps they would have rebalanced themselves in a longer time period?). We find that we send an increase error response to our clients during this time. We've set out queue size for search to 300 and we start to see the queue gets full: at java.lang.Thread.run(Thread.java:724) 2014-01-07 15:58:55,508 DEBUG action.search.type[Matt Murdock] [92036651] Failed to execute fetch phase org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 300) on org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction$2@23f1bc3 at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:61) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) But also we see the following error which we've been unable to find the diagnosis for: 2014-01-07 15:58:55,530 DEBUG index.shard.service [Matt Murdock] [index-name][4] Can not build 'doc stats' from engine shard state [RECOVERING] org.elasticsearch.index.shard.IllegalIndexShardStateException: [index-name][4] CurrentState[RECOVERING] operations only allowed when started/relocated at org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:765) Are we doing anything wrong or has anyone experienced this? Thanks, Jenny -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit
Re: More like this scoring algorithm unclear
Hi, Thanks, i'm not quite sure how to do that. I'm using: http://localhost:9200/lists/list/[id of list]/_mlt?mlt_field=product_idmin_term_freq=1min_doc_freq=1 the body does not seem to be respected (i'm using the elasticsearch head plugin) if i ad: { explain: true } i've been trying to rewrite the mlt api as an mlt query but no luck so far. Any suggestions? Thanks, Maarten Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher: Hey Maarten, I would use the explain:true option to see just why your documents are being scored higher than others. MoreLikeThis using the same fulltext scoring as far as I know, so term position would affect score. http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html Justin On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote: Hi, I have a question about why the 'more like this' algorithm scores documents higher than others, while they are (at first glance) the same. What i've done is index wishlist-documents which contain 1 property: product_id, this property contains an array of product_id's (e.g. [1234, , , ]. What i'm trying to do is find similair wishlist for a given wishlist with id x. The MLT API seems to work, it returns other documents which contain at least 1 of the product_id's from the original list. But what is see is that, for example. i get 10 hits, the first 6 hits contain the same (and only 1) product_id, this product_id is present in the original wishlist. What i would expect is that the score of the first 6 is the same. However what i see is that only the first 2 have the same, the next 2 a lower score and the next 2 even lower. Why is this? Also, i'm trying to write the MLT API as an MLT query, but somehow it doesn't work. I would expect that i need to take the entire content of the original product_id property and feed is as input for the 'like_text'. The documentation is not very clear and doesn't provide examples so i'm a little lost. Hope someone can give some pointers. Thanks, Maarten -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f1b4a50-8862-42e8-a3a8-532f88757a48%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Timestamp and _timestamp
Hi all, I have a ES cluster with four nodes and 157 indexes. There are about 140 mln. entries that occupy around 50 GB size (1 primary index with one replica). There are 2 data nodes, one pure master and one client node that serve as gate for web requests. Last days I started to observe that the cluster becomes very unstable and every few hours one of the data server stop unexpectedly. The only solution was to reboot all data nodes to be able to process future logging. My mapping contains this definitions: Timestamp: { type: date, format: date_time } and _timestamp : { enabled : true, path : Timestamp }, After some tests I discovered that if I do request with filtering on Timestamp the CPU load becomes very high and the cluster gets unstable. All incoming events are rejected. While when I make requests filtering on _timestamp everything works well as expected. My question is: why this is happening and what is the source of this behavior? Any ideas how to fix it? Thanks in advance, Nickolay Kolev -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a4cfa7b-fdfe-45a4-8bd2-906d8587a4f4%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Is it possible to do a text query against a (pre)defined set of fields?
Ville, Perhaps: Don't include the private fields in _all. Then a query against _all would be restricted to the (perhaps hundreds) of public fields. A query that includes the private fields would need to list _all and then the private fields. But since you have only 2 or 3 private fields, there shouldn't be much overhead on the query. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0692b19c-2c81-441b-ba2c-c7d33f98648c%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Can wildcard, matched fields have relevance scoring?
Hi, I am doing an 'exists' query on a field that is matching to a text field. The results all come back with same score. Example: _metadata:[* TO *] // Match documents where this field exists matched field ['text'] This searches only documents that contain a field called _metadata and highlights that field into 'text' field. I want the results to be ranked based on size of _metadata field or # of matches. Is it possible? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e203d671-cca4-4d19-ba63-bc6d3c704026%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Is it possible to do a text query against a (pre)defined set of fields?
Hi, Well - I think I've understood something wrong here. Isn't _all a special key that includes all indexed fields? Is there possibility to change the fields included in _all? Ville Ville, Perhaps: Don't include the private fields in _all. Then a query against _all would be restricted to the (perhaps hundreds) of public fields. A query that includes the private fields would need to list _all and then the private fields. But since you have only 2 or 3 private fields, there shouldn't be much overhead on the query. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d326a968-9ed9-44e9-8640-9b00d9507c05%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Does the server support streaming?
You are correct, ES nodes consumes data request by request, before they are passed on through the cluster. Also the bulk indexing requests, such requests are temporarily pushed to buffers, but they are split by lines and executed as single actions. So to reduce network roundtrips, the best thing is to use the bulk API. What is left is a few percent to optimize, which is not much worth it. With gzip, ES HTTP provides transparent compression. Main challenge is HTTP overhead (headers can't be compressed), and base64, if you use binary data with ES. Please note that you must evaluate the bulk responses too, in order to validate the notification about bulk success on doc level. It is possible to extend the whole ES API also to Websocket, so beside JSON, it could also be possible to transfer JSON text frames or SMILE/binary frames on a single bi-directional channel. HTTP must use two channels for this, so with Websocket, you can reduce connection resources to the half. In this sense, the Netty channel / REST / Java API could be extended for special realtime WS streaming mode applications, like for pubsub applications. I experimented with that some time ago on ES 0.20 https://github.com/jprante/elasticsearch-transport-websocket (needs updating) From what I understand, the thrift transport plugin compiles the ES API, operates in a streaming-like fashion, and is providing a solution that reduces HTTP overhead: https://github.com/elasticsearch/elasticsearch-transport-thrift Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH7wM%2BpdVpH9%3Dysoq7a0CesOGxDnY4yAwQAeAcqLWDGvQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: More like this scoring algorithm unclear
scoring algorithm is still vague but i got the query to act like the API, although the results are different so i'm still doing it wrong, here's an example: { explain: true, query: { more_like_this: { fields: [ PRODUCT_ID ], like_text: 104004855475 1001004002067765 100200494210 1002004004499883, min_term_freq: 1, min_doc_freq: 1, max_query_terms: 1, percent_terms_to_match: 0.5 } }, from: 0, size: 50, sort: [], facets: {} } the like_text contains product_id's from a wishlist for which i want to find similair lists Op woensdag 8 januari 2014 16:50:53 UTC+1 schreef Maarten Roosendaal: Hi, Thanks, i'm not quite sure how to do that. I'm using: http://localhost:9200/lists/list/[id of list]/_mlt?mlt_field=product_idmin_term_freq=1min_doc_freq=1 the body does not seem to be respected (i'm using the elasticsearch head plugin) if i ad: { explain: true } i've been trying to rewrite the mlt api as an mlt query but no luck so far. Any suggestions? Thanks, Maarten Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher: Hey Maarten, I would use the explain:true option to see just why your documents are being scored higher than others. MoreLikeThis using the same fulltext scoring as far as I know, so term position would affect score. http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html Justin On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote: Hi, I have a question about why the 'more like this' algorithm scores documents higher than others, while they are (at first glance) the same. What i've done is index wishlist-documents which contain 1 property: product_id, this property contains an array of product_id's (e.g. [1234, , , ]. What i'm trying to do is find similair wishlist for a given wishlist with id x. The MLT API seems to work, it returns other documents which contain at least 1 of the product_id's from the original list. But what is see is that, for example. i get 10 hits, the first 6 hits contain the same (and only 1) product_id, this product_id is present in the original wishlist. What i would expect is that the score of the first 6 is the same. However what i see is that only the first 2 have the same, the next 2 a lower score and the next 2 even lower. Why is this? Also, i'm trying to write the MLT API as an MLT query, but somehow it doesn't work. I would expect that i need to take the entire content of the original product_id property and feed is as input for the 'like_text'. The documentation is not very clear and doesn't provide examples so i'm a little lost. Hope someone can give some pointers. Thanks, Maarten -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c7032391-2456-47a0-a3b8-1f5fe61127e7%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Elasticsearch Missing Data
Hello, I've had my elasticsearch instance running for about a week with no issues, but last night it stopped working. When I went to look in Kibana, it stops logging around 20:45 on 1/7/14. I then restarted the service on both both elasticsearch servers and it started logging again and back pulled some logs from 07:10 that morning, even though I restarted the service around 10:00. So my questions are: 1. Why did it stop working? I don't see any obvious errors. 2. When I restarted it, why didn't it go back and pull all of the data and not just some of it? I see that there are no unassigned shards. curl -XGET 'http://localhost:9200/_cluster/health?pretty=true' { cluster_name : my-elasticsearch, status : green, timed_out : false, number_of_nodes : 3, number_of_data_nodes : 2, active_primary_shards : 40, active_shards : 80, relocating_shards : 0, initializing_shards : 0, unassigned_shards : 0 Are there any additional queries or logs I can look at to see what is going on? On a slight side note, when I restarted my 2nd elasticsearch server it isn't reading from the /etc/elasticsearch.yml file like it should. It isn't creating the node name correctly or putting the data files in the spot I have configured. I'm using CentOS and doing everything via /etc/init.d/elasticsearch on both servers and the elasticsearch1 server reads everything correctly but elasticsearch2 does not. Thanks for your help. Eric -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fc191ee4-b312-4c52-89d9-de04c4309b65%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Strategy for keeping Elasticsearch updated with MySQL
I would do 1/ to have a more near real time search. Also, I'd the idea that I have an object in memory and I simply push it to MySQL and to ES in the same time. No need to read again the object from MySQL to index it in another process (proposition 2) That said you could use also a Message Queue in the middle if you want to be able at some point to stop your ES cluster without stopping your application. This is what I did in the past. My 2 cents -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 20:13:40, arthurX (fc28...@gmail.com) a écrit: Hello! I use MySQL as my primary datastore and use Elasticsearch to further index the documents. My problem is keeping the data in ES in sync with MySQL. Currently I have two methods in mind: 1. whenever add or update an entry in MySQL, do the action together in ES. 2. Do some cron jobs that periodically keep ES in sync with the data in MySQL. For method 2 I wonder how can I check if an entry is already indexed in Elasticsearch. And would it be efficient at all if I have to check every entry to see if it is updated? I am new to the technology and I am afraid I had missed some really obvious and established solutions here. Or otherwise the normal way this situation is handled? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/55d842e5-277f-4d24-b5a9-8be5b5544dbc%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52cdb688.70a64e2a.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Very open Elasticsearch installation
I've spent the past six months or so writing and deploying a replacement on site search system and we've finally decided it was time to blog about ithttp://blog.wikimedia.org/2014/01/06/wikimedia-moving-to-elasticsearch/. I figure this list might find this useful because everythinghttp://git.wikimedia.org/summary/?r=mediawiki/extensions/CirrusSearch.gitis open sourcehttp://git.wikimedia.org/tree/operations%2Fpuppet.git/production/modules%2Felasticsearchand publichttp://ganglia.wikimedia.org/latest/?c=Elasticsearch%20cluster%20eqiadm=cpu_reportr=hours=by%20namehc=4mc=2. It looks like we're averaging about 3000http://ganglia.wikimedia.org/latest/?r=daycs=ce=m=es_queriess=by+namec=Elasticsearch+cluster+eqiadh=host_regex=max_graphs=0tab=mvn=hide-hf=falsesh=1z=smallhc=4queries per second and about 300http://ganglia.wikimedia.org/latest/?r=daycs=ce=m=es_indexess=by+namec=Elasticsearch+cluster+eqiadh=host_regex=max_graphs=0tab=mvn=hide-hf=falsesh=1z=smallhc=4updates per second at the moment which doesn't make us a very big installation but we're excited in our own little way. If you have time please give it a shot here https://it.wikipedia.org/wiki/Speciale:Ricercaor here https://en.wikisource.org/wiki/Special:Search or herehttps://www.mediawiki.org/wiki/Special:Searchor, if you don't mind somewhat uglier results, here https://www.wikidata.org/wiki/Special:Search. If you notice anything fishy please let me know or file a bughttps://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensionscomponent=CirrusSearch. We're a pretty small team but we'll get to everything eventually. Wish me/us luck. Over the next few months we'll be doubling the number of documents indexed and doubling the update rate and ramping up the query rate by about an order of magnitude. Thanks for reading, Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1RM7%2BZdJbUq%2B%2BAnTiz_4Hu1ei9OQ7c31w9%3Dgp248i-6A%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Very open Elasticsearch installation
This is really awesome Nik! Congrats to your team. I'm a bit disappointed that this search gives no result: https://en.wikisource.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=elasticsearchfulltext=Search :-) Best -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 8 janvier 2014 at 21:46:20, Nikolas Everett (nik9...@gmail.com) a écrit: I've spent the past six months or so writing and deploying a replacement on site search system and we've finally decided it was time to blog about it. I figure this list might find this useful because everything is open source and public. It looks like we're averaging about 3000 queries per second and about 300 updates per second at the moment which doesn't make us a very big installation but we're excited in our own little way. If you have time please give it a shot here or here or here or, if you don't mind somewhat uglier results, here. If you notice anything fishy please let me know or file a bug. We're a pretty small team but we'll get to everything eventually. Wish me/us luck. Over the next few months we'll be doubling the number of documents indexed and doubling the update rate and ramping up the query rate by about an order of magnitude. Thanks for reading, Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1RM7%2BZdJbUq%2B%2BAnTiz_4Hu1ei9OQ7c31w9%3Dgp248i-6A%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52cdb9ed.4b588f54.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Re: Strategy for keeping Elasticsearch updated with MySQL
Hi Arthur, I have done something similar years ago when I was working for a newspaper. We kept articles in database and full text was done with external program. There was a trigger the tables with articles that on every change operation adds record in a queue table. Something like this: article_id, opetation_type, table_name Then there was a cron jon every minute that reads from this table and -On delete deletes the entry -On Update deletes the entry and generates new simple page with the new artice – only title and content and put it on indexer to be indexed -On insert generates new simple page with the new artice – only title and content and put it on indexer to be indexed Articles have been placed in some directory like this: /root_dir/table_name/id/content.html. Then this path was returned and easy parsed to generate appropriate link to the artice. After success removes respective record from the queue and we have near realtime This can be done with ES but much easier best reragards, Nickolay Kolev 08 януари 2014, сряда, 21:13:35 UTC+2, arthurX написа: Hello! I use MySQL as my primary datastore and use Elasticsearch to further index the documents. My problem is keeping the data in ES in sync with MySQL. Currently I have two methods in mind: 1. whenever add or update an entry in MySQL, do the action together in ES. 2. Do some cron jobs that periodically keep ES in sync with the data in MySQL. For method 2 I wonder how can I check if an entry is already indexed in Elasticsearch. And would it be efficient at all if I have to check every entry to see if it is updated? I am new to the technology and I am afraid I had missed some really obvious and established solutions here. Or otherwise the normal way this situation is handled? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7dff496c-cc26-4620-bf1b-115a53f0ca6d%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Very open Elasticsearch installation
On Wed, Jan 8, 2014 at 3:49 PM, David Pilato da...@pilato.fr wrote: This is really awesome Nik! Congrats to your team. I'm a bit disappointed that this search gives no result: https://en.wikisource.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=elasticsearchfulltext=Search :-) Looks like we don't have any books about Elasticsearch. It does show up here: https://it.wikipedia.org/w/index.php?title=Speciale%3ARicercaprofile=defaultsearch=elasticsearchfulltext=Searchbut I can't read it. You can also find technical stuff about the integration and our rollout plan over here: https://www.mediawiki.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=elasticsearchfulltext=Search. I'll let you know when you can find https://en.wikipedia.org/wiki/Elasticsearch with it but that might take some time. There is way too much search traffic for us to be the default there. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Eaxe0dYW-WjyYbhVssaambHWdYqKeyMFstah3apdDrw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Is it possible to do a text query against a (pre)defined set of fields?
Ville, By default, the _all field includes all of the indexed field. Then, for your private fields, explicitly exclude them from the _all field by adding the following to their properties: include_in_all : false See the ES guide for more details, Specifically, thishttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.htmlmight help. I typically disable the _all field completely to cut down dramatically on disk space and build times. But everything else in ES has worked like a charm, so I'm sure this would work for you without too much trouble. Good luck! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54d7375c-2644-4b81-8e0e-51e74140f0aa%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: incrementally scaling ES from the small data
Adolfo, Still could not test how sockets relate to shards and why I automatically get 10 established sockets when opening a client: node = builder.client(clientOnly).data(!clientOnly).local(local).node(); client = node.client(); on default ES configuration, and many many more sockets after (up to 200), and how this number changes when increasing/decreasing number of shards, Of course, your application should create only one client and then let all threads within the application share that one client. Each client, especially the NodeClient, typically creates a thread pool behind it. It's a very heavy-weight object, so do not create more than one of them. But it's perfectly thread-safe and can (should) be used by as many threads in your application as desired. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/653b907c-38fe-4cfd-9cb9-1e7dcfae9c00%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Kibana Static Dashboard ?
I am modifying the guided.json dashboard. Down in Events panel I would like to tell kibana to statically filter out specific records. I tried adding this to the file. query: { filtered: { query: { bool: { should: [ { query_string: { query: record-type: traffic-stats } } ] } } } }, Doesn't appear to work. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/83bd80b2-5a61-4a15-b359-125fd600f3cd%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Very open Elasticsearch installation
I can't tell it in other words, but your step to ES is a landmark. Thank you, Nik, for making this public, this helps me a lot for spreading the word for more openness... https://en.wikisource.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=Hello+Worldfulltext=Search The search suggestion is a bit surprising - he do works :) but what a difference to the old search https://de.wikisource.org/wiki/Spezial:Suche Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHKtgyS%2BKPuDNnhXbx6nc038O61TGze-mC6pVvTANMkGA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Kibana Static Dashboard ?
Hello Jay , Cant you do the same from the kibana side by adding a must not filter. Here once you save that dashboard , you can always go back to the same link to see the same static dashboard. Thanks Vineeth On Thu, Jan 9, 2014 at 2:42 AM, Jay Wilson jawro...@gmail.com wrote: I am modifying the guided.json dashboard. Down in Events panel I would like to tell kibana to statically filter out specific records. I tried adding this to the file. query: { filtered: { query: { bool: { should: [ { query_string: { query: record-type: traffic-stats } } ] } } } }, Doesn't appear to work. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/83bd80b2-5a61-4a15-b359-125fd600f3cd%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5n%2BXkPc94dk_jmTOFFK7%2B_nrUdPL__k8d2ua6zcAe2Arg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Kibana Static Dashboard ?
As I understand Kibana when a dashboard is saved, it is placed into elasticsearch. I don't want it in elasticsearch. I want it in a static file. On Wednesday, January 8, 2014 2:32:50 PM UTC-7, vineeth mohan wrote: Hello Jay , Cant you do the same from the kibana side by adding a must not filter. Here once you save that dashboard , you can always go back to the same link to see the same static dashboard. Thanks Vineeth On Thu, Jan 9, 2014 at 2:42 AM, Jay Wilson jawr...@gmail.comjavascript: wrote: I am modifying the guided.json dashboard. Down in Events panel I would like to tell kibana to statically filter out specific records. I tried adding this to the file. query: { filtered: { query: { bool: { should: [ { query_string: { query: record-type: traffic-stats } } ] } } } }, Doesn't appear to work. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/83bd80b2-5a61-4a15-b359-125fd600f3cd%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f207abdd-9fce-4379-aa9a-dd1dd35aa398%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
allow_explicit_index and _bulk
The documentation on URL-based access controlhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/url-access-control.html implies that _bulk still works if you set rest.action.multi.allow_explicit_index: false, as long as you specify the index in the URL. However, I can't get it to work. POST /foo/bar/_bulk { index: {} } { _id: 1234, baz: foobar } returns explicit index in bulk is not allowed Should this work? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a0d1fa2f-0c28-4142-9f6d-4b28a1695bb3%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: incrementally scaling ES from the small data
BTW, I was very wrong when I mentioned that elasticsearch uses consistent hashing. It uses modulo-based hashing, which is why the number of shards cannot change since the modulo is fixed. Working on too many things at once while replying. :) On Wed, Jan 8, 2014 at 1:10 PM, InquiringMind brian.from...@gmail.comwrote: Adolfo, Still could not test how sockets relate to shards and why I automatically get 10 established sockets when opening a client: node = builder.client(clientOnly).data(!clientOnly).local(local).node(); client = node.client(); on default ES configuration, and many many more sockets after (up to 200), and how this number changes when increasing/decreasing number of shards, Of course, your application should create only one client and then let all threads within the application share that one client. Each client, especially the NodeClient, typically creates a thread pool behind it. It's a very heavy-weight object, so do not create more than one of them. But it's perfectly thread-safe and can (should) be used by as many threads in your application as desired. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/653b907c-38fe-4cfd-9cb9-1e7dcfae9c00%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDjLig-7%3D-_Ha%3D9Mi36um_qjqdJjMj-Ju7scg%3DKpxjpFA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: No hit using scan/scroll with has_parent filter
Hi Jean, Can you share how you execute the scan request with the has_parent filter? (via a gist or something like that) Martijn On 8 January 2014 15:17, Jean-Baptiste Lièvremont jean-baptiste.lievrem...@sonarsource.com wrote: Hi folks, I use a parent/child mapping configuration which works flawlessly with classic search requests, e.g using has_parent to find child documents with criteria on the parent documents. I am trying to get all child document IDs that match a given set of criteria using scan and scroll, which also works well - until I introduce the has_parent filter, in which case the scroll request returns no hit (although total_hits is correct). Is it a known issue? I can provide sample mapping files and queries with associated/expected results. Please note that this behavior has been noticed on 0.90.6 but is still present in 0.90.9. Thanks, best regards, -- Jean-Baptiste Lièvremont -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fd7c563e-34f7-4aa8-ab1a-460840ba2af0%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- Met vriendelijke groet, Martijn van Groningen -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TzaCJqG26SMNid2PawALjNSZgr%2ByS3NJEFfF3G%3DySjqkw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
SSL and org.elasticsearch.transport.NodeDisconnectedException
I have an es_client (java/dropwizard) application. It communicates with the elasticsearch just fine over plaintext connection. I have followed the instructions at https://github.com/sonian/elasticsearch-jetty to set up SSL for es. However when I start my es_client it reports every 5 seconds the following: INFO [2014-01-08 23:02:14,814] org.elasticsearch.client.transport: [Karolina Dean] failed to get node info for [#transport#-1][inet[localhost/127.0.0.1:9443]], disconnecting... ! org.elasticsearch.transport.NodeDisconnectedException: [][inet[localhost/127.0.0.1:9443]][cluster/nodes/info] disconnected How can I go about figuring this one out? Thanks, Maciej -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bff6a325-ab42-48ef-a6f4-bc7f9c274d1a%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: SSL and org.elasticsearch.transport.NodeDisconnectedException
On Wednesday, January 8, 2014 5:19:10 PM UTC-6, Maciej Stoszko wrote: I have an es_client (java/dropwizard) application. It communicates with the elasticsearch just fine over plaintext connection. I have followed the instructions at https://github.com/sonian/elasticsearch-jetty to set up SSL for es. However when I start my es_client it reports every 5 seconds the following: INFO [2014-01-08 23:02:14,814] org.elasticsearch.client.transport: [Karolina Dean] failed to get node info for [#transport#-1][inet[localhost/127.0.0.1:9443]], disconnecting... ! org.elasticsearch.transport.NodeDisconnectedException: [][inet[localhost/127.0.0.1:9443]][cluster/nodes/info] disconnected How can I go about figuring this one out? Thanks, Maciej Actually digging around a bit more, I think I should revise my question: Is it currently possible to have JAVA API client talking to Elasticsearch via SSL. I see that https://github.com/elasticsearch/elasticsearch/pull/2105 (Add SSL support to Netty transport layer for Client/Node-to-Node communication) was rejected. Maybe it is simply a feature which does not (yet) exist. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3933c63d-9763-4892-977b-733c0407e140%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: How to index an existing json file
Thank you for the binary flag tip. It is also in the documentation here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html On Tuesday, January 7, 2014 9:00:33 PM UTC-5, ZenMaster80 wrote: Hi, I am just starting with ElasticSearch, I would like to know how to index a simple json document books.json that has the following in it: Where do I place the document? I placed it in root directory of elastic search and in /bin folder.. {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda Jones”]}} $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json Warning: Couldn't read data from file books.json, this makes an empty POST. {error:MapperParsingException[failed to parse, document is empty],status:400} Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/853a876f-c6cb-4dd5-907a-13f626b3f078%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Filter and Query same taking some time
Hi, I had implemented ES search query for all our use cases but when i came to know that some of our use cases can be solved by filters I implemented that but I dont see any gain (in response time) in filters. My search queries are 1. Filter { size : 100, query : { match_all : { } }, filter : { bool : { must : { term : { color : red } } } }, version : true } 2. Query { size : 100, query : { bool : { must : { match : { color : { query : red, type : boolean, operator : AND } } } } }, version : true } By default the term query should be cached but I dont see a performance gain. Do i need to change some parameter also ? I am using ES 0.90.1 and with 16Gb of heap space given to ES. Thanks, Arjit -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/326a6640-d887-46b4-a8e7-ec15a1c9dc98%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Why not use rivers in production?
Getting reintroduced to ES and a co-worker recommended I listen to the webinar intro that Drew Raines gave as he mentioned something specific about rivers. Listening through it I heard him say that they (assuming ES) don't recommend using rivers in production because it's tied to one node. Having looked through a lot of the documentation on rivers I do see that you can specify which rivers run on which nodes so I wasn't sure what the exact implication was of this statement. Drew? Or anyone else care to comment? We're getting ready to push all of our data from MongoDB into ES so that we can search it and use Kibana for analysis so any insight into this would be great, thank you :). -warner -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJNTuMAay4fSD%3DaE453Ps2pYFRXcYjeM2TUb718z9na8pYqdBg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: High load average running on ES node
Hi Jörg, Thanks a lot for your detailed reply. Can you please explain how can I *reconfiguring ES for efficient cache usage* ? Thanks, Arjit Thanks , Arjit On Sun, Jan 5, 2014 at 10:55 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: The load is not much a surprise for an 8 core CPU node, I have also observed loads of 80-100. This high load, when induced by indexing, can be significantly reduced when using a high performance input/output disk subsystem, such as SSD. The disks are the slowest part in the system and generate high I/O wait which is responsible for increasing the CPU load. GC does generate high load too, this is mostly related to expensive queries that use filters or caches. The overall performance of the JVM is getting very poor in that case. You have several options: - rewriting queries or reconfiguring ES for efficient cache usage - adding nodes - decrease the heap slightly to smooth the steep edge when stop-the-world GC kicks in (but this depends on the workload if your ES cluster can work with less heap) G1 GC does not help against query/filter load, it is not decreasing CPU load, in fact, it is putting more CPU load on the machines, so it can better make a trade-off with less stop-of-the-world. G1 GC helps to push the stop-the-world periods under a certain limit so ES nodes do not disconnect that easily. It has no steep edge when performing stop-the-world GC phases. Please note, currently G1 GC seems safe only with Java 7 or Java 8 and ES version that have replaced GNU trove4j with HPPC library, that is, 0.90.9 or 1.0.0.Beta2 Jörg -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/taLTdd4S29w/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFnK4SweOfQK1b0yg9M4yCBJVVmGpd%3DcJpHSwqLC9Cjxw%40mail.gmail.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADe%2BHd_1c2OpJoHdH8WGa0q0F-FR4haC71FrmPGgk5Oz1RwDPg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks
The env is following: --elasticseasrch v0.90( i use 0.90.9 , the problem is still exist). -- java version is 1.7.0_45 On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: Dear all: I insert 1 logs to elasticsearch, each log is about 2M, and there are about 3000 keys and values. when i insert about 2, it used about 30G memory, and then elasticsearch is very slow, and it's hard to insert log. Could someone help me how to solve it? Thanks very much. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/caec9b84-c543-4bb3-8cb0-e90113972716%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks
Just wondering if you are hitting the same RAM usage when inserting without thrift? Could you test it? Could you gist as well what gives: curl -XGET 'http://localhost:9200/_nodes?all=truepretty=true' -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 9 janvier 2014 at 07:11:33, xjj210...@gmail.com (xjj210...@gmail.com) a écrit: The env is following: --elasticseasrch v0.90( i use 0.90.9 , the problem is still exist). -- java version is 1.7.0_45 On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: Dear all: I insert 1 logs to elasticsearch, each log is about 2M, and there are about 3000 keys and values. when i insert about 2, it used about 30G memory, and then elasticsearch is very slow, and it's hard to insert log. Could someone help me how to solve it? Thanks very much. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/caec9b84-c543-4bb3-8cb0-e90113972716%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52ce4caf.7644a45c.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Re: Why not use rivers in production?
A river instance is a singleton in the cluster. It means that a river is working only on a single node. It could be reallocated on another node when the first node fails. I think that's what Drew meant. Basically, rivers does not scale. My 2 cents -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 9 janvier 2014 at 06:15:24, Warner Onstine (warn...@gmail.com) a écrit: Getting reintroduced to ES and a co-worker recommended I listen to the webinar intro that Drew Raines gave as he mentioned something specific about rivers. Listening through it I heard him say that they (assuming ES) don't recommend using rivers in production because it's tied to one node. Having looked through a lot of the documentation on rivers I do see that you can specify which rivers run on which nodes so I wasn't sure what the exact implication was of this statement. Drew? Or anyone else care to comment? We're getting ready to push all of our data from MongoDB into ES so that we can search it and use Kibana for analysis so any insight into this would be great, thank you :). -warner -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJNTuMAay4fSD%3DaE453Ps2pYFRXcYjeM2TUb718z9na8pYqdBg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52ce4d53.749abb43.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Re: Filter and Query same taking some time
You probably won't see any difference the first time you execute it unless you are using warmers. With a second query, you should see the difference. How many documents you have in your dataset? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 9 janvier 2014 at 06:14:06, Arjit Gupta (arjit...@gmail.com) a écrit: Hi, I had implemented ES search query for all our use cases but when i came to know that some of our use cases can be solved by filters I implemented that but I dont see any gain (in response time) in filters. My search queries are 1. Filter { size : 100, query : { match_all : { } }, filter : { bool : { must : { term : { color : red } } } }, version : true } 2. Query { size : 100, query : { bool : { must : { match : { color : { query : red, type : boolean, operator : AND } } } } }, version : true } By default the term query should be cached but I dont see a performance gain. Do i need to change some parameter also ? I am using ES 0.90.1 and with 16Gb of heap space given to ES. Thanks, Arjit -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/326a6640-d887-46b4-a8e7-ec15a1c9dc98%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52ce519b.75c6c33a.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Re: Filter and Query same taking some time
I have 100,000 documents which are similar. In response I am getting the whole document not just Id. I am executing the query multiple times. Thanks , Arjit On Thu, Jan 9, 2014 at 1:06 PM, David Pilato da...@pilato.fr wrote: You probably won't see any difference the first time you execute it unless you are using warmers. With a second query, you should see the difference. How many documents you have in your dataset? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 9 janvier 2014 at 06:14:06, Arjit Gupta (arjit...@gmail.com//arjit...@gmail.com) a écrit: Hi, I had implemented ES search query for all our use cases but when i came to know that some of our use cases can be solved by filters I implemented that but I dont see any gain (in response time) in filters. My search queries are 1. Filter { size : 100, query : { match_all : { } }, filter : { bool : { must : { term : { color : red } } } }, version : true } 2. Query { size : 100, query : { bool : { must : { match : { color : { query : red, type : boolean, operator : AND } } } } }, version : true } By default the term query should be cached but I dont see a performance gain. Do i need to change some parameter also ? I am using ES 0.90.1 and with 16Gb of heap space given to ES. Thanks, Arjit -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/326a6640-d887-46b4-a8e7-ec15a1c9dc98%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/uknnBHMnZLk/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52ce519b.75c6c33a.1449b%40MacBook-Air-de-David.local . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADe%2BHd-RzJxTrtt8gVOS6cxa%3DXNZ%3Dwa%2Bv8Vnwnqigd5gfnJ0fw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: SSL and org.elasticsearch.transport.NodeDisconnectedException
jetty plugin replace http layer (9200) not the transport layer (9300). Transport Client uses transport layer (9300). -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 9 janvier 2014 at 02:02:48, Maciej Stoszko (maciek...@gmail.com) a écrit: On Wednesday, January 8, 2014 5:19:10 PM UTC-6, Maciej Stoszko wrote: I have an es_client (java/dropwizard) application. It communicates with the elasticsearch just fine over plaintext connection. I have followed the instructions at https://github.com/sonian/elasticsearch-jetty to set up SSL for es. However when I start my es_client it reports every 5 seconds the following: INFO [2014-01-08 23:02:14,814] org.elasticsearch.client.transport: [Karolina Dean] failed to get node info for [#transport#-1][inet[localhost/127.0.0.1:9443]], disconnecting... ! org.elasticsearch.transport.NodeDisconnectedException: [][inet[localhost/127.0.0.1:9443]][cluster/nodes/info] disconnected How can I go about figuring this one out? Thanks, Maciej Actually digging around a bit more, I think I should revise my question: Is it currently possible to have JAVA API client talking to Elasticsearch via SSL. I see that https://github.com/elasticsearch/elasticsearch/pull/2105 (Add SSL support to Netty transport layer for Client/Node-to-Node communication) was rejected. Maybe it is simply a feature which does not (yet) exist. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3933c63d-9763-4892-977b-733c0407e140%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52ce5336.374a3fe6.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
Elasticsearch Hadoop
Hi, To index Hadoop data into elasticsearch as I understand, We create an external table with essstorage handler and then copy the data from another internal hive table doesn't it duplicate the data in HDFS? Is there any way to use the hive internal tables directly to index instead of having two tables with same data? Kind Regards, Badal -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed08fd38-05e4-437a-a8e2-3295f2195e2a%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Converting queries returning certain distinct records to ES
May be you could find a way to do that with a single query if you design your documents in another way? Or using facets for the first query and Ids filter for the second? It's hard to tell without a concrete example of JSON documents. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 9 janvier 2014 at 01:28:06, heat...@hodgetastic.com (heat...@hodgetastic.com) a écrit: Hello I am currently trying to migrate an sql application to Elasticsearch. I need to be able to select a collection of results from an index which, for given search conditions, have distinct pairings of two certain columns. In sql I do the following two queries: Query 1: SELECT column_A, column_B, GROUP_CONCAT (table_name..id) id FROM `table_name` WHERE `column_?` = 'something' GROUP BY column_A, column_B, column_? Query 2: SELECT `table_name`.* FROM `table_name ` WHERE `column_?` = 'something' AND (`table_name.id` IN (ids_from_previous_query)) The first query returns me a list of ids from table_name such that each id satisfies the condition `column_?` = 'something' and the record with that id has a distinct [column_A,column_B] The second query then returns me all the records satisfying `column_?` = 'something' but only from that range of ids (I realise I probably do not need to do `column_?` = 'something again in the second query.) The result is that each record returned by the second query has satisfies the condition `column_?` = 'something' and I am only returned one record for each [column_A,column_B] paring. Since there is not really a 'distinct' option yet I am having trouble finding a way replicate this output with ES and wondered if anyone might have any thoughts as how I might go about it? At the moment I am open to any mapping / query combinations that will achieve what I need. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a857778-0399-4b3c-9973-a3e353436311%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52ce5655.354fe9f9.1449b%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.