storage use by attachment plugin
Dear All, i have about 20GB size of document, and i want to index all the document content using attachment plugin, my question is, what is the size of the index, is't the size will be also 20gb thank you -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/90ac9a54-6014-4db7-8583-6771aa2568c3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Extremely slow indexing -- java throwing http excetion errors
Hey. judging from the exception this looks like an unstable network connection? Are you using persistent HTTP connections? Pinging the nodes by each other is not a problem I guess? --Alex On Thu, Jun 19, 2014 at 12:12 AM, alekjouhar...@gmail.com wrote: Hello all, So here's the issue, our cluster was previously very underwhelmed as far as resource consumption, and after some config changes (see complete config below) -- we were able to hike up resource consumption, but are still indexing documents at the same sluggish rate of 400 docs/second. Redis and Logstash are definitely not the bottlenecks, and the indexing seems to be growing exponentially worse as we pull in more data. We are using elasticsearch v 1.1.1. The java http exception errors would definitely explain the slugishness, as there seems to be a socket timeout every second, like clockwork -- but i'm at a loss for what could be causing the errors to begin with. We are running redis,logstash kibana and the es master (no data) on one node, and have our elasticsearch data instance on another node. Network latency is definitely not so atrocious that it would be an outright bottleneck, and data gets to the secondary node fast enough -- but is backed up in indexing. Any help would greatly be appreciated, and I thank you all in advance! ### ES CONFIG ### index.indexing.slowlog.threshold.index.warn: 10s index.indexing.slowlog.threshold.index.info: 5s index.indexing.slowlog.threshold.index.debug: 2s index.indexing.slowlog.threshold.index.trace: 500ms monitor.jvm.gc.young.warn: 1000ms monitor.jvm.gc.young.info: 700ms #monitor.jvm.gc.young.debug: 400ms monitor.jvm.gc.old.warn: 10s monitor.jvm.gc.old.info: 5s #monitor.jvm.gc.old.debug: 2s cluster.name: iislog-cluster node.name: VM-ELKIIS discovery.zen.ping.multicast.enabled: true discovery.zen.ping.unicast.hosts: [192.168.6.145] discovery.zen.ping.timeout: 5 node.master: true node.data: false index.number_of_shards: 10 index.number_of_replicas: 0 bootstrap.mlockall: true index.refresh_interval: 30 indices.memory.index_buffer_size: 50% index.translog.flush_threshold_ops: 5 index.store.type: mmapfs index.store.compress.stored: true threadpool.search.type: fixed threadpool.search.size: 20 threadpool.search.queue_size: 100 threadpool.index.type: fixed threadpool.index.size: 20 threadpool.index.queue_size: 100 JAVA ERRORS IN ES LOG ### [2014-06-18 09:39:09,565][DEBUG][http.netty ] [VM-ELKIIS] Caught exception while handling client http traffic, closing connection [id: 0x7561184c, /192.168.6.3:6206 = /192.168.6.21:9200] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/95e3bc66-b403-4844-a798-da0f25141ca6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/95e3bc66-b403-4844-a798-da0f25141ca6%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-jK5P8DQWxVPzvcvOsFViFziGwSTnXSbYp689M5wLmMg%40mail.gmail.com. For more options, visit
Re: Splunk vs. Elastic search performance?
It is correct you noted that Elasticsearch comes with developer settings - that is exactly what a packages ES is meant for. If you find issues when configuring and setting up ES for critical use, it would be nice to post your issues so others can also find help too, and maybe share their solutions , because there are ES installations that run successfully in critical environments. By just quoting hate of dev teams, it is rather impossible for me to learn about the reason why this is so. Learning facts is more important than emotions to fix software issues. The power of open source is that such issues can be fixed by the help of a public discussion in the community. In closed software products, you can not rely on issues being discussed publicly for best solutions how to fix them. Jörg On Thu, Jun 19, 2014 at 2:48 PM, Thomas Paulsen monokit2...@googlemail.com wrote: We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk. Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom: That's a lot of data! I don't know of any installations that big but someone else might. What sort of infrastructure are you running splunk on now, what's your current and expected retention? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 19 April 2014 07:33, Frank Flynn faultle...@gmail.com wrote: We have a large Splunk instance. We load about 1.25 Tb of logs a day. We have about 1,300 loaders (servers that collect and load logs - they may do other things too). As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide? Should I expect to run on very similar hardware? More? or Less? Sure it depends on exactly what we're doing, the exact queries and the frequency we'd run them but I'm trying to get any kind of idea before we start. Are there any white papers or other documents about switching? It seems an obvious choice but I can only find very little performance comparisons (I did see that Elasticsearch just hired the former VP of Products at Splunk, Gaurav Gupta - but there were few numbers in that article either). Thanks, Frank -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGtte%3DRWjZCNtBWcX5y4Z9j7yXpyXC5MWdzpqubtCce5Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Clarification on has_child filter memory requirements
Hey, not all parent documents (and not the data), just their ids. Still this can accumulate, which is the reason why you should monitor the size of that data structure (exposed in the nodes stats). Hope that helps. --Alex On Thu, Jun 19, 2014 at 6:03 AM, Drew Kutcharian d...@venarc.com wrote: Based on the official docs ( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html ): {quote} memory considerations With the current implementation, all _parent field values and all _id field values of parent documents are loaded into memory (heap) via field data in order to support fast lookups, so make sure there is enough memory for it. {/quote} Does this mean that all the parent docs will be loaded into memory or the ones matching the filter? If the former is true, then it would mean that one should keep the size of the parent objects to minimum, right? In addition, say has_child is a part of a conjunction (regular filter AND has_child), would ES still load all the parent docs, or only the ones that matched the first filter? Thanks, Drew -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/FE901831-FB74-4F89-A313-16C1C08BF0A5%40venarc.com https://groups.google.com/d/msgid/elasticsearch/FE901831-FB74-4F89-A313-16C1C08BF0A5%40venarc.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-%3Dvbk3BkFQBbuXybg_-QX%3DEj6Rou2QMzqbzXUsbYJV8w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: problem indexing with my analyzer
Information My note_source contain picture (.jpg, .png ...) in base64 and text. For my mapping I have used : type = string analyzer = reuteurs (the name of my analyzer) Any idea ? Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : Hello I have some issue, when I index a particular data note_source (sql longtext). I use the same analyzer for each fields (except date_source and id_source) but for note_source, I have a warn monitor.jvm. When I remove note_source, everything fine. If I don't use analyzer on note_source, everything fine, but if I use my analyzer on note_source I have some crash. I think I have enough memory, I have used ES_HEAP_SIZE. Maybe my problem it's with accent (ascii, utf-8) Can you help me with this ? *My Setting* public function createSetting($pf){ $params = array('index' = $pf, 'body' = array( 'settings' = array( 'number_of_shards' = 5, 'number_of_replicas' = 0, 'analysis' = array( 'filter' = array( 'nGram' = array( token_chars =array(), type = nGram, min_gram = 3, max_gram = 250 ) ), 'analyzer' = array( 'reuters' = array( 'type' = 'custom', 'tokenizer' = 'standard', 'filter' = array('lowercase', 'asciifolding', 'nGram') ) ) ) ) )); $this-elasticsearchClient-indices()-create($params); return; } *My Indexing* public function indexTable($pf,$typeElement){ $params =array( index ='_river', type = $typeElement, id = _meta, body =array( type = jdbc, jdbc = array( url = jdbc:mysql://ip/name, user = 'root', password = 'mdp', index = $pf, type = $typeElement, sql = select id_source as _id, id_sous_theme, titre_source, desc_source, note_source, adresse_source, type_source, date_source from source, max_bulk_requests = 5, ) ) ); $this-elasticsearchClient-index($params); } Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Losing data after Elasticsearch restart
Hey, the exception you showed, can possibly happen, when you remove an alias. However you mentioned NullPointerException in your first post, which is not contained in the stacktrace, so it seems, that one is still missing. Also, please retry with a newer version of Elasticsearch. --Alex On Thu, Jun 19, 2014 at 5:13 AM, Rohit Jaiswal rohit.jais...@gmail.com wrote: Hi Alexander, We sent you the stack trace. Can you please enlighten us on this? Thanks, Rohit On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal rohit.jais...@gmail.com wrote: Hi Alexander, Thanks for your reply. We plan to upgrade in the long run, however we need to fix the data loss problem on 0.90.2 in the immediate term. Here is the stack trace - 10:09:37.783 PM [22:09:37,783][WARN ][indices.cluster ] [Storm] [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard org.elasticsearch.indices.recovery.RecoveryFailedException: [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]] at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293) at org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62) at org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery] Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed at org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147) at org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526) at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116) at org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60) at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328) at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.elasticsearch.transport.RemoteTransportException: [Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps] Caused by: org.elasticsearch.indices.InvalidAliasNameException: [b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown alias name was passed to alias Filter at org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99) at org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382) at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628) at org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447) at org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) [22:09:37,799][WARN ][cluster.action.shard ] [Storm] sending failed shard for [b7a76aa06cfd4048987d1117f3e0433a][0], node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start shard, message [RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested: RemoteTransportException[[Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed]; nested: RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]]; nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Re: Count request does not support [filter]. Why?
Hey, not a hundred percent sure, what you mean here. The post_filter setting? There are two possibilities: Either use the search_type=count or use a filtered query in the count API. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-count.html http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#count Also, be aware that the execution models are a bit different (which may result in different performance), see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-post-filter.html#search-request-post-filter Hope this helps, if not please refine your questions --Alex On Thu, Jun 19, 2014 at 3:23 PM, Andrew Gaydenko andrew.gayde...@gmail.com wrote: Count request does not support [filter]. Why? How to count with the same filter (except for size, fields, from) and query I'm probably going to search hits after counting? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/290e2be1-6f48-4266-a02e-4c8ff7620225%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/290e2be1-6f48-4266-a02e-4c8ff7620225%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9oHqQVvAbnh4pTRvtv%3DhzZJmq6YWnRjnkcRSXNqiVbcQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Very frequent ES OOM's potential segment merge problems
Hey, can you provide more information about the OOM exception? Also you should use the nodes stats API to monitor your system, so you can maybe easily spot, where this memory consumption stems from. Also, are you just indexing or doing searches/queries/gets as well? --Alex On Thu, Jun 19, 2014 at 10:35 PM, Paul Sabou paul.sa...@gmail.com wrote: Hi, *Situation:* We are using ES 1.2.1 on a machine with 32GB RAM, fast SSD and 12 cores. The machine runs Ubuntu 14.0.x LTS. The ES process has 12GB of RAM allocated. We have an index in which we inserted 105 million small documents so the ES data folder is around 50GB in size (we see this by using du -h . on the folder) The new document insertion rate is rather small (ie. 100-300 small docs per second). *The problem:* We experienced rather frequent ES OOM (Out of Memory) at a rate of around one every 15 mins. To lower the load on the index we deleted 104+ million docs (ie. mostly small log entries) by deleting everything in one type : curl -XDELETE http://localhost:9200/index_xx/type_yy so that we ended up with an ES index with several thousands docs. After this we started to experience massive disk IO (10-20Mbs reads and 1MBs writes) and more frequent OOM's (at a rate of around one every 7 minutes). We restart ES after every OOM and kept monitoring the data folder size. Over the next hour the size went down to around 36GB but now it's stuck there (doesn't go down in size even after several hours). *Questions* : Is this a problem related to segment merging running out of memory? If so how can be solved? If not, what could be the problem? Thanks Paul. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/695c92a3-f77a-46bd-9041-79421a0bf1be%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/695c92a3-f77a-46bd-9041-79421a0bf1be%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8Ed84KwzVg1MTK8Da83YgO6pjb3QMLVwCT%2B48NPw3HfA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch Node.Client Options
Hey, a client node with a full 10gb heap and garbage collection does not free anything, so those objects are still in use (which clearly explains THAT the OOM happens, but not WHY). Do you have huge searches going on spanning a lot of shards with deep pagination (all the time). Do you have some sort of backup mechanism which might be response for this? Anything from a search perspective which might lead to excessive memory usage? --Alex On Fri, Jun 20, 2014 at 12:15 AM, VB vishal.batgh...@gmail.com wrote: And this stack trace. [2014-06-04 14:47:12,939][INFO ][cluster.service ] [BUS2F2801F3] master {new [ELS-10.76.121.131][dg_r12_nQbqIT_oJfjTwTg][inet[/10.76.121.131:9300]]{data=false, max_local_storage_nodes=1, master=true}, previous [ELS-10.76.121.130][ BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false, max_local_storage_nodes=1, master=true}}, removed {[ELS-10.76.121.130][ BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false, max_local_storage_nodes=1, master=true},}, reason: zen-disco-master_failed ([ELS-10.76.121.130][BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300 ]]{data=false, max_local_storage_nodes=1, master=true}) [2014-06-04 14:48:03,969][WARN ][monitor.jvm ] [BUS2F2801F3] [gc][old][55503][489] duration [49.6s], collections [1]/[49.9s], total [49.6s]/[4.5h], memory [9.9gb]-[9.9gb]/[9.9gb], all_pools {[young] [532.5mb]-[532.5mb]/[532.5mb]}{[survivor] [51.3mb]-[42.8mb]/[66.5mb]}{[old] [9.3gb]-[9.3gb]/[9.3gb]} [2014-06-04 14:48:40,256][WARN ][monitor.jvm ] [BUS2F2801F3] [gc][old][55504][490] duration [35.7s], collections [1]/[36.2s], total [35.7s]/[4.5h], memory [9.9gb]-[9.9gb]/[9.9gb], all_pools {[young] [532.5mb]-[532.5mb]/[532.5mb]}{[survivor] [42.8mb]-[58.6mb]/[66.5mb]}{[old] [9.3gb]-[9.3gb]/[9.3gb]} [2014-06-04 14:49:30,335][WARN ][monitor.jvm ] [BUS2F2801F3] [gc][old][55505][491] duration [49.9s], collections [1]/[50s], total [49.9s]/[4.5h], memory [9.9gb]-[9.9gb]/[9.9gb], all_pools {[young] [532.5mb]-[532.5mb]/[532.5mb]}{[survivor] [58.6mb]-[63.7mb]/[66.5mb]}{[old] [9.3gb]-[9.3gb]/[9.3gb]} [2014-06-04 14:49:30,350][INFO ][discovery.zen] [BUS2F2801F3] master_left [[ELS-10.76.121.131][dg_r12_nQbqIT_oJfjTwTg][inet[/10.76.121.131:9300]]{data=false, max_local_storage_nodes=1, master=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout] [2014-06-04 14:49:30,865][WARN ][discovery.zen] [BUS2F2801F3] not enough master nodes after master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: {[ELS-10.76.125.37][j3VQFYDaQLujkprUnke02w][inet[/10.76.125.37:9300 ]]{max_local_storage_nodes=1, master=false},[ELS-10.76.122. 38][5V8bqkEzTP2TzMukB5_j-Q][inet[/10.76.122.38:9300]]{max_local_storage_nodes=1, master=false},[ELS-10.76.125.48][TGlF1uv8Q5GpgBVvIcvRAQ][ inet[/10.76.125.48:9300]]{max_local_storage_nodes=1, master=false},[EDSFB1ABF7][MqLDnM5mSLqIicIuyJk7IQ][inet[/10.76.122.19:9300 ]]{client=true, data=false, master=false},[ELS-10.76.120. 62][evcNI2CqSs-Zz44Jdzn0aw][inet[/10.76.120.62:9300]]{client=true, data=false, max_local_storage_nodes=1, master=false},[BUS9364B62][ YZPjEsvhT6OjM9ti5Lxwkg][inet[/10.76.123.123:9300]]{client=true, data=false, master=false},[ELS-10.76.125.38][RyeswSy8SquV5H8Vfsw75Q][ inet[/10.76.125.38:9300]]{max_local_storage_nodes=1, master=false},[EDSFB1200C][XUNaWVlYQUOVZlJMv3nHMA][inet[/10.76.122.18:9300 ]]{client=true, data=false, master=false},[ELS-10.76.124. 214][H8N9nIU0TKyGv_prKyRVCQ][inet[/10.76.124.214:9300]]{max_local_storage_nodes=1, master=false},[EDS1A1F2240][ET2u1qImQCCvqc-1gRvQbQ][inet[/ 10.76.120.87:9300]]{client=true, data=false, master=false},[ELS-10.76.125. 40][hp4wvQxER-mMPygey2Iqgg][inet[/10.76.125.40:9300]]{max_local_storage_nodes=1, master=false},[ELS-10.76.122.67][BiXop5iCRgGQyGvxazMkQg][ inet[/10.76.122.67:9300]]{max_local_storage_nodes=1, master=false},[ELS-10.76.121.129][pf9xpva7Q4izIy6Nj4S4iQ][ inet[/10.76.121.129:9300]]{data=false, max_local_storage_nodes=1, master=true},[EDSFB21E69][RabnwdLbT1WCp9gIE-_AXw][inet[/10.76.122.20:9300 ]]{client=true, data=false, master=false},[EDI1AE4FD76][ UF1RMWe6RYaZGp6BU3x-VA][inet[/10.76.124.228:9300]]{client=true, data=false, master=false},[ELS-10.76.125.46][nXceQp40TjOSctChaGVtKw][ inet[/10.76.125.46:9300]]{max_local_storage_nodes=1, master=false},[EDI1A1EA928][rWlelgQuT7KHSfyIejmLPg][inet[/ 10.76.120.82:9300]]{client=true, data=false, master=false},[ELS-10.76.121. 188][oWldDeY4TJioki90moNySw][inet[/10.76.121.188:9300]]{max_local_storage_nodes=1, master=false},[ELS-10.76.122.34][kPSYm9G8R8i_z2skK_jq1g][ inet[/10.76.122.34:9300]]{max_local_storage_nodes=1, master=false},[ELS-10.76.125.43][JMgOIZFBSzaQZ9bVagG57w][ inet[/10.76.125.43:9300]]{max_local_storage_nodes=1, master=false},[EDI1AE3EE57][7JHGaYjzS3uI7PLN8Ynm-Q][inet[/ 10.76.124.227:9300]]{client=true, data=false,
Re: puppet-elasticsearch options
Hi Andrej, Thank you for using the puppet module :-) The 'port' and 'discovery minimum' settings are both configuration settings for the elasticsearch.yml file. You can set those in the 'config' option variable, for example: elasticsearch::instance { 'instancename': config = { 'http.port' = '9210', 'discovery.zen.minimum_master_nodes' = 3 } } For the logging part, management of the logging.yml file is very limited at the moment but i hope to get some feedback on extending that. The thresholds for the slowlogs can be set in the same config option variable. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-slowlog.html#index-slow-log for more information. If you have any further questions, let me know. Cheers On Thursday, June 19, 2014 9:53:10 AM UTC+1, Andrej Rosenheinrich wrote: Hi, i am playing around with puppet-easticsearch 0.4.0, works wells so far (thanks!), but I am missing a few options I havent seen in the documentation. As I couldnt figure it out immediately by reading the scripts, may be someone can help me fast on this: - there is an option to change the port (9200), but this is only the http port. Is there an option to change the tcp transport port as well? - how can I configure logging? I think about logfile names and loglevel, may be even thresholds for slowlog. May be this is interesting enough to add it to the documentation? - is there an option in the module to easily configure memory usage? - how can I configure the discovery minimum? I am aware that I could go ahead and manipulate the elasticsearch.yml file with puppet, I am just curious if there are options for my questions already implemented in the module I have missed. So if someone could give me a hint or an example it would be really helpful! Thanks in advance! Andrej -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41d7340c-5570-4728-b979-35f97c233e25%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
effiecient way to store the result of a large slow query
Hi guys, Just wondering what is the most efficient way of executing a query that takes time(parent/child documents) and returns large amount of entries, and store the result in randomly evenly divided block to HDFS? e.g, the query will return 100million records and I want every random 1million stored in a different location(file/folder) on HDFS. I assume I could execute the query with scroll, and then whenever I received the 1 million records back, I then spawn anther thread to commit it to HDFS? Is there a way to run the query distributed way and have 100 threads query ES at the same time and each getting a random 1million back(without duplicate)? will ES hadoop help in this case? Appreciate your input! Chen -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACim9Rm64uHE9EQ35r_mJr9VhiEbDfD-70vS1uQHSG6UXM7ZDQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Type Ahead feature for contact list
Thanks for the help. I am able to see the correct results now, but could you please suggest how to write following query in java curl -X POST localhost:9200/hotels/_suggest -d ' { hotels : { text : m, completion : { field : name_suggest } } }' -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Re-Type-Ahead-feature-for-contact-list-tp4057883p4057889.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1403028901688-4057889.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: Storing auto generated _id under different name
I'm using elasticsearch as the database for a service. It would make things easier. For example, I could just return the _source field when other apps query my service. Related to that is that on the javascript client side, I am inserting the _id field into the _source JSON object as id and using that as the model for two way data-binding. If the id field was in the source already, I wouldn't have to keep track of this. On Tuesday, June 17, 2014 4:26:07 PM UTC-7, Adrien Grand wrote: No, it isn't possible. Why would you like to have the id of the document included in _source? On Tue, Jun 17, 2014 at 8:16 PM, Johny Lam john...@gmail.com javascript: wrote: Is it possible to have the _id be auto-generated and store it so that it's in the _source field under a different name, like say id instead of _id? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1eb03930-64c8-44ac-9f69-7ad2ff6b563e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/1eb03930-64c8-44ac-9f69-7ad2ff6b563e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a1b9d878-47cc-4e06-ae02-0b32375cf3bc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Combine elasticsearch/logstash/kibana with hadoop
Hi For performance improvement I'm trying to combine elasticsearch/logstash/kibana with hadoop (cdh4). Unfortunately I'm familiar only with HDFS where I store logs. In my opinion the combination of elasticsearch and hadoop should use hdfs as storage and transparent hadoop map/reduce functionality for search. I ran through elasticsearch-hadoop documentation and unfortunately I didn't understand how this combination could help me for kibana log analyze. Documentation says Elasticsearch real-time search and analytics natively integrated with Hadoop.. But what should I configure? Hadoop with Elasticsearch or Elasticsearch with Hadoop? As for first one, I found only java code parts, nothing about the Hadoop configuration, so it seems that I should be familiar with java programming. As for the last one I found only Hadoop HDFS Snapshot/Restore plugin, but I guess it was developed for indexes backup/restore, am I right? Anyway, are my expectations right? Or elasticsearch-hadoop was developed for developers only and it is not suitable for elasticsearch/logstash/kibana + hadoop -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/979a5788-1f17-4351-8c36-e205bc67dca0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Snapshot Restore in a cluster of two nodes
Hey, can you be more precise and create a fully fledged example (generating the repository, executing the snapshot on cluster one, executing restore on cluster 2, etc) and include the concrete error message in order to find out what 'the process breaks' means here? Also provide info about elasticsearch and jvm versions. Thanks! Snapshots are always done per index (the primary shards) and not per node, so there must be something else going on. Is it possible that only one node has write access to the repository? --Alex On Thu, Jun 19, 2014 at 3:36 PM, Daniel Bubenheim daniel.bubenh...@googlemail.com wrote: Hello, we have a cluster of two nodes. Every index in this cluster consists of 2 shards and one replica. We want to make use of snapshots restore to transfer data between two clusters. When we make our snapshots on node one only the primary shard is included, the replica shard is missing. While restoring on the other cluster the process breaks because of the missing second shard. Do we have to make a snapshot for each node to include both primary shards so that we can restore the whole index or am i missing something here? Thanks in advance Daniel -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb1b3a48-250c-46bc-9a4a-8a9ccd582164%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/fb1b3a48-250c-46bc-9a4a-8a9ccd582164%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8sq_eZ6g1sGhau%3DO2%3D93t%2Bz2yOtqiXxb7xMA9mrchuYg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How does shingle filter work on match_phrase in query phase?
Hello, Let's say you have an indexed text t1 t3 t3 with shingles. The token positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos 1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3). So if you are searching with a match_phrase for t1 t2 t3 (even if not tokenized as shingles) it will matches the document, because t1, t2 and t3 are considered next to each others (based on there recorded position) for this document. Cédric Hourcade c...@wal.fr On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walker0...@gmail.com wrote: How does shingle filter work on match_phrase in query phase? After analyzing phrase t1 t2 t3, shingle filter produced five tokens, t1 t2 t3 t1 t2 t2 t3 Will match_phrase still give t1 t2 t3 a match? How it works? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJQxjPNWyj-r6LtrWDXv_HGA-sgxfy%3DEu4Z5gJ5kRk_K2MWVNw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Very frequent ES OOM's potential segment merge problems
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot complete merge at org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3546) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4272) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3728) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) On Thursday, June 19, 2014 10:35:28 PM UTC+2, Paul Sabou wrote: Hi, *Situation:* We are using ES 1.2.1 on a machine with 32GB RAM, fast SSD and 12 cores. The machine runs Ubuntu 14.0.x LTS. The ES process has 12GB of RAM allocated. We have an index in which we inserted 105 million small documents so the ES data folder is around 50GB in size (we see this by using du -h . on the folder) The new document insertion rate is rather small (ie. 100-300 small docs per second). *The problem:* We experienced rather frequent ES OOM (Out of Memory) at a rate of around one every 15 mins. To lower the load on the index we deleted 104+ million docs (ie. mostly small log entries) by deleting everything in one type : curl -XDELETE http://localhost:9200/index_xx/type_yy so that we ended up with an ES index with several thousands docs. After this we started to experience massive disk IO (10-20Mbs reads and 1MBs writes) and more frequent OOM's (at a rate of around one every 7 minutes). We restart ES after every OOM and kept monitoring the data folder size. Over the next hour the size went down to around 36GB but now it's stuck there (doesn't go down in size even after several hours). *Questions* : Is this a problem related to segment merging running out of memory? If so how can be solved? If not, what could be the problem? Thanks Paul. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/db4e6c34-2d6b-4623-aa9c-c6fbf9083ea9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: 100% CPU on 1 Node with JMeter Tests
Hello, It wouldn't surprise me if both Black Mamba and Slapstick were hitting 100%, they have more shards and have to handle more requests than the others nodes. But in your case it's only one node. First, are you http requests evenly spread over the 4 nodes? You could also check that all your shards are about the same size? To check if it's an hardware problem I would: - disable the shards rebalacing - stop the cluster - switch the whole data directories from Black Mamba and Slapstick - start the cluster and rerun the benchmark You'll then see if the problem comes from the 3 shards or the server itself. Cédric Hourcade c...@wal.fr On Thu, Jun 19, 2014 at 7:40 PM, sai...@roblox.com wrote: Bump On Wednesday, June 18, 2014 6:20:58 PM UTC-7, sai...@roblox.com wrote: One out of 4 nodes always spikes to 100% CPU when we do some load tests using JMeter (50 Threads, 50 Loops) with any query (Match_All, Filtered Query etc.,). That particular node has 3 Shards with 2 Primary Shards. The other nodes have less than 40% CPU on them at the same time. The heap is set at 30GB on all of them. This is the GIST for Hot Threads https://gist.github.com/RobloxSai/9f040bbd5ab7b58f2b1d when the Test was running. Is there anything else that can be done to improve the performance? The Query Response times jump to 5-8 seconds when the CPU is hammered. https://lh3.googleusercontent.com/-EDnXAEg34cA/U6I5fb2zNOI/AB4/DqybJhq3Yhc/s1600/4+Nodes+Setup.png I had previously posted the specs of the Servers on another thread https://groups.google.com/forum/?utm_medium=emailutm_source=footer#!topic/elasticsearch/P1o_4bVvECA. Here are the Server Specs: *Machine Specs:* Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz Number of CPU cores:24 Number of Physical CPUs: 2 Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB Drive:Two 278GB SAS Drive configured in RAID 0 *OS:* Arch: 64bit(x86_64) OS Type:Linux Kernel:2.6.32-431.5.1.el6.x86_64 OS Version:Red Hat Enterprise Linux Server release 6.5 (Santiago) Java Version: Java 1.7.0_51 (Java 7u51 x64 version for Linux). -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57ed23cc-4623-4434-b550-e21723980d1b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/57ed23cc-4623-4434-b550-e21723980d1b%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJQxjPPCtwhWJtGbY6dCU_mU6cyyfh3dgkLEW-0FW%3DH4Ki7LdQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Count request does not support [filter]. Why?
Sorry, I wasn't clear enough. I mean Java client's CountRequest.source()'s argument content, { filter: ... } in particular. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b402f868-eeaf-484a-9081-75e81b7f5aed%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: problem indexing with my analyzer
Does it mean your applying the reuters analyzer on your base64 encoded pictures? I guess it generates a really huge number of tokens for each entry because of your nGram filter (with a max at 250). Cédric Hourcade c...@wal.fr On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard bernardtanguy1...@gmail.com wrote: Information My note_source contain picture (.jpg, .png ...) in base64 and text. For my mapping I have used : type = string analyzer = reuteurs (the name of my analyzer) Any idea ? Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : Hello I have some issue, when I index a particular data note_source (sql longtext). I use the same analyzer for each fields (except date_source and id_source) but for note_source, I have a warn monitor.jvm. When I remove note_source, everything fine. If I don't use analyzer on note_source, everything fine, but if I use my analyzer on note_source I have some crash. I think I have enough memory, I have used ES_HEAP_SIZE. Maybe my problem it's with accent (ascii, utf-8) Can you help me with this ? My Setting public function createSetting($pf){ $params = array('index' = $pf, 'body' = array( 'settings' = array( 'number_of_shards' = 5, 'number_of_replicas' = 0, 'analysis' = array( 'filter' = array( 'nGram' = array( token_chars =array(), type = nGram, min_gram = 3, max_gram = 250 ) ), 'analyzer' = array( 'reuters' = array( 'type' = 'custom', 'tokenizer' = 'standard', 'filter' = array('lowercase', 'asciifolding', 'nGram') ) ) ) ) )); $this-elasticsearchClient-indices()-create($params); return; } My Indexing public function indexTable($pf,$typeElement){ $params =array( index ='_river', type = $typeElement, id = _meta, body =array( type = jdbc, jdbc = array( url = jdbc:mysql://ip/name, user = 'root', password = 'mdp', index = $pf, type = $typeElement, sql = select id_source as _id, id_sous_theme, titre_source, desc_source, note_source, adresse_source, type_source, date_source from source, max_bulk_requests = 5, ) ) ); $this-elasticsearchClient-index($params); } Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJQxjPM8qvsmcxB7Xu4KqN28pfvk%2BcBn5bpV2Emw42M5HzAAUA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
ElasticSearch queries always return all the datas stored in the index
hello, https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index# I'm trying to index and query an index store in ES 1.2. I both create and populate the index with the JAVA API using the transportclient api. I have the following mapping: get /tp/carte/_mapping{ tp: { mappings: { carte: { properties: { adherents: { properties: { birthday: { type: date, format: dateOptionalTime }, firstname: { type: string }, lastname: { type: string } } }, dateEdition: { type: date, format: dateOptionalTime } } } } }} When I search ob object with the ID, it works fine but, when I try to query the content of one of my nested objects, *ES always returns all the objects stored in the index*. I also tried to create the objects manually with sense and I have the same behaviour. Example of my insert put /tp/carte/20454795{ dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1958-05-05T23:00:00.000Z, firstname: ANDREW, lastname: DOE }, { birthday: 1964-03-01T23:00:00.000Z, firstname: ROBERT, lastname: DOE }, { birthday: 1989-02-27T23:00:00.000Z, firstname: DAVID, lastname: DOE }, { birthday: 1990-12-11T23:00:00.000Z, firstname: JOHN, lastname: DOE } ] } Finally, you could find below a query executed in sense get /tp/carte/_search{ query: { bool: { must: [ { match: { adherents.lastname: { query: DOE } } } ] } } How can I fix that ? Thanks Regards Alexandre -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How to set the query resultset size to infinite
Hi all, I just joined the mailing list, so sorry if this topic was discussed before. I would like to set the query size to infinite (or no limit). http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html This page explains what the parameters do, but there are no details on how to set the size to no limit or (if not possible) what is the max value accepted by ES for this parameter. I tried setting the value to -1, as I've read somewhere that this would be recognized as no limit, but instead it defaults to 10. Any help? Thanks, Nuno -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/73ad3559-85b0-40a0-9325-5ff2054f192d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch queries always return all the datas stored in the index
Hey Alexandre, This is correct. You are searching for a carte which contains an adherent. Elasticsearch gives you a carte object as an answer. And elasticsearch gives you back exactly what you have indexed. That being said, I think you could look at parent/child feature for that use case. Or you can have one carte object per adherent? Makes sense? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 20 juin 2014 à 11:06:40, Alexandre Touret (alexan...@touret.info) a écrit: hello, I'm trying to index and query an index store in ES 1.2. I both create and populate the index with the JAVA API using the transportclient api. I have the following mapping: get /tp/carte/_mapping { tp: { mappings: { carte: { properties: { adherents: { properties: { birthday: { type: date, format: dateOptionalTime }, firstname: { type: string }, lastname: { type: string } } }, dateEdition: { type: date, format: dateOptionalTime } } } } } } When I search ob object with the ID, it works fine but, when I try to query the content of one of my nested objects, ES always returns all the objects stored in the index. I also tried to create the objects manually with sense and I have the same behaviour. Example of my insert put /tp/carte/20454795 { dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1958-05-05T23:00:00.000Z, firstname: ANDREW, lastname: DOE }, { birthday: 1964-03-01T23:00:00.000Z, firstname: ROBERT, lastname: DOE }, { birthday: 1989-02-27T23:00:00.000Z, firstname: DAVID, lastname: DOE }, { birthday: 1990-12-11T23:00:00.000Z, firstname: JOHN, lastname: DOE } ] } Finally, you could find below a query executed in sense get /tp/carte/_search { query: { bool: { must: [ { match: { adherents.lastname: { query: DOE } } } ] } } How can I fix that ? Thanks Regards Alexandre -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53a3fad7.5bd062c2.198d%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch queries always return all the datas stored in the index
Hello, thanks for your response When I add an other carte put /tp/carte/20450813 { dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1963-03-22T23:00:00.000Z, firstname: FLORENCE, lastname: SMITH }, { birthday: 2001-10-12T22:00:00.000Z, firstname: M ANGELO, lastname: SMITH }, { birthday: 2003-07-30T22:00:00.000Z, firstname: M LILI, lastname: SMITH } ] } and I run the query described above, I have both of the two 'carte' Is it normal ? Do you have an example or a link to illustrate the parent/child feature ? Thanks Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : Hey Alexandre, This is correct. You are searching for a carte which contains an adherent. Elasticsearch gives you a carte object as an answer. And elasticsearch gives you back exactly what you have indexed. That being said, I think you could look at parent/child feature for that use case. Or you can have one carte object per adherent? Makes sense? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info javascript:) a écrit: hello, https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index# I'm trying to index and query an index store in ES 1.2. I both create and populate the index with the JAVA API using the transportclient api. I have the following mapping: get /tp/carte/_mapping{ tp: { mappings: { carte: { properties: { adherents: { properties: { birthday: { type: date, format: dateOptionalTime }, firstname: { type: string }, lastname: { type: string } } }, dateEdition: { type: date, format: dateOptionalTime } } } } }} When I search ob object with the ID, it works fine but, when I try to query the content of one of my nested objects, *ES always returns all the objects stored in the index*. I also tried to create the objects manually with sense and I have the same behaviour. Example of my insert put /tp/carte/20454795{ dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1958-05-05T23:00:00.000Z, firstname: ANDREW, lastname: DOE }, { birthday: 1964-03-01T23:00:00.000Z, firstname: ROBERT, lastname: DOE }, { birthday: 1989-02-27T23:00:00.000Z, firstname: DAVID, lastname: DOE }, { birthday: 1990-12-11T23:00:00.000Z, firstname: JOHN, lastname: DOE } ] } Finally, you could find below a query executed in sense get /tp/carte/_search{ query: { bool: { must: [ { match: { adherents.lastname: { query: DOE } } } ] } } How can I fix that ? Thanks Regards Alexandre -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Re: ElasticSearch queries always return all the datas stored in the index
Searching for DOE gives you that answer? If so, it's not normal IMHO. You should try to reproduce it with a full SENSE script recreation so we can replay it and help you from here. See http://www.elasticsearch.org/help/ for information. About parent child, you could read this: http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 20 juin 2014 à 11:19:23, Alexandre Touret (alexan...@touret.info) a écrit: Hello, thanks for your response When I add an other carte put /tp/carte/20450813 { dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1963-03-22T23:00:00.000Z, firstname: FLORENCE, lastname: SMITH }, { birthday: 2001-10-12T22:00:00.000Z, firstname: M ANGELO, lastname: SMITH }, { birthday: 2003-07-30T22:00:00.000Z, firstname: M LILI, lastname: SMITH } ] } and I run the query described above, I have both of the two 'carte' Is it normal ? Do you have an example or a link to illustrate the parent/child feature ? Thanks Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : Hey Alexandre, This is correct. You are searching for a carte which contains an adherent. Elasticsearch gives you a carte object as an answer. And elasticsearch gives you back exactly what you have indexed. That being said, I think you could look at parent/child feature for that use case. Or you can have one carte object per adherent? Makes sense? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a écrit: hello, I'm trying to index and query an index store in ES 1.2. I both create and populate the index with the JAVA API using the transportclient api. I have the following mapping: get /tp/carte/_mapping { tp: { mappings: { carte: { properties: { adherents: { properties: { birthday: { type: date, format: dateOptionalTime }, firstname: { type: string }, lastname: { type: string } } }, dateEdition: { type: date, format: dateOptionalTime } } } } } } When I search ob object with the ID, it works fine but, when I try to query the content of one of my nested objects, ES always returns all the objects stored in the index. I also tried to create the objects manually with sense and I have the same behaviour. Example of my insert put /tp/carte/20454795 { dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1958-05-05T23:00:00.000Z, firstname: ANDREW, lastname: DOE }, { birthday: 1964-03-01T23:00:00.000Z, firstname: ROBERT, lastname: DOE }, { birthday: 1989-02-27T23:00:00.000Z, firstname: DAVID, lastname: DOE }, { birthday: 1990-12-11T23:00:00.000Z, firstname: JOHN, lastname: DOE } ] } Finally, you could find below a query executed in sense get /tp/carte/_search { query: { bool: { must: [ { match: { adherents.lastname: { query: DOE } } } ] } } How can I fix that ? Thanks Regards Alexandre -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Re: How to set the query resultset size to infinite
You don't want to do that! If your need is to extract (download) 1 000 000 000 records, you need to use scanscroll API: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html#scan-scroll -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 20 juin 2014 à 11:08:00, Nuno Carvalho (nuno...@gmail.com) a écrit: Hi all, I just joined the mailing list, so sorry if this topic was discussed before. I would like to set the query size to infinite (or no limit). http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html This page explains what the parameters do, but there are no details on how to set the size to no limit or (if not possible) what is the max value accepted by ES for this parameter. I tried setting the value to -1, as I've read somewhere that this would be recognized as no limit, but instead it defaults to 10. Any help? Thanks, Nuno -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/73ad3559-85b0-40a0-9325-5ff2054f192d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53a3fe27.3352255a.198d%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/d/optout.
Re: problem indexing with my analyzer
Yes, I am applying reuters on my document (compose by text and picture). My goal is to do my research on the text of the document with any word or part of a word. Yes the problem it's my nGram filter. How do I solve this problem ? Deacrease nGram max ? Change Analyzer by an other but who satisfy my goal ? Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit : Does it mean your applying the reuters analyzer on your base64 encoded pictures? I guess it generates a really huge number of tokens for each entry because of your nGram filter (with a max at 250). Cédric Hourcade c...@wal.fr javascript: On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard bernardt...@gmail.com javascript: wrote: Information My note_source contain picture (.jpg, .png ...) in base64 and text. For my mapping I have used : type = string analyzer = reuteurs (the name of my analyzer) Any idea ? Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : Hello I have some issue, when I index a particular data note_source (sql longtext). I use the same analyzer for each fields (except date_source and id_source) but for note_source, I have a warn monitor.jvm. When I remove note_source, everything fine. If I don't use analyzer on note_source, everything fine, but if I use my analyzer on note_source I have some crash. I think I have enough memory, I have used ES_HEAP_SIZE. Maybe my problem it's with accent (ascii, utf-8) Can you help me with this ? My Setting public function createSetting($pf){ $params = array('index' = $pf, 'body' = array( 'settings' = array( 'number_of_shards' = 5, 'number_of_replicas' = 0, 'analysis' = array( 'filter' = array( 'nGram' = array( token_chars =array(), type = nGram, min_gram = 3, max_gram = 250 ) ), 'analyzer' = array( 'reuters' = array( 'type' = 'custom', 'tokenizer' = 'standard', 'filter' = array('lowercase', 'asciifolding', 'nGram') ) ) ) ) )); $this-elasticsearchClient-indices()-create($params); return; } My Indexing public function indexTable($pf,$typeElement){ $params =array( index ='_river', type = $typeElement, id = _meta, body =array( type = jdbc, jdbc = array( url = jdbc:mysql://ip/name, user = 'root', password = 'mdp', index = $pf, type = $typeElement, sql = select id_source as _id, id_sous_theme, titre_source, desc_source, note_source, adresse_source, type_source, date_source from source, max_bulk_requests = 5, ) ) ); $this-elasticsearchClient-index($params); } Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7daa716-cb5f-45cc-916b-43c7c0aea6b9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch queries always return all the datas stored in the index
Yes My request for doe always return that answer Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit : Searching for DOE gives you that answer? If so, it's not normal IMHO. You should try to reproduce it with a full SENSE script recreation so we can replay it and help you from here. See http://www.elasticsearch.org/help/ for information. About parent child, you could read this: http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info javascript:) a écrit: Hello, thanks for your response When I add an other carte put /tp/carte/20450813 { dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1963-03-22T23:00:00.000Z, firstname: FLORENCE, lastname: SMITH }, { birthday: 2001-10-12T22:00:00.000Z, firstname: M ANGELO, lastname: SMITH }, { birthday: 2003-07-30T22:00:00.000Z, firstname: M LILI, lastname: SMITH } ] } and I run the query described above, I have both of the two 'carte' Is it normal ? Do you have an example or a link to illustrate the parent/child feature ? Thanks Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : Hey Alexandre, This is correct. You are searching for a carte which contains an adherent. Elasticsearch gives you a carte object as an answer. And elasticsearch gives you back exactly what you have indexed. That being said, I think you could look at parent/child feature for that use case. Or you can have one carte object per adherent? Makes sense? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a écrit: hello, https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index# I'm trying to index and query an index store in ES 1.2. I both create and populate the index with the JAVA API using the transportclient api. I have the following mapping: get /tp/carte/_mapping{ tp: { mappings: { carte: { properties: { adherents: { properties: { birthday: { type: date, format: dateOptionalTime }, firstname: { type: string }, lastname: { type: string } } }, dateEdition: { type: date, format: dateOptionalTime } } } } }} When I search ob object with the ID, it works fine but, when I try to query the content of one of my nested objects, *ES always returns all the objects stored in the index*. I also tried to create the objects manually with sense and I have the same behaviour. Example of my insert put /tp/carte/20454795{ dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1958-05-05T23:00:00.000Z, firstname: ANDREW, lastname: DOE }, { birthday: 1964-03-01T23:00:00.000Z, firstname: ROBERT, lastname: DOE }, { birthday: 1989-02-27T23:00:00.000Z, firstname: DAVID, lastname: DOE }, { birthday: 1990-12-11T23:00:00.000Z, firstname: JOHN, lastname: DOE } ] } Finally, you could find below a query executed in sense get /tp/carte/_search{ query: { bool: { must: [ { match: { adherents.lastname: { query: DOE } } } ] } } How can I fix that ? Thanks Regards Alexandre -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this
Re: ElasticSearch queries always return all the datas stored in the index
It looks like you are doing a GET rather than a POST, if so your query content is ignored. Cédric Hourcade c...@wal.fr On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alexan...@touret.info wrote: Yes My request for doe always return that answer Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit : Searching for DOE gives you that answer? If so, it's not normal IMHO. You should try to reproduce it with a full SENSE script recreation so we can replay it and help you from here. See http://www.elasticsearch.org/help/ for information. About parent child, you could read this: http://www. elasticsearch.org/blog/managing-relations-inside-elasticsearch/ -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a écrit: Hello, thanks for your response When I add an other carte put /tp/carte/20450813 { dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1963-03-22T23:00:00.000Z, firstname: FLORENCE, lastname: SMITH }, { birthday: 2001-10-12T22:00:00.000Z, firstname: M ANGELO, lastname: SMITH }, { birthday: 2003-07-30T22:00:00.000Z, firstname: M LILI, lastname: SMITH } ] } and I run the query described above, I have both of the two 'carte' Is it normal ? Do you have an example or a link to illustrate the parent/child feature ? Thanks Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : Hey Alexandre, This is correct. You are searching for a carte which contains an adherent. Elasticsearch gives you a carte object as an answer. And elasticsearch gives you back exactly what you have indexed. That being said, I think you could look at parent/child feature for that use case. Or you can have one carte object per adherent? Makes sense? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a écrit: hello, https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index# I'm trying to index and query an index store in ES 1.2. I both create and populate the index with the JAVA API using the transportclient api. I have the following mapping: get /tp/carte/_mapping{ tp: { mappings: { carte: { properties: { adherents: { properties: { birthday: { type: date, format: dateOptionalTime }, firstname: { type: string }, lastname: { type: string } } }, dateEdition: { type: date, format: dateOptionalTime } } } } }} When I search ob object with the ID, it works fine but, when I try to query the content of one of my nested objects, *ES always returns all the objects stored in the index*. I also tried to create the objects manually with sense and I have the same behaviour. Example of my insert put /tp/carte/20454795{ dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1958-05-05T23:00:00.000Z, firstname: ANDREW, lastname: DOE }, { birthday: 1964-03-01T23:00:00.000Z, firstname: ROBERT, lastname: DOE }, { birthday: 1989-02-27T23:00:00.000Z, firstname: DAVID, lastname: DOE }, { birthday: 1990-12-11T23:00:00.000Z, firstname: JOHN, lastname: DOE } ] } Finally, you could find below a query executed in sense get /tp/carte/_search{ query: { bool: { must: [ { match: { adherents.lastname: { query: DOE } } } ] } } How can I fix that ? Thanks Regards
Re: ElasticSearch queries always return all the datas stored in the index
That's right Thanks for your help :) Regards Le vendredi 20 juin 2014 11:28:26 UTC+2, Cédric Hourcade a écrit : It looks like you are doing a GET rather than a POST, if so your query content is ignored. Cédric Hourcade c...@wal.fr javascript: On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alex...@touret.info javascript: wrote: Yes My request for doe always return that answer Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit : Searching for DOE gives you that answer? If so, it's not normal IMHO. You should try to reproduce it with a full SENSE script recreation so we can replay it and help you from here. See http://www.elasticsearch.org/help/ for information. About parent child, you could read this: http://www. elasticsearch.org/blog/managing-relations-inside-elasticsearch/ -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a écrit: Hello, thanks for your response When I add an other carte put /tp/carte/20450813 { dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1963-03-22T23:00:00.000Z, firstname: FLORENCE, lastname: SMITH }, { birthday: 2001-10-12T22:00:00.000Z, firstname: M ANGELO, lastname: SMITH }, { birthday: 2003-07-30T22:00:00.000Z, firstname: M LILI, lastname: SMITH } ] } and I run the query described above, I have both of the two 'carte' Is it normal ? Do you have an example or a link to illustrate the parent/child feature ? Thanks Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : Hey Alexandre, This is correct. You are searching for a carte which contains an adherent. Elasticsearch gives you a carte object as an answer. And elasticsearch gives you back exactly what you have indexed. That being said, I think you could look at parent/child feature for that use case. Or you can have one carte object per adherent? Makes sense? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a écrit: hello, https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index# I'm trying to index and query an index store in ES 1.2. I both create and populate the index with the JAVA API using the transportclient api. I have the following mapping: get /tp/carte/_mapping{ tp: { mappings: { carte: { properties: { adherents: { properties: { birthday: { type: date, format: dateOptionalTime }, firstname: { type: string }, lastname: { type: string } } }, dateEdition: { type: date, format: dateOptionalTime } } } } }} When I search ob object with the ID, it works fine but, when I try to query the content of one of my nested objects, *ES always returns all the objects stored in the index*. I also tried to create the objects manually with sense and I have the same behaviour. Example of my insert put /tp/carte/20454795{ dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1958-05-05T23:00:00.000Z, firstname: ANDREW, lastname: DOE }, { birthday: 1964-03-01T23:00:00.000Z, firstname: ROBERT, lastname: DOE }, { birthday: 1989-02-27T23:00:00.000Z, firstname: DAVID, lastname: DOE }, { birthday: 1990-12-11T23:00:00.000Z, firstname: JOHN, lastname: DOE } ] } Finally, you could find below a query executed in sense get /tp/carte/_search{ query: { bool: { must: [
Re: ElasticSearch queries always return all the datas stored in the index
No. GET works for running searches. It could be an issue if you are using an OLD SENSE version and not Marvel. -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 20 juin 2014 à 11:28:23, Cédric Hourcade (c...@wal.fr) a écrit: It looks like you are doing a GET rather than a POST, if so your query content is ignored. Cédric Hourcade c...@wal.fr On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alexan...@touret.info wrote: Yes My request for doe always return that answer Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit : Searching for DOE gives you that answer? If so, it's not normal IMHO. You should try to reproduce it with a full SENSE script recreation so we can replay it and help you from here. See http://www.elasticsearch.org/help/ for information. About parent child, you could read this: http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a écrit: Hello, thanks for your response When I add an other carte put /tp/carte/20450813 { dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1963-03-22T23:00:00.000Z, firstname: FLORENCE, lastname: SMITH }, { birthday: 2001-10-12T22:00:00.000Z, firstname: M ANGELO, lastname: SMITH }, { birthday: 2003-07-30T22:00:00.000Z, firstname: M LILI, lastname: SMITH } ] } and I run the query described above, I have both of the two 'carte' Is it normal ? Do you have an example or a link to illustrate the parent/child feature ? Thanks Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : Hey Alexandre, This is correct. You are searching for a carte which contains an adherent. Elasticsearch gives you a carte object as an answer. And elasticsearch gives you back exactly what you have indexed. That being said, I think you could look at parent/child feature for that use case. Or you can have one carte object per adherent? Makes sense? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a écrit: hello, I'm trying to index and query an index store in ES 1.2. I both create and populate the index with the JAVA API using the transportclient api. I have the following mapping: get /tp/carte/_mapping { tp: { mappings: { carte: { properties: { adherents: { properties: { birthday: { type: date, format: dateOptionalTime }, firstname: { type: string }, lastname: { type: string } } }, dateEdition: { type: date, format: dateOptionalTime } } } } } } When I search ob object with the ID, it works fine but, when I try to query the content of one of my nested objects, ES always returns all the objects stored in the index. I also tried to create the objects manually with sense and I have the same behaviour. Example of my insert put /tp/carte/20454795 { dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1958-05-05T23:00:00.000Z, firstname: ANDREW, lastname: DOE }, { birthday: 1964-03-01T23:00:00.000Z, firstname: ROBERT, lastname: DOE }, { birthday: 1989-02-27T23:00:00.000Z, firstname: DAVID, lastname: DOE }, { birthday: 1990-12-11T23:00:00.000Z, firstname: JOHN, lastname: DOE } ] } Finally, you could find below a query executed in sense get /tp/carte/_search { query: { bool: { must: [ { match: { adherents.lastname: { query: DOE } } } ] } } How can I fix that ? Thanks Regards Alexandre -- You received this message because you are subscribed to the Google Groups elasticsearch group. To
Re: problem indexing with my analyzer
I set max_gram=20. It's better but at the end I have this many times : [2014-06-20 11:42:14,201][WARN ][monitor.jvm ] [ik-test2] [gc][young][528][263] duration [2s], collections [1]/[2.1s], total [2s]/[43.9s], memory [536mb]-[580.2mb]/[1015.6mb], all_pools {[young] [22.5mb]-[22.3mb]/[66.5mb]}{[survivor] [14.9kb]-[49.3kb]/[8.3mb]}{[old] [513.4mb]-[557.8mb]/[940.8mb]} I put ES_HEAP_SIZE : 2G. I think it's enough. Something wrong ? Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : Hello I have some issue, when I index a particular data note_source (sql longtext). I use the same analyzer for each fields (except date_source and id_source) but for note_source, I have a warn monitor.jvm. When I remove note_source, everything fine. If I don't use analyzer on note_source, everything fine, but if I use my analyzer on note_source I have some crash. I think I have enough memory, I have used ES_HEAP_SIZE. Maybe my problem it's with accent (ascii, utf-8) Can you help me with this ? *My Setting* public function createSetting($pf){ $params = array('index' = $pf, 'body' = array( 'settings' = array( 'number_of_shards' = 5, 'number_of_replicas' = 0, 'analysis' = array( 'filter' = array( 'nGram' = array( token_chars =array(), type = nGram, min_gram = 3, max_gram = 250 ) ), 'analyzer' = array( 'reuters' = array( 'type' = 'custom', 'tokenizer' = 'standard', 'filter' = array('lowercase', 'asciifolding', 'nGram') ) ) ) ) )); $this-elasticsearchClient-indices()-create($params); return; } *My Indexing* public function indexTable($pf,$typeElement){ $params =array( index ='_river', type = $typeElement, id = _meta, body =array( type = jdbc, jdbc = array( url = jdbc:mysql://ip/name, user = 'root', password = 'mdp', index = $pf, type = $typeElement, sql = select id_source as _id, id_sous_theme, titre_source, desc_source, note_source, adresse_source, type_source, date_source from source, max_bulk_requests = 5, ) ) ); $this-elasticsearchClient-index($params); } Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/154b8ca2-a130-4062-b5ce-0e0fa63d98fe%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ElasticSearch queries always return all the datas stored in the index
I just upgraded to ES 1.2.1 and the latest release of mavel. I have the same behaviour Le vendredi 20 juin 2014 11:34:59 UTC+2, David Pilato a écrit : No. GET works for running searches. It could be an issue if you are using an OLD SENSE version and not Marvel. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:28:23, Cédric Hourcade (c...@wal.fr javascript:) a écrit: It looks like you are doing a GET rather than a POST, if so your query content is ignored. Cédric Hourcade c...@wal.fr javascript: On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alex...@touret.info javascript: wrote: Yes My request for doe always return that answer Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit : Searching for DOE gives you that answer? If so, it's not normal IMHO. You should try to reproduce it with a full SENSE script recreation so we can replay it and help you from here. See http://www.elasticsearch.org/help/ for information. About parent child, you could read this: http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a écrit: Hello, thanks for your response When I add an other carte put /tp/carte/20450813 { dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1963-03-22T23:00:00.000Z, firstname: FLORENCE, lastname: SMITH }, { birthday: 2001-10-12T22:00:00.000Z, firstname: M ANGELO, lastname: SMITH }, { birthday: 2003-07-30T22:00:00.000Z, firstname: M LILI, lastname: SMITH } ] } and I run the query described above, I have both of the two 'carte' Is it normal ? Do you have an example or a link to illustrate the parent/child feature ? Thanks Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : Hey Alexandre, This is correct. You are searching for a carte which contains an adherent. Elasticsearch gives you a carte object as an answer. And elasticsearch gives you back exactly what you have indexed. That being said, I think you could look at parent/child feature for that use case. Or you can have one carte object per adherent? Makes sense? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a écrit: hello, https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index# I'm trying to index and query an index store in ES 1.2. I both create and populate the index with the JAVA API using the transportclient api. I have the following mapping: get /tp/carte/_mapping{ tp: { mappings: { carte: { properties: { adherents: { properties: { birthday: { type: date, format: dateOptionalTime }, firstname: { type: string }, lastname: { type: string } } }, dateEdition: { type: date, format: dateOptionalTime } } } } }} When I search ob object with the ID, it works fine but, when I try to query the content of one of my nested objects, *ES always returns all the objects stored in the index*. I also tried to create the objects manually with sense and I have the same behaviour. Example of my insert put /tp/carte/20454795{ dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1958-05-05T23:00:00.000Z, firstname: ANDREW, lastname: DOE }, { birthday: 1964-03-01T23:00:00.000Z, firstname: ROBERT, lastname: DOE }, { birthday: 1989-02-27T23:00:00.000Z,
Re: How to set the query resultset size to infinite
Right... that makes sense :) I'll give it a try, thank you! Nuno On Friday, 20 June 2014 10:26:07 UTC+1, David Pilato wrote: You don't want to do that! If your need is to extract (download) 1 000 000 000 records, you need to use scanscroll API: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html#scan-scroll -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:08:00, Nuno Carvalho (nun...@gmail.com javascript:) a écrit: Hi all, I just joined the mailing list, so sorry if this topic was discussed before. I would like to set the query size to infinite (or no limit). http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html This page explains what the parameters do, but there are no details on how to set the size to no limit or (if not possible) what is the max value accepted by ES for this parameter. I tried setting the value to -1, as I've read somewhere that this would be recognized as no limit, but instead it defaults to 10. Any help? Thanks, Nuno -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/73ad3559-85b0-40a0-9325-5ff2054f192d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/73ad3559-85b0-40a0-9325-5ff2054f192d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/49dbec8b-765a-4647-8672-b556028dcea0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: problem indexing with my analyzer
The user copy/paste the content of an html page and me, I index this information. I take the entire document with image. I can't change this behavior. I set max_gram=20. It's better but at the end I have this many times : [2014-06-20 11:42:14,201][WARN ][monitor.jvm ] [ik-test2] [gc][young][528][263] duration [2s], collections [1]/[2.1s], total [2s]/[43.9s], memory [536mb]-[580.2mb]/[1015.6mb], all_pools {[young] [22.5mb]-[22.3mb]/[66.5mb]}{[survivor] [14.9kb]-[49.3kb]/[8.3mb]}{[old] [513.4mb]-[557.8mb]/[940.8mb]} I put ES_HEAP_SIZE : 2G. I think it's enough. Something wrong ? Le vendredi 20 juin 2014 11:45:22 UTC+2, Cédric Hourcade a écrit : If you are only searching in the text you should index the images in an other field field. With no analyzer (index: not_analyzed), or even better index: no (not indexed). If you need to retrieve the image data it's still in the _source. But to be honest I wouldn't even store this kind of information in ES, your index is going to be bigger, merges are going to be slower... I'd keep the binary files stored elsewhere. Cédric Hourcade c...@wal.fr javascript: On Fri, Jun 20, 2014 at 11:25 AM, Tanguy Bernard bernardt...@gmail.com javascript: wrote: Yes, I am applying reuters on my document (compose by text and picture). My goal is to do my research on the text of the document with any word or part of a word. Yes the problem it's my nGram filter. How do I solve this problem ? Deacrease nGram max ? Change Analyzer by an other but who satisfy my goal ? Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit : Does it mean your applying the reuters analyzer on your base64 encoded pictures? I guess it generates a really huge number of tokens for each entry because of your nGram filter (with a max at 250). Cédric Hourcade c...@wal.fr On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard bernardt...@gmail.com wrote: Information My note_source contain picture (.jpg, .png ...) in base64 and text. For my mapping I have used : type = string analyzer = reuteurs (the name of my analyzer) Any idea ? Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : Hello I have some issue, when I index a particular data note_source (sql longtext). I use the same analyzer for each fields (except date_source and id_source) but for note_source, I have a warn monitor.jvm. When I remove note_source, everything fine. If I don't use analyzer on note_source, everything fine, but if I use my analyzer on note_source I have some crash. I think I have enough memory, I have used ES_HEAP_SIZE. Maybe my problem it's with accent (ascii, utf-8) Can you help me with this ? My Setting public function createSetting($pf){ $params = array('index' = $pf, 'body' = array( 'settings' = array( 'number_of_shards' = 5, 'number_of_replicas' = 0, 'analysis' = array( 'filter' = array( 'nGram' = array( token_chars =array(), type = nGram, min_gram = 3, max_gram = 250 ) ), 'analyzer' = array( 'reuters' = array( 'type' = 'custom', 'tokenizer' = 'standard', 'filter' = array('lowercase', 'asciifolding', 'nGram') ) ) ) ) )); $this-elasticsearchClient-indices()-create($params); return; } My Indexing public function indexTable($pf,$typeElement){ $params =array( index ='_river', type = $typeElement, id = _meta, body =array( type = jdbc, jdbc = array( url = jdbc:mysql://ip/name, user = 'root', password = 'mdp', index = $pf, type = $typeElement, sql = select id_source as _id, id_sous_theme, titre_source, desc_source, note_source, adresse_source, type_source, date_source from source, max_bulk_requests = 5, ) ) ); $this-elasticsearchClient-index($params); } Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send
Re: ElasticSearch queries always return all the datas stored in the index
Ah yes sorry you are right, I am using some old tools :) Cédric Hourcade c...@wal.fr On Fri, Jun 20, 2014 at 11:49 AM, Alexandre Touret alexan...@touret.info wrote: I just upgraded to ES 1.2.1 and the latest release of mavel. I have the same behaviour Le vendredi 20 juin 2014 11:34:59 UTC+2, David Pilato a écrit : No. GET works for running searches. It could be an issue if you are using an OLD SENSE version and not Marvel. -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:28:23, Cédric Hourcade (c...@wal.fr) a écrit: It looks like you are doing a GET rather than a POST, if so your query content is ignored. Cédric Hourcade c...@wal.fr On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alex...@touret.info wrote: Yes My request for doe always return that answer Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit : Searching for DOE gives you that answer? If so, it's not normal IMHO. You should try to reproduce it with a full SENSE script recreation so we can replay it and help you from here. See http://www.elasticsearch.org/help/ for information. About parent child, you could read this: http://www. elasticsearch.org/blog/managing-relations-inside-elasticsearch/ -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a écrit: Hello, thanks for your response When I add an other carte put /tp/carte/20450813 { dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1963-03-22T23:00:00.000Z, firstname: FLORENCE, lastname: SMITH }, { birthday: 2001-10-12T22:00:00.000Z, firstname: M ANGELO, lastname: SMITH }, { birthday: 2003-07-30T22:00:00.000Z, firstname: M LILI, lastname: SMITH } ] } and I run the query described above, I have both of the two 'carte' Is it normal ? Do you have an example or a link to illustrate the parent/child feature ? Thanks Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : Hey Alexandre, This is correct. You are searching for a carte which contains an adherent. Elasticsearch gives you a carte object as an answer. And elasticsearch gives you back exactly what you have indexed. That being said, I think you could look at parent/child feature for that use case. Or you can have one carte object per adherent? Makes sense? -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a écrit: hello, https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index# I'm trying to index and query an index store in ES 1.2. I both create and populate the index with the JAVA API using the transportclient api. I have the following mapping: get /tp/carte/_mapping{ tp: { mappings: { carte: { properties: { adherents: { properties: { birthday: { type: date, format: dateOptionalTime }, firstname: { type: string }, lastname: { type: string } } }, dateEdition: { type: date, format: dateOptionalTime } } } } }} When I search ob object with the ID, it works fine but, when I try to query the content of one of my nested objects, *ES always returns all the objects stored in the index*. I also tried to create the objects manually with sense and I have the same behaviour. Example of my insert put /tp/carte/20454795{ dateEdition: 2014-06-01T22:00:00.000Z, adherents: [ { birthday: 1958-05-05T23:00:00.000Z, firstname: ANDREW, lastname: DOE }, { birthday: 1964-03-01T23:00:00.000Z, firstname: ROBERT, lastname: DOE }, {
Re: How does shingle filter work on match_phrase in query phase?
Hello Hourcade, Thanks for your response. Does that mean different values should be set to index_analyzer and search_analyzer? (e.g. index_analyzer: shingle, and search_analyzer: standard) What if I want to re-use the same shingle analyzer in both index and search? will the match_phrase t1 t2 t3 still give me a match? I know that set a different analyzer to search_analyzer makes match_phrase t1 t2 t3 searchable, but if I do that, then I get no benefit from shingle, right? Instead I get a bigger index size. I assume shingle is used for faster match_phrase searches. But after shingle, searching a phrase of 3 tokens t1 t2 t3 becomes searching a phrase of 5 tokens plus I don't know how shingle arranges the positions for a correct phrase query. So how can match_phrase be faster? Thank you. Cédric Hourcade於 2014年6月20日星期五UTC+8下午4時18分03秒寫道: Hello, Let's say you have an indexed text t1 t3 t3 with shingles. The token positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos 1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3). So if you are searching with a match_phrase for t1 t2 t3 (even if not tokenized as shingles) it will matches the document, because t1, t2 and t3 are considered next to each others (based on there recorded position) for this document. Cédric Hourcade c...@wal.fr javascript: On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walke...@gmail.com javascript: wrote: How does shingle filter work on match_phrase in query phase? After analyzing phrase t1 t2 t3, shingle filter produced five tokens, t1 t2 t3 t1 t2 t2 t3 Will match_phrase still give t1 t2 t3 a match? How it works? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/602477cb-d8f4-459b--e6174662fbfd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How do people typically handle shard failures in their results?
If it fails on the primary shard, then a failure is returned. If it worked, and a replica failed, then that replica is deemed a failed replica, and will get allocated somewhere else in the cluster. Maybe an example of where a failure on all shards would help here? On Jun 18, 2014, at 11:45, mooky nick.minute...@gmail.com wrote: If I understand correctly, we can get an OK response from elastic (ie no error) but if there are shard failures in the response, it potentially means that results are incomplete/incorrect. From my observation, we can get failures on all shards - and elastic still returns OK (which was a bit surprising to me) What kinds of approaches to people typically use to deal with shard failures? For my application, if there are shard failures, essentially my results are inaccurate/incorrect - so I need to return an error to the client. Returning bad results is worse than returning an error. I am inclined to turn any shard failure into an exception. Is this quite common? Does it make sense to add a feature to the elastic api ? (ie request.setTreatShardFailuresAsErrors(true) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/461fa217-d664-47e9-a60d-88ea9506327d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/FFDC7083-24CB-484D-B337-65582596D555%40gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How do people typically handle shard failures in their results?
On Fri, Jun 20, 2014 at 7:08 AM, Shay Banon kim...@gmail.com wrote: If it fails on the primary shard, then a failure is returned. If it worked, and a replica failed, then that replica is deemed a failed replica, and will get allocated somewhere else in the cluster. Maybe an example of where a failure on “all” shards would help here? I think its more about searches and they can fail on one shard but not other for all sorts of reasons. Queue full, unfortunate script, bug, only one shard had results and the query asked for something weird like to use the postings highlighter when postings aren't stored. Lots of reasons. I log the event and move on. I toyed with outputting a warning to the user but didn't have time to implement it. We're pretty diligent with our logs so we'd notice the log and run it down. If the failure is caused by the queue being full only on one node, we'd likely notice that real quick as ganglia would lose it. This happened to me recently when we put a node without an ssd into a cluster with ssds. It couldn't keep up and dropped a ton of searches. In our defense, we didn't know the rest of the cluster had ssds so we were double surprised. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2dvNM-wu%3Due4trJzAtLV%3Dz1xK0MVNxhYkUKv2g68z3VQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How does shingle filter work on match_phrase in query phase?
Yes, you can use two different analyzers. In your case what you can do is: - for the the indexation you apply a shingle filter. - for the query you also apply a shingle filter, but this time you disable the unigrams (output_unigrams: false), so it will only generate the shingles, in your case : t1 t2 and t2 t3. It will match your document. Cédric Hourcade c...@wal.fr On Fri, Jun 20, 2014 at 12:30 PM, 陳智清 walker0...@gmail.com wrote: Hello Hourcade, Thanks for your response. Does that mean different values should be set to index_analyzer and search_analyzer? (e.g. index_analyzer: shingle, and search_analyzer: standard) What if I want to re-use the same shingle analyzer in both index and search? will the match_phrase t1 t2 t3 still give me a match? I know that set a different analyzer to search_analyzer makes match_phrase t1 t2 t3 searchable, but if I do that, then I get no benefit from shingle, right? Instead I get a bigger index size. I assume shingle is used for faster match_phrase searches. But after shingle, searching a phrase of 3 tokens t1 t2 t3 becomes searching a phrase of 5 tokens plus I don't know how shingle arranges the positions for a correct phrase query. So how can match_phrase be faster? Thank you. Cédric Hourcade於 2014年6月20日星期五UTC+8下午4時18分03秒寫道: Hello, Let's say you have an indexed text t1 t3 t3 with shingles. The token positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos 1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3). So if you are searching with a match_phrase for t1 t2 t3 (even if not tokenized as shingles) it will matches the document, because t1, t2 and t3 are considered next to each others (based on there recorded position) for this document. Cédric Hourcade c...@wal.fr On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walke...@gmail.com wrote: How does shingle filter work on match_phrase in query phase? After analyzing phrase t1 t2 t3, shingle filter produced five tokens, t1 t2 t3 t1 t2 t2 t3 Will match_phrase still give t1 t2 t3 a match? How it works? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/602477cb-d8f4-459b--e6174662fbfd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJQxjPMAEGK%3DSxYfoBtjgcdZYPHqAAiSPpQBjh1fvtXgkwWuLA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How do people typically handle shard failures in their results?
Ahh, I see. If its related to searches, then yea, the search response includes details about the total shards that the search was executed on, the successful shards, and failed shards. They are important to check to understand if one gets partial results. In the REST API, if there is a total failure, then it will return the worst status code out of all the shards in the response. In the Java API, the search response will be returned (with no exception), so the content of the search has to be checked (which is a good practice anyhow). It might make sense to raise an exception in the Java API if all shards failed, I am on the fence on this one, since anyhow a check needs to be performed on the result. On Jun 20, 2014, at 13:22, Nikolas Everett nik9...@gmail.com wrote: On Fri, Jun 20, 2014 at 7:08 AM, Shay Banon kim...@gmail.com wrote: If it fails on the primary shard, then a failure is returned. If it worked, and a replica failed, then that replica is deemed a failed replica, and will get allocated somewhere else in the cluster. Maybe an example of where a failure on all shards would help here? I think its more about searches and they can fail on one shard but not other for all sorts of reasons. Queue full, unfortunate script, bug, only one shard had results and the query asked for something weird like to use the postings highlighter when postings aren't stored. Lots of reasons. I log the event and move on. I toyed with outputting a warning to the user but didn't have time to implement it. We're pretty diligent with our logs so we'd notice the log and run it down. If the failure is caused by the queue being full only on one node, we'd likely notice that real quick as ganglia would lose it. This happened to me recently when we put a node without an ssd into a cluster with ssds. It couldn't keep up and dropped a ton of searches. In our defense, we didn't know the rest of the cluster had ssds so we were double surprised. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2dvNM-wu%3Due4trJzAtLV%3Dz1xK0MVNxhYkUKv2g68z3VQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9A9246D5-338B-4B93-B2FD-4D3B93F621F2%40gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How does shingle filter work on match_phrase in query phase?
I got it! Thank you! Cédric Hourcade於 2014年6月20日星期五UTC+8下午8時00分36秒寫道: Yes, you can use two different analyzers. In your case what you can do is: - for the the indexation you apply a shingle filter. - for the query you also apply a shingle filter, but this time you disable the unigrams (output_unigrams: false), so it will only generate the shingles, in your case : t1 t2 and t2 t3. It will match your document. Cédric Hourcade c...@wal.fr javascript: On Fri, Jun 20, 2014 at 12:30 PM, 陳智清 walke...@gmail.com javascript: wrote: Hello Hourcade, Thanks for your response. Does that mean different values should be set to index_analyzer and search_analyzer? (e.g. index_analyzer: shingle, and search_analyzer: standard) What if I want to re-use the same shingle analyzer in both index and search? will the match_phrase t1 t2 t3 still give me a match? I know that set a different analyzer to search_analyzer makes match_phrase t1 t2 t3 searchable, but if I do that, then I get no benefit from shingle, right? Instead I get a bigger index size. I assume shingle is used for faster match_phrase searches. But after shingle, searching a phrase of 3 tokens t1 t2 t3 becomes searching a phrase of 5 tokens plus I don't know how shingle arranges the positions for a correct phrase query. So how can match_phrase be faster? Thank you. Cédric Hourcade於 2014年6月20日星期五UTC+8下午4時18分03秒寫道: Hello, Let's say you have an indexed text t1 t3 t3 with shingles. The token positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos 1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3). So if you are searching with a match_phrase for t1 t2 t3 (even if not tokenized as shingles) it will matches the document, because t1, t2 and t3 are considered next to each others (based on there recorded position) for this document. Cédric Hourcade c...@wal.fr On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walke...@gmail.com wrote: How does shingle filter work on match_phrase in query phase? After analyzing phrase t1 t2 t3, shingle filter produced five tokens, t1 t2 t3 t1 t2 t2 t3 Will match_phrase still give t1 t2 t3 a match? How it works? Thank you. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/602477cb-d8f4-459b--e6174662fbfd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/61083ccb-f678-4074-bd48-a4dbcc0c0511%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: problem indexing with my analyzer
If your base64 encodes are long, they are going to be splited in a lot of tokens by the standard tokenizer. Theses tokens are often going to be a lot longer than standard words, so your nGram filter will generate even more tokens, a lot more than with standard text. That may be your problem there. You should really try to strip the encoded images with a simple regex from your documents before indexing them. If you need to keep the source, put the raw text in an unindexed field, and the cleaned one in another. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJQxjPPD4UXAjX%2Buwi84LSsPeiy0C80uzcb4C1QFxwLzfyjQGA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: ES v1.1 continuous young gc pauses old gc, stops the world when old gc happens and splits cluster
Mike - The above sounds like happened due to machines sending too many indexing requests and merging unable to keep up pace. Usual suspects would be not enough cpu/disk speed bandwidth. This doesn't sound related to memory constraints posted in the original issue of this thread. Do you see memory GC traces in logs? On Friday, June 20, 2014 9:40:48 AM UTC-4, Michael Hart wrote: We're seeing the same thing. ES 1.1.0, JDK 7u55 on Ubuntu 12.04, 5 data nodes, 3 separate masters, all are 15GB hosts with 7.5GB Heaps, storage is SSD. Data set is ~1.6TB according to Marvel. Our daily indices are roughly 33GB in size, with 5 shards and 2 replicas. I'm still investigating what happened yesterday, but I do see in Marvel a large spike in the Indices Current Merges graph just before the node dies, and a corresponding increase in JVM Heap. When Heap hits 99% everything grinds to a halt. Restarting the node fixes the issue, but this is third or fourth time it's happened. I'm still researching how to deal with this, but a couple of things I am looking at are: - increase the number of shards so that the segment merges stay smaller (is that even a legitimate sentence?) I'm still reading through this page the Index Module Merge page http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-merge.html for more details. - look at store level throttling http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html#store-throttling . I would love to get some feedback on my ramblings. If I find anything more I'll update this thread. cheers mike On Thursday, June 19, 2014 4:05:54 PM UTC-4, Bruce Ritchie wrote: Java 8 with G1GC perhaps? It'll have more overhead but perhaps it'll be more consistent wrt pauses. On Wednesday, June 18, 2014 2:02:24 PM UTC-4, Eric Brandes wrote: I'd just like to chime in with a me too. Is the answer just more nodes? In my case this is happening every week or so. On Monday, April 21, 2014 9:04:33 PM UTC-5, Brian Flad wrote: My dataset currently is 100GB across a few daily indices (~5-6GB and 15 shards each). Data nodes are 12 CPU, 12GB RAM (6GB heap). On Mon, Apr 21, 2014 at 6:33 PM, Mark Walkom ma...@campaignmonitor.com wrote: How big are your data sets? How big are your nodes? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 April 2014 00:32, Brian Flad bfla...@gmail.com wrote: We're seeing the same behavior with 1.1.1, JDK 7u55, 3 master nodes (2 min master), and 5 data nodes. Interestingly, we see the repeated young GCs only on a node or two at a time. Cluster operations (such as recovering unassigned shards) grinds to a halt. After restarting a GCing node, everything returns to normal operation in the cluster. Brian F On Wed, Apr 16, 2014 at 8:00 PM, Mark Walkom ma...@campaignmonitor.com wrote: In both your instances, if you can, have 3 master eligible nodes as it will reduce the likelihood of a split cluster as you will always have a majority quorum. Also look at discovery.zen.minimum_master_nodes to go with that. However you may just be reaching the limit of your nodes, which means the best option is to add another node (which also neatly solves your split brain!). Ankush it would help if you can update java, most people recommend u25 but we run u51 with no problems. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 17 April 2014 07:31, Dominiek ter Heide domin...@gmail.com wrote: We are seeing the same issue here. Our environment: - 2 nodes - 30GB Heap allocated to ES - ~140GB of data - 639 indices, 10 shards per index - ~48M documents After starting ES everything is good, but after a couple of hours we see the Heap build up towards 96% on one node and 80% on the other. We then see the GC take very long on the 96% node: TOuKgmlzaVaFVA][elasticsearch1.trend1.bottlenose.com][inet[/192.99.45.125: 9300]]]) [2014-04-16 12:04:27,845][INFO ][discovery] [elasticsearch2.trend1] trend1/I3EHG_XjSayz2OsHyZpeZA [2014-04-16 12:04:27,850][INFO ][http ] [ elasticsearch2.trend1] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/192.99.45.126:9200]} [2014-04-16 12:04:27,851][INFO ][node ] [elasticsearch2.trend1] started [2014-04-16 12:04:32,669][INFO ][indices.store] [elasticsearch2.trend1] updating indices.store.throttle.max_bytes_per_sec from [20mb] to [1gb], note, type is [MERGE] [2014-04-16 12:04:32,669][INFO ][cluster.routing.allocation.decider] [elasticsearch2.trend1] updating [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] to [50] [2014-04-16 12:04:32,670][INFO
Re: Losing data after Elasticsearch restart
Hi Alexander, Here is the stack trace for the NullpointerException - [23:24:38,929][DEBUG][action.bulk ] [Rasputin, Mikhail] [17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.bulk.BulkShardRequest@22b11bbf] java.lang.NullPointerException at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247) at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) [23:24:38,940][DEBUG][action.bulk ] [Rasputin, Mikhail] [17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.bulk.BulkShardRequest@768475c4] java.lang.NullPointerException at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247) at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Thanks, Rohit On Fri, Jun 20, 2014 at 12:02 AM, Alexander Reelsen a...@spinscale.de wrote: Hey, the exception you showed, can possibly happen, when you remove an alias. However you mentioned NullPointerException in your first post, which is not contained in the stacktrace, so it seems, that one is still missing. Also, please retry with a newer version of Elasticsearch. --Alex On Thu, Jun 19, 2014 at 5:13 AM, Rohit Jaiswal rohit.jais...@gmail.com wrote: Hi Alexander, We sent you the stack trace. Can you please enlighten us on this? Thanks, Rohit On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal rohit.jais...@gmail.com wrote: Hi Alexander, Thanks for your reply. We plan to upgrade in the long run, however we need to fix the data loss problem on 0.90.2 in the immediate term. Here is the stack trace - 10:09:37.783 PM [22:09:37,783][WARN ][indices.cluster ] [Storm] [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard org.elasticsearch.indices.recovery.RecoveryFailedException: [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]] at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293) at org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62) at org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey Mace][inet[/10.4.35.200:9300 ]][index/shard/recovery/startRecovery] Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed at org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147) at org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526) at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116) at org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60) at
Re: problem indexing with my analyzer
Thank you Cédric Hourcade ! Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit : If your base64 encodes are long, they are going to be splited in a lot of tokens by the standard tokenizer. Theses tokens are often going to be a lot longer than standard words, so your nGram filter will generate even more tokens, a lot more than with standard text. That may be your problem there. You should really try to strip the encoded images with a simple regex from your documents before indexing them. If you need to keep the source, put the raw text in an unindexed field, and the cleaned one in another. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
guarding from double-start
There were a couple of times during development workflow I have started ES script the second time. It results in red status (I use Elastic HQ) and not-working. So I'm forced to regenerate all indexes (with all test data) again. It takes noticeable time. At the moment I use this script ES_MAX_MEM=512M export ES_MAX_MEM cd /ES-dir/bin ./elasticsearch.in.sh ./elasticsearch -f under Linux to start ES. Can you. please, suggest a trick to avoid falling in red? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a79fba10-3fad-4c76-bc19-d744c2f79ef2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: searching on nested docs - geting back the nested docs as a response
I am not sure highlight will work as i suspect it will encounter the same obstacle, see in: https://github.com/elasticsearch/elasticsearch/issues/5245 as for suggestion #2, this will break our current schema and will require a significant model change (we store the data in MongoDB as well) - so, i am not sure if we are not better off to wait until #3022 is solved? for the meantime, any workaround will be appreciated... can we do some in memory searching again? (using native lucene somehow?...) On Friday, June 20, 2014 1:13:42 AM UTC+3, Itamar Syn-Hershko wrote: It is very hard to give you concrete advice without knowing more about your domain and usecases, but here are 2 points that came to mind: 1. You can make use of the highlighting features to show the content that matched. Highlighters can return whole blocks of text, and by using positionIncrements correctly you can get this right. 2. Yes, Elasticsearch is a document-oriented storage, but is it really necessary for you to index entire books as one document? I'd most certainly look at indexing sections or chapters maybe even pages as single documents and use string references to the book ID. Unless you use data from the book level along with full-text searches on the texts, which even then in some scenarios I would consider denormalization. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Jun 19, 2014 at 10:13 PM, liorg lior...@gmail.com javascript: wrote: Well, assuming we have a book type. the book holds a lot of metadata, lets say something of the following: { author: { name: Jose, lastName: Martin }, sections: [{ chapters: [{ pages: [{ pageNum: 1, numOfChars: 1000, text: let my people..., numofWords: 125 }, { pageNum: 2, numOfChars: 1005, text: let my people go..., numofWords: 150 }], chapterName: the start }, { pages: [{ pageNum: 3, numOfChars: 1000, text: will do..., numofWords: 125 }, { pageNum: 4, numOfChars: 1005, text: will do later on..., numofWords: 150 }], chapterName: the end }], sectionName: prologue }] } we want to search for all the pages that have let my people in their text and more than 100 words. so, when we use ES we can use nested objects and query on the nested page object - but the actual returned values are the books (parents) that have those matching pages. now, if we want to show the user the pages he was looking for - we cannot do that, as we get the whole book type returned with all its metadata and not just the nested objects that matched the criteria... - we need to search again (maybe in memory?) for the pages that matched the criteria in order to display the user his search results... (the whole type is returned as ES does not support yet in returning the nested objects that matched the criteria). i hope it is better understood now On Thursday, June 19, 2014 7:22:13 PM UTC+3, Itamar Syn-Hershko wrote: This is usually something that's being solved using parent-child, but the question here really is what do you mean by needing to retrieve both books pages. Can you describe the actual scenario and what you are trying to achieve? -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Jun 19, 2014 at 7:12 PM, liorg lior...@gmail.com wrote: Hi, we have somehow a complex type holding some nested docs with arrays (lets assume an hierarchy of books and for each book we have an array of pages containing its metadata). we want to search for the nested doc - search for all the books that have the term XYZ in one of their pages - but we want to get back not only the book, but the pages themselves. We've understood that it's problematic to achieve with ES (see https://github.com/elasticsearch/elasticsearch/issues/3022). We have a problem to achieve it with parent child model as the data model comes from our mongodb already existing model (and besides, not sure if a parent child model fits here). so... 1. Is there any a workaround we can do to get the results of the nested doc? (the actual pages?) 2. If not, is there a recommended way we can search for the data again in memory after it was narrowed down by ES server?... 3. Any advice will be appreciated as this is quite a big obstacle in our way to implement a solution using ES. thanks, Lior -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%
Kibana Terms panel showing date fields as longs?
Hello :) I have some log data indexed in ES and trying to visualize in Kibana and getting strange behavior related to dates. I have Terms panel with the following settings: Terms mode: terms Field: date Length 10 Order: count For some reason, the date column in the panel is showing up as a long, not a date: COUNTBYDATE TermCountAction14662944096597 14663808060063 14662080059480 The Table panel showing all my log entries knows that field is a date, and it displays as a date correctly. If I curl the request to ES, it appears ES is returning it as a long, not a date: curl -XGET 'http://localhost:9200/test/_search?pretty' -d '{ facets: { terms: { terms: { field: date, size: 10, order: count, exclude: [] }, facet_filter: { fquery: { query: { filtered: { query: { bool: { should: [ { query_string: { query: _type:test_type } } ] } }, filter: { bool: { must: [ { range: { date: { from: 1465925902106, to: 1466769177326 } } } ] } } } } } } } }, size: 0 }' returns: { took : 387, timed_out : false, _shards : { total : 10, successful : 10, failed : 0 }, hits : { total : 48173413, max_score : 0.0, hits : [ ] }, facets : { terms : { _type : terms, missing : 0, total : 365090, other : 0, terms : [ { term : 146629440, count : 96697 }, { term : 146638080, count : 60343 }, { term : 146620800, count : 59579 }, { term : 146612160, count : 51592 }, { term : 146603520, count : 48859 }, { term : 146594880, count : 48020 } ] } } } Is there something I can do to have Kibana recognize the term is a date and display it as 2014-06-17 like the Table panel does? Thanks so much! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3DpjWrZD8xiKCEDzXmcvydoQzztN-4q1%2BVr3rhaH4H0HEUQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Result the number of matched terms for a given result.
Hi, Is it possible to get elasticsearch to return the number of terms matched per result in a query. I know these are evaluated as they make up the score but there doesn't seem to be a way to get a simple count? For example with :query = {:in = {:user_ids = [user_ids...], :minimum_should_match = 1}} I would like to know how many user_ids were matched. Thanks, Dan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5cd6753-f166-4ae5-8c61-844650efa859%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: guarding from double-start
use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile guarding es instance. Or just run this way: pgrep -f elasticsearch || ./start_es.sh On Friday, June 20, 2014 3:21:08 PM UTC+1, Andrew Gaydenko wrote: There were a couple of times during development workflow I have started ES script the second time. It results in red status (I use Elastic HQ) and not-working. So I'm forced to regenerate all indexes (with all test data) again. It takes noticeable time. At the moment I use this script ES_MAX_MEM=512M export ES_MAX_MEM cd /ES-dir/bin ./elasticsearch.in.sh ./elasticsearch -f under Linux to start ES. Can you. please, suggest a trick to avoid falling in red? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d78daaaf-305b-45b4-ad9a-e34cf1adbb22%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: guarding from double-start
You can either use the startup scripts that come with the package when you install via apt/yum [1] or use the service wrapper [2]. [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html [2] https://github.com/elasticsearch/elasticsearch-servicewrapper -- Ivan On Fri, Jun 20, 2014 at 7:49 AM, Maciej Dziardziel fied...@gmail.com wrote: use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile guarding es instance. Or just run this way: pgrep -f elasticsearch || ./start_es.sh On Friday, June 20, 2014 3:21:08 PM UTC+1, Andrew Gaydenko wrote: There were a couple of times during development workflow I have started ES script the second time. It results in red status (I use Elastic HQ) and not-working. So I'm forced to regenerate all indexes (with all test data) again. It takes noticeable time. At the moment I use this script ES_MAX_MEM=512M export ES_MAX_MEM cd /ES-dir/bin ./elasticsearch.in.sh ./elasticsearch -f under Linux to start ES. Can you. please, suggest a trick to avoid falling in red? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d78daaaf-305b-45b4-ad9a-e34cf1adbb22%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d78daaaf-305b-45b4-ad9a-e34cf1adbb22%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDQMVO4sf-%3Dgq_cnQRX6cTP1RG7_HquR_tAoVa6A_VoFg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: guarding from double-start
On Friday, June 20, 2014 6:49:04 PM UTC+4, Maciej Dziardziel wrote: use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile guarding es instance. Or just run this way: pgrep -f elasticsearch || ./start_es.sh Aha, thanks! - at my case pgrep is the most appropriate. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/115162b2-d679-48f0-a06e-24c47f74d079%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[ANN] Elasticsearch Thrift transport plugin 2.2.0 released
Heya, We are pleased to announce the release of the Elasticsearch Thrift transport plugin, version 2.2.0. The thrift transport plugin allows to use the REST interface over thrift on top of HTTP.. https://github.com/elasticsearch/elasticsearch-transport-thrift/ Release Notes - elasticsearch-transport-thrift - Version 2.2.0 Update: * [28] - Update to elasticsearch 1.2.0 (https://github.com/elasticsearch/elasticsearch-transport-thrift/issues/28) Doc: * [25] - Add documentation on missing settings (https://github.com/elasticsearch/elasticsearch-transport-thrift/issues/25) Issues, Pull requests, Feature requests are warmly welcome on elasticsearch-transport-thrift project repository: https://github.com/elasticsearch/elasticsearch-transport-thrift/ For questions or comments around this plugin, feel free to use elasticsearch mailing list: https://groups.google.com/forum/#!forum/elasticsearch Enjoy, -The Elasticsearch team -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53a462bc.814db40a.27a2.5605SMTPIN_ADDED_MISSING%40gmr-mx.google.com. For more options, visit https://groups.google.com/d/optout.
Re: Splunk vs. Elastic search performance?
Thomas, Thanks for your insights and experiences. As I am someone who has explored and used ES for over a year but is relatively new to the ELK stack, your data points are extremely valuable. Let me offer some of my own views. Re: double the storage. I strongly recommend ELK users to disable the _all field. The entire text of the log events generated by logstash ends up in the message field (and not @message as many people incorrectly post). So the _all field is just redundant overhead with no value add. The result is a dramatic drop in database file sizes and dramatic increase in load performance. Of course, you need to configure ES to use the message field as the default for a Lucene Kibana query. During the year that I've used ES and watched this group, I have been on the front line of a brand new product with a smart and dedicated development team working steadily to improve the product. Six months ago, the ELK stack eluded me and reports weren't encouraging (with the sole exception of the Kibana web site's marketing pitch). But ES has come a long way since six months ago, and the ELK stack is much more closely integrated. The Splunk UI is carefully crafted to isolate users from each other and prevent external (to the Splunk db itself, not to our company) users from causing harm to data. But Kibana seems to be meant for a small cadre of trusted users. What if I write a dashboard with the same name as someone else's? Kibana doesn't even begin to discuss user isolation. But I am confident that it will. How can I tell Kibana to set the default Lucene query operator to AND instead of OR. Google is not my friend: I keep getting references to the Ruby versions of Kibana; that's ancient history by now. Kibana is cool and promising, but it has a long way to go for deployment to all of the folks in our company who currently have access to Splunk. Logstash has a nice book that's been very helpful, and logstash itself has been an excellent tool for prototyping. The book has been invaluable in helping me extract dates from log events and handling all of our different multiline events. But it still doesn't explain why the date filter needs a different array of matching strings to get the date that the grok filter has already matched and isolated. And recommendations to avoid the elasticsearch_http output and use elasticsearch (via the Node client) directly contradict the fact that logstash's 1.1.1 version of the ES client library is not compatible with the most recent 1.2.1 version of ES. And logstash is also a resource hog, so we eventually plan to replace it with Perl and Apache Flume (already in use) and pipe it into my Java bulk load tool (which is always kept up-to-date with the versions of ES we deploy!!). Because we send the data via Flume to our data warehouse, any losses in ES will be annoying but won't be catastrophic. And the front-end following of rotated log files will be done using the GNU *tail -F* command and option. This GNU tail command with its uppercase -F option follows rotated log files perfectly. I doubt that logstash can do the same, and we currently see that neither can Splunk (so we sporadically lose log events in Splunk too). So GNU tail -F piped into logstash with the stdin filter works perfectly in my evaluation setup and will likely form the first stage of any log forwarder we end up deploying, Brian On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote: We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk. On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote: We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data
boolean multi-field silently ignored in 1.2.1
I'm seeing multi-fields of type boolean silently being reduced to a normal boolean field in 1.2.1 which wasn't the behavior in 0.90.9. See https://gist.github.com/Omega359/0c2a93690b4db30693a1 for an example of this. Is this expected? To me it seems like it should work - the boolean field mapper seems to be calling out to multiFieldsBuilder - but I'm not versed enough in the internals of ES to know where if at all it's broken. Bruce -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ccc5b263-24a2-45c5-97d1-46a93799eb58%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Penalty or boost from a boolean property
Function_score is the way to go IMHO. Best -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 20 juin 2014 à 19:50, hugo lassiege hlassi...@gmail.com a écrit : Hi, I'm looking for help :) This is maybe trivial but I can't find the good solution. I have some documents and those documents have two boolean properties, basically thumbs up and thumbs down to show that the administrator approve or not those documents. I try to boost a document if it is thumbsup or demote the document if it is thumbsdown. It's not a filter, the document could be retrieved, it's just more or less relevant. I tried with two should clauses in the global request : { bool : { should : [ { term : { champ1 : valeur1 } }, { term : { champ2 : valeur2 } }, { term : { thumbsup : true } }, { term : { thumbsdown : false} } ] } } But I get some irrelevant documents because they match the last conditions. What would be the best method for this use case ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ba3964f0-fbc8-4e0c-be3f-c38af8221410%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/088863F1-E2EA-45A6-9368-D9AA69E717FE%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Penalty or boost from a boolean property
Hi, I'm looking for help :) This is maybe trivial but I can't find the good solution. I have some documents and those documents have two boolean properties, basically thumbs up and thumbs down to show that the administrator approve or not those documents. I try to boost a document if it is thumbsup or demote the document if it is thumbsdown. It's not a filter, the document could be retrieved, it's just more or less relevant. I tried with two should clauses in the global request : { bool : { should : [ { term : { champ1 : valeur1 } }, { term : { champ2 : valeur2 } }, { term : { thumbsup : true } }, { term : { thumbsdown : false} } ] } } But I get some irrelevant documents because they match the last conditions. What would be the best method for this use case ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ba3964f0-fbc8-4e0c-be3f-c38af8221410%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Getting complete value from ElasticSearch query
I have the following structure on my ElasticSearch: { _index: 3_exposureindex _type: exposuresearch _id: 12738 _version: 4 _score: 1 _source: { Name: test2_update Description: CreateUserId: 8 SourceId: null Id: 12738 ExposureId: 12738 CreateDate: 2014-06-20T16:18:50.500 UpdateDate: 2014-06-20T16:19:57.547 UpdateUserId: 8 } fields: { _parent: 1 } } I am trying to get both, the data in `_source` as well as that in `fields`, when I run the query: { query: { terms: { Id: [ 12738 ] } } } All I get are the values contained in `_source`, whereas, if I run the query: { fields: [ _parent ], query: { terms: { Id: [ 12738 ] } } } Then I only the `fields`. Is there a way to get both? I will be grateful for any help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cdb02319-f6ee-455e-bf13-762df7e33a82%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Getting complete value from ElasticSearch query
I forgot to mention that I have asked the same question in StackOverflow http://stackoverflow.com/questions/24333655/getting-complete-value-from-elasticsearch-query On Friday, June 20, 2014 11:52:49 AM UTC-7, Vinay Pandey wrote: I have the following structure on my ElasticSearch: { _index: 3_exposureindex _type: exposuresearch _id: 12738 _version: 4 _score: 1 _source: { Name: test2_update Description: CreateUserId: 8 SourceId: null Id: 12738 ExposureId: 12738 CreateDate: 2014-06-20T16:18:50.500 UpdateDate: 2014-06-20T16:19:57.547 UpdateUserId: 8 } fields: { _parent: 1 } } I am trying to get both, the data in `_source` as well as that in `fields`, when I run the query: { query: { terms: { Id: [ 12738 ] } } } All I get are the values contained in `_source`, whereas, if I run the query: { fields: [ _parent ], query: { terms: { Id: [ 12738 ] } } } Then I only the `fields`. Is there a way to get both? I will be grateful for any help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f70efa60-f62c-4dc0-9812-e02a3a900ea4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
deleting documents that are missing fields
I can easily query for documents that are missing a particular term field, however I'd like to free up that space and remove those documents. I've tried this with no luck: DELETE /my_index/pages/_search { filter : { missing : { field : sentences, existence : true, null_value : true } } } It works fine to find them, but i can't find an easy way to remove them and I have about 2million to remove as well. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2d41bfb-145d-402e-a5aa-2f0329278bd9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Getting complete value from ElasticSearch query
This just got answered: You should be able to specify _source in the fields Example: { fields: [ _parent, _source ], query: { terms: { Id: [ 12738 ] } }} On Friday, June 20, 2014 11:52:49 AM UTC-7, Vinay Pandey wrote: I have the following structure on my ElasticSearch: { _index: 3_exposureindex _type: exposuresearch _id: 12738 _version: 4 _score: 1 _source: { Name: test2_update Description: CreateUserId: 8 SourceId: null Id: 12738 ExposureId: 12738 CreateDate: 2014-06-20T16:18:50.500 UpdateDate: 2014-06-20T16:19:57.547 UpdateUserId: 8 } fields: { _parent: 1 } } I am trying to get both, the data in `_source` as well as that in `fields`, when I run the query: { query: { terms: { Id: [ 12738 ] } } } All I get are the values contained in `_source`, whereas, if I run the query: { fields: [ _parent ], query: { terms: { Id: [ 12738 ] } } } Then I only the `fields`. Is there a way to get both? I will be grateful for any help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00e590cf-352d-4ebf-800d-113565ee7fbe%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Terms aggregation for multiple fields
Hi Team, I am new to elasticsearch and learning about the searchapi/queryapi in elasticsearch. I have a requirement to fetch the data from ES. My data is as below assume in a table format Prop-Name Type Use Place1 Sale Office Place2 LeaseOffice Place3 SubLease Office Place4 Sale Industry So in type i have Sale, Lease, sublease as distinct values for the property. Similarly for use i have 7 distinct types. I have loaded the data as into ES. My need.. at the pageload.. i need to showe the count of each type and each use. Upon selection of type, i need to filter the use and viceversa. Assume if we have total 30 places for Type Sale ..then the Use might have Office15 and Industry 15.. When i select Office 15.. I need to find in the document how many types of each belong to Office. 1. All the time, I have to populate the distinct values (3 types and 7 use ) and their counts based on the selection of each 2. How to do aggregation if the Use field having values as Multi-family and want to show as one aggregated value? Current query bring me as two results for this value. Regards Madhavan.TR -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8f803b70-d8ff-4dbd-a4bb-0f71ecaec679%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: cassandra river plugin installation issue
Hi, The issue was not with Hector API, issue has been fixed by using WITH COMPACT STORAGE when creating column families in Cassandra. Here i have posted it: http://stackoverflow.com/questions/21089453/cassandra-column-name-trailing-with-blank-characters -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9de84b4d-0d99-4483-bd1e-5f9471c0b97d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: deleting documents that are missing fields
I do not use delete by query, but have you tried using a fully formed query and not just a filter? Perhaps an implicit match_all query is not being set. Try using a filtered query with a match_all query and your filter. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html -- Ivan On Fri, Jun 20, 2014 at 12:13 PM, Jeff Dupont jeff.dup...@gmail.com wrote: I can easily query for documents that are missing a particular term field, however I'd like to free up that space and remove those documents. I've tried this with no luck: DELETE /my_index/pages/_search { filter : { missing : { field : sentences, existence : true, null_value : true } } } It works fine to find them, but i can't find an easy way to remove them and I have about 2million to remove as well. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2d41bfb-145d-402e-a5aa-2f0329278bd9%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c2d41bfb-145d-402e-a5aa-2f0329278bd9%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCguLamXCnrtV-bA-Ed03pGdB%2BVMrAt5-CYkqkvfnDaGw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)
Patrick, Here's my template, along with where the _all field is disabled. You may wish to add this setting to your own template, and then also add the index setting to ignore malformed data (if someone's log entry occasionally slips in null or no-data instead of the usual numeric value): { automap : { template : logstash-*, settings : { *index.mapping.ignore_malformed : true* }, mappings : { _default_ : { numeric_detection : true, *_all : { enabled : false },* properties : { message : { type : string }, host : { type : string }, UUID : { type : string, index : not_analyzed }, logdate : { type : string, index : no } } } } } } Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a145cb1e-4013-4a6b-a58d-9a42368d8107%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: HIVE-Elasticsearch [mapr-elasticsearch] write to elasticsearch issue
Hi Costin, Thanks for the tip. I replaced the old version of jackson and it works now :). Cheers Shankar On Sunday, June 15, 2014 3:09:27 AM UTC-6, Costin Leau wrote: What version of MapR are you using? MapR uses an old version of jackson which es-hadoop should detect and use an appropriate code path. There are various fixes: 1. I've pushed a fix on the 2.x branch which improves detection - you can try the 2.0.1.BUILD-SNAPSHOT version here [a] 2. You can upgrade the jackson version in MapR to version 1.7 or higher (vanilla Hadoop uses 1.8.8). This approach works with the current es hadoop and also gives you a performance boost for serializing data. Cheers, [a] https://github.com/elasticsearch/elasticsearch-hadoop#development-snapshot On 6/13/14 11:30 PM, shankarr...@gmail.com javascript: wrote: Hi , I am trying to integrate elasticsearch with a mapr hadoop cluster. I am using the hive-elasticsearch integration document. I am able to read data from the elasticsearch node. However I am not able to write data into the elasticsearch node which is my primary requirement. Request to kindly guide me . I always get the following errors:- 2014-06-13 14:15:45,814 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS maprfs:/user/hive/warehouse/dev.db/_tmp.shankar/02_0 *2014-06-13 14:15:45,947 FATAL org.apache.hadoop.hive.ql.exec.mr.ExecMapper: java.lang.NoSuchMethodError: org.codehaus.jackson.JsonGenerator.writeUTF8String([BII)V at org.elasticsearch.hadoop.serializ*ation.json.JacksonJsonGenerator.writeUTF8String(JacksonJsonGenerator.java:123) at org.elasticsearch.hadoop.mr.WritableValueWriter.write(WritableValueWriter.java:47) at org.elasticsearch.hadoop.hive.HiveWritableValueWriter.write(HiveWritableValueWriter.java:83) at org.elasticsearch.hadoop.hive.HiveWritableValueWriter.write(HiveWritableValueWriter.java:38) at org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:69) at org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:111) at org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:55) at org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:41) at org.elasticsearch.hadoop.serialization.builder.ContentBuilder.value(ContentBuilder.java:258) at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.doWriteObject(TemplatedBulk.java:92) at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:79) at org.elasticsearch.hadoop.hive.EsSerDe.serialize(EsSerDe.java:128) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:582) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348) at org.apache.hadoop.mapred.Child$4.run(Child.java:282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1117) at org.apache.hadoop.mapred.Child.main(Child.java:271) 2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 finished. closing... 2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing... 2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... 2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 finished. closing... 2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 Close done 2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 Close done 2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done 2014-06-13 14:15:45,948 INFO
issues with file input from logstash to elastic - please read
Guys, its been more than a week i've been struggling with this issue, if possible, please give it a look and try to help :-( i have a config file that im running logstash with which is suppose to fetch the log file i specified in it and stream it to elasticsearch. problem is that it worked twice and thats it. NO changes made to the file and most of the times it doest load the data and doesnt show any error msg. when i change the input from file to stdin' it works fine. this is the config file, which i belive the syntax is correct since it did work twice... input{ file{ path = C:\elasticsearch-1.2.0\testLog.txt start_position = beginning } } output{ elasticsearch{ host= localhost index= tester3 protocol= http } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8b0634eb-dd2c-47f3-9959-2e48bdcc349d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Disabling date detection [Hive-Elasticsearch]
Hi , My write to es from mapr fails because of the automatic date detection being enabled . Is there a way to disable date detection from the external hive table properties. ? Request to please guide me regarding this. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed7e40b0-b896-4633-88fc-efdf2bead65a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: boolean multi-field silently ignored in 1.2.1
heya bruce that looks like a bug - please open an issue clint On 20 June 2014 19:41, Bruce Ritchie bruce.ritc...@gmail.com wrote: I'm seeing multi-fields of type boolean silently being reduced to a normal boolean field in 1.2.1 which wasn't the behavior in 0.90.9. See https://gist.github.com/Omega359/0c2a93690b4db30693a1 for an example of this. Is this expected? To me it seems like it should work - the boolean field mapper seems to be calling out to multiFieldsBuilder - but I'm not versed enough in the internals of ES to know where if at all it's broken. Bruce -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ccc5b263-24a2-45c5-97d1-46a93799eb58%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ccc5b263-24a2-45c5-97d1-46a93799eb58%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSpOKM38EJDpVkXyTdNuKtL%2BE5dDHBEV89K2LPP4oS2-A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: issues with file input from logstash to elastic - please read
You'll have better luck sending this to the Logstash mailing list :) Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 21 June 2014 08:02, Eitan Vesely eitan...@gmail.com wrote: Guys, its been more than a week i've been struggling with this issue, if possible, please give it a look and try to help :-( i have a config file that im running logstash with which is suppose to fetch the log file i specified in it and stream it to elasticsearch. problem is that it worked twice and thats it. NO changes made to the file and most of the times it doest load the data and doesnt show any error msg. when i change the input from file to stdin' it works fine. this is the config file, which i belive the syntax is correct since it did work twice... input{ file{ path = C:\elasticsearch-1.2.0\testLog.txt start_position = beginning } } output{ elasticsearch{ host= localhost index= tester3 protocol= http } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8b0634eb-dd2c-47f3-9959-2e48bdcc349d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8b0634eb-dd2c-47f3-9959-2e48bdcc349d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YhwCh2XQ1BjK5c5czTy3t0Wa%3DK46st6Gr5Ei%3D5JAkCyg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: How to find the number of authors who have written between 2-3 books?
Alternatively, if you mode this with parent-child, then you can use min_children/max_children which is available in the next release http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#_min_max_children_2 clint On 20 June 2014 17:15, Mike mnilsson2...@gmail.com wrote: I'm ok with the count returned being some estimate. Say in this simple example if it returned 1 for just Joe, or 3 for John, Joe, and Jack that would be ok too. I am also ok with restructuring my data in any way to more efficiently get this number. You mentioned creating a reference count document. How would that look? 1 doc per unique author, with a count of the total number of books he wrote so then I can do a range aggregation on that number? What if I wanted to find the number of authors who have written between 2-3 books that have a title containing E, F, H, or I (still 2 in this case, John and Joe) ? On Thursday, June 19, 2014 6:43:41 PM UTC-4, Itamar Syn-Hershko wrote: This is a Map/Reduce operation, you'll be better off maintaining a ref-count document IMO then trying to hack the aggregations framework to support this Another reason for doing it that way is in a distributed environment some aggregations can't be computed to an exact value - the Terms bucketing is one example. So if you need exact values, I'd go for a model that does it. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Fri, Jun 20, 2014 at 1:34 AM, Mike mnilss...@gmail.com wrote: Assume each document is a book: { title: A, author: Mike } { title: B, author: Mike } { title: C, author: Mike } { title: D, author: Mike } { title: E, author: John } { title: F, author: John } { title: G, author: John } { title: H, author: Joe } { title: I, author: Joe } { title: J, author: Jack } What is the best way to fin the number of authors who have written between 2-3 books? In this case it would be 2, John and Joe. I know I can do a terms aggregation on author, set size to be very very large, and then on the client side traverse through the thousands of authors and count how many had between 2-3. Is there a more efficient way to do this? The cardinality aggregation is almost what I want, if only I could specify a min and max term count. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2cab8d84-7c65-4f6e-ab39-3e2a0e859a87%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/2cab8d84-7c65-4f6e-ab39-3e2a0e859a87%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSyio7izuxr5UL4SD5uiA5J7rwtfyP742W3robxfk7s6A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Splunk vs. Elastic search performance?
I wasn't aware that the elasticsearch_http output wasn't recommended? When I spoke to a few of the ELK devs a few months ago, they indicated that there was minimal performance difference, at the greater benefit of not being locked to specific LS+ES versioning. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 21 June 2014 02:43, Brian brian.from...@gmail.com wrote: Thomas, Thanks for your insights and experiences. As I am someone who has explored and used ES for over a year but is relatively new to the ELK stack, your data points are extremely valuable. Let me offer some of my own views. Re: double the storage. I strongly recommend ELK users to disable the _all field. The entire text of the log events generated by logstash ends up in the message field (and not @message as many people incorrectly post). So the _all field is just redundant overhead with no value add. The result is a dramatic drop in database file sizes and dramatic increase in load performance. Of course, you need to configure ES to use the message field as the default for a Lucene Kibana query. During the year that I've used ES and watched this group, I have been on the front line of a brand new product with a smart and dedicated development team working steadily to improve the product. Six months ago, the ELK stack eluded me and reports weren't encouraging (with the sole exception of the Kibana web site's marketing pitch). But ES has come a long way since six months ago, and the ELK stack is much more closely integrated. The Splunk UI is carefully crafted to isolate users from each other and prevent external (to the Splunk db itself, not to our company) users from causing harm to data. But Kibana seems to be meant for a small cadre of trusted users. What if I write a dashboard with the same name as someone else's? Kibana doesn't even begin to discuss user isolation. But I am confident that it will. How can I tell Kibana to set the default Lucene query operator to AND instead of OR. Google is not my friend: I keep getting references to the Ruby versions of Kibana; that's ancient history by now. Kibana is cool and promising, but it has a long way to go for deployment to all of the folks in our company who currently have access to Splunk. Logstash has a nice book that's been very helpful, and logstash itself has been an excellent tool for prototyping. The book has been invaluable in helping me extract dates from log events and handling all of our different multiline events. But it still doesn't explain why the date filter needs a different array of matching strings to get the date that the grok filter has already matched and isolated. And recommendations to avoid the elasticsearch_http output and use elasticsearch (via the Node client) directly contradict the fact that logstash's 1.1.1 version of the ES client library is not compatible with the most recent 1.2.1 version of ES. And logstash is also a resource hog, so we eventually plan to replace it with Perl and Apache Flume (already in use) and pipe it into my Java bulk load tool (which is always kept up-to-date with the versions of ES we deploy!!). Because we send the data via Flume to our data warehouse, any losses in ES will be annoying but won't be catastrophic. And the front-end following of rotated log files will be done using the GNU *tail -F* command and option. This GNU tail command with its uppercase -F option follows rotated log files perfectly. I doubt that logstash can do the same, and we currently see that neither can Splunk (so we sporadically lose log events in Splunk too). So GNU tail -F piped into logstash with the stdin filter works perfectly in my evaluation setup and will likely form the first stage of any log forwarder we end up deploying, Brian On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote: We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk. On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote: We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2
Re: guarding from double-start
And in your config file, set: node.max_local_storage_nodes: 1 that way you won't start two nodes on a single instance On 20 June 2014 16:54, Andrew Gaydenko andrew.gayde...@gmail.com wrote: On Friday, June 20, 2014 6:49:04 PM UTC+4, Maciej Dziardziel wrote: use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile guarding es instance. Or just run this way: pgrep -f elasticsearch || ./start_es.sh Aha, thanks! - at my case pgrep is the most appropriate. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/115162b2-d679-48f0-a06e-24c47f74d079%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/115162b2-d679-48f0-a06e-24c47f74d079%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKTwyNM0DGJ_6HMoSbWmyJkSv5PObsfwGOF3tZ1a0QmJ9g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: problem indexing with my analyzer
You seriously don't want 3..250 length ngrams That's ENORMOUS Typically set min/max to 3 or 4, and that's it http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html#_ngrams_for_partial_matching On 20 June 2014 16:05, Tanguy Bernard bernardtanguy1...@gmail.com wrote: Thank you Cédric Hourcade ! Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit : If your base64 encodes are long, they are going to be splited in a lot of tokens by the standard tokenizer. Theses tokens are often going to be a lot longer than standard words, so your nGram filter will generate even more tokens, a lot more than with standard text. That may be your problem there. You should really try to strip the encoded images with a simple regex from your documents before indexing them. If you need to keep the source, put the raw text in an unindexed field, and the cleaned one in another. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKRS_zD%3DkVpKBpqp3hkcgJacAWsETGgJwMQJM%2BqJMuvscw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Adding order to a terms aggregator results in ArrayIndexOutOfBoundsException
I have a simple document schema on which I am trying to run the following query : curl -XPOST 'localhost:9200/indexName/topn/_search?pretty' -d '{ aggregations : { applid : { terms : { field : applid, size : 3, order : { ttbyt_sum : desc } }, aggregations : { tt : { filter : { and : { filters : [ { range : { t : { from : 140321160, to : 140321610, include_lower : true, include_upper : true } } }, { terms : { gid : [ abcd ] } } ] } }, aggregations : { byt_sum : { sum : { field : byt } } } } } } } }' This seems to give me back an error error : SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[rcP5ncimTpmcUZgvn5cgSw][indexName][0]: ArrayIndexOutOfBoundsException[null]}{[vauVf2XOQvOobpqIbp0REQ][indexName][2]: RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]]; nested: ArrayIndexOutOfBoundsException; }{[vauVf2XOQvOobpqIbp0REQ][indexName][1]: RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]]; nested: ArrayIndexOutOfBoundsException; }{[vauVf2XOQvOobpqIbp0REQ][indexName][4]: RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]]; nested: ArrayIndexOutOfBoundsException; }{[vauVf2XOQvOobpqIbp0REQ][indexName][3]: RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]]; nested: ArrayIndexOutOfBoundsException; }], status : 500 } When I take the order : { ttbyt_sum : desc } out, this seems to work fine. Also, the error only occurs for certain gid : [ abcd ] parameters. FOr example, it works for gid : [ 1234 ]. Could you suggest what is going wrong here? Elasticsearch version : { status : 200, name : Kylun, version : { number : 1.1.1, build_hash : f1585f096d3f3985e73456debdc1a0745f512bbc, build_timestamp : 2014-04-16T14:27:12Z, build_snapshot : false, lucene_version : 4.7 }, tagline : You Know, for Search } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0aed2aa9-e91b-43db-b917-11612458da2a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: guarding from double-start
On Saturday, June 21, 2014 2:33:28 AM UTC+4, Clinton Gormley wrote: And in your config file, set: node.max_local_storage_nodes: 1 that way you won't start two nodes on a single instance Great, thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/244c064e-3b2e-4b86-a2df-d1fa88617042%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Splunk vs. Elastic search performance?
Mark, I've read one post (can't remember where) that the Node client was preferred, but have also read where the HTTP interface is minimal overhead. So yes, I am currently using logstash with the HTTP interface and it works fine. I also performed some experiments with clustering (not much, due to resource and time constraints) and used unicast discovery. Then I read someone who strongly recommended multicast recovery, and I started to feel like I'd gone down the wrong path. Then I watched the ELK webinar and heard that unicast discovery was preferred. I think it's not a big deal either way; it's what works best for your particular networking infrastructure. In addition, I was recently given this link: http://aphyr.com/posts/317-call-me-maybe-elasticsearch. It hasn't dissuaded me at all, but it is a thought-provoking read. I am a little confused by some things, though. In all of my high-performance banging on ES, even with my time-to-live test feature enabled, I never lost any documents at all. But I wasn't using auto-id; I was specifying my own unique ID. And when run in my 3-node cluster (slow due to being hosted by 3 VMs running on a dual-code machine), I still didn't lose any data. So I am not sure of the high data loss scenarios he describes in his missive; I have seen no evidence of any data loss due to false insert positives at all. Brian On Friday, June 20, 2014 6:30:27 PM UTC-4, Mark Walkom wrote: I wasn't aware that the elasticsearch_http output wasn't recommended? When I spoke to a few of the ELK devs a few months ago, they indicated that there was minimal performance difference, at the greater benefit of not being locked to specific LS+ES versioning. Regards, Mark Walkom -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f7621a17-9366-4166-9612-61415938013f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: issues with file input from logstash to elastic - please read
Eitan, My recommendation is to use the stdin input in logstash and avoid its file input. Then, for testing you pipe the file into your logstash instance. But in production, you should run the GNU version of *tail -F* (uppercase F option) to correctly follow all forms of rotated logs, and the pipe that output into your logstash instance. I don't know just how robust logstash's file input is, but the GNU version of tail with the -F option is perfect, so there's no guesswork and no dependency on hope. Note that even Splunk has a currently open bug with losing data while trying to follow a rotated file. Also, I added the multiline processing to the filters; it didn't seem to work when applied as a stdin codec. Now it works very well together. Anyway, that's what our group is doing. And yes, the logstash-users https://groups.google.com/forum/#!forum/logstash-users group is also rather active and is a good place for logstash-specific help. Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9bbe59f4-93f1-4b59-8258-89301a8c5469%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch cluster on Azure using ubuntu. The nodes don't see each other
I just posted this question on Stackoverflow: I have been setting up a cluster of Elasticsearch in Azure, using Ubuntu VM, following the tutorial on the plugin page (elasticsearch-cloud-azure) on github. I've managed to configure everything and I have elasticsearch running, but I have 3 clusters of 1 Node instead of 1 Cluster of 3 nodes. I guess that the problem comes from: cloud: azure: keystore: /path/to/keystore password: your_password_for_keystore subscription_id: your_azure_subscription_id service_name: your_azure_cloud_service_name discovery: type: azure I'm not sure of what your_azure_cloud_service_name should be. I have all my nodes inside a Virtual Network, so they can communicate each other. By default, on azure each time I create a VM, a new Cloud Service containing only that VM is created. Should that value be different for each of the nodes in my cluster? I'm a bit lost on that one... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e2968f5d-9f67-421c-a60f-8fd5053317ce%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
update field type in existing mapping in elastic search
Hi , can you please provide inputs to update the existing field type in the mapping.Below is the requirement. I have crated contractIndex and it is type is conract. In that i have fields contractid as long, contract number as long but i want to change contract number type as string. Thanks, Srikanth. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e80c0884-9e18-4af2-8c04-69cde01fd3ab%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch cluster on Azure using ubuntu. The nodes don't see each other
You must create each VM under the same cloud service. azure vm create azure-elasticsearch-cluster Cloud service name is azure-elasticsearch-cluster -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 21 juin 2014 à 03:54, Pedro Alonso pedro@gmail.com a écrit : I just posted this question on Stackoverflow: I have been setting up a cluster of Elasticsearch in Azure, using Ubuntu VM, following the tutorial on the plugin page (elasticsearch-cloud-azure) on github. I've managed to configure everything and I have elasticsearch running, but I have 3 clusters of 1 Node instead of 1 Cluster of 3 nodes. I guess that the problem comes from: cloud: azure: keystore: /path/to/keystore password: your_password_for_keystore subscription_id: your_azure_subscription_id service_name: your_azure_cloud_service_name discovery: type: azure I'm not sure of what your_azure_cloud_service_name should be. I have all my nodes inside a Virtual Network, so they can communicate each other. By default, on azure each time I create a VM, a new Cloud Service containing only that VM is created. Should that value be different for each of the nodes in my cluster? I'm a bit lost on that one... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e2968f5d-9f67-421c-a60f-8fd5053317ce%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/64636299-774A-4C9B-865A-E3FEB85F326B%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: update field type in existing mapping in elastic search
You can't. You basically need to reindex. That said, you can try to use a multifield which add a String representation of the same field. But old values (old docs) won't have this new field populated. HTH -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 21 juin 2014 à 06:00, srikanth ramineni ri.srika...@gmail.com a écrit : Hi , can you please provide inputs to update the existing field type in the mapping.Below is the requirement. I have crated contractIndex and it is type is conract. In that i have fields contractid as long, contract number as long but i want to changecontract number type as string. Thanks, Srikanth. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e80c0884-9e18-4af2-8c04-69cde01fd3ab%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3FB4C99C-F209-4697-A902-3582C21711BF%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Re: relation between snapshot restore and update_mapping
I just discoverd these strange update_mapping loglines come from a completely unrelated thing, so please take this post as invalid and accept my apologies. On Thursday, June 19, 2014 1:21:32 PM UTC-4, JoeZ99 wrote: This is a somehow bizarre question. I really hope somebody jumps in, because I'm losing my mind. We've set a system by which our one-machine cluster gets updated indexes that have been made in other clusters by restoring snapshots. Long story short: for a few hours, the cluster is restoring snapshots, each one of them containing information about two indexes. of course , the global_state flag is set to false, because we don't want to recover the cluster, just those two indexes. Say during those few hours , the cluster have restored about 500 snapshots, one after another (there is never two restore processes at the same time). By looking at the logs, it goes flawlessly : [2014-06-19 00:00:01,318][INFO ][snapshots] [Svarog] restore [backups-1:5e51361312cb68f41e1cb1fa5597672a_ts20140618235915350570 ] is done [2014-06-19 00:00:02,363][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:08,653][INFO ][cluster.metadata ] [Svarog] [ 5e51361312cb68f41e1cb1fa5597672a_ts20140617220817522348] deleting index [2014-06-19 00:00:09,286][INFO ][cluster.metadata ] [Svarog] [ 5e51361312cb68f41e1cb1fa5597672a_phonetic_ts20140617220817904810] deleting index [2014-06-19 00:00:09,815][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:15,570][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:15,938][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:16,208][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:20,669][INFO ][snapshots] [Svarog] restore [backups-1:70e3583358803e70dc60a83953aaca9e_ts20140618235930121779 ] is done [2014-06-19 00:00:21,585][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:26,992][INFO ][cluster.metadata ] [Svarog] [ 70e3583358803e70dc60a83953aaca9e_ts20140617220848057264] deleting index [2014-06-19 00:00:27,601][INFO ][cluster.metadata ] [Svarog] [ 70e3583358803e70dc60a83953aaca9e_phonetic_ts20140617220848563815] deleting index after restoring the snapshot, outdated version of the indices are removed (because the indices recovered from the snapshot are newer). this goes quite well, and there is no significant load on the machine while doing this. but, at some poing, the cluster starts to issue udpate_mapping commands with no apparent reason (I'm almost sure there's been no interaction from outside)... [2014-06-19 04:38:36,293][INFO ][snapshots] [Svarog] restore [backups-1:99cbf66451446e6fe770878e84b4349b_ts20140619043819745139 ] is done [2014-06-19 04:38:37,238][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 04:38:44,016][INFO ][cluster.metadata ] [Svarog] [ 99cbf66451446e6fe770878e84b4349b_ts20140604042653951289] deleting index [2014-06-19 04:38:44,517][INFO ][cluster.metadata ] [Svarog] [ 99cbf66451446e6fe770878e84b4349b_phonetic_ts20140604042655159506] deleting index [2014-06-19 05:57:24,721][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 05:57:34,869][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 05:57:35,234span style=color: #660; class=styled ... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/949304b9-eba4-4328-badf-00f8288c36a3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Clarification on has_child filter memory requirements
Thanks Alex. What do you mean by not all parent documents (and not the data), just their ids what decides what which parent document ids get loaded? Also, this ids that get loaded are per query or they stay around longer? I ask because in our use case we're going to keep adding more and more parents and children. - Drew On Jun 20, 2014, at 12:04 AM, Alexander Reelsen a...@spinscale.de wrote: Hey, not all parent documents (and not the data), just their ids. Still this can accumulate, which is the reason why you should monitor the size of that data structure (exposed in the nodes stats). Hope that helps. --Alex On Thu, Jun 19, 2014 at 6:03 AM, Drew Kutcharian d...@venarc.com wrote: Based on the official docs (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html): {quote} memory considerations With the current implementation, all _parent field values and all _id field values of parent documents are loaded into memory (heap) via field data in order to support fast lookups, so make sure there is enough memory for it. {/quote} Does this mean that all the parent docs will be loaded into memory or the ones matching the filter? If the former is true, then it would mean that one should keep the size of the parent objects to minimum, right? In addition, say has_child is a part of a conjunction (regular filter AND has_child), would ES still load all the parent docs, or only the ones that matched the first filter? Thanks, Drew -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/FE901831-FB74-4F89-A313-16C1C08BF0A5%40venarc.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-%3Dvbk3BkFQBbuXybg_-QX%3DEj6Rou2QMzqbzXUsbYJV8w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E4598079-47FD-4B49-BE88-A0AE75E98622%40venarc.com. For more options, visit https://groups.google.com/d/optout.