can I specify a repository in the configuration file?
The standard procedure to register a repository is to issue a PUT command to the cluster. I'd like to automatize the process, in a way such as a script can build a search engine server and register a repository into it. However, I can not trust the server be ready and listening immediately after it's been started, so I'm afraid that if I issue the PUT command right after starting the elasticsearch server, the server won't be ready yet, and to repository is registered. That's why I thought of specifying the repository in the elasticsearch config file, but I haven't found any documentation regarding this. Is this possible? or , at least, is there any signal or event I can listen to to know when a elasticsearch server is ready? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3733f7f1-1272-4ed0-9484-f4d85a8b6e90%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: does snapshot restore lead to a memory leak?
Igor. Yes, that's right. My index only machines are just machines that are booted just for the indexing-snapshotting task. once there is no more tasks in queue, those machines are terminated. they only handle a few indices each time (their only purpose is to snapshot). I will do as you tell me. I guess I'll better wait to the timeframe in which most of the restores occurs, because that's when the memory consumption grows more, so expect those postings in 5 or 6 hours. On Wednesday, July 2, 2014 10:29:53 AM UTC-4, Igor Motov wrote: So, your search-only machines are running out of memory, while your index-only machines are doing fine. Did I understand you correctly? Could you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from the machine that runs out of memory, please run stats a few times with 1 hour interval. I would like to see how memory consumption is increasing over time. Please, also run nodes info ones (curl localhost:9200/_nodes) and post here (or send me by email) the results. Thanks! On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote: Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch the update. I explain: - we have one cluster of one machine which is only meant for serving search requests. the goal is not to index anything to it. It contains 1.7k indices, give it or take it. - every day, those 1.7k indices are reindexed, and snapshoted in pairs to a S3 repository (producint 850 snapshots)repository. - every day, the one reading only cluster of the first point restores those 850 snapshots to update its 1.7k indices from that same S3 repository. It works like a real charm. Load has dropped dramatically, and we can set a farm of temporary machines to do the indexing duties. But memory consumption never stops growing. we don't get any out of memory error or anything. In fact, there is nothing in the logs that shows any error, but after a week or a few days, the host has its memory almost exhausted and elasticsearch is not responding. The memory consumption is of course way ahead of the HEAP_SIZE We have to restart it and, when we do it we get the following error: java.util.concurrent.RejectedExecutionException: Worker has already been shutdown at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioSelector.registerTask(AbstractNioSelector.java:120) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. execute(DefaultChannelPipeline.java:636) at org.elasticsearch.common.netty.channel.Channels. fireExceptionCaughtLater(Channels.java:496) at org.elasticsearch.common.netty.channel.AbstractChannelSink. exceptionCaught(AbstractChannelSink.java:46) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. notifyHandlerException(DefaultChannelPipeline.java:658) at org.elasticsearch.common.netty.channel. DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream( DefaultChannelPipeline.java:781) at org.elasticsearch.common.netty.channel.Channels.write(Channels .java:725) at org.elasticsearch.common.netty.handler.codec.oneone. OneToOneEncoder.doEncode(OneToOneEncoder.java:71) at org.elasticsearch.common.netty.handler.codec.oneone. OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendDownstream(DefaultChannelPipeline.java:591) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendDownstream(DefaultChannelPipeline.java:582) at org.elasticsearch.common.netty.channel.Channels.write(Channels .java:704) at org.elasticsearch.common.netty.channel.Channels.write(Channels .java:671) at org.elasticsearch.common.netty.channel.AbstractChannel.write( AbstractChannel.java:248) at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse( NettyHttpChannel.java:158) at org.elasticsearch.rest.action.search.RestSearchAction$1. onResponse(RestSearchAction.java:106) at org.elasticsearch.rest.action.search.RestSearchAction$1. onResponse(RestSearchAction.java:98) at org.elasticsearch.action.search.type. TransportSearchQueryAndFetchAction$AsyncAction.innerFinishHim( TransportSearchQueryAndFetchAction.java:94
Re: relation between snapshot restore and update_mapping
I just discoverd these strange update_mapping loglines come from a completely unrelated thing, so please take this post as invalid and accept my apologies. On Thursday, June 19, 2014 1:21:32 PM UTC-4, JoeZ99 wrote: This is a somehow bizarre question. I really hope somebody jumps in, because I'm losing my mind. We've set a system by which our one-machine cluster gets updated indexes that have been made in other clusters by restoring snapshots. Long story short: for a few hours, the cluster is restoring snapshots, each one of them containing information about two indexes. of course , the global_state flag is set to false, because we don't want to recover the cluster, just those two indexes. Say during those few hours , the cluster have restored about 500 snapshots, one after another (there is never two restore processes at the same time). By looking at the logs, it goes flawlessly : [2014-06-19 00:00:01,318][INFO ][snapshots] [Svarog] restore [backups-1:5e51361312cb68f41e1cb1fa5597672a_ts20140618235915350570 ] is done [2014-06-19 00:00:02,363][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:08,653][INFO ][cluster.metadata ] [Svarog] [ 5e51361312cb68f41e1cb1fa5597672a_ts20140617220817522348] deleting index [2014-06-19 00:00:09,286][INFO ][cluster.metadata ] [Svarog] [ 5e51361312cb68f41e1cb1fa5597672a_phonetic_ts20140617220817904810] deleting index [2014-06-19 00:00:09,815][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:15,570][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:15,938][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:16,208][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:20,669][INFO ][snapshots] [Svarog] restore [backups-1:70e3583358803e70dc60a83953aaca9e_ts20140618235930121779 ] is done [2014-06-19 00:00:21,585][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 00:00:26,992][INFO ][cluster.metadata ] [Svarog] [ 70e3583358803e70dc60a83953aaca9e_ts20140617220848057264] deleting index [2014-06-19 00:00:27,601][INFO ][cluster.metadata ] [Svarog] [ 70e3583358803e70dc60a83953aaca9e_phonetic_ts20140617220848563815] deleting index after restoring the snapshot, outdated version of the indices are removed (because the indices recovered from the snapshot are newer). this goes quite well, and there is no significant load on the machine while doing this. but, at some poing, the cluster starts to issue udpate_mapping commands with no apparent reason (I'm almost sure there's been no interaction from outside)... [2014-06-19 04:38:36,293][INFO ][snapshots] [Svarog] restore [backups-1:99cbf66451446e6fe770878e84b4349b_ts20140619043819745139 ] is done [2014-06-19 04:38:37,238][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 04:38:44,016][INFO ][cluster.metadata ] [Svarog] [ 99cbf66451446e6fe770878e84b4349b_ts20140604042653951289] deleting index [2014-06-19 04:38:44,517][INFO ][cluster.metadata ] [Svarog] [ 99cbf66451446e6fe770878e84b4349b_phonetic_ts20140604042655159506] deleting index [2014-06-19 05:57:24,721][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 05:57:34,869][INFO ][repositories ] [Svarog] update repository [backups-1] [2014-06-19 05:57:35,234span style=color: #660; class=styled ... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/949304b9-eba4-4328-badf-00f8288c36a3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
can the url repository type be used as a ready only repo for the s3 repository?
the url type is used in combination with the fs type. some machines can write/read snapshots to a fs type repository, and same machines can only read for a url repository which points to the same location the fs repository points at. Is this behavior by any chance possible using S3 repositories??? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/30022b27-3fa2-4fb3-9e2f-2d5b86523425%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
ignore_missing flag in snapshot listing?
When I want to list the snapshots that are within a certain repository, I issue the following command: curl -XGET http://localhost:9200/_snapshots/repository_name/_all as I understand, this is the only way of doing it. However, chances are that while I'm issuing that command, some other process may delete a snapshot. I' ve found that it may provoke an error response from the list command, given the right conditions. I've figured that when the _all api call is made, elastic search finds out about every snapshot it has in the repository, and then fetch info from every one of them (by doing something similar to what curl -XGET http: //localhost:9200/_snapshots/repository_name/snapshot_name does), and the returns a listing with the collected info. If a snapshot is deleted by some other process AFTER elasticsearch has got all the snapshot names, and BEFORE elasticsearch starts collecting info of every one of them, an error related to elasticsearch not being able to find that particular snapshot is thrown. My question is - Is it possible to use a flag like ignore_missing_snapshots or something like that when making the curl -XGET http: //localhost:9200/_snapshots/repository_name/_all call? - Can I list by prefix, telling elastic search I want to list only the snapshots that starts by certain prefix? something like -XGET http://localhost:9200/_snapshots/repository_name/_all?prefix=prefix that way I could make sure the listing process doesn't interfere with the possible deletion process -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d31b55f3-2393-483e-bf7a-560c956483ad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Using snapshotrestore to separate indexing from searching
as I posted before, our system does not fit very well in cluster structure, because we have many small indices in place (about 1k indices with an average of 6k records each), we guessed that with so many small indices, the cluster spent too much time and resources which nodes should be master , or where to locate absurdly small shards, etc... Bottom line is that the cluster always ended up not working right. BTW, I'm suspecting that with a few advanced tuning options of the cluster (shard routing and the like) we may be able to put it on again, but unfortunately we can't find that kind of knowledge in the standard doc. If any of you have any hint on this, it would be greatly appreciated!!! Anyway, we need to scale the system somehow, and this is what we've come up with: - Our indices can have configuration variations that make a reindex needed at any time. it doesn't happen a lot, but it happens, and with 1k indices, it's bound to happen. - Indexing data is regenerated everyday, so every day the whole set of indices is re-created (we figured it's much faster to recreate the index than to update an existing one replacing everyone of its records) We would like the machines used for searching results are only used for that, and never used for indexing/reindexing ops, because we don't want the user experience to suffer when searching against an already loaded server because it's doing some heavy indexing. In our ideal scenario, indexing/reindexing would be done in devoted machines, which can be as many as needed, and searching would be done in different machines. We plan to use the snapshot/restore feature for that. Any time an index/reindex is needed, it would be done on one of these indexing machines, and then the fresh index would be snapshotted, to be restored to the search machine afterwards. We should have some client control to make sure the snapshot process is only once at a time, it's my understanding that this is not the case in the restore process (i.e. you can have more than one restore process running on a cluster). Individual item index can happen occasionally, but I figure when that happens we can just index to both the searching machines and the indexing machines, because it's never going to be big. Please understand cluster instead of machine How crazy does this whole thing sound, Is there any other way we can get some scalability? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82d7dd51-1b86-4b0f-8abc-425a45f1dfac%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any way to force Replica inside particular Node
well, I guess it depends of the number of shards, number of replicas settings you may have. say you have 5 shards, 1 replica in your settings. if you have two nodes uprunning, you should have 5 shards in each one. among those shards, some of them will be primary, some of them will be replicas. index ops can only be done in primary shards, read ops can be done in either primary shard or replica shard but which shard is in which node is up to elasticsearch, it can be 3 primary shards and 2 replica shards on the first node, and the other way around on the second, for instance. there are some tweaks you can do, like shard allocation and the awareness setting. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html On Tuesday, March 11, 2014 11:36:27 AM UTC-4, Michael Lussier wrote: Are you positive you have two nodes up and running? This should be a green state with 2 nodes, 2 shards, 1 replica. - -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Is-there-any-way-to-force-Replica-inside-particular-Node-tp4051537p4051560.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/75ea93b7-422e-4b50-af5e-90e652578d7c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Zero Downtime Reindexing
How about, while the scan is being done, let updates go to the old index but with an extra field? Once the alias points to the new index, it's just a query to fetch the fields with that new field from the old index and then reindex then into the new one. If the alias changing/new index creation is unsuccessful , then update old index to remove that new field. On Friday, February 21, 2014 3:11:52 AM UTC-5, Andrew Kane wrote: I tried to post a reply yesterday but it looks like it never made it. Thank you all for the quick replies. Here's a slightly better explanation of where I believe the race condition occurs. When the scan/scroll starts, the alias is still pointing to the old index, so updates go to the old index. Let's say you update Document 1. If the scroll/scan has already passed Document 1, the new index never sees the update. The three solutions you mentioned Nik are to either: 1. Keep track of updates manually [tedious] 2. Pause the jobs that perform the updates [out of sync] 3. Send updates to both indexes [also tedious] However, none of these seem ideal. - Andrew On Tuesday, February 18, 2014 8:41:18 PM UTC-8, Andrew Kane wrote: Hi, I've followed the documentation for zero-downtime mapping changes and it works great. http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ However, there is a (pretty big) race condition with this approach - while reindexing, changes may not make it to the new index. I've looked all over and haven't found a single solution to address this. The best attempt I've seen is to buffer updates, but this is tedious and still leaves a race condition (with a smaller window). My initial thoughts were to create a write alias that points to the old and new indices and use versioning. However, there is no way to write to multiple indices atomically. It seems like this issue should affect most Elasticsearch users (whether they realize it or not). Does anyone have a good solution to this? Thanks, Andrew -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5eff28f1-aec6-4fd1-b52d-168191e1de30%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: can't order by _boost field, even when index:not_analyzed
Ok, . Good to know, I wish I knew this a few days before ;-) I was really loosing my mind on this!! yet another reason for dropping indexing time defined boost, I guess. I really wish there were any way of defining per-document boost at index time. txs! On Monday, February 24, 2014 5:14:48 PM UTC-5, Binh Ly wrote: Yup, this is a known bug. Since _boost is being deprecated and replaced by function_score, this will likely not be fixed. For now if you want to sort on a boost value, either remove the _boost from your mapping, or introduce another field that you don't refer to from _boost. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b1bd446d-8fc9-4d58-84aa-f86f6d3a43e1%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
can't order by _boost field, even when index:not_analyzed
according to martijin's remarks on https://groups.google.com/forum/#!topic/elasticsearch/A5DSgvnTnC0 and also this issue https://github.com/elasticsearch/elasticsearch/issues/3752 , search results should be sortable by _boost field. my particular setup: { product: { properties: { _boost: { type: float, null_value: 1.0, index: not_analyzed } } } when using a match_all query, I can sort for any numeric field, but the _boost, that does not work. I'm using 0.90.10 Also, I don't know if this is related, but when I try to get the mapping via localhost:9200/index_name/_mapping , the _boost field doesn't show up. I guess it's because of its special nature. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e22fe979-1978-4e5d-8a7c-73a6dd238cdf%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Do I need to wrap a MissingField Filter inside a bool filter?
just for the purpose of clarification. the bitset feature is equivalent to the cacheable feature. the AND/OR filters can't cache its results since they always have to compare to other docs , but if a filter can be translated into a bitset, then it can be saved for future references, and that doc hasn't to be examined again in order to see if complies with the filter, but a single look at its bitset records should be enough. ?? On Tuesday, February 18, 2014 2:24:29 PM UTC-5, Binh Ly wrote: The missing filter should be cached by default if that's what you wanted to know: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-missing-filter.html So no need to bool it if that's what you're worried about. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d43eaed-5534-4842-a3f1-73a7d648bc1d%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
the fastest filter under the sun?
which is the fastest possible filter I could use to exclude documents from a search, provided that I have control of what fields those documents have? in an ecommerce search engine, we used to not to index products that were out of stock. however, if for some reason the ecommerce owner wants to show the out of stock products, she has to reindex them again. following a tip from Igor Motov in another thread, I'm considering adding some out_of_stock field to every item, let them all be indexed, and use a filtered alias. since I have control on the field I'm using for filtering, I figured I could use the one that allows me the fastest filter under the sun. I've assumed that would be the missing field filter, with existence:True. i.e. only the documents that don't have the out_of_stock field pass the filter (I don't mind the field's value) Am I right? is this the fastest/more performant filter I can use? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ffae84ea-957c-4ecf-b5f1-9a1456437860%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
using the internal transport module for moving data between clusters
transport module is the module elastic search uses for moving shards around in the cluster. can it be used somehow to move index data between different clusters? the point here is avoid the whole scanning in source / indexing in destiny thing, which is essentially the solution all the moving-data-between-clusters implementations I've seen are based on. Now that I have your attention: this is my case. - have around 700 indexes. each one of around 7k records. relatively small. - the ES cluster does not work well with so many small indexes, it wastes too much time deciding which node is master and which not - we need to separate indexing from searching - one solution is index in one machine and then transfer the index to the search machine. - if we do it the standard way, it implies indexing the dump from the index machine into the search machine, so no performance is gained. - One solution would be to move data between source and destiny the same way ES moves data inside a cluster, which I bet is much more efficient than the dumping/reindex approax Is this possible? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/465c18c5-22b4-4150-a128-1f616efa0c47%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.