You are correct in all accounts.
On Wed, Mar 5, 2014 at 2:07 AM, Ravikumar Govindarajan < [email protected]> wrote: > Aaron, > > I find that shard-servers are registered as ephemeral nodes in ZK. > > Does this mean that, when a shard-server process gets killed {OOM-killer > etc...}, the layout gets re-built? > Just to point out another feature. There is a GC watcher feature in Blur that monitors the heap level after each GC and if the heap isn't below a configured level all the running queries get interrupted. This exists to try and prevent OOM-killers. > > Rolling-upgrade or restarts will be performed in shard-servers one by one. > Will the layout get re-built here also? [In this case, layout calculation > is performed multiple times in a very short-span of time] > Depending on your cluster size in server and in data, one might want to kill and restart more than one shard server at a time. In the future we would like to allow for a migration of indexes to other servers before killing the node. Allowing for a smoother transition. Aaron > > -- > Ravi > > > On Mon, Feb 17, 2014 at 9:20 PM, Ravikumar Govindarajan < > [email protected]> wrote: > > > This will cancel all current queries with a > >> BlurException of type BACK_PRESSURE, the controller will automatically > >> retry the queries that were cancelled > > > > > > This is a real nice feature and most handy in production too, as a > > protection gainst both intentional and un-intentional bad queries > > > > > > Since every update to the index via thrift mutates are committed and > >> syncedto HDFS, it possible to have other shards follow the lead shard > >> (the > > > > writer) > > > > I think I am back to data-locality we previously discussed with your > "lead > > shard" remark. It is now possible to pin data-nodes to a file during > writes > > in hadoop. https://issues.apache.org/jira/browse/HDFS-2576 > > > > We can always pin a shard to 3 shard-servers [assuming shard-server is > run > > in datanodes] and persist this ZK. Failures can be easily handled using > > this info > > > > The big advantage of this method is, even in case of failures > > short-circuit reads are always utilized. I know you are not a big-fan of > > it, but still is it worthwhile to pursue it? > > > > One downside to this approach is, cluster balance might get skewed and > > need to be properly handled in HDFS > > > > -- > > Ravi > > >
