Cool, thanks for sharing the info! On Fri, May 6, 2016 at 3:48 AM, Ravikumar Govindarajan < [email protected]> wrote:
> We migrated about 30% of our data in Blur. > > But our servers were struggling to index the remaining 60% of data coupled > with new incoming documents for already migrated 30% user-base... > > We wanted to isolate migration (data-pumping) from normal indexing/search > requests.. > > Blur made it very easy for me. > > In MasterBasedFactory, I reserve a shard-range for migration & make sure > only a certain set of machines serve it. Once the shards reach a desired > size(15-20GB) , I de-reserve it (Normal machines pick it up & start serving > incoming requests immediately). Newer reserved shards are allocated to > continue our data pumping un-interrupted. > > This was a big problem for us & the hidden gems of Blur continue to suprise > & help us so much. > > Thanks a lot... > > -- > Ravi > > On Wed, Mar 19, 2014 at 12:07 PM, Ravikumar Govindarajan < > [email protected]> wrote: > > > Sure will take up 0.2.2 codebase. Thanks for all your help > > > > > > On Tue, Mar 18, 2014 at 4:24 PM, Aaron McCurry <[email protected]> > wrote: > > > >> On Tue, Mar 18, 2014 at 1:30 AM, Ravikumar Govindarajan < > >> [email protected]> wrote: > >> > >> > > > >> > > If I understand this one. Favor the primary response until a > certain > >> > > amount of time has passed then fall back to the secondary response > >> > assuming > >> > > it's available to return. > >> > > >> > > >> > Exactly. This is one such option. Another option is the > >> first-past-the-post > >> > > >> > Buffer cache? Are you referring to block cache? > >> > > >> > > >> > Yup. Was referring to the block-cache here. But like you said, we can > >> just > >> > let it fall off the LRU > >> > > >> > The interesting thing here is that Blur is fully committed to disk > >> (HDFS) > >> > > >> > upon each mutate > >> > > >> > I think this is a new feature that I have missed in Blur. Will for > sure > >> > check it out. This auto-solves the stale-read issue also > >> > > >> > The problem now is, I am doing quite low-level changes on top of blur. > >> Some > >> > of them are.. > >> > > >> > 1. Online Shard-Creation > >> > 2. Externalizing RowId->Shard mapping via BlurPartitioner > >> > 3. Splitting shards upon reaching configured size > >> > 4. Secondary read-only shard for availability... > >> > > >> > >> I would love hear about more of the details of the implementations of > >> these. :-) > >> > >> > >> > > >> > and many more such stuff needed for our app > >> > > >> > Hope to share and get feedback for these changes from Blur community > >> once > >> > the system survives a couple of production-cycles. > >> > > >> > >> That would be awesome. Based on your other email, I would strongly > >> recommend you take a look at the 0.2.2 codebase. It has MANY fixes, > >> performance improvements, and stability enhancements. Let us know if > you > >> have any questions. > >> > >> Aaron > >> > >> > >> > > >> > -- > >> > Ravi > >> > > >> > > >> > On Mon, Mar 17, 2014 at 7:17 PM, Aaron McCurry <[email protected]> > >> wrote: > >> > > >> > > On Sat, Mar 15, 2014 at 12:57 PM, Ravikumar Govindarajan < > >> > > [email protected]> wrote: > >> > > > >> > > > Aaron, > >> > > > > >> > > > I was thinking about another way of utilizing read-only shards > >> > > > > >> > > > Instead of logic/intelligence of finding a primary replica > >> > > struggling/down, > >> > > > can we opt for pushing a logic on client-side? > >> > > > > >> > > > We can take a few approaches as below > >> > > > > >> > > > 1. Query both primary/secondary shards in parallel and return > which > >> > ever > >> > > > comes first > >> > > > >> > > > >> > > > 2. Query both primary/secondary shards in parallel. Wait for > primary > >> > > > response as per configured delay. If not forthcoming, return > >> > secondary's > >> > > > response > >> > > > > >> > > > >> > > If I understand this one. Favor the primary response until a > certain > >> > > amount of time has passed then fall back to the secondary response > >> > assuming > >> > > it's available to return. > >> > > > >> > > > >> > > > > >> > > > These are useful only when client agrees for a "stale-read" > >> scenario. > >> > > > "stale-read" in this case will be the last-commit of the index. > >> > > > > >> > > > What I am aiming at, is in the case of layout-conscious apps > [layout > >> > does > >> > > > not change when VM update/crash/hang is restarted], we can always > >> > > fall-back > >> > > > on replica reads, resulting in greater availability but lesser > >> > > consistency > >> > > > > >> > > > A secondary-replica layout need to be present in ZK. > Replica-shards > >> > > should > >> > > > be always served from a server other than primary. May be we can > >> > > switch-off > >> > > > buffer-cache for replica reads, as it is used only temporarily > >> > > > > >> > > > >> > > Buffer cache? Are you referring to block cache? Or a query cache? > >> Just > >> > > as a FYI, Blur's query cache is currently disabled. As for the > block > >> > > cache, maybe. The block cache seems to help performance quite a bit > >> and > >> > > usually is does so at little cost. Also, we could flush the > secondary > >> > > shard from the cache from time to time. Or we could just let it > fall > >> out > >> > > of the LRU. > >> > > > >> > > > >> > > > > >> > > > 95% apps queue their indexing operations and can always retry > after > >> > > primary > >> > > > comes back online. > >> > > > > >> > > > >> > > The interesting thing here is that Blur is fully committed to disk > >> (HDFS) > >> > > upon each mutate. So assuming that the secondary shard has > refreshed, > >> > the > >> > > primary shard being down just means that you can't write to that > >> shard. > >> > > Reads should be in the same state. > >> > > > >> > > > >> > > > > >> > > > Please let me know your views on this > >> > > > > >> > > > >> > > I like all these ideas, the only thing I would add is that we we > would > >> > need > >> > > to build these sort of options into Blur on a configured per-table > >> basis. > >> > > The querying both primary and secondary shards at the same time > could > >> > > produce the most consistent respond times but at the cost of CPU > >> > resources > >> > > (obviously). > >> > > > >> > > Thanks for the thoughts and ideas! I like it! > >> > > > >> > > Aaron > >> > > > >> > > > >> > > > > >> > > > -- > >> > > > Ravi > >> > > > > >> > > > > >> > > > On Sat, Mar 8, 2014 at 8:56 PM, Aaron McCurry <[email protected] > > > >> > > wrote: > >> > > > > >> > > > > On Fri, Mar 7, 2014 at 5:42 AM, Ravikumar Govindarajan < > >> > > > > [email protected]> wrote: > >> > > > > > >> > > > > > > > >> > > > > > > Well it works that way for OOMs and for when the process > drop > >> > hard > >> > > > > (Think > >> > > > > > > kill -9). However when a shard server is shutdown it > >> currently > >> > > ends > >> > > > > it's > >> > > > > > > session in ZooKeeper, thus triggering a layout change. > >> > > > > > > >> > > > > > > >> > > > > > Yes, may be we can have a config to determine whether it shud > >> > > > > end/maintain > >> > > > > > the session in ZK when doing a normal shutdown and then > >> subsequent > >> > > > > restart. > >> > > > > > By this way, both MTTR-conscious and layout-conscious settings > >> can > >> > be > >> > > > > > supported. > >> > > > > > > >> > > > > > >> > > > > That's a neat idea. Once we have shards being served on > multiple > >> > > servers > >> > > > > we should definitely take a look at this. When we implement the > >> > > > > multi-shard serving I would guess that there will be 2 layout > >> > > strategies > >> > > > > (they might be implemented together). > >> > > > > > >> > > > > 1. Would be to get the N replicas online on different servers. > >> > > > > 2. Would the writing leader for the shard, assuming that it's > >> needed. > >> > > > > > >> > > > > > >> > > > > > > >> > > > > > How do you think we can detect that a particular shard-server > is > >> > > > > > struggling/shut-down and hence incoming search-requests need > to > >> go > >> > to > >> > > > > some > >> > > > > > other server? > >> > > > > > > >> > > > > > I am listing few paths off the top of my head > >> > > > > > > >> > > > > > 1. Process baby-sitters like supervisord, alerting controllers > >> > > > > > 2. Tracking first network-exception in controller and > diverting > >> to > >> > > > > > read-only > >> > > > > > instance. Periodically may be re-try > >> > > > > > 3. Take a statistics based decision, based on previous > response > >> > times > >> > > > > etc.. > >> > > > > > > >> > > > > > >> > > > > Anding to this one and this may be obvious but measuring the > >> response > >> > > > time > >> > > > > in comparison with other shards. Meaning if the entire cluster > is > >> > > > > experiencing an increase in load and all responses times are > >> > increasing > >> > > > we > >> > > > > wouldn't want to start killing off shard servers inadvertently. > >> > > Looking > >> > > > > for outliers. > >> > > > > > >> > > > > > >> > > > > > 4. Build some kind of leasing mechanism in ZK etc... > >> > > > > > > >> > > > > > >> > > > > I think that all of these are good approaches. Likely to > >> determine > >> > > that > >> > > > a > >> > > > > node is misbehaving and should be killed/not used anymore we > would > >> > want > >> > > > > multiple ways to measure that condition and then vote on the > need > >> > kick > >> > > > out. > >> > > > > > >> > > > > > >> > > > > Aaron > >> > > > > > >> > > > > > > >> > > > > > -- > >> > > > > > Ravi > >> > > > > > > >> > > > > > > >> > > > > > On Fri, Mar 7, 2014 at 8:01 AM, Aaron McCurry < > >> [email protected]> > >> > > > > wrote: > >> > > > > > > >> > > > > > > On Thu, Mar 6, 2014 at 6:30 AM, Ravikumar Govindarajan < > >> > > > > > > [email protected]> wrote: > >> > > > > > > > >> > > > > > > > I came to know about zk.session.timeout variable just now, > >> > while > >> > > > > > reading > >> > > > > > > > more about this problem. > >> > > > > > > > > >> > > > > > > > This will only trigger dead-node notification after the > >> > > configured > >> > > > > > > timeout > >> > > > > > > > exceeds. Setting it to 3-4 mins must be fine for OOMs and > >> > > > > > > rolling-restarts. > >> > > > > > > > > >> > > > > > > > >> > > > > > > Well it works that way for OOMs and for when the process > drop > >> > hard > >> > > > > (Think > >> > > > > > > kill -9). However when a shard server is shutdown it > >> currently > >> > > ends > >> > > > > it's > >> > > > > > > session in ZooKeeper, thus triggering a layout change. > >> > > > > > > > >> > > > > > > > >> > > > > > > > > >> > > > > > > > Only extra stuff I am looking for, is to divert search > calls > >> > to a > >> > > > > > > read-only > >> > > > > > > > shard instance during this 3-4 mins time to avoid > >> mini-outages > >> > > > > > > > > >> > > > > > > > >> > > > > > > Yes, and I think that the controllers will automatically > >> spread > >> > the > >> > > > > > queries > >> > > > > > > across those servers that are online. The BlurClient class > >> > already > >> > > > > > takes a > >> > > > > > > list of connection strings and treats all connections as > >> equals. > >> > > For > >> > > > > > > example, it's current use is to provide the client with all > >> the > >> > > > > > controllers > >> > > > > > > connection strings. Internally if any one of the > controllers > >> > goes > >> > > > down > >> > > > > > or > >> > > > > > > has a network issue another controller is automatically > >> retried > >> > > > without > >> > > > > > the > >> > > > > > > user having to do anything. There is back off, ping, and > >> pooling > >> > > > logic > >> > > > > > in > >> > > > > > > the BlurClientManager that the BlurClient utilizes. > >> > > > > > > > >> > > > > > > Aaron > >> > > > > > > > >> > > > > > > > >> > > > > > > > > >> > > > > > > > -- > >> > > > > > > > Ravi > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > On Thu, Mar 6, 2014 at 3:34 PM, Ravikumar Govindarajan < > >> > > > > > > > [email protected]> wrote: > >> > > > > > > > > >> > > > > > > > > What do you think of giving an extra leeway for > >> shard-server > >> > > > > > failover > >> > > > > > > > > cases? > >> > > > > > > > > > >> > > > > > > > > Ex: Whenever a shard-server process gets killed, the > >> > > > > controller-node > >> > > > > > > does > >> > > > > > > > > not immediately update-layout, but rather mark it as a > >> > suspect. > >> > > > > > > > > > >> > > > > > > > > When we have a read-only back-up of shard, searches can > >> > > continue > >> > > > > > > > > unhindered. Indexing during this time can be diverted > to a > >> > > queue, > >> > > > > > which > >> > > > > > > > > will store and retry-ops, when shard-server comes online > >> > again. > >> > > > > > > > > > >> > > > > > > > > Over configured number of attempts/time, if the > >> shard-server > >> > > does > >> > > > > not > >> > > > > > > > come > >> > > > > > > > > up, then one controller-server can authoritatively mark > >> it as > >> > > > down > >> > > > > > and > >> > > > > > > > > update the layout. > >> > > > > > > > > > >> > > > > > > > > -- > >> > > > > > > > > Ravi > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >
