[hypertable-dev] Re: Load balancing design

gordon Mon, 27 Dec 2010 06:23:28 -0800

Hey Doug,

I think the sys/RS_STATS table is a good design for this data flow --
we might create a convenience API to make it easier for the balancer
to grab the data.


The balancer needs an API that allows us to collect state data and
response / feedback data from the system (operational / performance
metrics) so we can learn the relationships there.  The API also needs
to present controls to the balancer so that it can take actions to try
and move the system to states that are associated with good
performance metrics.

We'll make this more concrete soon ...

On Dec 24, 7:27 am, Doug Judd <[email protected]> wrote:
> It's a little late, sorry.  So, correct me if I'm wrong, the training data
> is just another name for the feedback that is collected.  The training data
> would come from two places, the sys/RS_STATS table and an additional method
> of the LoadBalancer receive_monitoring_data() which is how it would be fed
> the monitoring data.  I suspect that's the missing API that Charles was
> referring to.
>
> One thing to keep in mind is that the reason we chose to persist the
> per-range state in the sys/RS_STATS table is so that the system would be
> designed for scale.  When doing back-of-the-envelope calculations, I use 3K
> range servers, 10K ranges per server, and approximately 1K worth of data per
> range.  This comes out to about 30GB which makes it infeasible to pull the
> data over to the Master and feed it into the LoadBalancer.  That's why we've
> introduced the sys/RS_STATS table.
>
> - Doug
>
> On Thu, Dec 23, 2010 at 10:04 PM, Doug Judd <[email protected]> wrote:
> > I think my confusion came from the Wikipedia definitions for 
> > supervised<http://en.wikipedia.org/wiki/Supervised_learning>vs.
> > unsupervised <http://en.wikipedia.org/wiki/Unsupervised_learning>learning.  
> > The supervised learning page is the only one that discusses the
> > use of training data.
>
> > The descriptions you give for online vs. batch sound very similar.  Get
> > feedback, take action, get feedback, take action...  From a system
> > infrastructure standpoint, is there any difference?  Does the traditional
> > connotation of the word "batch" come into play at all?  For example, does a
> > batch system accumulate a large "batch" of feedback that is processed at
> > once, whereas an online system processes feedback more continuously?
>
> > On Thu, Dec 23, 2010 at 7:52 PM, Gordon <[email protected]> wrote:
>
> >> These are great comments -- Doug, on the learning problem, I think you are
> >> referring to online learning versus batch learning. In an online learning
> >> setting we we would get system state information, take actions, and then 
> >> get
> >> feedback from the system to learn better policies or better estimates for
> >> the value of actions for any given state.
>
> >> In a batch learning setting we could accumulate training data in the form
> >> of system state and system performance as feedback -- then the balancer
> >> takes actions to try and move the system to states that have have
> >> experienced good system performance in the past.
>
> >> In each of the cases we have a notion of collecting feedback (i.e. system
> >> performance) associated with system state or directly from our actions as 
> >> we
> >> go about the various learning tasks so it's supervised learning in both
> >> cases.
>
> >> On Fri, Dec 24, 2010 at 1:46 AM, Doug Judd <[email protected]> wrote:
>
> >>> Hi Charles,
>
> >>> Thanks for the feedback.  Comments inline ...
>
> >>> On Thu, Dec 23, 2010 at 8:12 AM, Charles <[email protected]>wrote:
>
> >>>> [...]
>
> >>> 1. Although the design document talks about passing training data to
> >>>> the LoadBalancer object, glancing briefly at the pseudocode for the
> >>>> class definition it's not clear to me what the API is for passing the
> >>>> training data is or what the format would be.
>
> >>> We were thinking more along the lines of unsupervised learning.  We
> >>> certainly could explore a supervised learning approach.  Any ideas for 
> >>> what
> >>> the training data should look like?  Also, how could we go about 
> >>> generating
> >>> it?
>
> >>>> 2. I'm assuming that the LoadBalancer has full access to the items in
> >>>> the data table itself during the balance operation, in case it wants
> >>>> to collect data about them for use in the prediction.  Not clear if
> >>>> this would be useful b/c of the cost involved in gathering this data,
> >>>> but interesting to explore.
>
> >>> When you say "data table" I assume you're referring to the sys/RS_STATS
> >>> table.  The LoadBalancer will have full access to the items in this table.
> >>>  In fact this table exists solely to feed the LoadBalancer performance
> >>> statistics.  This table is populated by the RangeServers directly to
> >>> minimize the impact of statistics gathering on the system.  This means 
> >>> that
> >>> the load balancer and the RangeServers will need to coordinate on what
> >>> information gets collected, how often, and how much historical information
> >>> will be kept around.
>
> >>> 3. I would expect that practical implementations of LoadBalancer would
> >>>> want a way to serialise their nontrivial state (presumably using HT
> >>>> itself), but not sure if there's any special API support required for
> >>>> that.  (Maybe a reserved table for LB data?)
>
> >>> The LoadBalancer runs in the Master process (at least for now).  The
> >>> Master has a meta log (MML) that is written to the underlying DFS.
> >>>  Currently the plan is to have the basic balancer serialize state about
> >>> in-progress balance operations in the MML.  We can use the MML to persist
> >>> other LoadBalancer state as well.  However, the MML is designed to hold a
> >>> very small amount of data.  If you think there might be need for 
> >>> persisting
> >>> a very large amount of state, we can consider another system table
> >>> (sys/balancer).
>
> >>> 4. It may be worth providing a convenience implementation of
> >>>> LoadBalancer that works in the batch setting like the basic algorithm,
> >>>> i.e., a superclass for load balancers that want to operate once a day
> >>>> based on data that has been collected in the last 24 hours.
>
> >>>  Sounds good.
>
> >>>> 5. A LoadBalancer might want to use different strategies for the cases
> >>>> of adding a new range server versus high variance among servers.  Is
> >>>> there a way for the master to signal which of these situations is the
> >>>> case?
>
> >>> The monitoring data that gets fed into the LoadBalancer on a regular
> >>> interval (e.g. 30 seconds) will contain range server and range count
> >>> information for all of the range servers.  If a new range server suddenly
> >>> appears with a range count of zero, that would imply that a new server was
> >>> added.
>
> >>> 6. Of course an effective challenge problem would also require a test
> >>>> workload that is challenging enough to be representative of real
> >>>> usages of the load balancer.  As close to real usage as possible would
> >>>> be best, to try to forestall the danger of designing ML algorithms
> >>>> that are strong enough to learn the features of the synthetic problem
> >>>> generator but not that of real data.
>
> >>> We'll work on pulling some real-world workload together.  The realtime
> >>> Twitter stream sample <http://dev.twitter.com/pages/streaming_api> might
> >>> be a good place to start.
>
> >>>> 7. It is unclear what the optimal granularity for aggregating the
> >>>> range counts would be (could be less than 30 sec, or more).  Might
> >>>> want to have this settable parameter of the master.  Note that this is
> >>>> orthogonal to how often the master decides to send data to the load
> >>>> balancer, e.g., the master could send data every thirty seconds that
> >>>> are 6 bins of counts recorded every 5 sec.
>
> >>> The LoadBalancer API has a method for publishing the stats gathering
> >>> interval, so balancers would have a way to change it.  There's not much 
> >>> cost
> >>> associated with sending data to the LoadBalancer, so I don't think the
> >>> vector approach would be necessary.  We empirically chose 30 seconds as 
> >>> the
> >>> default because there is some overhead involved in gathering statistics,
> >>> including network communication with all of the range servers and mutex
> >>> locking/unlocking for each range managed by each server.
>
> >>> 8.  Wrt the objective functions, different objective performance
> >>>> metrics
> >>>> that are of interest to the user, and the user might want to have
> >>>> knobs to say (e.g.) exactly what SLA they would like satisfied.  But
> >>>> it's not clear to me whether this is part of load balancing (i.e.,
> >>>> deciding which ranges are served by which server) or auto-scaling
> >>>> (deciding how many servers to have).  It may be too early to lock down
> >>>> an API on this without having more experience with practical
> >>>> SML/Optimization load balancers.
>
> >>> We were thinking that "load average" would be a good overall objective
> >>> performance metric.  But now that you mention it, I suppose optimizing for
> >>> query latency or overall throughput, might yield a different balance.
> >>>  Adding this sort of user input is trivial to do.
>
> >>> On a related note, there are a number of other places in the system that
> >>> could benefit from machine learning.  Query cache size is one that comes 
> >>> to
> >>> mind.  Currently the query cache size is statically configured, with a
> >>> default size of 50MB.  It would be great to learn the optimal size for 
> >>> this
> >>> cache to improve query latency.
>
> >>> - Doug
>
> >>>  --
> >>> You received this message because you are subscribed to the Google Groups
> >>> "Hypertable Development" group.
> >>> To post to this group, send email to [email protected].
> >>> To unsubscribe from this group, send email to
> >>> [email protected]<hypertable-dev%[email protected]>
> >>> .
> >>> For more options, visit this group at
> >>>http://groups.google.com/group/hypertable-dev?hl=en.
>
> >> --
> >> Gordon Rios -- Cork Constraint Computation Centre
> >>http://www.4c.ucc.ie/web/people.jsp?id=144
> >>http://www.linkedin.com/in/gordonrios
> >> Ireland: +353 86 089 2416
> >> USA: +1 650 906 3473
>
> >>  --
> >> You received this message because you are subscribed to the Google Groups
> >> "Hypertable Development" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to
> >> [email protected]<hypertable-dev%[email protected]>
> >> .
> >> For more options, visit this group at
> >>http://groups.google.com/group/hypertable-dev?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.

[hypertable-dev] Re: Load balancing design

Reply via email to