Hello Doug, others,

Gordon pointed me to the design document. This is very interesting.
A few comments below.

1. Although the design document talks about passing training data to
the LoadBalancer object, glancing briefly at the pseudocode for the
class definition it's not clear to me what the API is for passing the
training data is or what the format would be.

2. I'm assuming that the LoadBalancer has full access to the items in
the data table itself during the balance operation, in case it wants
to collect data about them for use in the prediction.  Not clear if
this would be useful b/c of the cost involved in gathering this data,
but interesting to explore.

3. I would expect that practical implementations of LoadBalancer would
want a way to serialise their nontrivial state (presumably using HT
itself), but not sure if there's any special API support required for
that.  (Maybe a reserved table for LB data?)

4. It may be worth providing a convenience implementation of
LoadBalancer that works in the batch setting like the basic algorithm,
i.e., a superclass for load balancers that want to operate once a day
based on data that has been collected in the last 24 hours.

5. A LoadBalancer might want to use different strategies for the cases
of adding a new range server versus high variance among servers.  Is
there a way for the master to signal which of these situations is the
case?

6. Of course an effective challenge problem would also require a test
workload that is challenging enough to be representative of real
usages of the load balancer.  As close to real usage as possible would
be best, to try to forestall the danger of designing ML algorithms
that are strong enough to learn the features of the synthetic problem
generator but not that of real data.

7. It is unclear what the optimal granularity for aggregating the
range counts would be (could be less than 30 sec, or more).  Might
want to have this settable parameter of the master.  Note that this is
orthogonal to how often the master decides to send data to the load
balancer, e.g., the master could send data every thirty seconds that
are 6 bins of counts recorded every 5 sec.

8.  Wrt the objective functions, different objective performance
metrics
that are of interest to the user, and the user might want to have
knobs to say (e.g.) exactly what SLA they would like satisfied.  But
it's not clear to me whether this is part of load balancing (i.e.,
deciding which ranges are served by which server) or auto-scaling
(deciding how many servers to have).  It may be too early to lock down
an API on this without having more experience with practical
SML/Optimization load balancers.

Best wishes

Charles

--
Charles Sutton * [email protected] * http://homepages.inf.ed.ac.uk/csutton
Lecturer * School of Informatics * University of Edinburgh

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.

Reply via email to