Re: sampling error

Donal Murtagh Fri, 13 Feb 2004 13:02:03 -0800

Hi Jay,

Thanks for your comments. I largely agree with what you're saying, but
the focus of my study is really not the load on the network, but
rather, the performance of the sampling algorithms. To put it another
way, I don't really care about the network per se, I'm just using this
data as a means of evaluating the adaptive sampling algorithms.


I probably should have pointed this out...

Regards,
D�nal

 --- Jay Warner <[EMAIL PROTECTED]> wrote: > May I stick in $0.02 worth? 
(And to all those who figure that is
> what 
> it is worth, I say, "So be it!"  :)
> 
> You are measuring the 'cumulative byte count' as an indicator of
> 'load 
> on the machine.'  So far, so good.  Can I surmise that at some point
> you 
> (or a manager) are going to stand up at a meeting and say, "the
> current 
> load is xxx, we expect it to increase to yyy in 6 months.  Therefore,
> we 
> recommend...."
> 
> My point is, you are measuring load now, with an eye to _predicting_ 
> load at some future point.  You may include specific conditions in
> the 
> network that you find influence the load, but my point is the same.
> 
> Your measurements taken today are _estimates_ of the load under a 
> specific set of conditions.  As soon as you predict the load at a 
> different time (i.e., an as yet unmeasured point) then your 'specific
> 
> set of conditions' must be _defined_ to include the conditions you
> used 
> now, and those you will use in future.  A de' facto definition
> perhaps, 
> but a definition, nonetheless.
> 
> What is the mean load under those conditions?  Your measurements at
> any 
> moment will be near, but slightly different from, that mean.  they
> will 
> be different because of 'minor' variations in the conditions, minor 
> variations in the source of the load, etc.  Not because the
> measurement 
> is imprecise, as you point out.
> 
> I don't feel your proposed method of obtaining an 'error' estimate
> will 
> get you home.  More likely, you would do well to measure the load for
> 
> multiple periods in a short time, and work out the est. standard 
> deviation from that.  This could serve as your 'process capability,'
> as 
> it were - an indication of how madly the load fluctuates over a short
> 
> period of the day.  
> 
> I dare say that your network load is, on average, different at 9:30
> am 
> local time than at 12:30 pm, or at 11:00 pm or 3 am.  It will
> possibly 
> be different on Wednesday than on Sunday.
> 
> If you obtain measurements of load over a long period of time, say a 
> week or two, then the standard deviation would be the equivalent of 
> 'product variability.'  this will clearly be larger than the 'process
> 
> capability' or short term measure of variation.  It would indicate
> the 
> variation in load that a user could expect, whenever they try to do
> work 
> on the network.  Clearly, your selection of times to measure load
> will 
> influence the significance of the stdev to the user.  A system that 
> reports an uptime of 98.5% does not tell a user much of interest,
> when 
> the user only is involved for 8 hours a day and the machine idles for
> 16 
> hours.
> 
> Help any?
> Jay
> 
> Don wrote:
> 
> >Greetings,
> >
> >I'm involved in a research project which measures the load on a
> >(computer) network. The reponse variable is the cumulative byte
> count,
> >which is measured at various times (which are determined by an
> >adaptive sampling technique).
> >
> >The measurements taken at these times are assumed to be accurate, so
> I
> >am using the following technique to judge the accuracy of the
> >sampling:
> >
> >Assuming we measure the cumulative byte count after 10s and 20s, and
> >record 100kb, and 200kb respectively....
> >
> >1. Linearly interpolate between these 2 points to get
> >
> >11s - 110kb
> >12s - 120kb
> >...
> >
> >2. Calculate the difference between these interpolated values and
> the
> >actual values at 11s,12s,...
> >
> >3. Use RMSE, SSE, or similar to get an overall measure of error
> >
> >
> >The obvious question is "How do you know the actual value is at 11s,
> >12,...?"
> >The answer is that I am using an off-line data set, rather than
> doing
> >the experiment in real-time to test the sampling algorithm.
> >
> >Anyway, my question is: how valid is this method of assessing the
> >accuracy of the sampling technique given that there is no estimate
> of
> >"pure error" at the sample points?
> >
> >Thanks in Advance,
> >D�nal
> >.
> >.
> >=================================================================
> >Instructions for joining and leaving this list, remarks about the
> >problem of INAPPROPRIATE MESSAGES, and archives are available at:
> >.                  http://jse.stat.ncsu.edu/                    .
> >=================================================================
> >
> >
> >  
> >
> 
> -- 
> Jay Warner
> Principal Scientist
> Warner Consulting, Inc.
> 4444 North Green Bay Road
> Racine, WI 53404-1216
> USA
> 
> Ph:   (262) 634-9100
> FAX:  (262) 681-1133
> email:        [EMAIL PROTECTED]
> web:  http://www.a2q.com
> 
> The A2Q Method (tm) -- What do you want to improve today?
> 
> 
> 
>  


        
        
                
___________________________________________________________
BT Yahoo! Broadband - Free modem offer, sign up online today and save �80 
http://btyahoo.yahoo.co.uk
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: sampling error

Reply via email to