Hi Steve!

I would ask on the list which behavior is most often used and decide then... We 
could even say, we just take out the "short"/negative functionality as a whole.

But in principle I would keep 0 as the special case which probably should 
return nan.

As of hw rra: my concern is that with hw-rra you can only select single data 
sources and then you have to wait for some time to get the data - you even have 
to modify the rrd file to add it. Which IMO does not make it very practical and 
intuitive.

The cdef approach instead allows an immediate response for the request to see 
what prediction looks like and if it is sensitive to apply it. And that is 
especially true if you have 500k data points available and only look at them 
rarely with prediction - then putting the cost into the rendering is the 
cheaper solution from a CPU and disk perspective than running hw every 5 
minutes for 500k data points...

Also I have to admit I have never looked in more detail at hw because of those 
limitations plus the fact that I do not fully understand the mathematics and 
their reasoning behind it - and how to make use of the data and display the 
certainty/uncertainty values.

That is why I started implementing the predict part, which we had running 
before outside of rrd as a separate script creating/filling a prediction rrd 
file, which had the one advantage of having immutability when switching between 
resolutions, but again at the cost of disk space and computations done every 5 
minutes (but only for a subset of data).

As for rol, ror I am not sure what you would like to do exactly, but if it is 
just: shift data time by x seconds into the future/past, then it is possible. 
But even then to the shift +window average functionality would probably take 
146 RPN arguments (48 (=1800/300*8) times "x,shift,ror" with different shifts 
plus "48,average" ) to achieve the same thing as "86400,-8,1800,x,PREDICT" - So 
even less efficient...

But in general, this would allow to compare a weeks traffic easily, but then - 
if I remember correctly - you can achieve the same thing with def, where you 
can shift the data, but that would be even more complex from an number of 
arguments to rrd graph perspective... Also you can use rol/ror on cdefs and not 
only on the raw data, so you can add up defs first and only then do the 
shifts...

As an afterthought: the approach of using step count=1 does not make much sense 
besides the direct implementation of rol/ror (the way I understand it) with or 
without some averaging. 

So your "86400,x,ror" can get written as: "86400,1,300,predict" (not sure about 
the 300 for the window, you might need to replace it with 1), sigma will give 
nan here!. 
Similar thing with running average calculation of 1 hour in the past: 
0,1,3600,predict.
( at least if I remember the way that calculations are done correctly)

So it is very flexible in what you can really do with it, if you get creative - 
maybe we should add these in the documentation after verification that it works 
as expected...

We might also create some RPN- aliases for those to shorten the arguments 
needed and make it easier to read...

When you have decided on the final format of the negative step count approach 
for predict, I might create a patch to do also the percentile calculations.

 But one concern I start to have is that if you apply that to data that is of 
much lower resolution - say when graphing a year with percentile-predictions, 
then the numbers will change dramatically when you switch resolutions, as then 
you will no longer have 48 data points from which to calculate your percentile 
(at 2% resolution) but say only 8 values (at 14% resolution) - depending on 
exact rra definitions.

Also from a mathematical perspective I have the concern if averaging of rrd 
tool consolidation function + percentile computation on top of that really 
plays well together and give sensible results... 

Somehow I fear that this could result in miss-interpretations of data by people 
not too deeply trained in statistics - and even I myself seem not well enough 
educated to say if there is a risk... (Besides a hunch that this could produce 
unexpected artifacts...)

But if you look short term at highest resolutions it should not be an issue...

Ciao, Martin

Sent from my iPad

> On 26.04.2014, at 23:27, Steve Shipway <s.ship...@auckland.ac.nz> wrote:
> 
> Thanks for the clarification... I'm still not sure that I think the 0-shift 
> should be included when using a negative count but that's likely personal 
> preference and, as you say, the explicit list is always available anyway.
> 
> Maybe it should also allow the use of 0 to mean a single 0-shift -- ie
>  0,x,PREDICT 
> to be the same as 
>  0,1,x,PREDICT
> 
> Of course, currently with the negative setup  not including n, this is also 
> the same as
>  s,-1,x,PREDICT 
> for any value of s, which is a bit pointless and the main reason I felt 
> something was amiss.
> 
> I have a paragraph written up on this behaviour for the RRDTool book I'm 
> working on (yes! it lives! like a zombie it pulls itself from the 2-year-old 
> grave...) so I'll email this to Tobi for inclusion in the online manual if he 
> wants.
> 
> As for a predict_median and predict_percentile -- as you say, they could be 
> very CPU-expensive, and also if you're going that far it's likely that you'd 
> just set up a few HW RRAs to use instead.  Still, it never hurts to have more 
> tools in your toolbox...
> 
> If we're adding new operations to the RPN, my choice would be for ROL and ROR 
> (rotate top 3 stack items) and some date calculations (given an epoch time, 
> extract day of week, hour of day, etc in local timezone).  Maybe I'll 
> download the latest dev snapshot and see what I can do.
> 
> Steve
> 
> Steve Shipway
> University of Auckland ITS
> UNIX Systems Design Lead
> s.ship...@auckland.ac.nz
> Ph: +64 9 373 7599 ext 86487

_______________________________________________
rrd-developers mailing list
rrd-developers@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers

Reply via email to