Hi Brian, Thanks for looking into this matter.
Yes, my comments still stand. Using an approximate sampling for the tail of the base layer (or strip) should not be muddled with the pre-calculations of the Ziggurat layers. In fact, I would have far less objection if the pre-calculations were done with the exact erfc function, while keeping the current approximate algorithm for the tail generation intact. Actually, this is what I was suggesting in my original email. For the following discussion, let us define PARAM_R_APPROX = 3.44428647676 // current approx value PARAM_R_EXACT = 3.442619855899 // exact value A. With PARAM_R_APPROX, we are actually approximating the whole Gaussian distribution with an approx distribution that has the same shape as Guassian up to PARAM_R_APPROX, and then an exponential tail afterward. This approximate distribution does not match the original Gaussian between 0 and PARAM_R_APPROX. They differ by a constant factor. This factor is close to but not equal to 1.0. B. With PARAM_R_EXACT and the approx exponential tail algorithm, we make no approximation to the Gaussian distribution between 0 and PARAM_R_EXACT. But we are approximating the conditional (the condition being that the random variable is greater that PARAM_R_EXACT) distribution of the tail to the shape of an exponential. (BTW, this approximation implies a discontinuity of the density function at PARAM_R_EXACT, but that is beside the point) Under this light, it is clear that alternative A is worse than B. From a conceptual standpoint, it is much worse because we are sacrificing the accuracy of the whole distribution just for the tail approximation. From a numerical standpoint, it is not clear how large is the impact to user programs. My guess is not a whole lot with the current parameters (ie 128 layers), but I wouldn't vouch for that. Yet, I believe it is to the best interest of the GSL community to have this fixed, in the spirit of keeping Open Source software the highest quality through collaboration. This kind of discrepancy is virtually impossible to spot for Closed Source software. Regards. Joseph Pang Brian Gough wrote: At Fri, 2 May 2008 10:22:40 -0700 (PDT), Joseph Pang wrote: The tables in the above file, implementing the Ziggurat method for Gaussian, carry inaccurate numerical values. The problem is most likely due to the inaccurate value of PARAM_R, which is defined as 3.44428647676. The accurate value should be 3.442619855899. Even though the difference seems small, it affects all the values in all the tables to various degree. The extent of this impact to the accuracy of user programs is not known. By the way, the accurate value did appear in the original paper by Marsaglia and Tsang. The reason for the inaccurate values is due to the use of erfc approximation in the original C program that generates the Ziggurat tables ( I dug up that C program somewhere from the Web). It is easy to fix by using the accurate erfc function in this program to re-generate the tabulated values. Hello, Thanks for your email. In the gausszig.c file there was a note which says * 2) use an acceptance sampling from an exponential wedge * exp(-R*(x-R/2)) for the tail of the base strip to simplify the * implementation. The area of exponential wedge is used in * calculating 'v' and the coefficients in ziggurat table, so the * coefficients differ slightly from those in the Marsaglia and * Tsang paper. Do your comments still stand after taking this into account? _______________________________________________ Bug-gsl mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-gsl
