Hi Leonard, On Wednesday, 2006-11-15 03:21:48 +0200, Leonard Mada wrote:
> 1. I have corrected some issues/bugs in the initial ANOVA code Please note that before we can integrate code or data contributed we need a signed Joint Copyright Assignment form (JCA) filled-out, see http://contributing.openoffice.org/programming.html#jca > // CALCULATING ANOVA > // (C) LEONARD MADA (2006) > // > // THIS IS FREE SOFTWARE > // > // THIS SOFTWARE IS RELEASED UNDER BOTH > // THE GNU GPL AS WELL AS THE LGPL Note that when contributing code under the JCA such headers are without effect and would be removed. Furthermore, this isn't Fortran ;-) and it would be much more eye-friendly if you fixed your CapsLock key and refrained from using only capitalized letters in comments. Please use normal capitalization instead. Thanks. > void ScInterpreter::ScANOVA() > { > // WE GET EITHER A SINGLE MATRIX WHERE EVERY COLUMN IS A SEPARATE > VARIABLE > // DISADVANTAGE: ONLY ONE COLUMN PER VARIABLE > // OR MULTIPLE MATRICES, EACH MATRIX IS ONE VARIABLE > // DISADVANTAGE: > // CALC FUNCTIONS ACCEPT ONLY 30 PARAMS > // SO THERE ARE AT MOST 30 VARIABLES Not quite true. The UI in the Formula AutoPilot knows only 30 parameters at maximum, the compiler and interpreter actually can handle more. The theoretical maximum currently is 256 due to the BYTE variable used as parameter count. But designing the parameters is more a question of how other spreadsheet applications do it. We should follow that. For passing area references or arrays/matrices you may also want to take a look at other functions that handle multiple ranges like SUM() or AVERAGE(), implemented in interpr1.cxx method ScInterpreter::IterateParameters(). The number of elements in a matrix is limited, and obtaining a matrix from a parameter that is passed as an area reference is additional unnecessary overhead. > SCSIZE iVarNr = GetByte() /* NUMBER OF PARAMETERS */; > [...] > ScMatrixRef pMat[iVarNr]; // INITIALISE DATA-MATRICES > SCSIZE nR[iVarNr], nC[iVarNr]; // INITIALISE MATRIX DIMENSIONS This actually doesn't work. Automatically allocating variable-length arrays on the stack is a GCC extension and doesn't work with all compilers. Instead it should be ScMatrixRef pMat = new ScMatrixRef[iVarNr]; > SCSIZE nMAX = 0; // MAX DIMENSION OF ANY OF THE MATRICES > for(SCSIZE i= 0; i<iVarNr; i++) { > ScMatrixRef pMat[i] = GetMatrix(); > if (!pMat[i]) { > // NO DATA MATRIX - INVALID PARAMETERS > SetIllegalParameter(); Here we need to delete the new'ed pMat array before returning from the method. Or better, use std::auto_ptr from the beginnning, that does it automatically. > return; > [...] > if( nMAX == 1 ) { > SetNoValue(); Same here, delete new'ed memory. > double fSumX[iVarNr] ; // THIS STORES THE MEANS FOR THE INDIVIDUAL > VARIABLES > double fValX[iVarNr] [nMAX]; // THE VALUES Also this doesn't work, the arrays need to be dynamically allocated. I also don't see why we need fValX[iVarNr][nMAX] elements, where maybe most of it will be unused if only one matrix has nMAX elements. I think this can be much improved by storing just the needed elements. Furthermore, at least in this matrx scenario, to me it looks like copying the single matrix elements to fValX[][] is not necessary at all since the matrix has it all. However, if changed to area reference iteration this may be necessary to avoid iterating twice. But then taking care of the memory allocation footprint would be even more important. > fSumX[iCount] = fSumX[iCount] / jCount; // THIS IS THE MEAN > nR[iCount] = jCount; // STORE HOW MANY DATA VALUES EXIST FOR > THIS VAR > N += jCount; // ADD FOR GRAND TOTAL = TOTAL No OF DATA VALUES > jCount = 0; // RESET jCount FOR NEXT VARIABLE Btw, code is much more readable if you align the trailing comments (and use proper capitalization, of course ;-) fSumX[iCount] = fSumX[iCount] / jCount; // This is the mean. nR[iCount] = jCount; // Store how many data values exist for this var. N += jCount; // Add for grand total = total no of data values. jCount = 0; // Reset jCount for next variable. > void ScInterpreter::ScANOVAMono() > { > // WE GOT A SINGLE MATRIX WHERE EVERY COLUMN IS A SEPARATE VARIABLE > // DISADVANTAGE: ONLY ONE COLUMN PER VARIABLE > // BUT IT IS EASYER TO USE AND IT IS NOT LIMITED TO 30 VARIABLES I probably would avoid the duplicated implementation and make this a special case of the general ANOVA instead. However, also here it is more a matter of how other spreadsheet programs handle the parameters. Eike -- OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer. OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS Please don't send personal mail to the [EMAIL PROTECTED] account, which I use for mailing lists only and don't read from outside Sun. Thanks. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]