Estoy comentando esto en un thread aparte para no confundirlo con el de casos de migración.
Si bien mucha gente está acostumbrada a usar herramientas como MS Excel, y en muchos casos lo usan como "gold standard", en la realidad, no es una herramienta muy confiable si uno necesita precisión y exactitud en los resultados, ni hablar de estabilidad en el comportamiento de los cálculos. Por ejemplo, hay problemas y errores graves en la funciones estadísticas de MS Excel que sobreviven desde versiones muy antiguas: *Title:* On the Accuracy of Statistical Distributions in Microsoft Excel 2010 *Content:* Most of the errors in Microsoft Excel 97 and Excel 2003 pointed out in my previous papers have been eliminated in Excel 2010. But there are still too many deficiencies to be found in Excel 2010 and in my opinion one cannot yet say that Excel is a good and user-friendly program for scientific statistical purposes. *Contact Information:* [email protected] * URL*: http://www.csdassn.org/reportdetail.cfm?ID=1520 *Title:* Microsoft Excel 2000 and 2003 Faults, Problems, Workarounds and Fixes *Content:*This project attempts to consolidate most of the criticisms, reported errors and faults in Excel's statistical applications, and to evaluate their claims; it also extends the analysis to additional functions and routines in both Excel 2000 and 2003 versions as of February 2005. More extensive testing was done on many functions and routines to discover regions of the parameter space where the functions and routines would return erroneous values, or return error codes. The other purpose is to describe workarounds and fixes that overcome these faults and deficiencies in Excel-2000. Problems, faults and errors that still remain in Excel-2003 are also discussed. If the problem in Excel-2000 has been fixed in Excel-2003, it will be discussed. If there is no explicit indication of a change in Excel-2003, then it can be assumed that the problem still occurs in Excel-2003. In addition, some more intensive investigations were made on some routines and functions to identify “hidden” properties or computation limits. Further, some of Microsoft’s Knowledge Base Articles (KBA’s) relevant to the functions and routines are also evaluated. The project comprises 13 main papers (sections), and 26 separate notes that expand on some issues or areas of concern: Section 1: Introduction. A general review on the use of Excel in teaching introductory statistics and for general data analysis. Section 2: General problems with Excel. Introduces ideas about the meaning and use of problem, fault, defect and error terms, as they relate to the user and to the programmer (software developer). Section 3: Excel computation and display issues. Gives a road map on how things occur in Excel. Describes the IEEE-754 standard and its limitations. Differences between exact mathematical equation results and the implementation in Excel. Describes the difficulties of linguistic translations to Excel inputs. Describes the problems of using the display as criteria for accuracy. Section 4: The testing program for accuracy. Describes the basic methods used to test Excel outputs for accuracy. The STRD data sets. Discusses issues regarding the ability to obtain test data sets and precise output values. In many cases there is no agreed on computational method. Describes accuracy rating methods. Section 5: Univariate analysis. Tests on the 22 Excel univariate (or descriptive statistics) functions. Section 6: Analysis of variance (ANOVA). Tests on the ANOVA routines. Section 7: Covariance and correlation. Tests on the covariance and correlation functions. Section 8: Linear and polynomial regression. Reviews the problems with Excel 2000 regression, the improvements in Excel 2003, and the remaining deficiencies. Section 9: Nonlinear regression. Lists Previous tests on non-linear equation fitting using Solver. Solver basic deficiencies are discussed. Comparisons to other software products are made. Section 10: Statistical distributions and related functions. A general description of the distribution functions available in Excel. Section 11: Testing for accuracy and reliability of statistical distributions. Descriptions and methods on testing these distributions. Section 12: Results of new tests on statistical distributions. Discrete, continuous density, continuous cumulative, and continuous inverse are covered in four subsections. Section 13: Statistical tests, tests of significance and tests of a hypothesis. Tests on the t test, F test and Z test functions and routines. Discusses the problems and reported faults with these. Discusses the Fisher-Berens problem and Excel’s implentation of a solution. Section 14: Random number generation. Discusses the Excel random number generators for both 2000 and 2003 and gives the results of tests. Section 15: Add-in packages. PHSTAT1, PHSTAT2, DDXL and MEGASTAT were evaluated. Only MEGASTAT is acceptable. Section 16: Bibliography There are 26 notes that expand on parts of the testing project. They are referred to in the sections. An expanded XLS file is included that gives worksheets on how to easily generate the charts commonly found in introductory statistics textbooks. The documents are available at www.daheiser.info *Website:* http://www.daheiser.info *Contact Information: *David Heiser, MS. Carmichael, California *URL:* http://www.csdassn.org/reportdetail.cfm?ID=509 Interesantemente, Gnumeric presentaba problemas similares, sólo que en este caso, la corrección fué rápida y correcta (en mayor medida). *Title:* Fixing Statistical Errors in Spreadsheet Software: The Cases of Gnumeric and Excel *Content:*The open source spreadsheet package "Gnumeric" was such a good clone of Microsoft Excel that it even had errors in its statistical functions similar to those in Excel's statistical functions. When apprised of the errors in v1.0.4, the developers of Gnumeric indicated that they would try to fix the errors. Indeed, Gnumeric v1.1.2, has largely fixed its flaws, while Microsoft has not fixed its errors through many successive versions. Persons who desire to use a spreadsheet package to perform statistical analyses are advised to use Gnumeric rather than Excel. *Contact Information:* B. D. McCullough, Department of Decision Science, LeBow College of Business, Drexel University, Philadelphia, PA 19104 * URL: *http://www.csdassn.org/reportdetail.cfm?ID=508 Claro está que para cálculos simples, probablemente MS Excel, LibreOffice Calc y Gnumeric sean OK, pero si necesitan tener *resultados correctos* en los cálculos, Gnumeric resulta ser una buen opción (otra muchísimo mejor es R, pero eso mejor para otra ocasión) Bottom line: No porque sea software comercial es bueno, ni "gold standard", no porque sea software FLOSS está mal hecho e incorrecto, o viceversa :-) Espero que la confusión este clara (como decía un profesor de mi juventud). Saludos -- Jesus M. Castagnetto <[email protected]> Web: http://www.castagnetto.com/
_______________________________________________ Lista de correo Linux-plug Temática: Discusión general sobre Linux Peruvian Linux User Group (http://www.linux.org.pe) Participa suscribiéndote y escribiendo a: [email protected] Para darte de alta, de baja o hacer ajustes a tu suscripción visita: http://voip2.voip.net.pe/mailman/listinfo/linux-plug IMPORTANTE: Reglas y recomendaciones http://www.linux.org.pe/listas/reglas.php http://www.linux.org.pe/listas/comportamiento.php http://www.linux.org.pe/listas/recomendaciones.php Alojamiento de listas cortesia de http://cipher.pe
