Re: Project idea: Calc for Statistics
On Fri, Dec 14, 2012 at 8:55 AM, Andrew Douglas Pitonyak wrote: > > On 12/06/2012 12:12 PM, Rob Weir wrote: >> >> >> So two entirely different questions: >> >> 1) Improving the accuracy the statistical (and other numerical >> methods) we already have. >> >> 2) Extending the range of numerical methods we provide out-of-the-box > > > My first thought when I read this was adding extended precision interval > arithmetic; now that would be fun :-) > > >> >> I think #1 is a no-brainer, but it does require some expertise. The >> hard part is determining whether we have improved. For most problems >> we probably already get the same results as SPSS, R or other standard >> statistical packages. To really make an improvement we need to test >> the edge cases, the "poorly conditioned" and more complex cases. >> >> For #2, it probably makes sense to define a bridge to R. R is now >> the standard and there are hundreds of libraries that extend the >> environment. You can call R routines from SAS or SPPS. I just got >> the new Mathematica 9 upgrade, and guess what? They've now added the >> ability to call R. So some seamless of calling R routines and >> embedding R plots in Calc would be great. > > > I considered upgrading Mathematica, but I am too busy to play around with it > these days > I've played around a little. It has now some built-in functions for analyzing and graphing social networks, e.g., Facebook, Twitter. Not sure it is very useful, but perhaps a new software category of "mathertainment"... > Surprised that they integrate with R. Not because R is a bad thing, just > something I had not expected because mathematica already does so much out of > the box. Provides instant access to their huge repository of extra stuff. > So Mathematica out-of-the-box likely has as much as R has out-of-the-box. But the free 3rd party packages on CRAN (over 5000 of them) is a big win for R. The ecosystem is as important (or more so) than the standalone app. I think of it like Python -- IMHO the language itself is undistinguished, but the existence of libraries for every problem domain makes it my go-to tool for many problems. I wonder if there is a lesson here? Imagine magically we made our templates and extensions repository 10x better (by some metric, not necessarily size). Or what about content repositories, e.g., clip art, form letters, etc. Our value proposition then becomes more about the strength of the ecosystem and less about basic editing features. -Rob > -- > Andrew Pitonyak > My Macro Document: http://www.pitonyak.org/AndrewMacro.odt > Info: http://www.pitonyak.org/oo.php >
Re: Project idea: Calc for Statistics
On 12/06/2012 12:12 PM, Rob Weir wrote: So two entirely different questions: 1) Improving the accuracy the statistical (and other numerical methods) we already have. 2) Extending the range of numerical methods we provide out-of-the-box My first thought when I read this was adding extended precision interval arithmetic; now that would be fun :-) I think #1 is a no-brainer, but it does require some expertise. The hard part is determining whether we have improved. For most problems we probably already get the same results as SPSS, R or other standard statistical packages. To really make an improvement we need to test the edge cases, the "poorly conditioned" and more complex cases. For #2, it probably makes sense to define a bridge to R. R is now the standard and there are hundreds of libraries that extend the environment. You can call R routines from SAS or SPPS. I just got the new Mathematica 9 upgrade, and guess what? They've now added the ability to call R. So some seamless of calling R routines and embedding R plots in Calc would be great. I considered upgrading Mathematica, but I am too busy to play around with it these days Surprised that they integrate with R. Not because R is a bad thing, just something I had not expected because mathematica already does so much out of the box. Provides instant access to their huge repository of extra stuff. -- Andrew Pitonyak My Macro Document: http://www.pitonyak.org/AndrewMacro.odt Info: http://www.pitonyak.org/oo.php
Re: Project idea: Calc for Statistics
Hi Regina; >_ > From: Regina Henschel > >Hi Pedro, > >Pedro Giffuni schrieb: >> Hi guys; >> >> FWIW, while I was playing with the new random number generator I went >> around looking for some references and I found this paper from the Journal >> of Statistical Software (2010) titled "On the Numerical Accuracy of >> Spreadsheets": >> >> http://www.jstatsoft.org/v34/i04/paper >> >> >> It basically shows that Calc, among other Spreadsheet programs, is not >> really well suited for statistical analysis. > >They use an old version of Calc. In the meantime Calc has got a lot of >accuracy improvements. And the new implementations in Excel 2010 are far more >accurate than the old ones. The special results of the paper are outdated. Of >cause the general problem of using spreadsheets for data exploration remains. That's refreshing to know, thank you! The article linked by Tsutomu is somewhat more up to date and indeed mentions that Excel has been working hard on that field too. The list towards the end of your message is very interesting too. I will have a look too .. when I find time. > > >> >> Something rather amazing is that the major statistic suites have been moving >> towards a more "spreadsheet-like" environment. I am personally a fan of >> Minitab as it brings many functions that I needed for Quality control in a >> previous job. The price of the software package sky-rocketed in few years >> though :(. > >I'm not familiar with special statistical software. One problem with Calc is, >that users do not how to use the functions in Calc for they purpose, for >example making an ANOVA. So providing wizards would be helpful. Hmm .. I haven't looked at how Excel does Anova. We surely have the tools to do Anova but people do expect to see it as a handy script somewhere. The statistical packages out there are not very different in that sense and in many ways they emulate Excel. > >> >> One approach could be improving our local functions to match more >> demanding specifications: some of that will necessarily have to be done. >> Another approach could be facilitating interactions with software like R, > >https://issues.apache.org/ooo/show_bug.cgi?id=66589 > Yes, as I said that approach has many followers (Hi Rob :) ). Working on one approach doesn't mean we forget the others. >> >> and I am aware that approach has many followers. A third approach, which >> I would like to suggest as a future project, would be developing a scaddin >> focused on statistics and making full use of the functions from boost that >> we already have available as a module but we are not using to their full >> extent. > >I know that Calc is really inaccurate in some corner cases and a comparison >with the solutions from boost would be good. One problem is, that Calc is >limited to double precision because of the MSCV compiler. As far as I know, >boost uses own types to get better precision. > I am really hesitant to depend on the math functions in boost for the base Calc because most users don't need such stuff and keeping up to date with Boost can be painful. It's also rather nice to have our own implementations of the basic functions. With the boost stuff we get better performance and precision but we still have to add the same high level functions/scripts for things like Anova. It would be fine to use boost in scaddins, I think, and that would leave us a lot of space for experimentation without interfering with the basic Calc. This is all wishful thinking though, I doubt i will have the time for this soon. >> >> I know we are all busy with other stuff to improve for 4.0 Release, just >> thought I'd leave the idea for the future. > >I had done a lot for statistical functions under the mentor-ship of Eike in >the past, but now I'm more interested in Draw. > Yes I noticed :). FWIW, my favorite drawing utility is Xara which was copylefted some time ago but never picked much followers :(. Armin's work is absolutely cool though. >Some problems, which need to be solved are: >- Adapt FDIST, FINV, and TDIST to ODF >- New algorithm needed in ScInterpreter::GetBetaDist, see "FIXME" there >- Better detection of singular matrices >- Change the LINEST function to check for collinearity (Excel compatibility) > Thanks for this shortlist Pedro.
Re: Project idea: Calc for Statistics
Hi Pedro, Pedro Giffuni schrieb: Hi guys; FWIW, while I was playing with the new random number generator I went around looking for some references and I found this paper from the Journal of Statistical Software (2010) titled "On the Numerical Accuracy of Spreadsheets": http://www.jstatsoft.org/v34/i04/paper It basically shows that Calc, among other Spreadsheet programs, is not really well suited for statistical analysis. They use an old version of Calc. In the meantime Calc has got a lot of accuracy improvements. And the new implementations in Excel 2010 are far more accurate than the old ones. The special results of the paper are outdated. Of cause the general problem of using spreadsheets for data exploration remains. Something rather amazing is that the major statistic suites have been moving towards a more "spreadsheet-like" environment. I am personally a fan of Minitab as it brings many functions that I needed for Quality control in a previous job. The price of the software package sky-rocketed in few years though :(. I'm not familiar with special statistical software. One problem with Calc is, that users do not how to use the functions in Calc for they purpose, for example making an ANOVA. So providing wizards would be helpful. One approach could be improving our local functions to match more demanding specifications: some of that will necessarily have to be done. Another approach could be facilitating interactions with software like R, https://issues.apache.org/ooo/show_bug.cgi?id=66589 and I am aware that approach has many followers. A third approach, which I would like to suggest as a future project, would be developing a scaddin focused on statistics and making full use of the functions from boost that we already have available as a module but we are not using to their full extent. I know that Calc is really inaccurate in some corner cases and a comparison with the solutions from boost would be good. One problem is, that Calc is limited to double precision because of the MSCV compiler. As far as I know, boost uses own types to get better precision. I know we are all busy with other stuff to improve for 4.0 Release, just thought I'd leave the idea for the future. I had done a lot for statistical functions under the mentor-ship of Eike in the past, but now I'm more interested in Draw. Some problems, which need to be solved are: - Adapt FDIST, FINV, and TDIST to ODF - New algorithm needed in ScInterpreter::GetBetaDist, see "FIXME" there - Better detection of singular matrices - Change the LINEST function to check for collinearity (Excel compatibility) Kind regards Regina
Re: Project idea: Calc for Statistics
On Thu, Dec 6, 2012 at 10:57 AM, Pedro Giffuni wrote: > Hi guys; > > FWIW, while I was playing with the new random number generator I went > around looking for some references and I found this paper from the Journal > of Statistical Software (2010) titled "On the Numerical Accuracy of > Spreadsheets": > > http://www.jstatsoft.org/v34/i04/paper > Two other relevant papers: http://arc.nucapt.northwestern.edu/~karnesky/sdarticle.pdf http://www.csdassn.org/software_reports/gnumeric.pdf > > It basically shows that Calc, among other Spreadsheet programs, is not > really well suited for statistical analysis. > > Something rather amazing is that the major statistic suites have been moving > towards a more "spreadsheet-like" environment. I am personally a fan of > Minitab as it brings many functions that I needed for Quality control in a > previous job. The price of the software package sky-rocketed in few years > though :(. > > One approach could be improving our local functions to match more > demanding specifications: some of that will necessarily have to be done. > Another approach could be facilitating interactions with software like R, > > and I am aware that approach has many followers. A third approach, which > I would like to suggest as a future project, would be developing a scaddin > focused on statistics and making full use of the functions from boost that > we already have available as a module but we are not using to their full > extent. > So two entirely different questions: 1) Improving the accuracy the statistical (and other numerical methods) we already have. 2) Extending the range of numerical methods we provide out-of-the-box I think #1 is a no-brainer, but it does require some expertise. The hard part is determining whether we have improved. For most problems we probably already get the same results as SPSS, R or other standard statistical packages. To really make an improvement we need to test the edge cases, the "poorly conditioned" and more complex cases. For #2, it probably makes sense to define a bridge to R. R is now the standard and there are hundreds of libraries that extend the environment. You can call R routines from SAS or SPPS. I just got the new Mathematica 9 upgrade, and guess what? They've now added the ability to call R. So some seamless of calling R routines and embedding R plots in Calc would be great. -Rob > I know we are all busy with other stuff to improve for 4.0 Release, just > thought I'd leave the idea for the future. > > cheers, > > Pedro.
Re: Project idea: Calc for Statistics
Hi, I found the following paper several weeks ago in [1] (written in Japanese) that descrives about the paper that Pedro mentioned. "On the accuracy of statistical procedures in Microsoft Excel 2010", Submitted but rejected, January 2012 http://homepages.ulb.ac.be/~gmelard/Recherche.htm And I have seen some people want to use tool such as data analysys tools provided on Excel. There is Apache Commons Math library provides statistical tools written in Java. It is good stuff to make analysis tool as an extension also, if someone wanted. I have started to make such thing but it's discontinued. [1] http://oku.edu.mie-u.ac.jp/~okumura/blog/node/2585 Tsutomu 2012/12/7, Pedro Giffuni : > Hi guys; > > FWIW, while I was playing with the new random number generator I went > around looking for some references and I found this paper from the Journal > of Statistical Software (2010) titled "On the Numerical Accuracy of > Spreadsheets": > > http://www.jstatsoft.org/v34/i04/paper > > > It basically shows that Calc, among other Spreadsheet programs, is not > really well suited for statistical analysis. > > Something rather amazing is that the major statistic suites have been moving > towards a more "spreadsheet-like" environment. I am personally a fan of > Minitab as it brings many functions that I needed for Quality control in a > previous job. The price of the software package sky-rocketed in few years > though :(. > > One approach could be improving our local functions to match more > demanding specifications: some of that will necessarily have to be done. > Another approach could be facilitating interactions with software like R, > > and I am aware that approach has many followers. A third approach, which > I would like to suggest as a future project, would be developing a scaddin > focused on statistics and making full use of the functions from boost that > we already have available as a module but we are not using to their full > extent. > > I know we are all busy with other stuff to improve for 4.0 Release, just > thought I'd leave the idea for the future. > > cheers, > > Pedro.
Project idea: Calc for Statistics
Hi guys; FWIW, while I was playing with the new random number generator I went around looking for some references and I found this paper from the Journal of Statistical Software (2010) titled "On the Numerical Accuracy of Spreadsheets": http://www.jstatsoft.org/v34/i04/paper It basically shows that Calc, among other Spreadsheet programs, is not really well suited for statistical analysis. Something rather amazing is that the major statistic suites have been moving towards a more "spreadsheet-like" environment. I am personally a fan of Minitab as it brings many functions that I needed for Quality control in a previous job. The price of the software package sky-rocketed in few years though :(. One approach could be improving our local functions to match more demanding specifications: some of that will necessarily have to be done. Another approach could be facilitating interactions with software like R, and I am aware that approach has many followers. A third approach, which I would like to suggest as a future project, would be developing a scaddin focused on statistics and making full use of the functions from boost that we already have available as a module but we are not using to their full extent. I know we are all busy with other stuff to improve for 4.0 Release, just thought I'd leave the idea for the future. cheers, Pedro.