Since I implemented the RExcel interface I also would like this discussion to be continued on r-devel or some other list where I can follow it.
Let me add some thoughts: As Leonhard Mada suggested, probably the most needed connection between R and a spreadsheet program is a way to transfer dataframes easily from the spreadsheet to R and analysis results back from R to Excel. To do this, one needs to have a mechanism to transfer large amounts of data of different types. An additional complication for the way back fro R to the spreadsheet is that R results quite often have data types not supported by the spreadsheet program (complex numbers for example). The convenience tool that people really want in the first place is an item on the menu which allows to transfer a range with data from the spreadsheet to R. The question then is: should the users see the R command line? There has to be a way of telling R what kind of analysis to perform. Using a menu like the one supplied by RCommander is a sensible option. Using this also would reuse all the work invested in designing a good menu structure for end users. Getting results back into the spreadsheet is more difficult. Not technically, but from the design point of view. Analysis results in R usually are not arrays, but lists, i.e. compounds of compounds of data of different basic types, and of different sizes. There is no clear general rule how to put R results into spreadsheet ranges. The basic compound data type in spreadsheets are arrays, and the data types in R are much more complicated, and the conceptual mapping of result lists to spreadsheet ranges has to be designed differently for different types of analyses and results. Of course, a brute force method (implemented for example by the connection mechanism between NAG and Excel) would be to "just print" the results into the spreadsheet as strings. This way, spreadsheet rows become printed lines without further structure, and numbers in the results are not easily accessible for further computations on the spreadsheet. Such a "transfer data frame and get results" connection, however, is not really using the spreadsheet program as a spreadsheet program, but as a data grid and output formatting machinery, since it is completely independent of the spreadsheet program's most important feature, automatic recalculation triggered by changes of cell values. A really tight integration of R and a spreadsheet can extend the spreadsheet program's computational engine by the complete R engine. It could allow spreadsheet formulas like RApply("pchisq",A1,A2) which would have R compute the value of the chi-squared distribution with arguments in cells A1 and A2 of the spreadsheet. Changing the value in A1 would trigger R to recalculate the chi-square value. In this case, the connection between R and the spreadsheet program has to be very fast, since the spreadsheet program essentially is using R as a dynamically linked library. The problem of incompatible data types also becomes much harder to deal with. The results of R computations are directly put into spreadsheet ranges, so having R results consisting of lists makes things really difficult. Thomas Baier and I recently published a paper in Computational Statistics which discusses different models of integration between R and spreadsheets. Excel is used as an example, but the concepts are independent from the concrete implementation. It is accessible at http://dx.doi.org/10.1007/s00180-007-0023-6 If you cannot access it, write to me, I will send you a copy. Currently, we are working on a cross-platform alternative to using COM to connect the spread-sheet to R. The platforms in mind are (at least) Windows, Linux and MacOS (X). The spreadsheet program of choice for our next integration will be Gnumeric, where the integration is already worked on by students. -- Erich Neuwirth, Didactic Center for Computer Science University of Vienna Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-9394 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel