[Rd] Structure of the object: list() and attr()
Hello! I am writing code where I define objects with new class. When I started, it was a simple data.frame with attributes, but it is getting more evolved and I would like to hear any pros and cons to go for list structure, where one slot would be a data.frame, while other slots would take over role of attributes. Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical FacultyURI: http://www.bfro.uni-lj.si/MR/ggorjan Zootechnical Department mail: gregor.gorjanc at bfro.uni-lj.si Groblje 3 tel: +386 (0)1 72 17 861 SI-1230 Domzale fax: +386 (0)1 72 17 888 Slovenia, Europe -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Any interest in merge and by implementations specifically for so
Kevin, Whether or not the R core developers want to merge these functions in base R, they would make a great little package on CRAN. That way others could easily use them, and for yourself, the package automatically gets updated with new versions of R. It sounds like you're done with the hard parts. All that you need to do is add some documentation along with a couple of configuration files, and you're done. - Tom -- View this message in context: http://www.nabble.com/Any-interest-in-%22merge%22-and-%22by%22-implementations-specifically-for-sorted-data--tf2009595.html#a5595038 Sent from the R devel forum at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Structure of the object: list() and attr()
Gorjanc Gregor [EMAIL PROTECTED] writes: Hello! I am writing code where I define objects with new class. When I started, it was a simple data.frame with attributes, but it is getting more evolved and I would like to hear any pros and cons to go for list structure, where one slot would be a data.frame, while other slots would take over role of attributes. I would suggest using S4 classes for representing more complex classes. I don't think a list structure is much different (morally) than a data.frame with lots of attributes hanging off of it. Either way, the slots and they types don't share a common definition that can be checked. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Structure of the object: list() and attr()
The key issue is inheritance. If you use a data frame with attributes then you can inherit data frame methods without further definition, e.g. x - structure(data.frame(a = 1:10), my.attr = 33, class = c(myclass, data.frame)) dim(x) # inherit dim method but if you do it this way then you need to define your own methods for each one you want: x - structure(list(.Data = data.frame(a = 1:10), my.attr = 33), class = myclass) dim.myclass - function(x) dim(x$.Data) dim(x) for every method you want. On 8/1/06, Gorjanc Gregor [EMAIL PROTECTED] wrote: Hello! I am writing code where I define objects with new class. When I started, it was a simple data.frame with attributes, but it is getting more evolved and I would like to hear any pros and cons to go for list structure, where one slot would be a data.frame, while other slots would take over role of attributes. Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical FacultyURI: http://www.bfro.uni-lj.si/MR/ggorjan Zootechnical Department mail: gregor.gorjanc at bfro.uni-lj.si Groblje 3 tel: +386 (0)1 72 17 861 SI-1230 Domzale fax: +386 (0)1 72 17 888 Slovenia, Europe -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Any interest in merge and by implementations specifically for so
Hi Tom, Whether or not the R core developers want to merge these functions in base R, they would make a great little package on CRAN. That way others could easily use them, and for yourself, the package automatically gets updated with new versions of R. It sounds like you're done with the hard parts. All that you need to do is add some documentation along with a couple of configuration files, and you're done. Thomas Lumley recommended the same thing last night. I have just finished debugging the routines and validating them for use without NAs. I had to fix a number of typos in my code for the other functions but now they all work properly. I still need to test, debug, and validate them for use of NAs with na.rm set to both true and false. Once I have validated them (ie. that they return the exact same things as unlist(lappy(split(x,i),FUNCTION)) does), I will get together an external package and make it available on CRAN. I am in the stupid position of knowing how to add the functions internally to R with no problems, but I still have to learn how to build and add external packages. So something else to learn! Thanks, Kevin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R | vnc | X11 fonts
Evan Cooch wrote: Quick followup - works fine with fluxbox (and, as noted, default twm). Simply can't get it to work with the gnome desktop, which ultimately I would like to. The difference between twm and metacity in gnome or other gnome windows manager is that twm uses X11 core fonts whereas gnome is xft/fontconfig-aware, and as far as I know R's X11() uses core font API's and is not xft-aware. You haven't said anything about your xorg setup - specifically, whether you are using a font server (it is the default on FC5, so unless you have changed it, you are using one). If that's the case, changing this line in /etc/X11/xorg.conf FontPath unix/:7100 to use *real* font paths may help. HTL __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Install R-patched_2006-07-13 on i386-pc-solaris2.10 with Sun Studio 11
Dear R-developers: Anybody having installed R-patched_2006-07-13 on i386-pc-solaris2.10 with Sun Studio 11, I need you help/advice please. Thank you very much Latchezar Dimitrov -Original Message- From: Latchezar Dimitrov Sent: Wednesday, July 26, 2006 4:48 PM To: 'Prof Brian Ripley' Cc: r-devel@stat.math.ethz.ch Subject: RE: [Rd] Install R-patched_2006-07-13 on i386-pc-solaris2.10 with Sun Studio 11 Dear Prof. Ripley and R-developers: Thank you very much for the reply. Please see bellow -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 26, 2006 2:37 AM To: Latchezar Dimitrov Cc: r-devel@stat.math.ethz.ch Subject: Re: [Rd] Install R-patched_2006-07-13 on i386-pc-solaris2.10 with Sun Studio 11 On Wed, 26 Jul 2006, Latchezar Dimitrov wrote: Dear R-developers: I'm trying to build a 64-bit R-patched_2006-07-24 on SunFire V40z with on Solaris OS 10 64-bit kernel and using Sun Studio 11 compilers. Everything runs OK until it gets to building package tools (all.R) where it fails. Bellow is how I tried it (I can provide any other additional info if needed). Any help please? You seem to have gcc 4.1.1 in your paths, so can you try that instead. True (that's why I gave the env) however that is c/c++ only. I cannot build gcc fortran right way and since R is on top of my list I switched to Sun Studio 11. I included the libraries and the path in order to use makeinfo and readline which I compiled with that gcc (I build most essential gnu utils with it too as well as some applications and everything seems ok). To avoid misuse of it I explicitly specified all the programs that I knew could be mistaken (CC,CXX,etc), gnu ld was renamed. I as sure as one can that there was not any (mis)use of /usr/local/ except for readline and makeinfo (I believe the build did not go that far though). One additional thing that I missed in my prev. e-mail is that R binary was build seemingly properly, i.e., it starts ok however with complaints about missing base parts (obviously). Now to make it clear it has nothing to do with that gcc I removed all the paths from the environment and to make sure nothing leaks through I renamed /usr/local so it stays out of the way. Then I set --with-readline=no and started clean (as you may have already noticed I build in a separate from src dir (obj-R) completely empty at the beginning. It failed the very same way. Further, I decided to give 32-bit build a try. To make things more interesting I restore /usr/local and the path to /usr/local/bin in order to be able to use makeinfo (since the only readline I had was a 64-bit one I gave it up). And voila, it built like a charm. The only tests that I noticed failed were those involving tcltk (I had only 64-bit ones installed). The only problem still remaining is I cannot care less about 32-bit version. Please find attached the log of 64-bit try and a little bit of 32 success in the end. So would someone please help me find out what is wrong and build my favorite 64-bit R? Thank you very much, Latchezar Dimitrov This looks like a fairly fundamental problem with your current build, possibly a mis-compile. Thank you very much Latchezar Dimitrov Wake Forest Univ. School of Medicine [EMAIL PROTECTED] # echo $PATH /opt/SUNWspro/bin:/usr/sbin:/usr/bin:/usr/openwin/bin:/usr/ccs/bin:/us r/ openwin/bin:/usr/dt/bin:/usr/platform/i86pc/sbin:/opt/SUNWvts/bin:/opt /S UNWexplo/bin:/usr/local/bin [EMAIL PROTECTED] # echo $CC cc [EMAIL PROTECTED] # echo $CXX CC [EMAIL PROTECTED] # echo $CFLAGS -xarch=amd64 -xmodel=medium [EMAIL PROTECTED] # echo $CXXFLAGS -xarch=amd64 -xmodel=medium [EMAIL PROTECTED] # echo $FCFLAGS -xarch=amd64 -xmodel=medium [EMAIL PROTECTED] # echo $FFLAGS -xarch=amd64 -xmodel=medium [EMAIL PROTECTED] # echo $LDFLAGS -xarch=amd64 -xmodel=medium [EMAIL PROTECTED] # echo $R_BROWSER /usr/sfw/bin/mozilla [EMAIL PROTECTED] # echo $r_arch amd64 [EMAIL PROTECTED] # echo $LD_LIBRARY_PATH /usr/openwin/lib:/usr/local/gcc-4.1.1-x86-bootstrap/lib/gcc/i386-pc-so la ris2.10/4.1.1/amd64:/usr/local/lib:/usr/local/gcc-4.1.1-x86-bootstrap/ li b/gcc/i386-pc-solaris2.10/4.1.1/amd64:/usr/local/lib:/usr/loca l/gcc-4.1. 1-x86-bootstrap/lib/gcc/i386-pc-solaris2.10/4.1.1/amd64:/usr/local/lib [EMAIL PROTECTED] # ../src/R-patched_2006-07-24/configure --prefix=/opt/R-2.3.1-patched_2006-07-24-Sun_Studio_11 --with-readline --disable-mbcs R_PAPERSIZE=letter --disable-rpath --with-bzlib --with-zlib --with-spcre --with-tcltk --disable-R-profiling --disable-nls Error in parseNamespaceFile(package, package.lib, mustExist =
Re: [Rd] R | vnc | X11 fonts
Hin-Tak Leung wrote: Evan Cooch wrote: Quick followup - works fine with fluxbox (and, as noted, default twm). Simply can't get it to work with the gnome desktop, which ultimately I would like to. The difference between twm and metacity in gnome or other gnome windows manager is that twm uses X11 core fonts whereas gnome is xft/fontconfig-aware, and as far as I know R's X11() uses core font API's and is not xft-aware. You haven't said anything about your xorg setup - specifically, whether you are using a font server (it is the default on FC5, so unless you have changed it, you are using one). If that's the case, changing this line in /etc/X11/xorg.conf FontPath unix/:7100 to use *real* font paths may help. Thanks very much for the useful summary of some key points. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] compiling R | multi-Opteron | BLAS source
The R-devel version of R provides a pluggable BLAS, which makes such tests fairly easy (although building the BLAS themselves is not). On dual Opterons, using multiple threads is often not worthwhile and can be counter-productive (Doug Bates has found some dramatic examples, and you can see them in my timings below). So timings for FC3, gcc 3.4.6, dual Opteron 252, 64-bit build of R. ACML 3.5.0 is by far the easiest to install (on R-devel all you need to do is to link libacml.so to lib/libRblas.so) and pretty competitive, so that is what I normally use. These timings are not very repeatable: to a few % only even after averaging quite a few runs. set.seed(123) X - matrix(rnorm(1e6), 1000) system.time(for(i in 1:25) X%*%X) system.time(for(i in 1:25) solve(X)) system.time(for(i in 1:10) svd(X)) internal BLAS (-O3) system.time(for(i in 1:25) X%*%X) [1] 96.939 0.341 97.375 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 110.316 1.652 112.006 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 165.550 1.131 166.806 0.000 0.000 Goto 1.03, 1 thread system.time(for(i in 1:25) X%*%X) [1] 12.949 0.191 13.143 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 23.201 1.449 24.652 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 43.318 1.016 44.361 0.000 0.000 Goto 1.03, dual CPU system.time(for(i in 1:25) X%*%X) [1] 15.038 0.244 8.488 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 26.569 2.239 19.814 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 59.912 1.799 50.350 0.000 0.000 ACML 3.5.0 (single-threaded) system.time(for(i in 1:25) X%*%X) [1] 13.794 0.368 14.164 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 22.990 1.695 24.710 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 48.267 1.373 49.662 0.000 0.000 ATLAS 3.6.0, single-threaded system.time(for(i in 1:25) X%*%X) [1] 16.164 0.404 16.572 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 26.200 1.704 27.907 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 50.150 1.462 51.619 0.000 0.000 ATLAS 3.6.0, multi-threaded system.time(for(i in 1:25) X%*%X) [1] 17.657 0.468 9.775 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 38.388 2.353 30.141 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 95.611 3.039 88.917 0.000 0.000 On Sun, 23 Jul 2006, Evan Cooch wrote: Greetings - A quick perusal of some of the posts to this maillist suggest the level of the questions is probably beyond someone working at my level, but at the risk of looking foolish publicly (something I find I get increasingly comfortable with as I get older), here goes: My research group recently purchased a multi-Opteron system (bunch of 880 chips), running 64-bit RHEL 4 (which we have site licensed at our university, so it cost us nothing - good price) with SMP support built into the kernel (perhaps obviously, for a multi-pro system). Several of our user use [R], which I've only used on a few occasions. However, it is part of my task to get [R] installed for folks using this system. While the simple, basic compile sequence (./configure, make, make check, make install) went smoothly, its pretty clear from our benchmarks that the [R] code isn't running as 'rocket-fast' as it should for a system like this. So, I dig a bit deeper. Most of the jobs we want to run could benefit from BLAS support (lots of array manipulations and other bits of linear algebra), and a few other compilation optimizations - and here is where I seek advice. 1) Looks like there are 3-4 flavours: LAPACK, ATLAS, ACML (AMD-chips...), and Goto. In reading what I can find, it seems that there are reasons not to use ACML (single-thread) despite the AMD chips, reasons to avoid ATLAS (some hassles compiling on RHEL 4 boxes), reasons to avoid LAPACK (ibid), but apparently no problems with Goto BLAS. Is that a reasonable summary? At the risk of starting a larger discussion, I'm simply looking to get BLAS support, yielding the fastest [R] code with the minimum of hassles (while tweaking lines of configure fies, weird linker sequences and all that used to appeal when I was a student, I don't have time at this stage). So, any quick recommendation for *which* BLAS library? My quick assessment suggests goto BLAS, but I'm hoping for some confirmation. 3) compilation of BLAS - I can compile for 32-bit, or 64-bit. Presumably, given we've invested in 64-bit chips, and a 64-bit OS, we'd like to consider a 64-bit compilation. Which, also presumably, means we'd need 64-bit compilation for [R]. While I've read the short blurb on CRAN concerning 64-bi vs 32-bit compilations (data size vs speed), I'd be happy to have both on our machine. But, I'm not sure how one specifies 64-bits in the [R] compilation - what flags to I need to set during ./configure, or what config file do I need to edit? Thanks very much in advance - and, again, apologies
Re: [Rd] compiling R | multi-Opteron | BLAS source
Thanks very much - I followed your advice, and have tried a variety of permutations (using ACML, and LAPACK). For the most part, I'm still 'playing' with multiple threads, but given the performance I'm getting (quad Opteron 880, 16 GB RAM, 64-bit FC5), I'll stick with that for now (but based on your examples, worth considering a single-thread build for comparisons - the svd test is pretty compelling). Here are some 'average values' from my machine for the benchmarks you posted: ACML3.5.0 - multi-threaded (compiled with gcc 4.0.1 and gfortran): system.time(for(i in 1:25) X%*%X) 11.75 0.335 3.900 0.000 0.000 system.time(for(i in 1:25) solve(X)) 22.410 2.621 13.481 0.000 0.000 system.time(for(i in 1:10) svd(X)) 67.384 4.28 38.585 0.000 0.000 Needless to say, on this level of system, most things run pretty fast - except the svd benchmark which lags, consistent with what you showed in your results. What is somewhat intriguing is why the svd example varies so much between (say) internal BLAS (165) and goto BLAS (for example; 43), for a single-thread compilation. But, it does look as if ACML is holding its own. Cheers... The R-devel version of R provides a pluggable BLAS, which makes such tests fairly easy (although building the BLAS themselves is not). On dual Opterons, using multiple threads is often not worthwhile and can be counter-productive (Doug Bates has found some dramatic examples, and you can see them in my timings below). So timings for FC3, gcc 3.4.6, dual Opteron 252, 64-bit build of R. ACML 3.5.0 is by far the easiest to install (on R-devel all you need to do is to link libacml.so to lib/libRblas.so) and pretty competitive, so that is what I normally use. These timings are not very repeatable: to a few % only even after averaging quite a few runs. set.seed(123) X - matrix(rnorm(1e6), 1000) system.time(for(i in 1:25) X%*%X) system.time(for(i in 1:25) solve(X)) system.time(for(i in 1:10) svd(X)) internal BLAS (-O3) system.time(for(i in 1:25) X%*%X) [1] 96.939 0.341 97.375 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 110.316 1.652 112.006 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 165.550 1.131 166.806 0.000 0.000 Goto 1.03, 1 thread system.time(for(i in 1:25) X%*%X) [1] 12.949 0.191 13.143 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 23.201 1.449 24.652 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 43.318 1.016 44.361 0.000 0.000 Goto 1.03, dual CPU system.time(for(i in 1:25) X%*%X) [1] 15.038 0.244 8.488 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 26.569 2.239 19.814 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 59.912 1.799 50.350 0.000 0.000 ACML 3.5.0 (single-threaded) system.time(for(i in 1:25) X%*%X) [1] 13.794 0.368 14.164 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 22.990 1.695 24.710 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 48.267 1.373 49.662 0.000 0.000 ATLAS 3.6.0, single-threaded system.time(for(i in 1:25) X%*%X) [1] 16.164 0.404 16.572 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 26.200 1.704 27.907 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 50.150 1.462 51.619 0.000 0.000 ATLAS 3.6.0, multi-threaded system.time(for(i in 1:25) X%*%X) [1] 17.657 0.468 9.775 0.000 0.000 system.time(for(i in 1:25) solve(X)) [1] 38.388 2.353 30.141 0.000 0.000 system.time(for(i in 1:10) svd(X)) [1] 95.611 3.039 88.917 0.000 0.000 On Sun, 23 Jul 2006, Evan Cooch wrote: Greetings - A quick perusal of some of the posts to this maillist suggest the level of the questions is probably beyond someone working at my level, but at the risk of looking foolish publicly (something I find I get increasingly comfortable with as I get older), here goes: My research group recently purchased a multi-Opteron system (bunch of 880 chips), running 64-bit RHEL 4 (which we have site licensed at our university, so it cost us nothing - good price) with SMP support built into the kernel (perhaps obviously, for a multi-pro system). Several of our user use [R], which I've only used on a few occasions. However, it is part of my task to get [R] installed for folks using this system. While the simple, basic compile sequence (./configure, make, make check, make install) went smoothly, its pretty clear from our benchmarks that the [R] code isn't running as 'rocket-fast' as it should for a system like this. So, I dig a bit deeper. Most of the jobs we want to run could benefit from BLAS support (lots of array manipulations and other bits of linear algebra), and a few other compilation optimizations - and here is where I seek advice. 1) Looks like there are 3-4 flavours:
[Rd] Artefacts in (screen viewed) PDF output
This issue is probably to do with on-screen viewing of PDF files written from R (2.3.1, Windows XP, RHEL 4), not with how the files are produced. So the question is mainly to ask whether others have seen similar behaviour, and whether a remedy is known. When neighbouring polygons are written with the same fill colour, and with no border line colouring, PDF files show traces of probably unstroked lines or probably interstices when viewed on-screen in at least acroread (7.0) on both Windows XP and RHEL 4 (though not xpdf 3.0 on RHEL 4). This is intrusive when many neighbouring polygons share fill colour, for example on election party share maps, where borders are suppressed for clarity. An example is: library(maps) us - map(state, fill=TRUE, plot=FALSE) pdf(borders.pdf) plot(us, type=n, axes=FALSE, asp=1) polygon(us, col=blue, border=NA) dev.off() Using polygon(us, col=blue, border=transparent) gives the same result. Curiously, the same is also observed with postscript() and external conversion to PDF (epstopdf), although viewing the EPS file on RHEL 4 in ggv does not show any artefacts up to 400%. My feeling is that the output files are correct but that acroread is introducing interstices in rendering to screen - I do not have a printer with high enough resolution to check properly, but I believe that acroread-printed output does not have the artefacts. They are however visible when acroread is used in presentation mode. Any insight would be very useful. Roger -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Any interest in merge and by implementations specifically for so
Hi, My last word on this topic until I get a working external R package ... The igroup code has now been validated both with and without NAs and with and without removing them. Thanks to Bill, Tom, Thomas, and everyone for your helpful comments and hints. The results for my validation run are here in case anyone is interested. So my code now officially works. If anyone wants patches against the latest development version of R to play around with (do your own timings, etc), please just let me know and I will send the patches privately. I will start to work on an external package next week when I have more time. Hope this helps, Kevin x - rnorm(2e6) i - rep(1:1e6,2) y - runif(2e6) is.na(x[y 0.8]) - TRUE suma = unlist(lapply(split(x,i),sum,na.rm=T)) names(suma) - NULL sumb = igroupSums(x,i,na.rm=T) all.equal(suma,sumb) [1] TRUE suma = unlist(lapply(split(x,i),sum,na.rm=F)) names(suma) - NULL sumb = igroupSums(x,i,na.rm=F) all.equal(suma,sumb) [1] TRUE maxa = unlist(lapply(split(x,i),max,na.rm=T)) There were 50 or more warnings (use warnings() to see the first 50) names(maxa)-NULL maxb - igroupMaxs(x,i,na.rm=T) all.equal(maxa, maxb) [1] TRUE maxa = unlist(lapply(split(x,i),max,na.rm=F)) names(maxa)-NULL maxb - igroupMaxs(x,i,na.rm=F) all.equal(maxa, maxb) [1] TRUE mina = unlist(lapply(split(x,i),min,na.rm=T)) There were 50 or more warnings (use warnings() to see the first 50) names(mina)-NULL minb - igroupMins(x,i,na.rm=T) all.equal(mina, minb) [1] TRUE mina = unlist(lapply(split(x,i),min,na.rm=F)) names(mina)-NULL minb - igroupMins(x,i,na.rm=F) all.equal(mina, minb) [1] TRUE meana = unlist(lapply(split(x,i),mean,na.rm=T)) names(meana)-NULL meanb - igroupMeans(x,i,na.rm=T) all.equal(meana, meanb) [1] TRUE meana = unlist(lapply(split(x,i),mean,na.rm=F)) names(meana)-NULL meanb - igroupMeans(x,i,na.rm=F) all.equal(meana, meanb) [1] TRUE proda = unlist(lapply(split(x,i),prod,na.rm=T)) names(proda)-NULL prodb - igroupProds(x,i,na.rm=T) all.equal(proda, prodb) [1] TRUE proda = unlist(lapply(split(x,i),prod,na.rm=F)) names(proda)-NULL prodb - igroupProds(x,i,na.rm=F) all.equal(proda, prodb) [1] TRUE cnta - unlist(lapply(split(x,i),length)) names(cnta) - NULL cntb - igroupCounts(x,i,na.rm=F) all.equal(cnta,cntb) [1] TRUE anya - unlist(lapply(split((x1.0),i),any,na.rm=T)) names(anya)-NULL anyb - igroupAnys((x1.0),i,na.rm=T) all.equal(anya,anyb) [1] TRUE anya - unlist(lapply(split((x1.0),i),any,na.rm=F)) names(anya)-NULL anyb - igroupAnys((x1.0),i,na.rm=F) all.equal(anya,anyb) [1] TRUE alla - unlist(lapply(split((x1.0),i),all,na.rm=T)) names(alla)-NULL allb - igroupAlls((x1.0),i,na.rm=T) all.equal(alla,allb) [1] TRUE alla - unlist(lapply(split((x1.0),i),all,na.rm=F)) names(alla)-NULL allb - igroupAlls((x1.0),i,na.rm=F) all.equal(alla,allb) [1] TRUE __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] read.table with more cols than headers
I am trying to understand the behaviour of read.table() reading delimited files (with header=TRUE and fill=TRUE) when there are more (possibly spurious) columns than headings. I give below four small data files, all of which have one or two extra columns added to one line. Reading the first file produces an error message, the second produces a column of NA, the third adds an extra row, the fourth ignores the extra columns with no message and no NA. Most unintuitive! Here are my attempts to understand this, with questions interpolated. The behaviour on the first file seems self-explanatory. The number of headings determines the number of columns, and extra data columns are not allowed. (On the other hand, the help ?read.table says that the number of columns is determined from the first five rows, which suggests that the header line is not the only determiner. If headers, when present, are indeed the only determiner, perhaps this should be mentioned in the help. Are headers actually equivalent to specifying the same set of names using the col.names argument?) For the second file, the first column is being taken as row names. This agrees with the help which says if the header line has one less entry than the number of columns, the first column is taken to be the row names. OK, perhaps not the ideal solution for this data file, but clearly documented behaviour. In the third file, the extra columns are being taken to be a new row. This seems wrong, because the help says that cases correspond to lines. There is no suggestion in the documentation that a line of the file could contain multiple cases. This is the result I have most trouble with. I guess could prevent this behaviour by flush=TRUE. File 4 is curious. Here the number of columns has been determined, using the first 5 rows of the file, to be two. The extra column on line 6 can't change this, so the first column doesn't become row names. But in that case, shouldn't the extra column found on line 6 produce an error message, same as for file 1? Specifying colClasses to be a vector of length more than 2 when reading file 3 will produce a result similar to file 4, but with a warning. It is not clear to me why colClasses should have an influence, since it doesn't change the determination of the number of columns. Why a warning here, but an error for file 1 and no message for file 4? Any comments gratefully received. Gordon X,Y a,2 b,4,, c,6 X,Y a,2 b,4, c,6 X,Y a,2 b,4 c,6 d,8 e,10,, f,12 X,Y a,2 b,4 c,6 d,8 e,10, f,12 read.csv(test1.txt) Error in read.table(file = file, header = header, sep = sep, quote = quote, : more columns than column names read.csv(test2.txt) X Y a 2 NA b 4 NA c 6 NA read.csv(test3.txt) X Y 1 a 2 2 b 4 3 c 6 4 d 8 5 e 10 6 NA 7 f 12 read.csv(test4.txt) X Y 1 a 2 2 b 4 3 c 6 4 d 8 5 e 10 6 f 12 read.csv(test3.txt,colClasses=c(NA,NA)) X Y 1 a 2 2 b 4 3 c 6 4 d 8 5 e 10 6 NA 7 f 12 read.csv(test3.txt,colClasses=c(NA,NA,NA,NA)) X Y 1 a 2 2 b 4 3 c 6 4 d 8 5 e 10 6 f 12 Warning message: cols = 2 != length(data) = 4 in: read.table(file = file, header = header, sep = sep, quote = quote, sessionInfo() R version 2.4.0 Under development (unstable) (2006-07-25 r38698) i386-pc-mingw32 locale: LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_MONETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252 attached base packages: [1] methods stats graphics grDevices utils datasets base __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel