I believe the question here is related to the sign on the compact row names representation: why is it sometimes `c(NA, <positive>)` and sometimes `c(NA, <negative>)` -- why the difference in sign?
To the best of my knowledge, older versions of R used the signed-ness of compact row.names to differentiate between different 'types' of data.frames, but that should no longer be necessary. Unless there is some reason not to, I believe R should standardize on one representation, and consider it a bug if the other is seen. Of course, I could be wrong, so I only offer my understanding only as a way of invoking Cunningham's law... Cheers, Kevin On Mon, Nov 10, 2014 at 12:05 PM, Joshua Ulrich <josh.m.ulr...@gmail.com> wrote: > On Mon, Nov 10, 2014 at 12:35 PM, Dr Gregory Jefferis > <jeffe...@mrc-lmb.cam.ac.uk> wrote: >> Dear R-devel, >> >> Can anyone help me to understand this? It seems that subscripting the rows >> of a data.frame without actually changing their order, somehow changes an >> internal representation of row.names that is revealed by e.g. >> dput/dump/serialize >> >> I have read the docs and inspected the (R) code for data.frame, rownames, >> row.names and dput without enlightenment. >> > Look at ?.row_names_info (which is mentioned in the See Also section > of ?row.names) and its type argument. Also see the discussion here: > http://stackoverflow.com/q/26468746/271616 > >> df=data.frame(a=1:10, b=1) >> dput(df) >> df2=df[1:nrow(df), ] >> # R thinks they are equal (so do I!) >> all.equal(df, df2) >> dput(df2) >> >> Looking at the output of the dputs >> >>> dput(df) >> >> structure(list(a = 1:10, b = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = >> c("a", >> "b"), row.names = c(NA, -10L), class = "data.frame") >>> >>> dput(df2) >> >> structure(list(a = 1:10, b = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = >> c("a", >> "b"), row.names = c(NA, 10L), class = "data.frame") >> >> we have row.names = c(NA, -10L) in the first case and row.names = c(NA, 10L) >> in the second, so somehow these objects have a different representation >> >> Can anyone explain why? This has come up because >> > The first are "automatic". The second are a compact form of 1:10, as > mentioned in ?row.names. I'm not certain of the root cause/reason, > but the second object will not have "automatic" rownames because you > have subset it with a non-missing 'i'. > >>> library(digest) >>> digest(df)==digest(df2) >> >> [1] FALSE >> >> digest uses serialize under the hood, but serialize, dput and dump all show >> the same effect (I've pasted an example below using dump, md5sum from base >> R). >> >> Many thanks for any enlightenment! More generally is there any way to >> calculate a digest of a data.frame that could get round this issue or is >> that not possible? >> >> Best wishes, >> >> Greg. >> >> >> A digest using base R: >> >> library(tools) >> td=tempfile() >> dir.create(td) >> tempfiles=file.path(td,c("df", "df2")) >> dump("df",tempfiles[1]) >> dump("df2",tempfiles[2]) >> md5sum(tempfiles) >> >> # different md5sum >> >>> sessionInfo() # for my laptop but also observed on R 3.1.2 >> >> R version 3.1.1 (2014-07-10) >> Platform: x86_64-apple-darwin13.1.0 (64-bit) >> >> locale: >> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 >> >> attached base packages: >> [1] tools stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] nat_1.5.14 nat.utils_0.4.2 digest_0.6.4 Rvcg_0.9 >> devtools_1.6.1 igraph_0.7.1 >> [7] testthat_0.9.1 rgl_0.93.1098 >> >> loaded via a namespace (and not attached): >> [1] codetools_0.2-9 filehash_2.2-2 nabor_0.4.3 parallel_3.1.1 >> plyr_1.8.1 >> [6] Rcpp_0.11.3 rstudio_0.98.1062 rstudioapi_0.1 XML_3.98-1.1 >> yaml_2.1.13 >> >> -- >> Gregory Jefferis, PhD >> Division of Neurobiology >> MRC Laboratory of Molecular Biology >> Francis Crick Avenue >> Cambridge Biomedical Campus >> Cambridge, CB2 OQH, UK >> >> http://www2.mrc-lmb.cam.ac.uk/group-leaders/h-to-m/g-jefferis >> http://jefferislab.org >> http://flybrain.stanford.edu >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel