It seems to me that, abstractly, a dataframe is just as straightforwardly a sequence of tuples/observations as a vector is a sequence of scalars. R's convention is that a 1-vector represents a scalar, and similarly, a 1-dataframe can represent a tuple (though it can also be represented as a list). Of course, a dataframe can *also* be interpreted as a list of vectors.
Just as a sequence of scalars can be interpreted as a set of scalars by the order- and repetition-ignoring homomophism, so can a sequence of tuples. It seems to me natural that set operations should follow that interpretation. -s On 5/30/09, G. Jay Kerns <gke...@ysu.edu> wrote: > Dear R-devel, > > Please see the recent thread on R-help, "Odd Behavior Out of > setdiff(...) - addition of duplicate entries is not identified" posted > by Jason Rupert. I gave an answer, then read David Winsemius' answer, > and then did some follow-up investigation. > > I would like to change my answer. > > My current version of setdiff() is acting in a way that I do not > understand, and a way that I suspect has changed. Consider the > following, derived from Jason's OP: > > The base package setdiff(), atomic vectors: > > x <- 1:100 > y <- c(x,x) > > setdiff(x, y) # integer(0) > setdiff(y, x) # integer(0) > > z <- 1:25 > > setdiff(x,z) # 26:100 > setdiff(z,x) # integer(0) > > > Everything is fine. > > Now look at base package setdiff(), data frames??? > > ################################ > A <- data.frame(x = 1:100) > B <- rbind(A, A) > > setdiff(A, B) # df 1:100? > setdiff(B, A) # df 1:100? > > C <- data.frame(x = 1:25) > > setdiff(A, C) # df 1:100? > setdiff(C, A) # df 1:25? > > ############################ > > > I have read ?setdiff 37 times now, and I cannot divine any > interpretation that matches the above output. From the source, it > appears that > > match(x, y, 0L) == 0L > > is evaluating to TRUE, of length equal to the columns of x, and then > > x[match(x, y, 0L) == 0L] > > is returning the entire data frame. > > Compare with the output from package "prob", which uses a setdiff that > operates row-wise: > > > ########################### > library(prob) > A <- data.frame(x = 1:100) > B <- rbind(A, A) > > setdiff(A, B) # integer(0) > setdiff(B, A) # integer(0) > > C <- data.frame(x = 1:25) > > setdiff(A, C) # 26:100 > setdiff(C, A) # integer(0) > > > > IMHO, the entire notion of "set" and "element" is problematic in the > df case, so I am not advocating the adoption of the prob:::setdiff > approach; rather, setdiff is behaving in a way that I cannot believe > with my own eyes, and I would like to alert those who can speak as to > why this may be happening. > > Thanks to Jason for bringing this up, and to David for catching the > discrepancy. > > Session info is below. I use the binaries prepared by the Debian > group so I do not have the latest patched-revision-4440986745343b. > This must have been related to something which has been fixed since > April 17, and in that case, please disregard my message. > > Yours truly, > Jay > > > > > > >> sessionInfo() > R version 2.9.0 (2009-04-17) > x86_64-pc-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] prob_0.9-1 > > > > > > > > > > > > > > > > > > -- > > *************************************************** > G. Jay Kerns, Ph.D. > Associate Professor > Department of Mathematics & Statistics > Youngstown State University > Youngstown, OH 44555-0002 USA > Office: 1035 Cushwa Hall > Phone: (330) 941-3310 Office (voice mail) > -3302 Department > -3170 FAX > E-mail: gke...@ysu.edu > http://www.cc.ysu.edu/~gjkerns/ > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel