On Thu, Sep 20, 2012 at 10:57 AM, Stefan Th. Gries <stgr...@gmail.com> wrote: > >From my book on corpus linguistics with R: > > # (10) Imagine you have two vectors a and b such that > a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g") > b<-c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g") > > # Of these vectors, you can create frequency lists by writing > freq.list.a<-table(a); freq.list.b<-table(b) > rm(a); rm(b) > > # How do you merge these two frequency lists without merging the two > vectors first? More specifically, if I delete a and b from your > memory, > rm(a); rm(b) > # how do you generate the following table only from freq.list.a and > freq.list.b, i.e., without any reference to a and b themselves? Before > you complain about this question as being unrealistic, consider the > possibility that you generated the frequency lists of two corpora > (here, a and b) that are so large that you cannot combine them into > one (a.and.b<-c(a, b)) and generate a frequency list of that combined > vector (table(a.and.b)) ... > joint.freqs > a b d e f g i j > 3 1 3 1 5 5 1 1 > > joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a), > names(freq.list.b)))))) # You generate an empty vector joint.freqs (i) > that is as long as there are different types in both a and b (but note > that, as requested, this information is not taken from a or b, but > from their frequency lists) ... > names(joint.freqs)<-sort(unique(c(names(freq.list.a), > names(freq.list.b)))) # ... and (ii) whose elements have these > different types as names. > joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new > vector joint.freqs that have the same names as the frequencies in the > first frequency list are assigned the respective frequencies. > joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b > # The elements of the new vector joint.freqs that have the same names > as the frequencies in the second frequency list are assigned the sum > of the values they already have (either the ones from the first > frequency list or just zeroes) and the respective frequencies. > joint.freqs # look at the result > > # Another shorter and more elegant solution was proposed by Claire > Crawford (but uses a function which will only be introduced later in > the book) > freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency > lists are merged into a single vector ... > joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b), > sum)) # ... and then the sums of all numbers that share the same names > are computed > joint.freqs # look at the result > > # The shortest, but certainly not memory-efficient way to do this > involves just using the frequency lists to create one big vector with > all elements and tabulate that. > table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b), > freq.list.b))) # kind of cheating but possible with short vectors ... >
Try: rowsum(freq.list.a.b, names(freq.list.a.b)) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.