On Oct 4, 2010, at 16:57 , Johannes Graumann wrote: > Hi, > > I'm turning my wheels on this and keep coming around to the same wrong > solution - please have a look and give a hand ... > > The premise is: a DF like so > >> loremIpsum <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit. > Quisque leo ipsum, ultricies scelerisque volutpat non, volutpat et nulla. > Curabitur consequat ullamcorper tellus id imperdiet. Duis semper malesuada > nulla, blandit lobortis diam fringilla at. Vestibulum nec tellus orci, eu > sollicitudin quam. Phasellus sit amet enim diam. Phasellus mattis hendrerit > varius. Curabitur ut tristique enim. Lorem ipsum dolor sit amet, consectetur > adipiscing elit. Sed convallis, tortor id vehicula facilisis, nunc justo > facilisis tellus, sed eleifend nisi lacus id purus. Maecenas tempus > sollicitudin libero, molestie laoreet metus dapibus eu. Mauris justo ante, > mattis et pulvinar a, varius pretium eros. Curabitur fringilla dui ac dui > rutrum pretium. Donec sed magna adipiscing nisi accumsan congue sed ac est. > Vivamus lorem urna, tristique quis accumsan quis, ullamcorper aliquet > velit." >> tmpDF <- data.frame(Column1=rep(unlist(strsplit(loremIpsum," > ")),length.out=510),Column2=runif(510,min=0,max=1e8)) > > is to be split into DFs with 50 entries in an ordered manner according to > column2 (first DF ist o contain the rows with the 50 largest numbers, ...). > > Here is what I have been doing: > >> binSize <- 50 >> splitMembership <- > pmin(ceiling(order(tmpDF[["Column2"]],decreasing=TRUE)/binSize),floor(nrow(tmpDF)/binSize)) >> splitList <- split(tmpDF,splitMembership) > > Distribution seems to work ... >> sapply(splitList,nrow) > > But this is NOT what I wanted ... >> sapply(splitList,function(x){max(x[["Column2"]])}) > This was supposed to give me bins that are Column2-sorted and bin one should > have a higher max than 2 than 3 ... > > Can anyone point out where (my now 3 reimplementations) fail? > > Thanks, Stupid Joh
Dear Stupid Joh, Have you considered something along the lines of o <- order(-x$Column2) xx <- x[o,] split(xx, (seq_len(NROW(x))-1) %/% 50) The above is a bit hard to follow, but it seems to work better with rank() instead of order(): > splitMembership <- + pmin(ceiling(rank(-tmpDF[["Column2"]])/binSize),floor(nrow(tmpDF)/binSize)) > splitList <- split(tmpDF,splitMembership)> sapply(splitList,nrow) 1 2 3 4 5 6 7 8 9 10 50 50 50 50 50 50 50 50 50 60 > sapply(splitList,function(x){max(x[["Column2"]])}) 1 2 3 4 5 6 99877498 90567877 81965382 69112280 59814266 52130373 7 8 9 10 41557660 32630212 21226996 11880032 -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.