Hi,

Thanks for including code and data so that we could reproduce what you're
doing.

Your problem is that you tell ddply to split the dataset by runNumber and
cat1, which results in 4 groups. ddply then applies my.summary() to these
four groups. One of these groups (cat1 = 1 and runNumber=1) has both
start.loc and end.loc, as it contains rows which has start=TRUE and
end=TRUE. This group will work fine. The other three groups, however, are
broken. The group with cat1 = 2 and runNumber = 1 has neither start.loc nor
end.loc, while the two groups with runNumber = 2 each have only one of the
two. The error disappears if you split the dataset only by runNumber, as
then each group has both start.loc and end.loc.

If you want to apply my.summary() to each of these four groups, you're going
to have to fix the earlier code that assigns the start and end variables.

Jonathan


On Wed, Jul 28, 2010 at 7:59 AM, jd6688 <jdsignat...@gmail.com> wrote:

>
> mydata <- read.table(textConnection("
>
>  Id cat1 location item_values p-values sequence
> a111 1 3002737     100             0.01       1
> a112 1 3017821     102             0.05       2
> a113 2 3027730     103             0.02       3
> a114 2 3036220     104             0.04       4
> a115 1 3053984     105             0.03       5
> a118 1 3090500     106             0.02       8
> a119 1 3103304     107             0.03       9
> a120 2 3090500     106             0.02       10
> a121 2 3103304     107             0.03       11
>
> "), header = TRUE)
>
> closeAllConnections()
>
>
>
> first <- function(x)c(TRUE, diff(x)!=1)
>
>
>
>
> last <- function(x)c(diff(x)!=1, TRUE)
>
>
>
> mydata$start <- first(mydata$sequence)
> mydata$end <- last(mydata$sequence)
>
> mydata$runNumber <- cumsum(first(mydata$sequence))
>
> #load library
> library(plyr)
>
>
> ddply(mydata[, -1], .(runNumber,cat1), function(x) {max(x$item_values)})
>
>
>
> my.summary <- function(x) {
>  start.loc <- x$location[which(x$start == TRUE)]
>  end.loc <- x$location[which(x$end == TRUE)]
>  peak <- max(x$item_values)
>  output <- data.frame(
>                      start_of_the_location = start.loc,
>                      end_of_the_location = end.loc,
>                      peak_value = peak)
>  return(output)
> }
>
>
> ddply(mydata[, -1], .(runNumber,cat1), my.summary)
>
> why ddply returned the following error
>
>  Error in data.frame(start_of_the_location = start.loc, end_of_the_location
> = end.loc,  :
>  arguments imply differing number of rows: 0, 1
> > mydata[,-1]
>  cat1 location item_values p.values sequence start   end runNumber
> 1    1  3002737         100     0.01        1  TRUE FALSE         1
> 2    1  3017821         102     0.05        2 FALSE FALSE         1
> 3    2  3027730         103     0.02        3 FALSE FALSE         1
> 4    2  3036220         104     0.04        4 FALSE FALSE         1
> 5    1  3053984         105     0.03        5 FALSE  TRUE         1
> 6    1  3090500         106     0.02        8  TRUE FALSE         2
> 7    1  3103304         107     0.03        9 FALSE FALSE         2
> 8    2  3090500         106     0.02       10 FALSE FALSE         2
> 9    2  3103304         107     0.03       11 FALSE  TRUE         2
> >
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/error-arguments-imply-differing-number-tp2305014p2305014.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to