Re: [R] How to delete the replicate rows by summing up the numeric columns

Yi Tue, 29 Jun 2010 19:27:08 -0700

Thank you very much for response. Finally I took David's way. Others' work
well for this specific case. But I find problem is there are more than one
column of character variables.


z=c('ab','ah','bc','ah','dv')
x=substr(z,start=1,stop=1)
y=substr(z,start=2,stop=2)
v1=5:9
v2=7:11
data=data.frame(x,y,z,v1,v2)

### I want to sum up v1 and v2 wrt z only, and delete duplicat rows. I do
not care x and y here, just keep them only.

data$summed <- ave(data$v1, data$z, FUN=sum)
data[!duplicated(data$z),c('z','x','y','summed')]
### I tried to use 'melt' or 'cast' to solve the problem (as Nikhil and
Dennis sugguested). But it seems since x and y are also charactor variables,
it just does not work here.

Basically, my problem is perfectly answered.  But if you want to comment on
this example, please do it.  I really appreciate it.

Let's say, if you think we can use cast or aggregate functions to delete
duplicate rows by summing up the numerica columns where there are several
columns are charactor variables. I feel no way to deal with two types at the
same time.
Thank you.

On Tue, Jun 29, 2010 at 6:04 PM, Dennis Murphy <djmu...@gmail.com> wrote:

> Hi:
>
> If you can deal with alphabetic order, the following seems to work:
>
> v <- aggregate(third ~ first, data = data, FUN = sum)
> v$second <- levels(data$second)
> v[, c(1, 3, 2)]
>   first  second third
> 1     b  Brazil     2
> 2     c   China    15
> 3     e England    13
> 4     f  France     8
> 5     j   Japan     5
> 6     k   Korea     4
> 7     u     usa     8
>
> v$second works in this case because the levels are ordered and all are used
> when inserted in v. That's not a guarantee in more complicated problems and
> frankly, this one is a kludge.
>
> A plyr version would be
>
> v <- ddply(data, .(first), summarise, third = sum(third), second = second)
> v[!duplicated(v$first), c(1, 3, 2)]
>   first   second third
> 1     b  Brazil     2
> 2     c   China    15
> 4     e England    13
> 6     f  France     8
> 7     j   Japan     5
> 8     k   Korea     4
> 9     u     usa     8
>
> The advantage of ddply over aggregate in this case is that ddply allows one
> to insert second as an 'identity' of sorts; however, the result contains
> duplicate rows, so we need to remove them in the second statement.
>
> Using melt and cast from the reshape package,
> mm <- melt(data, id = c('first', 'second'))
> (ms <- cast(mm, first + second ~ . , sum))
>   first  second (all)
> 1     b  Brazil     2
> 2     c   China    15
> 3     e England    13
> 4     f  France     8
> 5     j   Japan     5
> 6     k   Korea     4
> 7     u     usa     8
>
> names(ms)[3] <- 'third'
>
> This seems to be the cleanest version of the three in terms of getting both
> ID variables into the final result.
>
> HTH,
> Dennis
>
>  On Tue, Jun 29, 2010 at 12:05 PM, Yi <liuyi.fe...@gmail.com> wrote:
>
>> Hi, folks,
>>
>> I am sorry that I did not state the problem correctly yesterday.
>>
>> Please let me address the problem by the following codes:
>>
>> first=c('u','b','e','k','j','c','u','f','c','e')
>>
>> second=c('usa','Brazil','England','Korea','Japan','China','usa','France','China','England')
>> third=1:10
>> data=data.frame(first,second,third)
>>
>> ## You may understand values in the first column are the unique codes for
>> those in the second column.
>> ####So 'u' is only for usa. Replicate values appear the same rows for the
>> first and second columns.
>> ### Now I want to delete replicate rows with the same values in first
>> (sceond) rows
>> ####and sum up values in the third column for the same values.
>>
>> mm=melt(data,id='first')
>> sum=cast(mm,first~variable,sum) ### This does not work.
>>
>> ###I tried another way to do this
>> mm= melt(data, id='first',measure='third')
>> sum=cast(mm,first~variable,sum)
>>
>> ## But then the problem is how to 'merge' the result with the second
>> column
>> in the dataset.
>>
>>
>> The expected dataframe is like this:
>>
>> (I showed a wrong expected dataframe yesterday.)
>>
>>     first   second  third
>> 1      u     usa      8
>> 2      b   Brazil     2
>> 3      e  England   13
>> 4      k   Korea     4
>> 5      j   Japan      5
>> 6      c   China     15
>> 8      f  France     8
>>
>> Thanks in advance.
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to delete the replicate rows by summing up the numeric columns

Reply via email to