Re: [R] get maximum 3 rows by column elements in data frame

2015-11-09 Thread PIKAL Petr
Hi

I am not completely sure what do you want, so here is my guess.

> dat<-structure(list(Measure_id = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 
> 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 
> 5), i = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 5, 5, 
> 5, 5, 5, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7), j = c(1, 1, 1, 1, 1, 
> 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 
> 3, 3, 3, 3, 5, 5, 5, 5, 5), value = c(1.5, 2, 1, 0, 2, 2, 1.5, 0, 0, 1, 1, 2, 
> 0, 1, 2, 2, 0.5, 1, 0, 2, 2, 1.5, 0, 0, 1, 1.5, 2.5, 0, 1, 2, 1.5, 1, 0, 0, 
> 1, 1, 2, 0, 0, 1), rank = c(0.75, 1, 1, 0, 1, 1, 0.75, 0, 0, 0.5, 0.5, 1, 0, 
> 1, 1, 1, 1, 1, NaN, 1, 1, 0.6, NaN, 0, 0.5, 0.75, 1, NaN, 1, 1, 1, 0.5, NaN, 
> NaN, 1, 0.667, 1, NaN, NaN, 1)), .Names = c("Measure_id", "i", 
> "j", "value", "rank"), row.names = c(NA, 40L), class = "data.frame")

I named your data dat. First I set decreasing order for Measure Id and rank.

> ooo<-order(dat[,1], dat[,5], decreasing=T)

I than changed order of rows.
> dat<-dat[ooo,]

And finally with this oneliner selected 3 rows in each measure_id by rank

> do.call(rbind, lapply(split(dat, dat[,1]), head, 3))
 Measure_id i j value rank
1.6   1 2 3   2.01
1.16  1 4 3   2.01
1.21  1 5 1   2.01
2.2   2 2 1   2.01
2.12  2 2 4   2.01
2.17  2 4 3   0.51
3.3   3 2 1   1.01
3.18  3 4 3   1.01
3.8   3 2 3   0.00
4.14  4 2 4   1.01
4.29  4 5 2   1.01
4.4   4 2 1   0.00
5.5   5 2 1   2.01
5.15  5 2 4   2.01
5.20  5 4 3   2.01

Is this what you wanted?

Cheers
Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ragia
> Ibrahim
> Sent: Monday, November 09, 2015 11:55 AM
> To: r-help@r-project.org
> Subject: [R] get maximum 3 rows by column elements in data frame
>
> Dear group,
>
> I have the following data freame
>
> dput(df_all_nodes)
>
> structure(list(Measure_id = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4,
> 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3,
> 4, 5), i = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4,
> 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7), j =
> c(1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 1, 1, 1,
> 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 5, 5, 5, 5, 5), value = c(1.5, 2,
> 1, 0, 2, 2, 1.5, 0, 0, 1, 1, 2, 0, 1, 2, 2, 0.5, 1, 0, 2, 2, 1.5, 0, 0,
> 1, 1.5, 2.5, 0, 1, 2, 1.5, 1, 0, 0, 1, 1, 2, 0, 0, 1), rank = c(0.75,
> 1, 1, 0, 1, 1, 0.75, 0, 0, 0.5, 0.5, 1, 0, 1, 1, 1, 1, 1, NaN, 1, 1,
> 0.6, NaN, 0, 0.5, 0.75, 1, NaN, 1, 1, 1, 0.5, NaN, NaN, 1,
> 0.667, 1, NaN, NaN, 1)), .Names = c("Measure_id", "i", "j",
> "value", "rank"), row.names = c(NA, 40L), class = "data.frame")
> >
>
> I want to get maximum 3 rows in each group of Measure_id. e.g. for
> measure_id 1 get the max ranks  (select the max for each measure
> depending on the rank column).
>
> how to do that
> Best regards,
> Ragia
>
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If 

Re: [R] get maximum 3 rows by column elements in data frame

2015-11-09 Thread jim holtman
It is not entirely clear what you are asking for.  Can you provide a sample
of the output that you want from the data.  Here is the data split by
Measure_id, but not sure what to do with it:

> split(x, x$Measure_id)
$`1`
   Measure_id i j value  rank
1   1 2 1   1.5 0.750
6   1 2 3   2.0 1.000
11  1 2 4   1.0 0.500
16  1 4 3   2.0 1.000
21  1 5 1   2.0 1.000
26  1 5 2   1.5 0.750
31  1 7 3   1.5 1.000
36  1 7 5   1.0 0.667

$`2`
   Measure_id i j value rank
2   2 2 1   2.0 1.00
7   2 2 3   1.5 0.75
12  2 2 4   2.0 1.00
17  2 4 3   0.5 1.00
22  2 5 1   1.5 0.60
27  2 5 2   2.5 1.00
32  2 7 3   1.0 0.50
37  2 7 5   2.0 1.00

$`3`
   Measure_id i j value rank
3   3 2 1 11
8   3 2 3 00
13  3 2 4 00
18  3 4 3 11
23  3 5 1 0  NaN
28  3 5 2 0  NaN
33  3 7 3 0  NaN
38  3 7 5 0  NaN

$`4`
   Measure_id i j value rank
4   4 2 1 00
9   4 2 3 00
14  4 2 4 11
19  4 4 3 0  NaN
24  4 5 1 00
29  4 5 2 11
34  4 7 3 0  NaN
39  4 7 5 0  NaN

$`5`
   Measure_id i j value rank
5   5 2 1 2  1.0
10  5 2 3 1  0.5
15  5 2 4 2  1.0
20  5 4 3 2  1.0
25  5 5 1 1  0.5
30  5 5 2 2  1.0
35  5 7 3 1  1.0
40  5 7 5 1  1.0

>



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Nov 9, 2015 at 5:55 AM, Ragia Ibrahim  wrote:

> Dear group,
>
> I have the following data freame
>
> dput(df_all_nodes)
>
> structure(list(Measure_id = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1,
> 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2,
> 3, 4, 5, 1, 2, 3, 4, 5), i = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 7,
> 7, 7, 7, 7, 7, 7, 7, 7, 7), j = c(1, 1, 1, 1, 1, 3, 3, 3, 3,
> 3, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
> 3, 3, 3, 3, 3, 5, 5, 5, 5, 5), value = c(1.5, 2, 1, 0, 2, 2,
> 1.5, 0, 0, 1, 1, 2, 0, 1, 2, 2, 0.5, 1, 0, 2, 2, 1.5, 0, 0, 1,
> 1.5, 2.5, 0, 1, 2, 1.5, 1, 0, 0, 1, 1, 2, 0, 0, 1), rank = c(0.75,
> 1, 1, 0, 1, 1, 0.75, 0, 0, 0.5, 0.5, 1, 0, 1, 1, 1, 1, 1, NaN,
> 1, 1, 0.6, NaN, 0, 0.5, 0.75, 1, NaN, 1, 1, 1, 0.5, NaN, NaN,
> 1, 0.667, 1, NaN, NaN, 1)), .Names = c("Measure_id",
> "i", "j", "value", "rank"), row.names = c(NA, 40L), class = "data.frame")
> >
>
> I want to get maximum 3 rows in each group of Measure_id. e.g. for
> measure_id 1 get the max ranks  (select the max for each measure depending
> on the rank column).
>
> how to do that
> Best regards,
> Ragia
>
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.