Re: [R] Create rows for columns in dataframe

2013-08-14 Thread Dark
Hi A.K,

Thanks for your great help.
I'm now running your first suggestion on a 600.000 row sample after
verifying it works on a smaller sample.
It's now been running for 40 minutes. 
Which method do you think will be faster?

Regards Derk



--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673704.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create rows for columns in dataframe

2013-08-14 Thread arun
Hi,
I tried the second method on a bigger dataset.  This is what I get, 

indx-rep(1:nrow(dat1),6e4)
dat2- dat1[indx,]

system.time({
vec1- paste(dat2[,1],dat2[,2],colnames(dat2)[2],sep=.)
res2-reshape(dat2,idvar=newCol,varying=list(2:26),direction=long)
res3-res2[order(res2[,4]),]
res4-  res3[res3[,3]!=,-4]
vec2-paste(res4[,1],res4[,3],paste0(C,res4[,2]),sep=.)
 res4$PRIMAIRY-vec2%in%vec1
 row.names(res4)-1:nrow(res4)
res4$ID- row.names(res4)
res4[,c(1,3)]- lapply(res4[,c(1,3)],as.character)
res5-res4[,c(5,1,3,4)]
colnames(res5)[3]-CODE})
 # user  system elapsed 
#144.672   2.072 147.034  #reshape() step is taking most of the time
 dim(res5)
#[1] 288   4

#Comparing this to the first method on a smaller subset of dat2.
dat2New- dat2[1:3e4,]

system.time({
res1-do.call(rbind,lapply(seq_len(nrow(dat2New)),function(i) 
{x1-as.character(unlist(dat2New[i,-1]));CODE-x1[x1!=];PRIMAIRY-x1[x1!=]==head(x1,1);
 
DSYSRTKY=as.numeric(as.character(dat2[i,1]));data.frame(DSYSRTKY,CODE,PRIMAIRY,stringsAsFactors=FALSE)
 }))
 res1$ID- row.names(res1)
res2-res1[,c(4,1:3)]
})
#  user  system elapsed 
#166.452  15.752 182.643 
nrow(dat2)-nrow(dat2New)
#[1] 33

You might also try library(data.table).  Should be faster..

A.K.








- Original Message -
From: Dark i...@software-solutions.nl
To: r-help@r-project.org
Cc: 
Sent: Wednesday, August 14, 2013 5:41 AM
Subject: Re: [R] Create rows for columns in dataframe

Hi A.K,

Thanks for your great help.
I'm now running your first suggestion on a 600.000 row sample after
verifying it works on a smaller sample.
It's now been running for 40 minutes. 
Which method do you think will be faster?

Regards Derk



--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673704.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create rows for columns in dataframe

2013-08-14 Thread arun
Hi,
This seemed to be faster than the other two methods:
vec1- as.character(rep(dat1[,1],each=(ncol(dat1)-1)))
vec2- as.character(unlist(t(dat1[,-1])))
vec3- rep(rep(c(TRUE,FALSE),c(1,(ncol(dat1)-2))),nrow(dat1))
dat2-data.frame(DSYSRTKY=vec1,CODE=vec2,PRIMAIRY=vec3,stringsAsFactors=FALSE)
dat3- dat2[dat2[,2]!=,]
row.names(dat3)- 1:nrow(dat3)
 dat3New-within(dat3,{ID-row.names(dat3)})[,c(4,1:3)]

#Out1## Output dataset
Out1$PRIMAIRY- as.logical(Out1$PRIMAIRY)

identical(Out1,dat3New)
#[1] TRUE

#Speed test 

indx-rep(1:nrow(dat1),6e4)
dat2- dat1[indx,]

system.time({
vec1- as.character(rep(dat2[,1],each=(ncol(dat2)-1)))
vec2- as.character(unlist(t(dat2[,-1])))
vec3- rep(rep(c(TRUE,FALSE),c(1,(ncol(dat2)-2))),nrow(dat2))
dat4-data.frame(DSYSRTKY=vec1,CODE=vec2,PRIMAIRY=vec3,stringsAsFactors=FALSE)
dat5- dat4[dat4[,2]!=,]
row.names(dat5)- 1:nrow(dat5)
 dat5New-within(dat5,{ID-row.names(dat5)})[,c(4,1:3)]
})
# user  system elapsed 
# 12.620   0.684  13.333 
dim(dat5New)
#[1] 288   4

A.K.




Hi Arun, 

The second method is indeed working much faster. It worked fast for my 600.000 
row record. 
Still I have 2 bigger files where processing becomes an issue even 
though I have lots of memory (32 gig) for the second statement: 
res2-reshape(dat2,idvar=newCol,varying=list(2:26),direction=long) 

Would data.table also take less memory? Maybe even speed things up would be 
good. How would I do it? 
I think splitting the dataframe before merging it might also be an option and 
after that combining them, any ideas on that? 

Regards Dirk 




- Original Message -
From: arun smartpink...@yahoo.com
To: R help r-help@r-project.org
Cc: 
Sent: Wednesday, August 14, 2013 10:39 AM
Subject: Re: [R] Create rows for columns in dataframe

Hi,
I tried the second method on a bigger dataset.  This is what I get, 

indx-rep(1:nrow(dat1),6e4)
dat2- dat1[indx,]

system.time({
vec1- paste(dat2[,1],dat2[,2],colnames(dat2)[2],sep=.)
res2-reshape(dat2,idvar=newCol,varying=list(2:26),direction=long)
res3-res2[order(res2[,4]),]
res4-  res3[res3[,3]!=,-4]
vec2-paste(res4[,1],res4[,3],paste0(C,res4[,2]),sep=.)
 res4$PRIMAIRY-vec2%in%vec1
 row.names(res4)-1:nrow(res4)
res4$ID- row.names(res4)
res4[,c(1,3)]- lapply(res4[,c(1,3)],as.character)
res5-res4[,c(5,1,3,4)]
colnames(res5)[3]-CODE})
 # user  system elapsed 
#144.672   2.072 147.034  #reshape() step is taking most of the time
 dim(res5)
#[1] 288   4

#Comparing this to the first method on a smaller subset of dat2.
dat2New- dat2[1:3e4,]

system.time({
res1-do.call(rbind,lapply(seq_len(nrow(dat2New)),function(i) 
{x1-as.character(unlist(dat2New[i,-1]));CODE-x1[x1!=];PRIMAIRY-x1[x1!=]==head(x1,1);
 
DSYSRTKY=as.numeric(as.character(dat2[i,1]));data.frame(DSYSRTKY,CODE,PRIMAIRY,stringsAsFactors=FALSE)
 }))
 res1$ID- row.names(res1)
res2-res1[,c(4,1:3)]
})
#  user  system elapsed 
#166.452  15.752 182.643 
nrow(dat2)-nrow(dat2New)
#[1] 33

You might also try library(data.table).  Should be faster..

A.K.








- Original Message -
From: Dark i...@software-solutions.nl
To: r-help@r-project.org
Cc: 
Sent: Wednesday, August 14, 2013 5:41 AM
Subject: Re: [R] Create rows for columns in dataframe

Hi A.K,

Thanks for your great help.
I'm now running your first suggestion on a 600.000 row sample after
verifying it works on a smaller sample.
It's now been running for 40 minutes. 
Which method do you think will be faster?

Regards Derk



--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673704.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create rows for columns in dataframe

2013-08-14 Thread Dark
Hi Arun,

The second method is indeed working much faster. It worked fast for my
600.000 row record.
Still I have 2 bigger files where processing becomes an issue even though I
have lots of memory (32 gig) for the second statement:
res2-reshape(dat2,idvar=newCol,varying=list(2:26),direction=long) 

Would data.table also take less memory? Maybe even speed things up would be
good. How would I do it?
I think splitting the dataframe before merging it might also be an option
and after that combining them, any ideas on that?

Regards Dirk





--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673750.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create rows for columns in dataframe

2013-08-13 Thread arun
HI,

Your desired output is not clear.  May be this helps:
#dat1 is the dataset
dat1$ID- 1:nrow(dat1)
library(reshape2)
res1-melt(dat1,id.vars=c(ID,DSYSRTKY))
res1$value-res1$value!=
res1[,2]- as.integer(as.character(res1[,2]))
 res1[,3]-as.character(res1[,3])
 colnames(res1)[3:4]-c(CODE,PRIMARY)
head(res1)
#  ID  DSYSRTKY CODE PRIMARY
#1  1 10005   C1    TRUE
#2  2 10203   C1    TRUE
#3  3 10315   C1    TRUE
#4  4 10315   C1    TRUE
#5  5 10327   C1    TRUE
#6  6 10327   C1    TRUE


A.K.



- Original Message -
From: Dark i...@software-solutions.nl
To: r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 5:46 AM
Subject: [R] Create rows for columns in dataframe

Hi experts,

I have a dataframe with 100k+ records. it has a key/id column and 25 code
columns. I would like to restructure it having a row for each code column.

I have a structure like this (used dput):
structure(list(DSYSRTKY = structure(c(1L, 2L, 3L, 3L, 4L, 4L), .Names =
c(1, 
2, 3, 4, 5, 6), .Label = c(10005, 10203, 
10315, 10327), class = factor), C1 = structure(c(6L, 
3L, 2L, 5L, 1L, 4L), .Names = c(1, 2, 3, 4, 5, 6), .Label =
c(41401, 
42831, 45341, 486, 5990, 71535), class = factor), 
    C2 = structure(c(5L, 1L, 3L, 6L, 4L, 2L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(4019, 51881, 5990, 
    6826, 78900, V4986), class = factor), C3 = structure(c(6L, 
    3L, 5L, 2L, 4L, 1L), .Names = c(1, 2, 3, 4, 5, 
    6), .Label = c(5119, 5939, 72400, 7850, 8052, 
    V1251), class = factor), C4 = structure(c(6L, 5L, 3L, 
    1L, 2L, 4L), .Names = c(1, 2, 3, 4, 5, 6), .Label =
c(3109, 
    4019, 4241, 42789, V1011, V454), class = factor), 
    C5 = structure(c(1L, 1L, 3L, 1L, 2L, 4L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 2720, 4019, 
    7823), class = factor), C6 = structure(c(1L, 1L, 2L, 
    1L, 4L, 3L), .Names = c(1, 2, 3, 4, 5, 6), .Label = c(, 
    311, 41400, 49390), class = factor), C7 = structure(c(1L, 
    1L, 2L, 1L, 3L, 4L), .Names = c(1, 2, 3, 4, 5, 
    6), .Label = c(, 2724, 2859, V4581), class = factor), 
    C8 = structure(c(1L, 1L, 3L, 1L, 4L, 2L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 40390, 71680, 
    79029), class = factor), C9 = structure(c(1L, 1L, 2L, 
    1L, 4L, 3L), .Names = c(1, 2, 3, 4, 5, 6), .Label = c(, 
    4168, 5859, V1582), class = factor), C10 = structure(c(1L, 
    1L, 3L, 1L, 1L, 2L), .Names = c(1, 2, 3, 4, 5, 
    6), .Label = c(, 49390, 7804), class = factor), 
    C11 = structure(c(1L, 1L, 3L, 1L, 1L, 2L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 2724, V066), class =
factor), 
    C12 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 6930), class = factor), 
    C13 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 41400), class = factor), 
    C14 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, V4581), class = factor), 
    C15 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 40291), class = factor), 
    C16 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 4280), class = factor), 
    C17 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C18 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C19 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C20 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C21 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C22 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C23 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C24 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C25 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor)), .Names =
c(DSYSRTKY, 
C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, 
C11, C12, C13, C14, C15, C16, C17, C18, C19, 
C20, C21, C22, C23, C24, C25), row.names = c(1, 
2, 3, 4, 5, 6), class = data.frame)

Now I want to restructure this dataframe not having 25 code fields but a row
for each code but only if the code has a value!

The new structure should look something like:
NewDataFrame - data.frame(ID=integer(), DSYSRTKY=integer(),
CODE=character(),  PRIMAIRY=logical())

The ID column should just be an increment. PRIMAIRY is a boolean which
should be true if orriginally was the first code (C1).

It has to be efficient since my real data has many more rows than my example
structure of only 6 rows.
I tried some looping mechanism and it was working but it was not performing
at all.


Re: [R] Create rows for columns in dataframe

2013-08-13 Thread Dark
Hi,

My desired output for my sample!! using dput():
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 
42, 43, 44, 45, 46, 47, 48), DSYSRTKY = c(10005, 
10005, 10005, 10005, 10203, 10203, 
10203, 10203, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327), CODE = c(71535, 78900, V1251, 
V454, 45341, 4019, 72400, V1011, 42831, 5990, 8052, 
4241, 4019, 311, 2724, 71680, 4168, 7804, V066, 
6930, 41400, V4581, 40291, 4280, 5990, V4986, 5939, 
3109, 41401, 6826, 7850, 4019, 2720, 49390, 2859, 
79029, V1582, 486, 51881, 5119, 42789, 7823, 41400, 
V4581, 40390, 5859, 49390, 2724), PRIMAIRY = c(TRUE, 
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE)), .Names = c(ID, 
DSYSRTKY, CODE, PRIMAIRY), row.names = c(NA, 48L), class =
data.frame)

So the 'DSYSRTKY' (10005) has 4 code fields filled so you get 4 rows.
The next one also 4, the third one 16. Anyway, just take a look at the
sample.

I think this will help trying to make clear what my desired result is!

Regards Derk





--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create rows for columns in dataframe

2013-08-13 Thread arun
According to your first post,


NewDataFrame - data.frame(ID=integer(), DSYSRTKY=integer(), CODE=character(),  
PRIMAIRY=logical())


The new output dataset: Out1
str(Out1)
'data.frame':    48 obs. of  4 variables:
 $ ID  : chr  1 2 3 4 ...
 $ DSYSRTKY: chr  10005 10005 10005 10005 ...
 $ CODE    : chr  71535 78900 V1251 V454 ...
 $ PRIMAIRY: chr  TRUE FALSE FALSE FALSE ...


I guess you wanted DSYSRTKY to be numeric and PRIMAIRY to be logical
res1-do.call(rbind,lapply(seq_len(nrow(dat1)),function(i) 
{x1-as.character(unlist(dat1[i,-1]));CODE-x1[x1!=];PRIMAIRY-x1[x1!=]==head(x1,1);
 
DSYSRTKY=as.numeric(as.character(dat1[i,1]));data.frame(DSYSRTKY,CODE,PRIMAIRY,stringsAsFactors=FALSE)
 }))
 res1$ID- row.names(res1)
res2-res1[,c(4,1:3)]

str(res2)
#'data.frame':    48 obs. of  4 variables:
# $ ID  : chr  1 2 3 4 ...
# $ DSYSRTKY: num  1e+08 1e+08 1e+08 1e+08 1e+08 ...
# $ CODE    : chr  71535 78900 V1251 V454 ...
# $ PRIMAIRY: logi  TRUE FALSE FALSE FALSE TRUE FALSE ...
 head(res2)
#  ID  DSYSRTKY  CODE PRIMAIRY
#1  1 10005 71535 TRUE
#2  2 10005 78900    FALSE
#3  3 10005 V1251    FALSE
#4  4 10005  V454    FALSE
#5  5 10203 45341 TRUE
#6  6 10203  4019    FALSE
 head(Out1)
#  ID  DSYSRTKY  CODE PRIMAIRY
#1  1 10005 71535 TRUE
#2  2 10005 78900    FALSE
#3  3 10005 V1251    FALSE
#4  4 10005  V454    FALSE
#5  5 10203 45341 TRUE
#6  6 10203  4019    FALSE
A.K.







- Original Message -
From: Dark i...@software-solutions.nl
To: r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 12:16 PM
Subject: Re: [R] Create rows for columns in dataframe

Hi,

My desired output for my sample!! using dput():
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 
42, 43, 44, 45, 46, 47, 48), DSYSRTKY = c(10005, 
10005, 10005, 10005, 10203, 10203, 
10203, 10203, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327), CODE = c(71535, 78900, V1251, 
V454, 45341, 4019, 72400, V1011, 42831, 5990, 8052, 
4241, 4019, 311, 2724, 71680, 4168, 7804, V066, 
6930, 41400, V4581, 40291, 4280, 5990, V4986, 5939, 
3109, 41401, 6826, 7850, 4019, 2720, 49390, 2859, 
79029, V1582, 486, 51881, 5119, 42789, 7823, 41400, 
V4581, 40390, 5859, 49390, 2724), PRIMAIRY = c(TRUE, 
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE)), .Names = c(ID, 
DSYSRTKY, CODE, PRIMAIRY), row.names = c(NA, 48L), class =
data.frame)

So the 'DSYSRTKY' (10005) has 4 code fields filled so you get 4 rows.
The next one also 4, the third one 16. Anyway, just take a look at the
sample.

I think this will help trying to make clear what my desired result is!

Regards Derk





--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create rows for columns in dataframe

2013-08-13 Thread arun
You could also try:
##Out1 is the output dataset
Out1$PRIMAIRY-as.logical(Out1$PRIMAIRY) #changing the class
#dat1 input dataset

vec1- paste(dat1[,1],dat1[,2],colnames(dat1)[2],sep=.)
res2-reshape(dat1,idvar=newCol,varying=list(2:26),direction=long)
res3-res2[order(res2[,4]),]
res4-  res3[res3[,3]!=,-4]
vec2-paste(res4[,1],res4[,3],paste0(C,res4[,2]),sep=.)
 res4$PRIMAIRY-vec2%in%vec1
 row.names(res4)-1:nrow(res4)
res4$ID- row.names(res4)
res4[,c(1,3)]- lapply(res4[,c(1,3)],as.character)
res5-res4[,c(5,1,3,4)]
colnames(res5)[3]-CODE
identical(res5,Out1)
#[1] TRUE
A.K.



A.K.



- Original Message -
From: arun smartpink...@yahoo.com
To: R help r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 2:45 PM
Subject: Re: [R] Create rows for columns in dataframe

According to your first post,


NewDataFrame - data.frame(ID=integer(), DSYSRTKY=integer(), CODE=character(),  
PRIMAIRY=logical())


The new output dataset: Out1
str(Out1)
'data.frame':    48 obs. of  4 variables:
 $ ID  : chr  1 2 3 4 ...
 $ DSYSRTKY: chr  10005 10005 10005 10005 ...
 $ CODE    : chr  71535 78900 V1251 V454 ...
 $ PRIMAIRY: chr  TRUE FALSE FALSE FALSE ...


I guess you wanted DSYSRTKY to be numeric and PRIMAIRY to be logical
res1-do.call(rbind,lapply(seq_len(nrow(dat1)),function(i) 
{x1-as.character(unlist(dat1[i,-1]));CODE-x1[x1!=];PRIMAIRY-x1[x1!=]==head(x1,1);
 
DSYSRTKY=as.numeric(as.character(dat1[i,1]));data.frame(DSYSRTKY,CODE,PRIMAIRY,stringsAsFactors=FALSE)
 }))
 res1$ID- row.names(res1)
res2-res1[,c(4,1:3)]

str(res2)
#'data.frame':    48 obs. of  4 variables:
# $ ID  : chr  1 2 3 4 ...
# $ DSYSRTKY: num  1e+08 1e+08 1e+08 1e+08 1e+08 ...
# $ CODE    : chr  71535 78900 V1251 V454 ...
# $ PRIMAIRY: logi  TRUE FALSE FALSE FALSE TRUE FALSE ...
 head(res2)
#  ID  DSYSRTKY  CODE PRIMAIRY
#1  1 10005 71535 TRUE
#2  2 10005 78900    FALSE
#3  3 10005 V1251    FALSE
#4  4 10005  V454    FALSE
#5  5 10203 45341 TRUE
#6  6 10203  4019    FALSE
 head(Out1)
#  ID  DSYSRTKY  CODE PRIMAIRY
#1  1 10005 71535 TRUE
#2  2 10005 78900    FALSE
#3  3 10005 V1251    FALSE
#4  4 10005  V454    FALSE
#5  5 10203 45341 TRUE
#6  6 10203  4019    FALSE
A.K.







- Original Message -
From: Dark i...@software-solutions.nl
To: r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 12:16 PM
Subject: Re: [R] Create rows for columns in dataframe

Hi,

My desired output for my sample!! using dput():
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 
42, 43, 44, 45, 46, 47, 48), DSYSRTKY = c(10005, 
10005, 10005, 10005, 10203, 10203, 
10203, 10203, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327), CODE = c(71535, 78900, V1251, 
V454, 45341, 4019, 72400, V1011, 42831, 5990, 8052, 
4241, 4019, 311, 2724, 71680, 4168, 7804, V066, 
6930, 41400, V4581, 40291, 4280, 5990, V4986, 5939, 
3109, 41401, 6826, 7850, 4019, 2720, 49390, 2859, 
79029, V1582, 486, 51881, 5119, 42789, 7823, 41400, 
V4581, 40390, 5859, 49390, 2724), PRIMAIRY = c(TRUE, 
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE)), .Names = c(ID, 
DSYSRTKY, CODE, PRIMAIRY), row.names = c(NA, 48L), class =
data.frame)

So the 'DSYSRTKY' (10005) has 4 code fields filled so you get 4 rows.
The next one also 4, the third one 16. Anyway, just take a look at the
sample.

I think this will help trying to make clear what my desired result is!

Regards Derk





--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.