[R] Reshape large Data Frame to new format

2014-03-24 Thread Dark
Hi R-experts,

I have a data.frame that I want to reshape to a certain format so I can use
it in a tool for further analysis.
Basicly I have a very long list with IDs of persons and their codes.

I create a row for every person with 25 of their codes. I a person has more
then 25 codes, I want to add another row for that person. If a row contains
less then 25 codes I want to fill with empty string values.

I have manually created a sample rawData and resultData and used dput so you
can see my starting DF and the wanted result DF.

The sample is of very limited size, the real data would contain a few
million(!) records. 

rawData - structure(list(PersonID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L), .Label = c(Person1, Person2, Person3, 
Person4, Person5), class = factor), codes = c(34396L, 81878L, 
67829L, 13428L, 12992L, 63724L, 85930L, 78497L, 59578L, 50733L, 
26154L, 47205L, 74578L, 12204L, 42435L, 96643L, 35242L, 29836L, 
73031L, 11326L, 96686L, 55849L, 56415L, 11064L, 78509L, 55715L, 
75851L, 60682L, 16277L, 52763L, 23429L, 39723L, 95809L, 60081L, 
19618L, 46012L, 79188L, 54664L, 64420L, 72875L, 97428L, 74897L, 
75615L, 12023L, 21572L, 56177L, 61704L, 70879L, 69033L, 87224L, 
68670L, 65602L, 25476L, 81209L, 62086L, 35492L, 39771L, 14380L, 
43858L, 53679L, 78023L, 43785L, 69884L, 12840L, 54021L, 68002L, 
79249L, 61784L, 7L, 28935L, 91406L, 42045L, 97716L, 65690L, 
57310L, 57627L, 32227L, 43121L, 22251L, 31255L, 90660L, 89118L, 
14558L, 99824L, 25005L, 62186L, 10527L, 99438L, 85656L, 79465L, 
35357L, 41697L, 83084L, 83590L, 16234L, 32480L, 50991L, 79524L, 
93888L, 32637L, 13253L, 76576L, 48632L, 68014L, 24281L, 74320L, 
44601L, 36251L, 27825L, 85569L, 21634L, 50364L, 74436L, 73216L, 
89342L, 63562L, 88485L, 40552L, 49359L, 29636L, 26285L, 13263L, 
18106L, 78589L, 43479L, 12491L, 50840L, 77453L, 80578L, 43693L, 
89857L, 12837L, 55950L, 63049L, 84508L, 29736L, 88194L, 86849L, 
54274L, 38713L)), .Names = c(PersonID, codes), row.names = c(NA, 
-140L), class = data.frame)


resultData = structure(list(PersonId = c(Person1, Person1, Person2, 
Person3, Person4, Person5, Person5, Person5), Code1 = c(34396, 
55715, 97428, 56177, 68002, 90660, 74320, 89857), Code2 = c(81878, 
75851, 74897, 61704, 79249, 89118, 44601, 12837), Code3 = c(67829, 
60682, 75615, 70879, 61784, 14558, 36251, 55950), Code4 = c(13428, 
16277, 12023, 69033, 7, 99824, 27825, 63049), Code5 = c(12992, 
52763, 21572, 87224, 28935, 25005, 85569, 84508), Code6 = c(63724, 
23429, , 68670, 91406, 62186, 21634, 29736), Code7 =
c(85930, 
39723, , 65602, 42045, 10527, 50364, 88194), Code8 =
c(78497, 
95809, , 25476, 97716, 99438, 74436, 86849), Code9 =
c(59578, 
60081, , 81209, 65690, 85656, 73216, 54274), Code10 =
c(50733, 
19618, , 62086, 57310, 79465, 89342, 38713), Code11 =
c(26154, 
46012, , 35492, 57627, 35357, 63562, ), Code12 = c(47205, 
79188, , 39771, 32227, 41697, 88485, ), Code13 = c(74578, 
54664, , 14380, 43121, 83084, 40552, ), Code14 = c(12204, 
64420, , 43858, 22251, 83590, 49359, ), Code15 = c(42435, 
72875, , 53679, 31255, 16234, 29636, ), Code16 = c(96643, 
, , , 78023, 32480, 26285, ), Code17 = c(35242, 
, , , 43785, 50991, 13263, ), Code18 = c(29836, 
, , , 69884, 79524, 18106, ), Code19 = c(73031, 
, , , 12840, 93888, 78589, ), Code20 = c(11326, 
, , , 54021, 32637, 43479, ), Code21 = c(96686, 
, , , , 13253, 12491, ), Code22 = c(55849, , 
, , , 76576, 50840, ), Code23 = c(56415, , , 
, , 48632, 77453, ), Code24 = c(11064, , , , 
, 68014, 80578, ), Code25 = c(78509, , , , , 
24281, 43693, )), .Names = c(PersonId, Code1, Code2, 
Code3, Code4, Code5, Code6, Code7, Code8, Code9, 
Code10, Code11, Code12, Code13, Code14, Code15, Code16, 
Code17, Code18, Code19, Code20, Code21, Code22, Code23, 
Code24, Code25), row.names = c(NA, -8L), class = data.frame)

This sample data explains very well what I'm trying to achieve. As you can
see there are 2 rows for Person1 and 3 rows for Person 5 because they have
respectively 40 and 60 codes.

I'm a big fan of the data.table package so maybe someone has an solution
using that package?
But of course any solution is welcome:-)

Thanks for any help in advance,

Regards Dark




--
View this message in context: 
http://r.789695.n4.nabble.com/Reshape-large-Data-Frame-to-new-format-tp4687431.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman

Re: [R] Create rows for columns in dataframe

2013-08-14 Thread Dark
Hi A.K,

Thanks for your great help.
I'm now running your first suggestion on a 600.000 row sample after
verifying it works on a smaller sample.
It's now been running for 40 minutes. 
Which method do you think will be faster?

Regards Derk



--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673704.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create rows for columns in dataframe

2013-08-14 Thread Dark
Hi Arun,

The second method is indeed working much faster. It worked fast for my
600.000 row record.
Still I have 2 bigger files where processing becomes an issue even though I
have lots of memory (32 gig) for the second statement:
res2-reshape(dat2,idvar=newCol,varying=list(2:26),direction=long) 

Would data.table also take less memory? Maybe even speed things up would be
good. How would I do it?
I think splitting the dataframe before merging it might also be an option
and after that combining them, any ideas on that?

Regards Dirk





--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673750.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Create rows for columns in dataframe

2013-08-13 Thread Dark
Hi experts,

I have a dataframe with 100k+ records. it has a key/id column and 25 code
columns. I would like to restructure it having a row for each code column.

I have a structure like this (used dput):
structure(list(DSYSRTKY = structure(c(1L, 2L, 3L, 3L, 4L, 4L), .Names =
c(1, 
2, 3, 4, 5, 6), .Label = c(10005, 10203, 
10315, 10327), class = factor), C1 = structure(c(6L, 
3L, 2L, 5L, 1L, 4L), .Names = c(1, 2, 3, 4, 5, 6), .Label =
c(41401, 
42831, 45341, 486, 5990, 71535), class = factor), 
C2 = structure(c(5L, 1L, 3L, 6L, 4L, 2L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(4019, 51881, 5990, 
6826, 78900, V4986), class = factor), C3 = structure(c(6L, 
3L, 5L, 2L, 4L, 1L), .Names = c(1, 2, 3, 4, 5, 
6), .Label = c(5119, 5939, 72400, 7850, 8052, 
V1251), class = factor), C4 = structure(c(6L, 5L, 3L, 
1L, 2L, 4L), .Names = c(1, 2, 3, 4, 5, 6), .Label =
c(3109, 
4019, 4241, 42789, V1011, V454), class = factor), 
C5 = structure(c(1L, 1L, 3L, 1L, 2L, 4L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 2720, 4019, 
7823), class = factor), C6 = structure(c(1L, 1L, 2L, 
1L, 4L, 3L), .Names = c(1, 2, 3, 4, 5, 6), .Label = c(, 
311, 41400, 49390), class = factor), C7 = structure(c(1L, 
1L, 2L, 1L, 3L, 4L), .Names = c(1, 2, 3, 4, 5, 
6), .Label = c(, 2724, 2859, V4581), class = factor), 
C8 = structure(c(1L, 1L, 3L, 1L, 4L, 2L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 40390, 71680, 
79029), class = factor), C9 = structure(c(1L, 1L, 2L, 
1L, 4L, 3L), .Names = c(1, 2, 3, 4, 5, 6), .Label = c(, 
4168, 5859, V1582), class = factor), C10 = structure(c(1L, 
1L, 3L, 1L, 1L, 2L), .Names = c(1, 2, 3, 4, 5, 
6), .Label = c(, 49390, 7804), class = factor), 
C11 = structure(c(1L, 1L, 3L, 1L, 1L, 2L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 2724, V066), class =
factor), 
C12 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 6930), class = factor), 
C13 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 41400), class = factor), 
C14 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, V4581), class = factor), 
C15 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 40291), class = factor), 
C16 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 4280), class = factor), 
C17 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C18 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C19 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C20 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C21 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C22 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C23 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C24 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C25 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor)), .Names =
c(DSYSRTKY, 
C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, 
C11, C12, C13, C14, C15, C16, C17, C18, C19, 
C20, C21, C22, C23, C24, C25), row.names = c(1, 
2, 3, 4, 5, 6), class = data.frame)

Now I want to restructure this dataframe not having 25 code fields but a row
for each code but only if the code has a value!

The new structure should look something like:
NewDataFrame - data.frame(ID=integer(), DSYSRTKY=integer(),
CODE=character(),  PRIMAIRY=logical())

The ID column should just be an increment. PRIMAIRY is a boolean which
should be true if orriginally was the first code (C1).

It has to be efficient since my real data has many more rows than my example
structure of only 6 rows.
I tried some looping mechanism and it was working but it was not performing
at all.

Hopefully I provided enough information using dput.

Regards Derk




--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create rows for columns in dataframe

2013-08-13 Thread Dark
Hi,

My desired output for my sample!! using dput():
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 
42, 43, 44, 45, 46, 47, 48), DSYSRTKY = c(10005, 
10005, 10005, 10005, 10203, 10203, 
10203, 10203, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327), CODE = c(71535, 78900, V1251, 
V454, 45341, 4019, 72400, V1011, 42831, 5990, 8052, 
4241, 4019, 311, 2724, 71680, 4168, 7804, V066, 
6930, 41400, V4581, 40291, 4280, 5990, V4986, 5939, 
3109, 41401, 6826, 7850, 4019, 2720, 49390, 2859, 
79029, V1582, 486, 51881, 5119, 42789, 7823, 41400, 
V4581, 40390, 5859, 49390, 2724), PRIMAIRY = c(TRUE, 
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE)), .Names = c(ID, 
DSYSRTKY, CODE, PRIMAIRY), row.names = c(NA, 48L), class =
data.frame)

So the 'DSYSRTKY' (10005) has 4 code fields filled so you get 4 rows.
The next one also 4, the third one 16. Anyway, just take a look at the
sample.

I think this will help trying to make clear what my desired result is!

Regards Derk





--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Add column to dataframe based on code in other column

2013-08-08 Thread Dark
Hi all,

I have a dataframe of users which contain US-state codes. 
Now I want to add a column named REGION based on the state code. I have
already done a mapping:

NorthEast - c(07, 20, 22, 30, 31, 33, 39, 41, 47)
MidWest - c(14, 15, 16, 17, 23, 24, 26, 28, 35, 36, 43, 52)
South - c(01, 04, 08, 09, 10, 11, 18, 19, 21, 25, 34, 37, 42, 44, 45, 49,
51)
West - c(02, 03, 05, 06, 12, 13, 27, 29, 32, 38, 46, 50, 53)
Other - c(40, 48, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 94,
98, 99)

So for example:
NameState_Code
Tom   20
Harry 56
Ben 05
Sally   04

Should become like:
So for example:
NameState_Code REGION
Tom   20   NorthEast
Harry 56   Other
Ben 05  West
Sally   04   South

Could anyone help me with a clever statement?



--
View this message in context: 
http://r.789695.n4.nabble.com/Add-column-to-dataframe-based-on-code-in-other-column-tp4673335.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Retreiving correct data from combining two datasets

2013-08-05 Thread Dark
Hi all,I have two datasets:*Dataset 1 - List of Users:*ID Name C1 C2 C3
C23 C24 C25*Dataset 2 - List of Codes*Code Description
CategoryThe code fields in the user-dataset do not have to contain a value
and if they have a value they dont have to correspond with the
Codes-dataset.Now I need 2 things:- Per user-record the number of occurences
in the Codes table, so if one user would have C1, C8, C12, C19 occuring in
the Codes dataset that would be 4.- The top 20 of the most occuring codes
with their descriptionI find this very challenging but I'm sure there are
some R-guru's out there who can help me on this:)Thanks in advance



--
View this message in context: 
http://r.789695.n4.nabble.com/Retreiving-correct-data-from-combining-two-datasets-tp4673098.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Retreiving correct data from combining two datasets

2013-08-05 Thread Dark
Hi all, 

I have two datasets: 

Dataset 1 - List of Users, the layout looks like this: 
ID Name C1 C2 C3 C23 C24 C25 

Dataset 2 - List of Codes, the layout looks like this: 
Code Description Category 

The code fields in the user-dataset do not have to contain a value and if
they have a value they dont have to correspond with the Codes-dataset. 

Now I need 2 things: 
- Per user-record the number of occurences in the Codes table, so if one
user would have C1, C8, C12, C19 occuring in the Codes dataset that would be
4. 
- The top 20 of the most occuring codes with their description 

I find this very challenging but I'm sure there are some R-guru's out there
who can help me on this:) 
I cannot give any sample data because it's classified

Thanks in advance 



--
View this message in context: 
http://r.789695.n4.nabble.com/Retreiving-correct-data-from-combining-two-datasets-tp4673131.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Add a column to a data frame with value based on the percentile of the row

2013-07-31 Thread Dark
Hi all,

I think this should be an easy question for the guru's out here.

I have this large data frame (2.500.000 rows, 15 columns) and I want to add
a column named SEGMENT to it.
The first 5% rows (first 125.000 rows) should have the value Top 5% in the
SEGMENT column
Then the rows from 5% to 20% should have the value 5 to 20
Then 20-50% should have the value 20 to 50
And the last 50% of the rows should have the value Bottom 50

What is the easiest way of doing this? I was thinking of using quantile but
then I should have some rownumber column.

Regards Derk



--
View this message in context: 
http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add a column to a data frame with value based on the percentile of the row

2013-07-31 Thread Dark
Works like a charm, thanks a lot!



--
View this message in context: 
http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711p4672728.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add a column to a data frame with value based on the percentile of the row

2013-07-31 Thread Dark
Hi Arun Kirshna,

I have tested your method and it will work for me.
I only run into one problem. Before I want to do this operation I have
sorted my data frame so my rownumbers ar not subsequent.

You can see if you first order your example data frame like:
dat1 - dat1[order(-dat1$value),]

head(dat1)
 IDvalue   SEGMENT
237 237 3.538552  20 to 50
21   21 3.376149Top 5%
421 421 3.015634 Bottom 50
339 339 2.855991 Bottom 50
119 119 2.589574  20 to 50
12   12 2.512276Top 5%

Do you have a solution for this?





--
View this message in context: 
http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711p4672726.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Saving multiple rda-files as one rda-file

2013-07-25 Thread Dark
Really no one has any suggestions on this issue?



--
View this message in context: 
http://r.789695.n4.nabble.com/Saving-multiple-rda-files-as-one-rda-file-tp4672041p4672278.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Saving multiple rda-files as one rda-file

2013-07-25 Thread Dark
Hi, 

Yes maybe I should have been more clear on my problem.
I want to append the different data-frames back into one variable ( rbind )
and save it as one R Data file.

Regards Derk



--
View this message in context: 
http://r.789695.n4.nabble.com/Saving-multiple-rda-files-as-one-rda-file-tp4672041p4672313.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Saving multiple rda-files as one rda-file

2013-07-22 Thread Dark
Hi all,

For a project we have to process some very large CSV files (up to 40 gig)
To reduce them in size and increase operating performance I wanted to store
them as RData files.
Since it was to big I decided to split the csv and saving those parts as
separate .RDA files.
So far so good. Now I want to bind them all together to save as one RDA file
again and this is supprisingly difficult.

First I load my rda files into my environment:
load(paste(rdaoutputdir, file1.rda, sep=))
load(paste(rdaoutputdir, file2.rda, sep=))
load(paste(rdaoutputdir, file3.rda, sep=))
etc

Then I try to combine them into one object.

Using rbind like this gives memory allocation problems ('Error: cannot
allocate vector of size')
objectToSave - rbind(object1, object2, object3)

using pre-allocation gives me a factor level error. I used this code:
nextrow - nrow(object1)+1
object1[nextrow:(nextrow+nrow(object2)-1),] - object2
# we need to assure unique row names
row.names(object1) = 1:nrow(object1)
rm(object2)
gc()

15! warning messages:
1: In `[-.factor`(`*tmp*`, iseq, value = structure(c(1L,  ... :
  invalid factor level, NA generated
2: In `[-.factor`(`*tmp*`, iseq, value = structure(c(1L,  ... :
  invalid factor level, NA generated

What can I do?

Regards Derk



--
View this message in context: 
http://r.789695.n4.nabble.com/Saving-multiple-rda-files-as-one-rda-file-tp4672041.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.