[R] Contributed packages
Folks: If you wanted to find out about what are the contributed packages and classify them, how would you go about it? For someone new like me, I would like to know what the possibilities are. When I click on "install packages" on my Windows version of R, it gives me a list but it is hard to figure out from that list what is the purpose of each package and to what class it belongs (for example, class of regular expressions). What is the equivalent of CPAN.org for Perl in R where you can browse Perl modules by category? Thanks. Satish __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe question
David: Thanks for the idea. Both the one that you suggested and the one that Bill Venables suggested are very good. Unfortunately, this statement is creating out of memory issues like below (system limitations). When I had padded white space before the number, read.csv.sql is correctly treating it as a factor. I am going to take out the padding so that it treats it as numeric and then I can proceed with further steps. Satish Out of memory warning Reached total allocation of 1535Mb: see help(memory.size) 34: In ans[[i]] <- tmp : Reached total allocation of 1535Mb: see help(memory.size) >> Bill Venable's suggestion below week_list <- paste("wk", 1:209, sep="") ### no need for c(...) for(week in week_list) three_wk_out[[week]] <- as.numeric(three_wk_out[[week]]) ### no need for '{...}' Bill Venables CSIRO/CMIS Cleveland Laboratories -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Sunday, February 07, 2010 8:51 PM To: Vadlamani, Satish {FLNA} Cc: r-help@r-project.org help Subject: Re: [R] dataframe question On Feb 7, 2010, at 8:14 PM, David Winsemius wrote: > > On Feb 7, 2010, at 7:51 PM, Vadlamani, Satish {FLNA} wrote: > >> Folks: >> Good day. Please see the code below. three_wk_out is a dataframe >> with columns wk1 through wk209. I want to change the format of the >> columns. I am trying the code below but it does not work. I need >> $week in the for loop interpreted as wk1, wk2, etc. Could you >> please help? Thanks. >> Satish >> >> R code below >> week_list <- paste("wk",c(1:209),sep="") > > > Or more "functionally": > > three_wk_out <- as.data.frame( lapply(three_wk_out, some_function) ) Or if you wanted to just change the particular columns that matched the "wk" pattern: idx <- grep("wk", names(three_wk_out)) three_wk_out[, idx ] <- apply( three_wk_out[, idx ], 2, as.numeric) (I probably should have used apply( ___ , 2, fn) in the prior effort rather than coercing a list back to a dataframe.) > > E.g.: > > > a b c x > 1 1 0 0 1 > 2 2 3 2 4 > 3 1 2 1 5 > 4 2 0 3 2 > > > df <- as.data.frame(lapply(df, "^", 2)) > > df > a b c x > 1 1 0 0 1 > 2 16 81 16 256 > 3 1 16 1 625 > 4 16 0 81 16 > > >> for (week in week_list) >> { >> three_wk_out$week <- as.numeric(three_wk_out$week) >> } >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dataframe question
Folks: Good day. Please see the code below. three_wk_out is a dataframe with columns wk1 through wk209. I want to change the format of the columns. I am trying the code below but it does not work. I need $week in the for loop interpreted as wk1, wk2, etc. Could you please help? Thanks. Satish R code below week_list <- paste("wk",c(1:209),sep="") for (week in week_list) { three_wk_out$week <- as.numeric(three_wk_out$week) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading large files
Gabor: It did suppress the message now and I was able to load the data. Question. 1. test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl parse_3wkout.pl") In the statement above, should the filename in file= and the file name that the perl script uses through the filter= command be the same? I would think not. I would say that if filter= is passed to the statement, then the filename should be ignored. Is this how it works? Thanks. Satish -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Saturday, February 06, 2010 4:58 PM To: Vadlamani, Satish {FLNA} Cc: r-help@r-project.org Subject: Re: [R] Reading large files I have uploaded another version which suppresses display of the error message but otherwise works the same. Omitting the redundant arguments we have: ibrary(sqldf) # next line is only needed once per session to read in devel version source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";) test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl parse_3wkout.pl") On Sat, Feb 6, 2010 at 5:48 PM, Vadlamani, Satish {FLNA} wrote: > Gabor: > Please see the results below. Sourcing your new R script worked (although > with the same error message). If I put eol="\n" option, it is adding a "\r" > to the last column. I took out the eol option below. This is just some more > feedback to you. > > I am thinking that I will just do an inline edit in Perl (that is create the > csv file through Perl by overwriting the current file) and then use > read.csv.sql without the filter= option. This seems to be more tried and > tested. If you have any suggestions, please let me know. Thanks. > Satish > > > BEFORE SOURCING YOUR NEW R SCRIPT >> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * >> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl") > Error in readRegistry(key, maxdepth = 3) : > Registry key 'SOFTWARE\R-core' not found >> test_df > Error: object 'test_df' not found > > AFTER SOURCING YOUR NEW R SCRIPT >> source("f:/dp_modeling_team/downloads/R/sqldf.R") >> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * >> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl") > Error in readRegistry(key, maxdepth = 3) : > Registry key 'SOFTWARE\R-core' not found > In addition: Warning messages: > 1: closing unused connection 5 (3wkoutstatfcst_small.dat) > 2: closing unused connection 4 (3wkoutstatfcst_small.dat) > 3: closing unused connection 3 (3wkoutstatfcst_small.dat) >> test_df > allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3 > 1 A 4 1 37 99 4925 4925 99 99 4 99 > 2 A 4 1 37 99 4925 4925 99 99 4 99 > > -Original Message- > From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] > Sent: Saturday, February 06, 2010 4:28 PM > To: Vadlamani, Satish {FLNA} > Cc: r-help@r-project.org > Subject: Re: [R] Reading large files > > The software attempts to read the registry and temporarily augment the > path in case you have Rtools installed so that the filter can access > all the tools that Rtools provides. I am not sure why its failing on > your system but there is evidently some differences between systems > here and I have added some code to trap and bypass that portion in > case it fails. I have added the new version to the svn repository so > try this: > > library(sqldf) > # overwrite with development version > source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";) > # your code to call read.csv.sql > > > On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA} > wrote: >> >> Gabor: >> Here is the update. As you can see, I got the same error as below in 1. >> >> 1. Error >> test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", >> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n") >> Error in readRegistry(key, maxdepth = 3) : >> Registry key 'SOFTWARE\R-core' not found >> >> 2. But the loading of the bigger file was successful as you can see below. >> 857 MB, 333,250 rows, 227 columns. This is good. >> >> I will have to just do an inline edit in Perl and change the file to csv >> from within R and then call the read.csv.sql. >> >> If you have any suggestions to fix 1, I would like to try them. >> >> system.time(test_df &
Re: [R] Reading large files
Gabor: Please see the results below. Sourcing your new R script worked (although with the same error message). If I put eol="\n" option, it is adding a "\r" to the last column. I took out the eol option below. This is just some more feedback to you. I am thinking that I will just do an inline edit in Perl (that is create the csv file through Perl by overwriting the current file) and then use read.csv.sql without the filter= option. This seems to be more tried and tested. If you have any suggestions, please let me know. Thanks. Satish BEFORE SOURCING YOUR NEW R SCRIPT > test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * from > file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl") Error in readRegistry(key, maxdepth = 3) : Registry key 'SOFTWARE\R-core' not found > test_df Error: object 'test_df' not found AFTER SOURCING YOUR NEW R SCRIPT > source("f:/dp_modeling_team/downloads/R/sqldf.R") > test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * from > file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl") Error in readRegistry(key, maxdepth = 3) : Registry key 'SOFTWARE\R-core' not found In addition: Warning messages: 1: closing unused connection 5 (3wkoutstatfcst_small.dat) 2: closing unused connection 4 (3wkoutstatfcst_small.dat) 3: closing unused connection 3 (3wkoutstatfcst_small.dat) > test_df allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3 1 A 41 37 99 4925 4925 99 99 4 99 2 A 41 37 99 4925 4925 99 99 4 99 -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Saturday, February 06, 2010 4:28 PM To: Vadlamani, Satish {FLNA} Cc: r-help@r-project.org Subject: Re: [R] Reading large files The software attempts to read the registry and temporarily augment the path in case you have Rtools installed so that the filter can access all the tools that Rtools provides. I am not sure why its failing on your system but there is evidently some differences between systems here and I have added some code to trap and bypass that portion in case it fails. I have added the new version to the svn repository so try this: library(sqldf) # overwrite with development version source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";) # your code to call read.csv.sql On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA} wrote: > > Gabor: > Here is the update. As you can see, I got the same error as below in 1. > > 1. Error > test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", > header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n") > Error in readRegistry(key, maxdepth = 3) : > Registry key 'SOFTWARE\R-core' not found > > 2. But the loading of the bigger file was successful as you can see below. > 857 MB, 333,250 rows, 227 columns. This is good. > > I will have to just do an inline edit in Perl and change the file to csv from > within R and then call the read.csv.sql. > > If you have any suggestions to fix 1, I would like to try them. > > system.time(test_df <- read.csv.sql(file="out.txt")) > user system elapsed > 192.53 15.50 213.68 > Warning message: > closing unused connection 3 (out.txt) > > Thanks again. > > Satish > > -Original Message- > From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] > Sent: Saturday, February 06, 2010 3:02 PM > To: Vadlamani, Satish {FLNA} > Cc: r-help@r-project.org > Subject: Re: [R] Reading large files > > Note that you can shorten #1 to read.csv.sql("out.txt") since your > other arguments are the default values. > > For the second one, use read.csv.sql, eliminate the arguments that are > defaults anyways (should not cause a problem but its error prone) and > add an explicit eol= argument since SQLite can have problems with end > of line in some cases. Also test out your perl script separately from > R first to ensure that it works: > > test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl > parse_3wkout.pl", eol = "\n") > > SQLite has some known problems with end of line so try it with and > without the eol= argument just in case. When I just made up the > following gawk example I noticed that I did need to specify the eol= > argument. > > Also I have added a complete example using gawk as Example 13c on the > home page just now: > http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql > > > On Sat, Feb 6, 2
Re: [R] Reading large files
Gabor: Here is the update. As you can see, I got the same error as below in 1. 1. Error test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n") Error in readRegistry(key, maxdepth = 3) : Registry key 'SOFTWARE\R-core' not found 2. But the loading of the bigger file was successful as you can see below. 857 MB, 333,250 rows, 227 columns. This is good. I will have to just do an inline edit in Perl and change the file to csv from within R and then call the read.csv.sql. If you have any suggestions to fix 1, I would like to try them. system.time(test_df <- read.csv.sql(file="out.txt")) user system elapsed 192.53 15.50 213.68 Warning message: closing unused connection 3 (out.txt) Thanks again. Satish -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Saturday, February 06, 2010 3:02 PM To: Vadlamani, Satish {FLNA} Cc: r-help@r-project.org Subject: Re: [R] Reading large files Note that you can shorten #1 to read.csv.sql("out.txt") since your other arguments are the default values. For the second one, use read.csv.sql, eliminate the arguments that are defaults anyways (should not cause a problem but its error prone) and add an explicit eol= argument since SQLite can have problems with end of line in some cases. Also test out your perl script separately from R first to ensure that it works: test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl parse_3wkout.pl", eol = "\n") SQLite has some known problems with end of line so try it with and without the eol= argument just in case. When I just made up the following gawk example I noticed that I did need to specify the eol= argument. Also I have added a complete example using gawk as Example 13c on the home page just now: http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql On Sat, Feb 6, 2010 at 3:52 PM, Vadlamani, Satish {FLNA} wrote: > Gabor: > > I had success with the following. > 1. I created a csv file with a perl script called "out.txt". Then ran the > following successfully > library("sqldf") > test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = > TRUE, sep = ",", dbname = tempfile()) > > 2. I did not have success with the following. Could you tell me what I may be > doing wrong? I could paste the perl script if necessary. From the perl > script, I am reading the file, creating the csv record and printing each > record one by one and then exiting. > > Thanks. > > Not had success with below.. > #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * > from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = > tempfile()) > test_df > > Error message below: > test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * > from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = > tempfile()) > Error in readRegistry(key, maxdepth = 3) : > Registry key 'SOFTWARE\R-core' not found > In addition: Warning messages: > 1: closing unused connection 14 (3wkoutstatfcst_small.dat) > 2: closing unused connection 13 (3wkoutstatfcst_small.dat) > 3: closing unused connection 11 (3wkoutstatfcst_small.dat) > 4: closing unused connection 9 (3wkoutstatfcst_small.dat) > 5: closing unused connection 3 (3wkoutstatfcst_small.dat) >> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * >> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname >> = tempfile()) > Error in readRegistry(key, maxdepth = 3) : > Registry key 'SOFTWARE\R-core' not found > > -Original Message- > From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] > Sent: Saturday, February 06, 2010 12:14 PM > To: Vadlamani, Satish {FLNA} > Cc: r-help@r-project.org > Subject: Re: [R] Reading large files > > No. > > On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA} > wrote: >> Gabor: >> Can I pass colClasses as a vector to read.csv.sql? Thanks. >> Satish >> >> >> -Original Message- >> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >> Sent: Saturday, February 06, 2010 9:41 AM >> To: Vadlamani, Satish {FLNA} >> Cc: r-help@r-project.org >> Subject: Re: [R] Reading large files >> >> Its just any Windows batch command string that filters stdin to >> stdout. What the command consist
Re: [R] Reading large files
Gabor: I had success with the following. 1. I created a csv file with a perl script called "out.txt". Then ran the following successfully library("sqldf") test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = TRUE, sep = ",", dbname = tempfile()) 2. I did not have success with the following. Could you tell me what I may be doing wrong? I could paste the perl script if necessary. From the perl script, I am reading the file, creating the csv record and printing each record one by one and then exiting. Thanks. Not had success with below.. #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = tempfile()) test_df Error message below: test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = tempfile()) Error in readRegistry(key, maxdepth = 3) : Registry key 'SOFTWARE\R-core' not found In addition: Warning messages: 1: closing unused connection 14 (3wkoutstatfcst_small.dat) 2: closing unused connection 13 (3wkoutstatfcst_small.dat) 3: closing unused connection 11 (3wkoutstatfcst_small.dat) 4: closing unused connection 9 (3wkoutstatfcst_small.dat) 5: closing unused connection 3 (3wkoutstatfcst_small.dat) > test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * > from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = > tempfile()) Error in readRegistry(key, maxdepth = 3) : Registry key 'SOFTWARE\R-core' not found -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Saturday, February 06, 2010 12:14 PM To: Vadlamani, Satish {FLNA} Cc: r-help@r-project.org Subject: Re: [R] Reading large files No. On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA} wrote: > Gabor: > Can I pass colClasses as a vector to read.csv.sql? Thanks. > Satish > > > -Original Message- > From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] > Sent: Saturday, February 06, 2010 9:41 AM > To: Vadlamani, Satish {FLNA} > Cc: r-help@r-project.org > Subject: Re: [R] Reading large files > > Its just any Windows batch command string that filters stdin to > stdout. What the command consists of should not be important. An > invocation of perl that runs a perl script that filters stdin to > stdout might look like this: > read.csv.sql("myfile.dat", filter = "perl myprog.pl") > > For an actual example see the source of read.csv2.sql which defaults > to using a Windows vbscript program as a filter. > > On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA} > wrote: >> Jim, Gabor: >> Thanks so much for the suggestions where I can use read.csv.sql and embed >> Perl (or gawk). I just want to mention that I am running on Windows. I am >> going to read the documentation the filter argument and see if it can take a >> decent sized Perl script and then use its output as input. >> >> Suppose that I write a Perl script that parses this fwf file and creates a >> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be >> a statement or something? If you know the answer, please let me know. >> Otherwise, I will try a few things and report back the results. >> >> Thanks again. >> Saitsh >> >> >> -Original Message- >> From: jim holtman [mailto:jholt...@gmail.com] >> Sent: Saturday, February 06, 2010 6:16 AM >> To: Gabor Grothendieck >> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org >> Subject: Re: [R] Reading large files >> >> In perl the 'unpack' command makes it very easy to parse fixed fielded data. >> >> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck >> wrote: >>> Note that the filter= argument on read.csv.sql can be used to pass the >>> input through a filter written in perl, [g]awk or other language. >>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk") >>> >>> gawk has the FIELDWIDTHS variable for automatically parsing fixed >>> width fields, e.g. >>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html >>> making this very easy but perl or whatever you are most used to would >>> be fine too. >>> >>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA} >>> wrote: >>>> Hi Gabor: >>>> Thanks. My files are all in fixed width format. They are a lot of them. It >
Re: [R] Reading large files
Gabor: Can I pass colClasses as a vector to read.csv.sql? Thanks. Satish -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Saturday, February 06, 2010 9:41 AM To: Vadlamani, Satish {FLNA} Cc: r-help@r-project.org Subject: Re: [R] Reading large files Its just any Windows batch command string that filters stdin to stdout. What the command consists of should not be important. An invocation of perl that runs a perl script that filters stdin to stdout might look like this: read.csv.sql("myfile.dat", filter = "perl myprog.pl") For an actual example see the source of read.csv2.sql which defaults to using a Windows vbscript program as a filter. On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA} wrote: > Jim, Gabor: > Thanks so much for the suggestions where I can use read.csv.sql and embed > Perl (or gawk). I just want to mention that I am running on Windows. I am > going to read the documentation the filter argument and see if it can take a > decent sized Perl script and then use its output as input. > > Suppose that I write a Perl script that parses this fwf file and creates a > CSV file. Can I embed this within the read.csv.sql call? Or, can it only be a > statement or something? If you know the answer, please let me know. > Otherwise, I will try a few things and report back the results. > > Thanks again. > Saitsh > > > -Original Message- > From: jim holtman [mailto:jholt...@gmail.com] > Sent: Saturday, February 06, 2010 6:16 AM > To: Gabor Grothendieck > Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org > Subject: Re: [R] Reading large files > > In perl the 'unpack' command makes it very easy to parse fixed fielded data. > > On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck > wrote: >> Note that the filter= argument on read.csv.sql can be used to pass the >> input through a filter written in perl, [g]awk or other language. >> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk") >> >> gawk has the FIELDWIDTHS variable for automatically parsing fixed >> width fields, e.g. >> http://www.delorie.com/gnu/docs/gawk/gawk_44.html >> making this very easy but perl or whatever you are most used to would >> be fine too. >> >> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA} >> wrote: >>> Hi Gabor: >>> Thanks. My files are all in fixed width format. They are a lot of them. It >>> would take me some effort to convert them to CSV. I guess this cannot be >>> avoided? I can write some Perl scripts to convert fixed width format to CSV >>> format and then start with your suggestion. Could you let me know your >>> thoughts on the approach? >>> Satish >>> >>> >>> -Original Message- >>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >>> Sent: Friday, February 05, 2010 5:16 PM >>> To: Vadlamani, Satish {FLNA} >>> Cc: r-help@r-project.org >>> Subject: Re: [R] Reading large files >>> >>> If your problem is just how long it takes to load the file into R try >>> read.csv.sql in the sqldf package. A single read.csv.sql call can >>> create an SQLite database and table layout for you, read the file into >>> the database (without going through R so R can't slow this down), >>> extract all or a portion into R based on the sql argument you give it >>> and then remove the database. See the examples on the home page: >>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql >>> >>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani >>> wrote: >>>> >>>> Matthew: >>>> If it is going to help, here is the explanation. I have an end state in >>>> mind. It is given below under "End State" header. In order to get there, I >>>> need to start somewhere right? I started with a 850 MB file and could not >>>> load in what I think is reasonable time (I waited for an hour). >>>> >>>> There are references to 64 bit. How will that help? It is a 4GB RAM machine >>>> and there is no paging activity when loading the 850 MB file. >>>> >>>> I have seen other threads on the same types of questions. I did not see any >>>> clear cut answers or errors that I could have been making in the process. >>>> If >>>> I am missing something, please let me know. Thanks. >>>> Satish >>>> >>>> >>>> End State >>>>> Satish wrote: "at one time I will need to load say 15GB int
Re: [R] Reading large files
Jim, Gabor: Thanks so much for the suggestions where I can use read.csv.sql and embed Perl (or gawk). I just want to mention that I am running on Windows. I am going to read the documentation the filter argument and see if it can take a decent sized Perl script and then use its output as input. Suppose that I write a Perl script that parses this fwf file and creates a CSV file. Can I embed this within the read.csv.sql call? Or, can it only be a statement or something? If you know the answer, please let me know. Otherwise, I will try a few things and report back the results. Thanks again. Saitsh -Original Message- From: jim holtman [mailto:jholt...@gmail.com] Sent: Saturday, February 06, 2010 6:16 AM To: Gabor Grothendieck Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org Subject: Re: [R] Reading large files In perl the 'unpack' command makes it very easy to parse fixed fielded data. On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck wrote: > Note that the filter= argument on read.csv.sql can be used to pass the > input through a filter written in perl, [g]awk or other language. > For example: read.csv.sql(..., filter = "gawk -f myfilter.awk") > > gawk has the FIELDWIDTHS variable for automatically parsing fixed > width fields, e.g. > http://www.delorie.com/gnu/docs/gawk/gawk_44.html > making this very easy but perl or whatever you are most used to would > be fine too. > > On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA} > wrote: >> Hi Gabor: >> Thanks. My files are all in fixed width format. They are a lot of them. It >> would take me some effort to convert them to CSV. I guess this cannot be >> avoided? I can write some Perl scripts to convert fixed width format to CSV >> format and then start with your suggestion. Could you let me know your >> thoughts on the approach? >> Satish >> >> >> -Original Message- >> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >> Sent: Friday, February 05, 2010 5:16 PM >> To: Vadlamani, Satish {FLNA} >> Cc: r-help@r-project.org >> Subject: Re: [R] Reading large files >> >> If your problem is just how long it takes to load the file into R try >> read.csv.sql in the sqldf package. A single read.csv.sql call can >> create an SQLite database and table layout for you, read the file into >> the database (without going through R so R can't slow this down), >> extract all or a portion into R based on the sql argument you give it >> and then remove the database. See the examples on the home page: >> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql >> >> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani >> wrote: >>> >>> Matthew: >>> If it is going to help, here is the explanation. I have an end state in >>> mind. It is given below under "End State" header. In order to get there, I >>> need to start somewhere right? I started with a 850 MB file and could not >>> load in what I think is reasonable time (I waited for an hour). >>> >>> There are references to 64 bit. How will that help? It is a 4GB RAM machine >>> and there is no paging activity when loading the 850 MB file. >>> >>> I have seen other threads on the same types of questions. I did not see any >>> clear cut answers or errors that I could have been making in the process. If >>> I am missing something, please let me know. Thanks. >>> Satish >>> >>> >>> End State >>>> Satish wrote: "at one time I will need to load say 15GB into R" >>> >>> >>> - >>> Satish Vadlamani >>> -- >>> View this message in context: >>> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading large files
Hi Gabor: Thanks. My files are all in fixed width format. They are a lot of them. It would take me some effort to convert them to CSV. I guess this cannot be avoided? I can write some Perl scripts to convert fixed width format to CSV format and then start with your suggestion. Could you let me know your thoughts on the approach? Satish -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Friday, February 05, 2010 5:16 PM To: Vadlamani, Satish {FLNA} Cc: r-help@r-project.org Subject: Re: [R] Reading large files If your problem is just how long it takes to load the file into R try read.csv.sql in the sqldf package. A single read.csv.sql call can create an SQLite database and table layout for you, read the file into the database (without going through R so R can't slow this down), extract all or a portion into R based on the sql argument you give it and then remove the database. See the examples on the home page: http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani wrote: > > Matthew: > If it is going to help, here is the explanation. I have an end state in > mind. It is given below under "End State" header. In order to get there, I > need to start somewhere right? I started with a 850 MB file and could not > load in what I think is reasonable time (I waited for an hour). > > There are references to 64 bit. How will that help? It is a 4GB RAM machine > and there is no paging activity when loading the 850 MB file. > > I have seen other threads on the same types of questions. I did not see any > clear cut answers or errors that I could have been making in the process. If > I am missing something, please let me know. Thanks. > Satish > > > End State >> Satish wrote: "at one time I will need to load say 15GB into R" > > > - > Satish Vadlamani > -- > View this message in context: > http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading large files
Folks: I am trying to read in a large file. Definition of large is: Number of lines: 333, 250 Size: 850 MB The maching is a dual core intel, with 4 GB RAM and nothing else running on it. I read the previous threads on read.fwf and did not see any conclusive statements on how to read fast. Example record and R code given below. I was hoping to purchase a better machine and do analysis with larger datasets - but these preliminary results do not look good. Does anyone have any experience with large files (> 1GB) and using them with Revolution-R? Thanks. Satish Example Code key_vec <- c(1,3,3,4,2,8,8,2,2,3,2,2,1,3,3,3,3,9) key_names <- c("allgeo","area1","zone","dist","ccust1","whse","bindc","ccust2","account","area2","ccust3","customer","allprod","cat","bu","class","size","bdc") key_info <- data.frame(key_vec,key_names) col_names <- c(key_names,sas_time$week) num_buckets <- rep(12,209) width_vec = c(key_vec,num_buckets) col_classes<-c(rep("factor",18),rep("numeric",209)) #threewkoutstat <- read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes,n=100) threewkoutstat <- read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes) names(threewkoutstat) <- col_names Example record (only one record pasted below) A00400100379949254925004A0010020020150020150090.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.00 ! 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.60 0.600.600.700.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.00 ! 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.000.000.00 0.000.000.000.000.000.00 0.000.000.000.000.00 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing R and modules on Unix OS
Hi: I have a question about installing R (and modules) on a Unix system (AIX). Can I just gunzip (or the equivalent) the installation files into my home directory or will I need someone with root access to install R? I am hoping that the answer is the former (I can unzip all files to a directory R that I create under my home directory and I can start using it). Could you please help me with this and any other instructions to install R and modules when you do not have root access? Thanks. Satish __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to read numeric as text
Hi: If I want to read a file with read.table. I want x1 and x2 to be read as character and x3 as numeric. How to do this? Thanks. Satish x1 ,x2,x3 10,20,30 11 ,22,35 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge records in the same dataframe
Hi: Suppose that I have a data frame as below x1 x2 x3 ... x10 wk1 wk2 ... Wk208 (these are the column names) For each record, x1, x2, x3 ... x10 are attributes. and wk1, wk2, ..., wk208 are the sales recoreded for this attribute combination. Suppose that now, that I want to do the following 1. Merge the data frame so that I have a new data frame grouped by values of x2 and x3 (for example). That is, if two records have the same values of x2 and x3, they should be summed. I tried to look at merge, tapply etc. but did not see a fit with I want to do above. Thanks in advance. Satish __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with creating some loops
Hi: In general, how to I cast a character to the class that I am trying to change. For example, if I have a data frame df1. df1 has a column x. suppose I want to a substring of x (the first 3 chars). Then I want to do something like df1$new = substring(of x) Example Data frame df1 x abcd efgh Now df1$new should be ab ef Thanks. Satish _ From: Vadlamani, Satish {FLNA} Sent: Friday, October 30, 2009 8:40 AM To: R-help@r-project.org Subject:Help with creating some loops Hi All: I have a data frame called all_corn. This has 31 columns. The first column is a character key. The next 15 columns (stat1,stat2,...,stat15) are the statistical forecast. The last 15 columns (sls1,sls2,...,sls5) are actual sales. I want to calculate textbook tracking signal and cuulative percent error. 1) I am showing some of the calculations below. How can I make a loop out of this instead of manually doing this 15 times? 2) Once All these calculations are done, how do I put all these columns (err1,err2, etc.) into the same data frame? Thanks. attach(all_corn) cum_sls1 <- sls1 err1 <- sls1-stat1 cum_err1 <- sls1-stat1 cum_abs_err1 <- abs(err1) mad1 <- abs(cum_err1)/1 cum_pct_err1 <- (ifelse(cum_sls1 > 0, cum_err1/cum_sls1, 1))*100 ts1 <- ifelse(mad1 > 0, cum_err1/mad1, 0) cum_sls2 <- cum_sls1 + sls2 err2 <- sls2-stat2 cum_err2 <- cum_err1 + sls2-stat2 cum_abs_err2 <- cum_abs_err1 + abs(err2) mad2 <- cum_abs_err2/2 cum_pct_err2 <- (ifelse(cum_sls2 > 0, cum_err2/cum_sls2, 1))*100 ts2 <- ifelse(mad2 > 0, cum_err2/mad2, 0) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with creating some loops
Hi All: I have a data frame called all_corn. This has 31 columns. The first column is a character key. The next 15 columns (stat1,stat2,...,stat15) are the statistical forecast. The last 15 columns (sls1,sls2,...,sls5) are actual sales. I want to calculate textbook tracking signal and cuulative percent error. 1) I am showing some of the calculations below. How can I make a loop out of this instead of manually doing this 15 times? 2) Once All these calculations are done, how do I put all these columns (err1,err2, etc.) into the same data frame? Thanks. attach(all_corn) cum_sls1 <- sls1 err1 <- sls1-stat1 cum_err1 <- sls1-stat1 cum_abs_err1 <- abs(err1) mad1 <- abs(cum_err1)/1 cum_pct_err1 <- (ifelse(cum_sls1 > 0, cum_err1/cum_sls1, 1))*100 ts1 <- ifelse(mad1 > 0, cum_err1/mad1, 0) cum_sls2 <- cum_sls1 + sls2 err2 <- sls2-stat2 cum_err2 <- cum_err1 + sls2-stat2 cum_abs_err2 <- cum_abs_err1 + abs(err2) mad2 <- cum_abs_err2/2 cum_pct_err2 <- (ifelse(cum_sls2 > 0, cum_err2/cum_sls2, 1))*100 ts2 <- ifelse(mad2 > 0, cum_err2/mad2, 0) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with read.fwf
Hi All: I am trying to use read.fwf and encountering the following error below. Any ideas on what I can do? I tried to use read.table (and the default for read.table is space) and it works. I am not sure why read.fwf is not working test_data_frame = read.fwf(file="small.txt",widths=width_vec,header=FALSE) Error in file(FILENAME, "a") : cannot open the connection In addition: Warning message: In file(FILENAME, "a") : cannot open file 'C:\temp\RtmpLN6W00\Rfwf.2ea6bb3': No such file or directory > code below setwd("d:/edump_data/11x4_2009") current = as.Date("2009/10/25") current = chron("10/25/2009", format="m/d/y") next_date = current + 7 prev_date = current - 7 last_date = current - 7*156 #156 buckets in the past, one current bucket and 52 future buckets. Total is 209 buckets future_dates = seq.dates(next_date,by='weeks',length=52) past_dates = seq.dates(last_date,by='weeks',length=156) num_buckets = rep(14,209) width_vec = c(62,num_buckets) test_data_frame = read.fwf(file="small.txt",widths=width_vec,header=false) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on Bias calculations and question on read.fwf
Hi All: Bear with me on this longer e-mail. Questions: 1) Can you share with me on any example code that you may have that calculates bias of a statistical forecast in a time series? 2) Supposed I have the file in the fixed width format (details below). 1-62 character key 63-76 sales data point 1 77-90 sales data 2 91-94 sales data 3 and so on (each of the data points are 14 characters in width) What is the read.fwf command that will extract these columns? Some more details below. If you have any thoughts, please share with me. Basically I want to do some analysis on how we are biased on our forecasts. I have several files as shown below. I have put one record each for the sales file and the forecast file. The file is fixed width format. THe first 62 characters is the key for the records. THis should be further broken down into several column values. For A006004004016004016011 can be broken down as follows: Category = A006, BU = 004 Class = 004 Size = 016 BDC = 004016011 I then want to do cbind on both of these dataframes and compare the statistical forecast and the actual sales for a given time window. EXAMPLE RECORD FROM THE Sales file (columns truncated) A0050010240032314231003030050303A00600400401600401601123.200 23.70022.80023.300 Example record from the Stat Forecast file (columns truncated) A0050010240032314231003030050303A00600400401600401605134.800 35.50034.20034.900 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating sequence of dates
Thanks. Please expect more newbie questions!! Satish -Original Message- From: jim holtman [mailto:jholt...@gmail.com] Sent: Wednesday, October 28, 2009 7:05 AM To: Vadlamani, Satish {FLNA} Cc: R-help@r-project.org Subject: Re: [R] Generating sequence of dates try this: > current = as.Date("2009/10/25") > start <- seq(current, by='-1 week', length=2)[2] > seq(start, by='1 week', length=10) [1] "2009-10-18" "2009-10-25" "2009-11-01" "2009-11-08" "2009-11-15" "2009-11-22" "2009-11-29" "2009-12-06" "2009-12-13" [10] "2009-12-20" > On Wed, Oct 28, 2009 at 7:57 AM, Vadlamani, Satish {FLNA} wrote: > Hello All: > I have the following question > > # instantiate a date > current = as.Date("2009/10/25") > > #generate a sequence of dates in the future > future_dates = seq(current,by='1 week',length=53) > > Question: How to generate a sequence of past dates starting one week in the > past relative to the current date. Obviously, what I wrote below is not > correct. I think I can write a for loop and push each value into a vector. Is > this the best way? Thanks. > > Satish > > > past_dates = seq(current,by=-'1 week',length=156) > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generating sequence of dates
Hello All: I have the following question # instantiate a date current = as.Date("2009/10/25") #generate a sequence of dates in the future future_dates = seq(current,by='1 week',length=53) Question: How to generate a sequence of past dates starting one week in the past relative to the current date. Obviously, what I wrote below is not correct. I think I can write a for loop and push each value into a vector. Is this the best way? Thanks. Satish past_dates = seq(current,by=-'1 week',length=156) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing R and SAs
Hi: For those of you who are adept at both SAS and R, I have the following questions: a) What are some reasons / tasks for which you would use R over SAS and vice versa? b) What are some things for which R is a must have that SAS cannot fulfill the requirements? I am on the ramp up on both of them. The general feeling that I am getting by following this group is that R updates to the product are at a much faster pace and therefore, this would be better for someone who wants the bleeding edge (correct me if I am wrong). But I am also interested in what is inherently better in R that SAS cannot offer perhaps because of the design. Thanks. Satish __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 64 bit compiled version of R on windows
Hi: 1) Does anyone have experience with 64 bit compiled version of R on windows? Is this available or one has to compile it oneself? 2) If we do compile the source in 64 bit, would we then need to compile any additional modules also in 64 bit? I am just trying to prepare for the time when I will get larger datasets to analyze. Each of the datasets is about 1 GB in size and I will try to bring in about 16 of them in memory at the same time. At least that is the plan. I asked a related question in the past and someone recommended the product RevolutionR - I am looking into this also. If you can think of any other options, please mention. I have not been doing low level programming for a while now and therefore, the self compilation on windows would be the least preferable (and then I have to worry about how to compile any modules that I need). Thanks. Thanks. Satish __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about the use of large datasets in R
Hi: Sorry if this is a double post. I posted the same thing this morning and did not see it. I just started using R and am asking the following questions so that I can plan for the future when I may have to analyze volume data. 1) What are the limitations of R when it comes to handling large datasets? Say for example something like 200M rows and 15 columns data frame (between 1.5 to 2 GB in size)? Will the limitation be based on the specifications of the hardware or R itself? 2) Is R 32 bit compiled or 64 bit (on say Windows and AIX) 3) Are there any other points to note / things to keep in mind when handling large datasets? 4) Should I be looking at SAS also only for this reason (we do have SAS in-house but the problem is that I am still not sure what we have license for, etc.) Any pointers / thoughts will be appreciated. Satish __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Test mail
Hi: This is a test mail. Thanks. Satish __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.