Re: [R] csv file with two header rows
On Apr 26, 8:17 pm, David Winsemius dwinsem...@comcast.net wrote: On Apr 25, 2013, at 6:35 PM, analys...@hotmail.com wrote: Is there a way to use read.csv() on such a file without deleting one of the header rows? What do you mean by one of the header rows? -- David Winsemius Alameda, CA, USA The file is imported from an external source and for some reason there are two header rows each with a set of names for the columns. It would get refreshed from time to time amd I don't want to have to remember to remove one of them by hand (its a huge file and its not easy to get it into an editor) each time before R processing. But the skip option suggested by the other posters did the job - thanks to all (and it turns out the second set of names is more English-like anyways). __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] csv file with two header rows
Is there a way to use read.csv() on such a file without deleting one of the header rows? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generate and store multiple plots
I have a data set whose rows look like Item date variable_1 variable_2 variable_3 variable_4 Different items may occur over different dates. During any single study, I might select a subset of the four variables or some function of them to be plotted against time (date). For each item, I would select a date range and I want a plot of the selected variables over that range for that item. I need a method that would do this at one shot and put the plot objects out to disk, one for each item. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels of comma separated data
On May 25, 4:46 am, Stefan ste...@inizio.se wrote: analyst41 at hotmail.com analyst41 at hotmail.com writes: I have a data set that has some comma separated strings in each row. I'd like to create a vector consisting of all distinct strings that occur. The number of strings in each row may vary. Thanks for any help. # # # Some data: d - data.frame(id = 1:5, text = c('one,two', 'two,three,three,four', 'one,three,three,five', 'five,five,five,five', 'one,two,three'), stringsAsFactors = FALSE ) # # # A function. I'm not a black belt at this, so there # are probably a more efficient way of writing this. fcn - function(x){ a - strsplit(x, ',') # Split the string by comma unique(a[[1]]) # Uniquify the vector} # # # Use the function with sapply. sapply(d[,2], fcn) Thanks - but this solves a slightly different problem - it outputs the unique values in each row. I want a list of the unique values in the whole data frame. In this case the output should be a single vector = c(one,two,three,four,five). __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels of comma separated data
On May 25, 7:23 am, analys...@hotmail.com analys...@hotmail.com wrote: On May 25, 4:46 am, Stefan ste...@inizio.se wrote: analyst41 at hotmail.com analyst41 at hotmail.com writes: I have a data set that has some comma separated strings in each row. I'd like to create a vector consisting of all distinct strings that occur. The number of strings in each row may vary. Thanks for any help. # # # Some data: d - data.frame(id = 1:5, text = c('one,two', 'two,three,three,four', 'one,three,three,five', 'five,five,five,five', 'one,two,three'), stringsAsFactors = FALSE ) # # # A function. I'm not a black belt at this, so there # are probably a more efficient way of writing this. fcn - function(x){ a - strsplit(x, ',') # Split the string by comma unique(a[[1]]) # Uniquify the vector} # # # Use the function with sapply. sapply(d[,2], fcn) Thanks - but this solves a slightly different problem - it outputs the unique values in each row. I want a list of the unique values in the whole data frame. In this case the output should be a single vector = c(one,two,three,four,five). Actually I figured it out after I posted this: levels(as.factor(unlist(strsplit(d$text,',' [1] five four one three two Thanks for pointing me the right way. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] levels of comma separated data
I have a data set that has some comma separated strings in each row. I'd like to create a vector consisting of all distinct strings that occur. The number of strings in each row may vary. Thanks for any help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need help with merge
Have actualsdf ID Name datadate val 1 23 Acme Corp1 23 2 23 Acme Corp2 43 3 23 Acme Corp3 54 4 23 Acme Corp4 65 5 23 Acme Corp5 23 6 23 Acme Corp6 43 7 23 Acme Corp7 NA 8 23 Acme Corp8 43 9 23 Acme Corp9 54 10 23 Acme Corp 10 32 fcstdf fcstrundate fcstdate fcst ID Name 1 56 22 23 Acme Corp 2 67 43 23 Acme Corp 3 78 54 23 Acme Corp 4 89 23 23 Acme Corp 5 9 10 NA 23 Acme Corp 6 10 11 13 23 Acme Corp mergeddf =merge(fcstdf,actualsdf,by.x = fcstdate,by.y = datadate,all =TRUE) mergeddf fcstdate fcstrundate fcst ID.xName.x ID.yName.y val 1 1 NA NA NA NA 23 Acme Corp 23 2 2 NA NA NA NA 23 Acme Corp 43 3 3 NA NA NA NA 23 Acme Corp 54 4 4 NA NA NA NA 23 Acme Corp 65 5 5 NA NA NA NA 23 Acme Corp 23 6 6 5 22 23 Acme Corp 23 Acme Corp 43 7 7 6 43 23 Acme Corp 23 Acme Corp NA 8 8 7 54 23 Acme Corp 23 Acme Corp 43 9 9 8 23 23 Acme Corp 23 Acme Corp 54 10 10 9 NA 23 Acme Corp 23 Acme Corp 32 11 11 10 13 23 Acme Corp NA NA NA I would like mergeddf to look like cleanmergeddf fcstdate fcstrundate fcst val ID Name 1 1 NA NA 23 23 Acme Corp 2 2 NA NA 43 23 Acme Corp 3 3 NA NA 54 23 Acme Corp 4 4 NA NA 65 23 Acme Corp 5 5 NA NA 23 23 Acme Corp 6 6 5 22 43 23 Acme Corp 7 7 6 43 NA 23 Acme Corp 8 8 7 54 43 23 Acme Corp 9 9 8 23 54 23 Acme Corp 10 10 9 NA 32 23 Acme Corp 11 11 10 13 NA 23 Acme Corp I can think of an awkward way - but is there a direct merged command that would produce the final output? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hwo to speed up aggregate
I have df quantity branch client date name 110 1 1 2010-01-01 one 220 2 1 2010-01-01 one 330 3 2 2010-01-01 two 415 4 1 2010-01-01 one 510 5 2 2010-01-01 two 620 6 3 2010-01-01 three 7 1000 1 1 2011-01-01 one 8 2000 2 1 2011-01-01 one 9 3000 3 2 2011-01-01 two 10 1500 4 1 2011-01-01 one 11 1000 5 2 2011-01-01 two 12 2000 6 3 2011-01-01 three I want to aggregate away the branch. I followed a suggestion by Gabor (thanks) and did aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date),sum) client date quantity 1 1 2010-01-01 45 2 2 2010-01-01 40 3 3 2010-01-01 20 4 1 2011-01-01 4500 5 2 2011-01-01 4000 6 3 2011-01-01 2000 I want df$name also in the output and did what looked obvious: aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date,name=df$name),sum) client date name quantity 1 1 2010-01-01 one 45 2 1 2011-01-01 one 4500 3 3 2010-01-01 three 20 4 3 2011-01-01 three 2000 5 2 2010-01-01 two 40 6 2 2011-01-01 two 4000 It seems to work, but slows down tremendously for a dataframe with around a 1000 rows. Could anyone explain what is going on and suggest a way out? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data and parameters
Thanks. I finally got around to implementing it and it works. But I think the steps to produce master_reduced can be compressed into master_reduced = merge(master,control) master clientId date value 11 1001 10001 22 1002 10002 33 1003 10003 44 1004 10004 52 1005 10005 control clientId mindate maxdate control.params 12 1001005 1 2310051005 2 merge(master,control) clientId date value mindate maxdate control.params 12 1002 10002 1001005 1 22 1005 10005 1001005 1 33 1003 1000310051005 2 with the added advantage that clientId doesn't occur twice. Is this just coincidence or can I use this technique reliably for merges of this sort? master_reduced clientId date value clientId mindate maxdate control.params 22 1002 100022 1001005 1 33 1003 10003310051005 2 52 1005 100052 1001005 1 On Jan 21, 5:20 am, Moritz Grenke r-l...@360mix.de wrote: #dummy data: master=as.data.frame(list(clientId=c(1:4,2), date=1001:1005, value=10001:10005)) control=as.data.frame(list(clientId=c(2,3), mindate=c(100,1005), maxdate=c(1005,1005), control.params=c(1,2))) #reducing master df: #generating TRUE FALSE index: idIndex=master$clientId %in% control$clientId #choose only those lines where index==TRUE master_reduced=master[idIndex,] master_reduced #merging dfs: mergingIndex= match(master_reduced$clientId, control$clientId) master_reduced=cbind(master_reduced, control[mergingIndex,]) master_reduced #finally choose those lines where date is in range dateIndex=master_reduced$datemaster_reduced$mindate master_reduced$datemaster_reduced$maxdate finalDF=master_reduced[dateIndex,] finalDF Hope this helps Moritz _ Moritz Grenkehttp://www.360mix.de -Ursprüngliche Nachricht- Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im Auftrag von analys...@hotmail.com Gesendet: Freitag, 21. Januar 2011 03:02 An: r-h...@r-project.org Betreff: [R] data and parameters (1) I have a master data frame that reads ClientID |date |value (2) I also have a control data frame that reads Client ID| Min date| Max date| control parameters The control data set may not have all client IDs . I want to use the control data frame on the master data frame to remove client IDS that don't exist in the control data set and for those that do, remove dates outside the required range. (3) We can either put the control parameters on all rows corresponding to a client ID or look it up from the control data frame (4) The basic function call looks like do.something(df,control parameters) where df is the subset of the master data set that corresponds to a single client with unwanted dates removed and the control parameters pertain to that client. Any help would be appreciated. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] two apparent anomalies
(1) a = c(a,b) mode(a) [1] character b = c(1,2) mode(b) [1] numeric c = data.frame(a,b) mode(c$a) [1] numeric (2) a = c(a,a,b,b,c) levels(as.factor(a)) [1] a b c levels(as.factor(a[1:3])) [1] a b a = as.factor(a) levels(a) [1] a b c levels(a[1:3]) [1] a b c Any explanation would be helpful. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two apparent anomalies
On Jan 22, 9:50 am, Berwin A Turlach ber...@maths.uwa.edu.au wrote: On Sat, 22 Jan 2011 06:16:43 -0800 (PST) analys...@hotmail.com analys...@hotmail.com wrote: (1) a = c(a,b) mode(a) [1] character b = c(1,2) mode(b) [1] numeric c = data.frame(a,b) mode(c$a) [1] numeric R str(c) 'data.frame': 2 obs. of 2 variables: $ a: Factor w/ 2 levels a,b: 1 2 $ b: num 1 2 Character vectors are turned into factors by default by data.frame(). OTOH: R c = data.frame(a,b, stringsAsFactors=FALSE) R mode(c$a) [1] character (2) a = c(a,a,b,b,c) levels(as.factor(a)) [1] a b c levels(as.factor(a[1:3])) [1] a b a = as.factor(a) levels(a) [1] a b c levels(a[1:3]) [1] a b c Subsetting factors does not get rid of no-longer used levels by default. OTOH: R levels(a[1:3, drop=TRUE]) [1] a b or R levels(factor(a[1:3])) [1] a b HTH. Cheers, Berwin Thanks for both responses. is there a difference between the as.factor and factor commands and also between as.data.frame and data.frame? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data and parameters
(1) I have a master data frame that reads ClientID |date |value (2) I also have a control data frame that reads Client ID| Min date| Max date| control parameters The control data set may not have all client IDs . I want to use the control data frame on the master data frame to remove client IDS that don't exist in the control data set and for those that do, remove dates outside the required range. (3) We can either put the control parameters on all rows corresponding to a client ID or look it up from the control data frame (4) The basic function call looks like do.something(df,control parameters) where df is the subset of the master data set that corresponds to a single client with unwanted dates removed and the control parameters pertain to that client. Any help would be appreciated. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Parameters/data that live globally
I am coming to R from Fortran and I used to use fixed size arrays in named common. common /name1/array(100) The contents of array can be accessed/modified if and only if this line occurs in the function. Very helpful if different functions need different global data (can have name2, name3 etc. for common data blocks). Is there a way to do this in R? Thanks for any help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] interactive graphics from the O/S Shell prompt
I have a function called plotID(ID) that would generate a plot for customerID = ID. I can run it repeatedly from within R without any problems. Would it be possible to run this function from the O/S command prompt; each time you enter an ID , it would open a graphics window with the plot for that ID and prompt you for a new ID (and perhaps if you type quit the program terminates). Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Outputting csv file from dataframe with columns in a particular order
I have a dataframe with columns ID,'date,estimate,actual (but not necessarily in that order - I do a merge somewhere and that somehow messes up the order of the columns). How can I output it to a csv file with the columns in the order that I want? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggredating date data
I tried a date by date forecast of a time series and it seems to be too wild. How can I aggregate the date into weeks or months as required? Thanks. The input looks like ID datadate(-MM-DD) value_for_day -- ---- -- -- and I want to be able to change it to ID dataweek value_for_week or ID datamonth value_ for_ month __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Outputting csv file from dataframe with columns in a particular order
Thanks to all who responded. On Jan 12, 10:34 am, Peter Ehlers ehl...@ucalgary.ca wrote: On 2011-01-12 07:16, analys...@hotmail.com wrote: I have a dataframe with columns ID,'date,estimate,actual (but not necessarily in that order - I do a merge somewhere and that somehow messes up the order of the columns). How can I output it to a csv file with the columns in the order that I want? Let's say that your data.frame is DF. mynames - c(ID, date, estimate, actual) write.csv(DF[, mynames], ) Peter Ehlers __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question on aggregate
an example available on the net goes like df identifier quantity 1 1 10 2 1 20 3 2 30 4 1 15 5 2 10 6 3 20 aggregate(df$quantity, by=list(df$identifier), sum) Group.1 x 1 1 45 2 2 40 3 3 20 I'd like Group.1 to retain the name identifier and would like to control what x get called in the output. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] filling up holes
On Dec 28, 10:27 pm, bill.venab...@csiro.au wrote: Dear 'analyst41' (it would be a courtesy to know who you are) Here is a low-level way to do it. First create some dummy data allDates - seq(as.Date(2010-01-01), by = 1, length.out = 50) client_ID - sample(LETTERS[1:5], 50, rep = TRUE) value - 1:50 date - sample(allDates) clientData - data.frame(client_ID, date, value) At this point clientData has 50 rows, with 5 clients, each with a sample of datas. Everything is in random order execept value. Now write a little function to fill out a subset of the data consisting of one client's data only: fixClient - function(cData) { + dateRange - range(cData$date) + dates - seq(dateRange[1], dateRange[2], by = 1) + fullSet - data.frame(client_ID = as.character(cData$client_ID[1]), + date = dates, value = NA) + + fullSet$value[match(cData$date, dates)] - cData$value + fullSet + } Now split up the data, apply the fixClient function to each section and re-combine them again: allData - do.call(rbind, + lapply(split(clientData, clientData$client_ID), fixClient)) Check: head(allData) client_ID date value A.1 A 2010-01-04 36 A.2 A 2010-01-05 18 A.3 A 2010-01-06 NA A.4 A 2010-01-07 NA A.5 A 2010-01-08 NA A.6 A 2010-01-09 49 Seems OK. At this point the data are in sorted order by client and date, but that should not matter. Bill Venables. It is of course a great honor to receive a reply from you (but please allow me to continue to be an anonymous source of bits and bytes over the net). This is a neat solution, but please watch this space to see my dumber version (the code might need to be changed to a procedural languaage eventually). Thank you. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of analys...@hotmail.com Sent: Wednesday, 29 December 2010 10:45 AM To: r-h...@r-project.org Subject: [R] filling up holes I have a data frame with three columns client ID | date | value For each cilent ID I want to determine Min date and Max date and for any dates in between that are missing I want to insert a row Client ID | date| NA Any help would be appreciated. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.- Hide quoted text - - Show quoted text - __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] filling up holes
I have a data frame with three columns client ID | date | value For each cilent ID I want to determine Min date and Max date and for any dates in between that are missing I want to insert a row Client ID | date| NA Any help would be appreciated. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help with data management
On Dec 25, 1:36 pm, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Sat, Dec 25, 2010 at 8:08 AM, analys...@hotmail.com analys...@hotmail.com wrote: I have a data frame that reads client ID date transcations 323232 11/1/2010 22 323232 11/2/2010 0 323232 11/3/2010 missing 121212 11/10/2010 32 121212 11/11/2010 15 . I want to order the rows by client ID and date and using a black-box forecasting method create the data fcst(client,date of forecast, date for which forecast applies). Assume that I have a function that given a time series x(1),x(2),x(k) will generate f(i,j) where f(i,j) = forecast j days ahead, given data till date i. How can the forecast data be best stored and how would I go about the taks of processing all the clients and dates? This isn't quite what you asked but it seems more suitable to what you need. Instead of using long form data we transform it to wide form with one client per column. Try copying this from this post and pasting it into your R session: Lines - 323232 11/1/2010 22 323232 11/2/2010 0 323232 11/3/2010 missing 121212 11/10/2010 32 121212 11/11/2010 15 library(zoo) library(chron) # read in. split = 1 converts to wide form # can use myfile.dat in place of textConnection(Lines) for real data z - read.zoo(textConnection(Lines), split = 1, index = 2, FUN = chron, na.strings = missing) # d is matrix with one row per date and one col per client d - coredata(z) # just use last point as our forecast for next 3 dates naive.forecast - function(x) rep(tail(x, 1), 3) pred - apply(d, 2, naive.forecast) # put predictions together with the data rbind(d, pred) For the data you showed this gives: rbind(d, pred) 121212 323232 [1,] NA 22 [2,] NA 0 [3,] NA NA [4,] 32 NA [5,] 15 NA [6,] 15 NA [7,] 15 NA [8,] 15 NA -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.- Hide quoted text - - Show quoted text - Thank you. Everything works on my system (windows) except that I get the final output X121212 X323232 [1,] NA 22 [2,] NA 0 [3,] NA NA [4,] 32 NA [5,] 15 NA [6,] 15 NA [7,] 15 NA [8,] 15 NA i.e., an X gets attached to the client name. I'd also like to retain the dates in each row. I'll try to follow up along these lines. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need help with data management
I have a data frame that reads client ID date transcations 323232 11/1/2010 22 323232 11/2/2010 0 323232 11/3/2010 missing 121212 11/10/2010 32 12121211/11/2010 15 . I want to order the rows by client ID and date and using a black-box forecasting method create the data fcst(client,date of forecast, date for which forecast applies). Assume that I have a function that given a time series x(1),x(2),x(k) will generate f(i,j) where f(i,j) = forecast j days ahead, given data till date i. How can the forecast data be best stored and how would I go about the taks of processing all the clients and dates? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help with data management
On Dec 25, 10:17 am, David Winsemius dwinsem...@comcast.net wrote: On Dec 25, 2010, at 8:08 AM, analys...@hotmail.com wrote: I have a data frame that reads client ID date transcations 323232 11/1/2010 22 323232 11/2/2010 0 323232 11/3/2010 missing 121212 11/10/2010 32 121212 11/11/2010 15 . I want to order the rows by client ID and date and using a black-box forecasting method create the data fcst(client,date of forecast, date for which forecast applies). Assume that I have a function that given a time series x(1),x(2),x(k) will generate f(i,j) where f(i,j) = forecast j days ahead, given data till date i. How can the forecast data be best stored and how would I go about the taks of processing all the clients and dates? http://lmgtfy.com/?q=forecast+r-project -- David Winsemius, MD West Hartford, CT Thanks. I am planning to write my own univariate forecasting routine. My question is mostly concerned with separting out the time series by client, generating the forecasts and then putting everything back together into something like ClientID | forecast date| date forecast is for |forecast| actual __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.- Hide quoted text - - Show quoted text - __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading csv files
On Feb 5, 7:16 pm, Jim Lemon j...@bitwrit.com.au wrote: On 02/06/2010 09:05 AM, analys...@hotmail.com wrote: On Feb 5, 8:57 am, Barry Rowlingsonb.rowling...@lancaster.ac.uk wrote: On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com analys...@hotmail.com wrote: the csv files are downloaded from a database and it looks like some character fields contain the CR-LF sequence within them. This causes R to see a new record/row and the number of rows it sees is different (usually higher) from the number of rows actually extracted. Hard to tell without an example, but I just tried this in a file: 1,2,this is a test,99 2,3,oneliner,45 and: read.table(test.csv,sep=,) V1 V2 V3 V4 1 1 2 this\nis a test 99 2 2 3 oneliner 45 seemed to work. But if your strings aren't quoted (hard to tell without an example) then you might have to find another way. Hard to tell without an example. Barry __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Here is a Hex dump (please igmore the '' at the start of each line) - of the file that results from extracting two rows. EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A ...description.. 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E strongUnknown 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65 Anytime, Anywhe 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F re Learningbr / 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65../strong The 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6F answer is Unkno 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75 wn.strong you 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66 can start and f 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68 inish in less th 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73 en 17 months./s 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C trong br /.. 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61 br /..Unknown a 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F bout ensuring yo 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A u learn .. R, Fortran and Excel see five lines, but the database has only two lines. Okay, you have five CR-LF pairs with two being EORs. It looks like the br /CR-LF is the EOR sequence, so it should be possible to preserve those while changing the others to something like ~ or deleting them. As I said previously, the regexperts can work out a way to distinguish the CR-LF pairs that are _not_ in an EOR sequence. You might want to think about dumping the control characters as well. Jim __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.- Hide quoted text - I am sure other sequences cause a false EOR also. The false EORs are CRLF sequences are within commas - I don't know if R can read a fixed number of bytes regardless of EOR markers. If it can, it should be possible to assemble the true database rows from the bytes read in. - Show quoted text - __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading csv files
the csv files are downloaded from a database and it looks like some character fields contain the CR-LF sequence within them. This causes R to see a new record/row and the number of rows it sees is different (usually higher) from the number of rows actually extracted. Any suggestions? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading csv files
On Feb 5, 8:57 am, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com analys...@hotmail.com wrote: the csv files are downloaded from a database and it looks like some character fields contain the CR-LF sequence within them. This causes R to see a new record/row and the number of rows it sees is different (usually higher) from the number of rows actually extracted. Hard to tell without an example, but I just tried this in a file: 1,2,this is a test,99 2,3,oneliner,45 and: read.table(test.csv,sep=,) V1 V2 V3 V4 1 1 2 this\nis a test 99 2 2 3 oneliner 45 seemed to work. But if your strings aren't quoted (hard to tell without an example) then you might have to find another way. Hard to tell without an example. Barry __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Here is a Hex dump (please igmore the '' at the start of each line) - of the file that results from extracting two rows. EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A ...description.. 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E strongUnknown 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65Anytime, Anywhe 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F re Learningbr / 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65 ../strong The 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6Fanswer is Unkno 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75 wn. strong you 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66can start and f 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68 inish in less th 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73 en 17 months./s 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C trong br /.. 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61 br /..Unknown a 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F bout ensuring yo 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A u learn .. R, Fortran and Excel see five lines, but the database has only two lines. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R splits character fields in a csv file
I download a csv extract from a database and use read.csv to read it from R and when there are large character fields with embedded blanks, slashes etc. - R often sees one line as two lines (or more). I verfied with readLines that an embedded blank in a character field causes a spurious new line to be seen. I am sure this problem has beem seen before - and would appreciate any help in reading such files. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.