Re: [R] csv file with two header rows

2013-04-27 Thread analys...@hotmail.com


On Apr 26, 8:17 pm, David Winsemius dwinsem...@comcast.net wrote:
 On Apr 25, 2013, at 6:35 PM, analys...@hotmail.com wrote:

  Is there a way to use read.csv() on such a file without deleting one
  of the header rows?

 What do you mean by one of the header rows?
 --

 David Winsemius
 Alameda, CA, USA


The file is imported from an external source and for some reason there
are two header rows each with a set of names for the columns.  It
would get refreshed from time to time amd I don't want to have to
remember to remove one of them by hand (its a huge file and its not
easy to get it into an editor) each time before R processing.

But the skip option suggested by the other posters did the job -
thanks to all (and it turns out the second set of names is more
English-like anyways).
 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] csv file with two header rows

2013-04-26 Thread analys...@hotmail.com
Is there a way to use read.csv() on such a file without deleting one
of the header rows?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Generate and store multiple plots

2012-06-10 Thread analys...@hotmail.com
I have a data set whose rows look like

Item date variable_1 variable_2 variable_3 variable_4


Different items may occur over different dates.

During any single study, I might select a subset of the four variables
or some function of them to be plotted against time (date).

For each item, I would select a date range and I want a plot of the
selected variables over that range for that item.

I need a method that would do this at one shot and put the plot
objects out to disk, one for each item.

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels of comma separated data

2012-05-25 Thread analys...@hotmail.com


On May 25, 4:46 am, Stefan ste...@inizio.se wrote:
 analyst41 at hotmail.com analyst41 at hotmail.com writes:



  I have a data set that has some comma separated strings in each row.
  I'd like to create a vector consisting of all distinct strings that
  occur.  The number of strings in each row may vary.

  Thanks for any help.

 #
 #
 # Some data:
 d - data.frame(id = 1:5,
   text = c('one,two',
     'two,three,three,four',
     'one,three,three,five',
     'five,five,five,five',
     'one,two,three'),
   stringsAsFactors = FALSE
 )
 #
 #
 # A function. I'm not a black belt at this, so there
 # are probably a more efficient way of writing this.
 fcn - function(x){
   a - strsplit(x, ',') # Split the string by comma
   unique(a[[1]]) # Uniquify the vector}

 #
 #
 # Use the function with sapply.
 sapply(d[,2], fcn)



Thanks - but this solves a slightly different problem - it outputs the
unique values in each row.  I want a list of the unique values in the
whole data frame.

In this case the output should be a single vector =
 c(one,two,three,four,five).


 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels of comma separated data

2012-05-25 Thread analys...@hotmail.com


On May 25, 7:23 am, analys...@hotmail.com analys...@hotmail.com
wrote:
 On May 25, 4:46 am, Stefan ste...@inizio.se wrote:





  analyst41 at hotmail.com analyst41 at hotmail.com writes:

   I have a data set that has some comma separated strings in each row.
   I'd like to create a vector consisting of all distinct strings that
   occur.  The number of strings in each row may vary.

   Thanks for any help.

  #
  #
  # Some data:
  d - data.frame(id = 1:5,
    text = c('one,two',
      'two,three,three,four',
      'one,three,three,five',
      'five,five,five,five',
      'one,two,three'),
    stringsAsFactors = FALSE
  )
  #
  #
  # A function. I'm not a black belt at this, so there
  # are probably a more efficient way of writing this.
  fcn - function(x){
    a - strsplit(x, ',') # Split the string by comma
    unique(a[[1]]) # Uniquify the vector}

  #
  #
  # Use the function with sapply.
  sapply(d[,2], fcn)

 Thanks - but this solves a slightly different problem - it outputs the
 unique values in each row.  I want a list of the unique values in the
 whole data frame.

 In this case the output should be a single vector =
  c(one,two,three,four,five).


Actually I figured it out after I posted this:

 levels(as.factor(unlist(strsplit(d$text,','
[1] five  four  one   three two

Thanks for pointing me the right way.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] levels of comma separated data

2012-05-24 Thread analys...@hotmail.com
I have a data set that has some comma separated strings in each row.
I'd like to create a vector consisting of all distinct strings that
occur.  The number of strings in each row may vary.

Thanks for any help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Need help with merge

2011-02-09 Thread analys...@hotmail.com
Have

 actualsdf
   ID  Name datadate val
1  23 Acme Corp1  23
2  23 Acme Corp2  43
3  23 Acme Corp3  54
4  23 Acme Corp4  65
5  23 Acme Corp5  23
6  23 Acme Corp6  43
7  23 Acme Corp7  NA
8  23 Acme Corp8  43
9  23 Acme Corp9  54
10 23 Acme Corp   10  32

 fcstdf
  fcstrundate fcstdate fcst ID  Name
1   56   22 23 Acme Corp
2   67   43 23 Acme Corp
3   78   54 23 Acme Corp
4   89   23 23 Acme Corp
5   9   10   NA 23 Acme Corp
6  10   11   13 23 Acme Corp

 mergeddf =merge(fcstdf,actualsdf,by.x = fcstdate,by.y = datadate,all 
 =TRUE)

 mergeddf
   fcstdate fcstrundate fcst ID.xName.x ID.yName.y val
1 1  NA   NA   NA  NA   23 Acme Corp  23
2 2  NA   NA   NA  NA   23 Acme Corp  43
3 3  NA   NA   NA  NA   23 Acme Corp  54
4 4  NA   NA   NA  NA   23 Acme Corp  65
5 5  NA   NA   NA  NA   23 Acme Corp  23
6 6   5   22   23 Acme Corp   23 Acme Corp  43
7 7   6   43   23 Acme Corp   23 Acme Corp  NA
8 8   7   54   23 Acme Corp   23 Acme Corp  43
9 9   8   23   23 Acme Corp   23 Acme Corp  54
10   10   9   NA   23 Acme Corp   23 Acme Corp  32
11   11  10   13   23 Acme Corp   NA  NA  NA

I would like mergeddf to look like

 cleanmergeddf
   fcstdate fcstrundate fcst val ID  Name
1 1  NA   NA  23 23 Acme Corp
2 2  NA   NA  43 23 Acme Corp
3 3  NA   NA  54 23 Acme Corp
4 4  NA   NA  65 23 Acme Corp
5 5  NA   NA  23 23 Acme Corp
6 6   5   22  43 23 Acme Corp
7 7   6   43  NA 23 Acme Corp
8 8   7   54  43 23 Acme Corp
9 9   8   23  54 23 Acme Corp
10   10   9   NA  32 23 Acme Corp
11   11  10   13  NA 23 Acme Corp

I can think of an awkward way - but is there a direct merged command
that would produce the final output?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hwo to speed up aggregate

2011-01-26 Thread analys...@hotmail.com
I have

 df
   quantity branch client   date  name
110  1  1 2010-01-01   one
220  2  1 2010-01-01   one
330  3  2 2010-01-01   two
415  4  1 2010-01-01   one
510  5  2 2010-01-01   two
620  6  3 2010-01-01 three
7  1000  1  1 2011-01-01   one
8  2000  2  1 2011-01-01   one
9  3000  3  2 2011-01-01   two
10 1500  4  1 2011-01-01   one
11 1000  5  2 2011-01-01   two
12 2000  6  3 2011-01-01 three

I want to aggregate away the branch. I followed a suggestion by Gabor
(thanks) and did

 aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date),sum)
  client   date quantity
1  1 2010-01-01   45
2  2 2010-01-01   40
3  3 2010-01-01   20
4  1 2011-01-01 4500
5  2 2011-01-01 4000
6  3 2011-01-01 2000

I want df$name also in the output and did what looked obvious:

 aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date,name=df$name),sum)
  client   date  name quantity
1  1 2010-01-01   one   45
2  1 2011-01-01   one 4500
3  3 2010-01-01 three   20
4  3 2011-01-01 three 2000
5  2 2010-01-01   two   40
6  2 2011-01-01   two 4000

It seems to work, but slows down tremendously for a dataframe with
around a 1000 rows.

Could anyone explain what is going on and suggest a way out?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data and parameters

2011-01-24 Thread analys...@hotmail.com
Thanks.  I finally got around to implementing it and it works.

But I think the steps to produce master_reduced can be compressed into

master_reduced = merge(master,control)

 master
  clientId date value
11 1001 10001
22 1002 10002
33 1003 10003
44 1004 10004
52 1005 10005
 control
  clientId mindate maxdate control.params
12 1001005  1
2310051005  2


  merge(master,control)
  clientId date value mindate maxdate control.params
12 1002 10002 1001005  1
22 1005 10005 1001005  1
33 1003 1000310051005  2

with the added advantage that clientId doesn't occur twice.  Is this
just coincidence or can I use this technique reliably for merges of
this sort?

 master_reduced
  clientId date value clientId mindate maxdate control.params
22 1002 100022 1001005  1
33 1003 10003310051005  2
52 1005 100052 1001005  1


On Jan 21, 5:20 am, Moritz Grenke r-l...@360mix.de wrote:
 #dummy data:
 master=as.data.frame(list(clientId=c(1:4,2), date=1001:1005,
 value=10001:10005))
 control=as.data.frame(list(clientId=c(2,3), mindate=c(100,1005),
 maxdate=c(1005,1005), control.params=c(1,2)))

 #reducing master df:
 #generating TRUE FALSE index:
 idIndex=master$clientId %in% control$clientId

 #choose only those lines where index==TRUE
 master_reduced=master[idIndex,]
 master_reduced

 #merging dfs:
 mergingIndex= match(master_reduced$clientId, control$clientId)
 master_reduced=cbind(master_reduced, control[mergingIndex,])
 master_reduced

 #finally choose those lines where date is in range
 dateIndex=master_reduced$datemaster_reduced$mindate 
 master_reduced$datemaster_reduced$maxdate
 finalDF=master_reduced[dateIndex,]
 finalDF

 Hope this helps
 Moritz
 _
 Moritz Grenkehttp://www.360mix.de

 -Ursprüngliche Nachricht-
 Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im
 Auftrag von analys...@hotmail.com
 Gesendet: Freitag, 21. Januar 2011 03:02
 An: r-h...@r-project.org
 Betreff: [R] data and parameters

 (1) I have a master data frame that reads

 ClientID |date |value

 (2) I also have a control data frame that reads

 Client ID| Min date| Max date| control parameters

 The control data set may not have all client IDs .

 I want to use the control data frame on the master data frame to
 remove client IDS that don't exist in the control data set and for
 those that do, remove dates outside the required range.

 (3) We can either put the control parameters on all rows corresponding
 to a client ID or look it up from the control data frame

 (4) The basic function call looks like

 do.something(df,control parameters)

 where df is the subset of the master data set that corresponds to a
 single client with unwanted dates removed and the control parameters
 pertain to that client.

 Any help would be appreciated.

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] two apparent anomalies

2011-01-22 Thread analys...@hotmail.com
(1)

 a = c(a,b)
 mode(a)
[1] character
 b = c(1,2)
 mode(b)
[1] numeric
 c = data.frame(a,b)
 mode(c$a)
[1] numeric

(2)


 a = c(a,a,b,b,c)
 levels(as.factor(a))
[1] a b c
 levels(as.factor(a[1:3]))
[1] a b
 a = as.factor(a)
 levels(a)
[1] a b c
 levels(a[1:3])
[1] a b c

Any explanation would be helpful.  Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two apparent anomalies

2011-01-22 Thread analys...@hotmail.com


On Jan 22, 9:50 am, Berwin A Turlach ber...@maths.uwa.edu.au wrote:
 On Sat, 22 Jan 2011 06:16:43 -0800 (PST)

 analys...@hotmail.com analys...@hotmail.com wrote:
  (1)

   a = c(a,b)
   mode(a)
  [1] character
   b = c(1,2)
   mode(b)
  [1] numeric
   c = data.frame(a,b)
   mode(c$a)
  [1] numeric

 R str(c)
 'data.frame':   2 obs. of  2 variables:
  $ a: Factor w/ 2 levels a,b: 1 2
  $ b: num  1 2

 Character vectors are turned into factors by default by data.frame().

 OTOH:

 R c = data.frame(a,b, stringsAsFactors=FALSE)
 R mode(c$a)  
 [1] character

  (2)

   a = c(a,a,b,b,c)
   levels(as.factor(a))
  [1] a b c
   levels(as.factor(a[1:3]))
  [1] a b
   a = as.factor(a)
   levels(a)
  [1] a b c
   levels(a[1:3])
  [1] a b c

 Subsetting factors does not get rid of no-longer used levels by default.

 OTOH:

 R levels(a[1:3, drop=TRUE])
 [1] a b

 or

 R levels(factor(a[1:3]))
 [1] a b

 HTH.

 Cheers,

         Berwin


Thanks for both responses.

is there a difference between the as.factor and factor commands
and also between as.data.frame and data.frame?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data and parameters

2011-01-20 Thread analys...@hotmail.com
(1) I have a master data frame that reads

ClientID |date |value

(2) I also have a control data frame that reads

Client ID| Min date| Max date| control parameters

The control data set may not have all client IDs .

I want to use the control data frame on the master data frame to
remove client IDS that don't exist in the control data set and for
those that do, remove dates outside the required range.

(3) We can either put the control parameters on all rows corresponding
to a client ID or look it up from the control data frame

(4) The basic function call looks like

do.something(df,control parameters)

where df is the subset of the master data set that corresponds to a
single client with unwanted dates removed and the control parameters
pertain to that client.

Any help would be appreciated.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Parameters/data that live globally

2011-01-18 Thread analys...@hotmail.com
I am coming to R from Fortran and I used to use fixed size arrays in
named common. common /name1/array(100)

The contents of array can be accessed/modified if and only if this
line occurs in the function.  Very helpful if different functions need
different global data (can have name2, name3 etc. for common data
blocks).

Is there a way to do this in R?

Thanks for any help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] interactive graphics from the O/S Shell prompt

2011-01-15 Thread analys...@hotmail.com
I have a function called plotID(ID) that would generate a plot for
customerID = ID.  I can run it repeatedly from within R without any
problems.

Would it be possible to run this function from the O/S command prompt;
each time you enter an ID , it would open a graphics window with the
plot for that ID and prompt you for a new ID (and perhaps if you type
quit the program terminates).

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Outputting csv file from dataframe with columns in a particular order

2011-01-12 Thread analys...@hotmail.com
I have a dataframe with columns ID,'date,estimate,actual (but
not necessarily in that order - I do a merge somewhere and that
somehow messes up the order of the columns).

How can I output it to a csv file with the columns in the order that I
want?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggredating date data

2011-01-12 Thread analys...@hotmail.com
I tried a date by date forecast of a time series and it seems to be
too wild.  How can I aggregate the date into weeks or months as
required?

Thanks.

The input looks like

ID datadate(-MM-DD)  value_for_day
-- ----
-- --   

and I want to be able to change it to

ID dataweek value_for_week

or

ID datamonth value_ for_ month

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Outputting csv file from dataframe with columns in a particular order

2011-01-12 Thread analys...@hotmail.com
Thanks to all who responded.

On Jan 12, 10:34 am, Peter Ehlers ehl...@ucalgary.ca wrote:
 On 2011-01-12 07:16, analys...@hotmail.com wrote:

  I have a dataframe with columns ID,'date,estimate,actual (but
  not necessarily in that order - I do a merge somewhere and that
  somehow messes up the order of the columns).

  How can I output it to a csv file with the columns in the order that I
  want?

 Let's say that your data.frame is DF.
 mynames - c(ID, date, estimate, actual)
 write.csv(DF[, mynames], )

 Peter Ehlers

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question on aggregate

2011-01-10 Thread analys...@hotmail.com
an example available on the net goes like

 df
  identifier quantity
1  1   10
2  1   20
3  2   30
4  1   15
5  2   10
6  3   20
 aggregate(df$quantity, by=list(df$identifier), sum)
  Group.1  x
1   1 45
2   2 40
3   3 20


I'd like Group.1 to retain the name identifier and would like to
control what x get called in the output.  Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filling up holes

2010-12-29 Thread analys...@hotmail.com


On Dec 28, 10:27 pm, bill.venab...@csiro.au wrote:
 Dear 'analyst41' (it would be a courtesy to know who you are)

 Here is a low-level way to do it.  

 First create some dummy data

  allDates - seq(as.Date(2010-01-01), by = 1, length.out = 50)
  client_ID - sample(LETTERS[1:5], 50, rep = TRUE)
  value - 1:50
  date - sample(allDates)
  clientData - data.frame(client_ID, date, value)

 At this point clientData has 50 rows, with 5 clients, each with a sample of 
 datas.  Everything is in random order execept value.

 Now write a little function to fill out a subset of the data consisting of 
 one client's data only:

  fixClient - function(cData) {

 +   dateRange - range(cData$date)
 +   dates - seq(dateRange[1], dateRange[2], by = 1)
 +   fullSet - data.frame(client_ID = as.character(cData$client_ID[1]),
 +                         date = dates, value = NA)
 +
 +   fullSet$value[match(cData$date, dates)] - cData$value
 +   fullSet  
 + }

 Now split up the data, apply the fixClient function to each section and 
 re-combine them again:

  allData - do.call(rbind,

 +                    lapply(split(clientData, clientData$client_ID), 
 fixClient))

 Check:

  head(allData)

     client_ID       date value
 A.1         A 2010-01-04    36
 A.2         A 2010-01-05    18
 A.3         A 2010-01-06    NA
 A.4         A 2010-01-07    NA
 A.5         A 2010-01-08    NA
 A.6         A 2010-01-09    49



 Seems OK.  At this point the data are in sorted order by client and date, but 
 that should not matter.

 Bill Venables.



It is of course a great honor to receive a reply from you (but please
allow me to continue to be an anonymous source of bits and bytes over
the net).

This is a neat solution, but please watch this space to see my dumber
version (the code might need to be changed to a procedural languaage
eventually).

Thank you.

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of analys...@hotmail.com
 Sent: Wednesday, 29 December 2010 10:45 AM
 To: r-h...@r-project.org
 Subject: [R] filling up holes

 I have a data frame with three columns

 client ID | date | value

 For each cilent ID I want to determine Min date and Max date and for
 any dates in between that are missing I want to insert a row

 Client ID | date| NA

 Any help would be appreciated.

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.- Hide 
 quoted text -

 - Show quoted text -

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] filling up holes

2010-12-28 Thread analys...@hotmail.com
I have a data frame with three columns

client ID | date | value


For each cilent ID I want to determine Min date and Max date and for
any dates in between that are missing I want to insert a row

Client ID | date| NA

Any help would be appreciated.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] need help with data management

2010-12-27 Thread analys...@hotmail.com


On Dec 25, 1:36 pm, Gabor Grothendieck ggrothendi...@gmail.com
wrote:
 On Sat, Dec 25, 2010 at 8:08 AM, analys...@hotmail.com





 analys...@hotmail.com wrote:
  I have a data frame that reads

  client ID date transcations

  323232   11/1/2010 22
  323232   11/2/2010 0
  323232   11/3/2010 missing
  121212   11/10/2010 32
  121212    11/11/2010 15
  .

  I want to order the rows by client ID and date and using a black-box
  forecasting method create the data fcst(client,date of forecast, date
  for which forecast applies).

   Assume that I have a function that given a time series
  x(1),x(2),x(k) will generate f(i,j) where f(i,j) = forecast j days
  ahead, given data till date i.

  How can the forecast data be best stored and how would I go about the
  taks of processing all the clients and dates?

 This isn't quite what you asked but it seems more suitable to what you
 need.  Instead of using long form data we transform it to wide form
 with one client per column.  Try copying this from this post and
 pasting it into your R session:

 Lines - 323232   11/1/2010 22
 323232   11/2/2010 0
 323232   11/3/2010 missing
 121212   11/10/2010 32
 121212    11/11/2010 15

 library(zoo)
 library(chron)

 # read in. split = 1 converts to wide form
 # can use myfile.dat in place of textConnection(Lines) for real data
 z - read.zoo(textConnection(Lines), split = 1, index = 2, FUN = chron,
       na.strings = missing)
 # d is matrix with one row per date and one col per client
 d - coredata(z)

 # just use last point as our forecast for next 3 dates
 naive.forecast - function(x) rep(tail(x, 1), 3)
 pred - apply(d, 2, naive.forecast)

 # put predictions together with the data
 rbind(d, pred)

 For the data you showed this gives:

  rbind(d, pred)

      121212 323232
 [1,]     NA     22
 [2,]     NA      0
 [3,]     NA     NA
 [4,]     32     NA
 [5,]     15     NA
 [6,]     15     NA
 [7,]     15     NA
 [8,]     15     NA

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.- Hide 
 quoted text -

 - Show quoted text -

Thank you.

Everything works on my system (windows) except that I get the final
output

 X121212 X323232
[1,]  NA  22
[2,]  NA   0
[3,]  NA  NA
[4,]  32  NA
[5,]  15  NA
[6,]  15  NA
[7,]  15  NA
[8,]  15  NA

i.e., an X gets attached to the client name.

I'd also like to retain the dates in each row.  I'll try to follow up
along these lines.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] need help with data management

2010-12-25 Thread analys...@hotmail.com
I have a data frame that reads

client ID date transcations

323232   11/1/2010 22
323232   11/2/2010 0
323232   11/3/2010 missing
121212   11/10/2010 32
12121211/11/2010 15
.


I want to order the rows by client ID and date and using a black-box
forecasting method create the data fcst(client,date of forecast, date
for which forecast applies).

 Assume that I have a function that given a time series
x(1),x(2),x(k) will generate f(i,j) where f(i,j) = forecast j days
ahead, given data till date i.

How can the forecast data be best stored and how would I go about the
taks of processing all the clients and dates?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] need help with data management

2010-12-25 Thread analys...@hotmail.com


On Dec 25, 10:17 am, David Winsemius dwinsem...@comcast.net wrote:
 On Dec 25, 2010, at 8:08 AM, analys...@hotmail.com wrote:





  I have a data frame that reads

  client ID date transcations

  323232   11/1/2010 22
  323232   11/2/2010 0
  323232   11/3/2010 missing
  121212   11/10/2010 32
  121212    11/11/2010 15
  .

  I want to order the rows by client ID and date and using a black-box
  forecasting method create the data fcst(client,date of forecast, date
  for which forecast applies).

  Assume that I have a function that given a time series
  x(1),x(2),x(k) will generate f(i,j) where f(i,j) = forecast j days
  ahead, given data till date i.

  How can the forecast data be best stored and how would I go about the
  taks of processing all the clients and dates?

 http://lmgtfy.com/?q=forecast+r-project

 --

 David Winsemius, MD
 West Hartford, CT


Thanks.  I am planning to write my own univariate forecasting routine.

My question is mostly concerned with separting out the time series by
client, generating the forecasts and then putting everything back
together into something like

ClientID | forecast date| date forecast is for |forecast| actual


 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.- Hide 
 quoted text -

 - Show quoted text -

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading csv files

2010-02-06 Thread analys...@hotmail.com


On Feb 5, 7:16 pm, Jim Lemon j...@bitwrit.com.au wrote:
 On 02/06/2010 09:05 AM, analys...@hotmail.com wrote:





  On Feb 5, 8:57 am, Barry Rowlingsonb.rowling...@lancaster.ac.uk
  wrote:
  On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com

  analys...@hotmail.com  wrote:
  the csv files are downloaded from a database and it looks like some
  character fields contain the CR-LF sequence within them.

  This causes R to see a new record/row and the number of rows it sees
  is different (usually higher) from the number of rows actually
  extracted.

    Hard to tell without an example, but I just tried this in a file:

  1,2,this
  is a test,99
  2,3,oneliner,45

  and:

  read.table(test.csv,sep=,)

     V1 V2              V3 V4
  1  1  2 this\nis a test 99
  2  2  3        oneliner 45

  seemed to work. But if your strings aren't quoted (hard to tell
  without an example) then you might have to find another way. Hard to
  tell without an example.

  Barry

  __
  r-h...@r-project.org mailing 
  listhttps://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  Here is a Hex dump (please igmore the '' at the start of each line) -
  of the file that results from extracting two rows.

  EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A   ...description..
  22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E   strongUnknown
  20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65    Anytime, Anywhe
  72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F   re Learningbr /
  3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65../strong  The
  20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6F    answer is Unkno
  77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75   wn.strong  you
  20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66    can start and f
  69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68   inish in less th
  65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73   en 17 months./s
  74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C   trong  br /..
  62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61   br /..Unknown a
  62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F   bout ensuring yo
  75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A   u learn ..

  R, Fortran and Excel see five lines, but the database has only two
  lines.

 Okay, you have five CR-LF pairs with two being EORs. It looks like the
 br /CR-LF is the EOR sequence, so it should be possible to preserve
 those while changing the others to something like ~ or deleting them.
 As I said previously, the regexperts can work out a way to distinguish
 the CR-LF pairs that are _not_ in an EOR sequence.

 You might want to think about dumping the control characters as well.

 Jim

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.- Hide 
 quoted text -


I am sure other sequences cause a false EOR also.  The false EORs are
CRLF sequences are within commas - I don't know if R can read a fixed
number of bytes regardless of EOR markers. If it can, it should be
possible to assemble the true database rows from the bytes read in.
 - Show quoted text -

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading csv files

2010-02-05 Thread analys...@hotmail.com
the csv files are downloaded from a database and it looks like some
character fields contain the CR-LF sequence within them.

This causes R to see a new record/row and the number of rows it sees
is different (usually higher) from the number of rows actually
extracted.

Any suggestions?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading csv files

2010-02-05 Thread analys...@hotmail.com


On Feb 5, 8:57 am, Barry Rowlingson b.rowling...@lancaster.ac.uk
wrote:
 On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com

 analys...@hotmail.com wrote:
  the csv files are downloaded from a database and it looks like some
  character fields contain the CR-LF sequence within them.

  This causes R to see a new record/row and the number of rows it sees
  is different (usually higher) from the number of rows actually
  extracted.

  Hard to tell without an example, but I just tried this in a file:

 1,2,this
 is a test,99
 2,3,oneliner,45

 and:

  read.table(test.csv,sep=,)

   V1 V2              V3 V4
 1  1  2 this\nis a test 99
 2  2  3        oneliner 45

 seemed to work. But if your strings aren't quoted (hard to tell
 without an example) then you might have to find another way. Hard to
 tell without an example.

 Barry

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Here is a Hex dump (please igmore the '' at the start of each line) -
of the file that results from extracting two rows.


 EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A   ...description..
 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E   strongUnknown
 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65Anytime, Anywhe
 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F   re Learningbr /
 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65   ../strong The
 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6Fanswer is Unkno
 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75   wn. strong you
 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66can start and f
 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68   inish in less th
 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73   en 17 months./s
 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C   trong br /..
 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61   br /..Unknown a
 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F   bout ensuring yo
 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A   u learn ..



R, Fortran and Excel see five lines, but the database has only two
lines.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R splits character fields in a csv file

2009-11-05 Thread analys...@hotmail.com
I download a csv extract from a database and use read.csv to read it
from R and when there are large character fields with embedded blanks,
slashes etc. - R often  sees one line as two lines (or more).

I verfied with readLines that an embedded blank in a character field
causes a spurious new line to be seen.

I am sure this problem has beem seen before - and would appreciate any
help in reading such files.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.