Re: [R] More efficient use of reshape?

2012-12-14 Thread Nathan Miller
Thanks John,

I appreciate your help.

With the help of Dennis Murphy, code along the lines of this

climData_melt <- melt(clim.data, id = 'date', measure = c("GISS", "HAD",
  "NOAA", "RSS", "UAH"))

actually gets the data into the correct form very simply and from there its
easy to plot with faceting in ggplot2.

Thanks for helping clear up the reshape and reshape2 issues. I try to make
sure I have those types of things figured out before posting to the list.
Didn't mean to confuse the issue by incorrectly referring to the
packages/functions or their locations.

Thanks,
Nate



On Fri, Dec 14, 2012 at 7:38 AM, John Kane  wrote:

> I think David was pointing out that reshape() is not a reshape2 function.
>  It is in the stats package.
>
> I am not sure exactly what you are doing but perhaps something along the
> lines of
>
> library(reshape2)
> mm  <-  melt(clim.data, id = Cs("yr_frac", "yr_mn","AMO", "NINO34",
> "SSTA"))
>
> is a start?
>
> I also don't think that the more recent versions of ggplot2 automatically
> load reshape2 so it may be that you are working with a relatively old
> installation of ggplot and reshape?
>
> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: i686-pc-linux-gnu (32-bit)
>
> locale:
>  [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
> LC_TIME=en_CA.UTF-8
>  [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8
>  LC_MESSAGES=en_CA.UTF-8
>  [7] LC_PAPER=C LC_NAME=C  LC_ADDRESS=C
> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] grid  stats graphics  grDevices utils datasets  methods
> base
>
> other attached packages:
> [1] lubridate_1.2.0directlabels_2.9   RColorBrewer_1.0-5
> gridExtra_0.9.1stringr_0.6.2
> [6] scales_0.2.3   plyr_1.8   reshape2_1.2.1 ggplot2_0.9.3
>
> loaded via a namespace (and not attached):
> [1] colorspace_1.2-0 dichromat_1.2-4  digest_0.6.0 gtable_0.1.2
> labeling_0.1
> [6] MASS_7.3-22  munsell_0.4  proto_0.3-9.2    tools_2.15.2
>
>
>
>
>
>
> John Kane
> Kingston ON Canada
>
>
> > -Original Message-
> > From: natemille...@gmail.com
> > Sent: Thu, 13 Dec 2012 09:58:34 -0800
> > To: dwinsem...@comcast.net
> > Subject: Re: [R] More efficient use of reshape?
> >
> > Sorry David,
> >
> > In my attempt to simplify example and just include the code I felt was
> > necessary I left out the loading of ggplot2, which then imports reshape2,
> > and which was actually used in the code I provided. Sorry to the mistake
> > and my misunderstanding of where the reshape function was coming from.
> > Should have checked that more carefully.
> >
> > Thanks,
> > Nate
> >
> >
> > On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius
> > wrote:
> >
> >>
> >> On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:
> >>
> >>  Hi all,
> >>>
> >>> I have played a bit with the "reshape" package and function along with
> >>> "melt" and "cast", but I feel I still don't have a good handle on how
> >>> to
> >>> use them efficiently. Below I have included a application of "reshape"
> >>> that
> >>> is rather clunky and I'm hoping someone can offer advice on how to use
> >>> reshape (or melt/cast) more efficiently.
> >>>
> >>>
> >> You do realize that the 'reshape' function is _not_ in the reshape
> >> package, right? And also that the reshape package has been superseded by
> >> the reshape2 package?
> >>
> >> --
> >> David.
> >>
> >>
> >>> #For this example I am using climate change data available on-line
> >>>
> >>> file <- ("
> >>>
> http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csv<
> http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv>
> >>> ")
> >>> clim.data <- read.csv(file, header=TRUE)
> >>>
> >>> library(lubridate)
> >>> library(reshape)
> >>>
> >>> #I've been playing with the lubridate package a bit to work with dates,
> >>> but
> >>> as the climate dataset only uses year and month I have
> >>> #added a "day" to each entry in the "yr_mn" column and then used "dym"
&g

Re: [R] More efficient use of reshape?

2012-12-14 Thread John Kane
I think David was pointing out that reshape() is not a reshape2 function.  It 
is in the stats package.

I am not sure exactly what you are doing but perhaps something along the lines 
of 

library(reshape2)  
mm  <-  melt(clim.data, id = Cs("yr_frac", "yr_mn","AMO", "NINO34", "SSTA"))

is a start?  

I also don't think that the more recent versions of ggplot2 automatically load 
reshape2 so it may be that you are working with a relatively old installation 
of ggplot and reshape?

sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C   LC_TIME=en_CA.UTF-8  
 
 [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8
LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=C LC_NAME=C  LC_ADDRESS=C 
 
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C  
 

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods   base  
   

other attached packages:
[1] lubridate_1.2.0directlabels_2.9   RColorBrewer_1.0-5 gridExtra_0.9.1
stringr_0.6.2 
[6] scales_0.2.3   plyr_1.8   reshape2_1.2.1 ggplot2_0.9.3 

loaded via a namespace (and not attached):
[1] colorspace_1.2-0 dichromat_1.2-4  digest_0.6.0 gtable_0.1.2 
labeling_0.1
[6] MASS_7.3-22  munsell_0.4  proto_0.3-9.2tools_2.15.2






John Kane
Kingston ON Canada


> -Original Message-
> From: natemille...@gmail.com
> Sent: Thu, 13 Dec 2012 09:58:34 -0800
> To: dwinsem...@comcast.net
> Subject: Re: [R] More efficient use of reshape?
> 
> Sorry David,
> 
> In my attempt to simplify example and just include the code I felt was
> necessary I left out the loading of ggplot2, which then imports reshape2,
> and which was actually used in the code I provided. Sorry to the mistake
> and my misunderstanding of where the reshape function was coming from.
> Should have checked that more carefully.
> 
> Thanks,
> Nate
> 
> 
> On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius
> wrote:
> 
>> 
>> On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:
>> 
>>  Hi all,
>>> 
>>> I have played a bit with the "reshape" package and function along with
>>> "melt" and "cast", but I feel I still don't have a good handle on how
>>> to
>>> use them efficiently. Below I have included a application of "reshape"
>>> that
>>> is rather clunky and I'm hoping someone can offer advice on how to use
>>> reshape (or melt/cast) more efficiently.
>>> 
>>> 
>> You do realize that the 'reshape' function is _not_ in the reshape
>> package, right? And also that the reshape package has been superseded by
>> the reshape2 package?
>> 
>> --
>> David.
>> 
>> 
>>> #For this example I am using climate change data available on-line
>>> 
>>> file <- ("
>>> http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csv<http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv>
>>> ")
>>> clim.data <- read.csv(file, header=TRUE)
>>> 
>>> library(lubridate)
>>> library(reshape)
>>> 
>>> #I've been playing with the lubridate package a bit to work with dates,
>>> but
>>> as the climate dataset only uses year and month I have
>>> #added a "day" to each entry in the "yr_mn" column and then used "dym"
>>> from
>>> lubridate to generate the POSIXlt formatted dates in
>>> #a new column clim.data$date
>>> 
>>> clim.data$yr_mn<-paste("01", clim.data$yr_mn, sep="")
>>> clim.data$date<-dym(clim.data$**yr_mn)
>>> 
>>> #Now to the reshape. The dataframe is in a wide format. The columns
>>> GISS,
>>> HAD, NOAA, RSS, and UAH are all different sources
>>> #from which the global temperature anomaly has been calculated since
>>> 1880
>>> (actually only 1978 for RSS and UAH). What I would like to
>>> #do is plot the temperature anomaly vs date and use ggplot to facet by
>>> the
>>> different data source (GISS, HAD, etc.). Thus I need the
>>> #data in long format with a date column, a temperature anomaly column,
>>> and
>>> a data source column. The code below works, but its
>>> #really very clunky and I'm sure I am not using these tools as
>>> efficiently
>>> as I can.
>>> 
>>> #The varying=list(

Re: [R] More efficient use of reshape?

2012-12-13 Thread Nathan Miller
Sorry David,

In my attempt to simplify example and just include the code I felt was
necessary I left out the loading of ggplot2, which then imports reshape2,
and which was actually used in the code I provided. Sorry to the mistake
and my misunderstanding of where the reshape function was coming from.
Should have checked that more carefully.

Thanks,
Nate


On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius wrote:

>
> On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:
>
>  Hi all,
>>
>> I have played a bit with the "reshape" package and function along with
>> "melt" and "cast", but I feel I still don't have a good handle on how to
>> use them efficiently. Below I have included a application of "reshape"
>> that
>> is rather clunky and I'm hoping someone can offer advice on how to use
>> reshape (or melt/cast) more efficiently.
>>
>>
> You do realize that the 'reshape' function is _not_ in the reshape
> package, right? And also that the reshape package has been superseded by
> the reshape2 package?
>
> --
> David.
>
>
>> #For this example I am using climate change data available on-line
>>
>> file <- ("
>> http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csv
>> ")
>> clim.data <- read.csv(file, header=TRUE)
>>
>> library(lubridate)
>> library(reshape)
>>
>> #I've been playing with the lubridate package a bit to work with dates,
>> but
>> as the climate dataset only uses year and month I have
>> #added a "day" to each entry in the "yr_mn" column and then used "dym"
>> from
>> lubridate to generate the POSIXlt formatted dates in
>> #a new column clim.data$date
>>
>> clim.data$yr_mn<-paste("01", clim.data$yr_mn, sep="")
>> clim.data$date<-dym(clim.data$**yr_mn)
>>
>> #Now to the reshape. The dataframe is in a wide format. The columns GISS,
>> HAD, NOAA, RSS, and UAH are all different sources
>> #from which the global temperature anomaly has been calculated since 1880
>> (actually only 1978 for RSS and UAH). What I would like to
>> #do is plot the temperature anomaly vs date and use ggplot to facet by the
>> different data source (GISS, HAD, etc.). Thus I need the
>> #data in long format with a date column, a temperature anomaly column, and
>> a data source column. The code below works, but its
>> #really very clunky and I'm sure I am not using these tools as efficiently
>> as I can.
>>
>> #The varying=list(3:7) specifies the columns in the dataframe that
>> corresponded to the sources (GISS, etc.), though then in the resulting
>> #reshaped dataframe the sources are numbered 1-5, so I have to reassigned
>> their names. In addition, the original dataframe has
>> #additional data columns I do not want and so after reshaping I create
>> another! dataframe with just the columns I need, and
>> #then I have to rename them so that I can keep track of what everything
>> is.
>> Whew! Not the most elegant of code.
>>
>> d<-reshape(clim.data, varying=list(3:7),idvar="date"**,
>> v.names="anomaly",direction="**long")
>>
>> d$time<-ifelse(d$time==1,"**GISS",d$time)
>> d$time<-ifelse(d$time==2,"HAD"**,d$time)
>> d$time<-ifelse(d$time==3,"**NOAA",d$time)
>> d$time<-ifelse(d$time==4,"RSS"**,d$time)
>> d$time<-ifelse(d$time==5,"UAH"**,d$time)
>>
>> new.data<-data.frame(d$date,d$**time,d$anomaly)
>> names(new.data)<-c("date","**source","anomaly")
>>
>> I realize this is a mess, though it works. I think with just some help on
>> how better to work this example I'll probably get over the learning hump
>> and actually figure out how to use these data manipulation functions more
>> cleanly.
>>
>> Any advice or assistance would be appreciated.
>> Thanks,
>> Nate
>>
>> [[alternative HTML version deleted]]
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html 
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> Alameda, CA, USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] More efficient use of reshape?

2012-12-13 Thread David Winsemius


On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:


Hi all,

I have played a bit with the "reshape" package and function along with
"melt" and "cast", but I feel I still don't have a good handle on  
how to
use them efficiently. Below I have included a application of  
"reshape" that

is rather clunky and I'm hoping someone can offer advice on how to use
reshape (or melt/cast) more efficiently.



You do realize that the 'reshape' function is _not_ in the reshape  
package, right? And also that the reshape package has been superseded  
by the reshape2 package?


--
David.



#For this example I am using climate change data available on-line

file <- ("
http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv";)
clim.data <- read.csv(file, header=TRUE)

library(lubridate)
library(reshape)

#I've been playing with the lubridate package a bit to work with  
dates, but

as the climate dataset only uses year and month I have
#added a "day" to each entry in the "yr_mn" column and then used  
"dym" from

lubridate to generate the POSIXlt formatted dates in
#a new column clim.data$date

clim.data$yr_mn<-paste("01", clim.data$yr_mn, sep="")
clim.data$date<-dym(clim.data$yr_mn)

#Now to the reshape. The dataframe is in a wide format. The columns  
GISS,

HAD, NOAA, RSS, and UAH are all different sources
#from which the global temperature anomaly has been calculated since  
1880

(actually only 1978 for RSS and UAH). What I would like to
#do is plot the temperature anomaly vs date and use ggplot to facet  
by the

different data source (GISS, HAD, etc.). Thus I need the
#data in long format with a date column, a temperature anomaly  
column, and

a data source column. The code below works, but its
#really very clunky and I'm sure I am not using these tools as  
efficiently

as I can.

#The varying=list(3:7) specifies the columns in the dataframe that
corresponded to the sources (GISS, etc.), though then in the resulting
#reshaped dataframe the sources are numbered 1-5, so I have to  
reassigned

their names. In addition, the original dataframe has
#additional data columns I do not want and so after reshaping I create
another! dataframe with just the columns I need, and
#then I have to rename them so that I can keep track of what  
everything is.

Whew! Not the most elegant of code.

d<-reshape(clim.data, varying=list(3:7),idvar="date",
v.names="anomaly",direction="long")

d$time<-ifelse(d$time==1,"GISS",d$time)
d$time<-ifelse(d$time==2,"HAD",d$time)
d$time<-ifelse(d$time==3,"NOAA",d$time)
d$time<-ifelse(d$time==4,"RSS",d$time)
d$time<-ifelse(d$time==5,"UAH",d$time)

new.data<-data.frame(d$date,d$time,d$anomaly)
names(new.data)<-c("date","source","anomaly")

I realize this is a mess, though it works. I think with just some  
help on
how better to work this example I'll probably get over the learning  
hump
and actually figure out how to use these data manipulation functions  
more

cleanly.

Any advice or assistance would be appreciated.
Thanks,
Nate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.