Re: [R] flexible approach to subsetting data

Andrea Lamont Wed, 24 Jul 2013 11:39:32 -0700

This is all very helpful.  Thank you for your comments. I will try the
approaches suggested and let you know if I have any problems.


Thank you!!!


On Tue, Jul 23, 2013 at 5:59 PM, David Winsemius <dwinsem...@comcast.net>wrote:

>
> On Jul 23, 2013, at 2:00 PM, David Carlson wrote:
>
> > Actually the ".0" on the first variable is not needed.
> >
> > You could modify the reshape() call to search for the base
> > name of each variable so you would not need to change the code
> > if the number of replications changes:
> >
> > reshape(df5,  direction="long", v.names=c("dose", "resp"),
> >       varying=list(dose=grepl("dose", names(df5)),
> >       resp=grepl("resp", names(df5)) )
> >      )
> >
>
> That's really elegant and much more "elastic". (I hadn't realized that a
> logical vector would be accepted.) Also possible to just use 'grep' which
> would instead construct a vector of column numbers as the list elements of
> 'varying'. I've wondered for years whether the help page description of
> 'varying could be improved. It currently says:
>
> "varying :
> names of sets of variables in the wide format that correspond to single
> variables in long format (time-varying). This is canonically a list of
> vectors of variable names, but it can optionally be a matrix of names, or a
> single vector of names. In each case, the names can be replaced by indices
> which are interpreted as referring to names(data). See Details for more
> details and options."
>
> I wondered if it might say instead:
>
> "a list of sets of variables in the wide format that each correspond to
> single variables in long format (time-varying). This is canonically a
> list of vectors of column names or numbers , but it can optionally be a
> matrix of names, or a single vector of names. In each case, the names can
> be replaced by numeric or logical indices which are interpreted as
> extracting from names(data). See Details for more details and options."
>
> But it supposedly is the case that it can be a set of names, and in that
> case there is also a  further promise that an effort to do the automagic
> splitting. Unfortunately the magic is often unsuccessful
>
> > reshape(df5,  direction="long",
> +       varying=c("dose", "resp")
> +      )
> Error in guess(varying) :
>   failed to guess time-varying variables from their names
>
> # Seems like it should have been possible:
> > df5
>   dose.0 resp.0 dose.1 resp.1 dose.2 resp.2 dose.3 resp.3
> 1     40     40      1      4      2      4      3      4
> 2     50     50      2      5      1      5      3      5
> 3     60     60      1      4      2      4      3      4
> 4     50     50      2      5      1      5      3      5
>
> --
> David.
>
> > -------------------------------------
> > David L Carlson
> > Associate Professor of Anthropology
> > Texas A&M University
> > College Station, TX 77840-4352
> >
> > -----Original Message-----
> > From: r-help-boun...@r-project.org
> > [mailto:r-help-boun...@r-project.org] On Behalf Of David
> > Winsemius
> > Sent: Tuesday, July 23, 2013 1:12 PM
> > To: David Winsemius
> > Cc: R help; Andrea Lamont
> > Subject: Re: [R] flexible approach to subsetting data
> >
> >
> > On Jul 23, 2013, at 10:49 AM, David Winsemius wrote:
> >
> >>
> >> On Jul 23, 2013, at 10:01 AM, Adams, Jean wrote:
> >>
> >>> Check out the reshape() function of the reshape package.
> > Here's one of the
> >>> examples from ?reshape.
> >>>
> >>> Jean
> >>>
> >>>
> >>> library(reshape)   # No,  at least not for the
> > reshape-function
> >>
> >> The reshape function is from the 'base' package. The
> > 'reshape' and 'reshape2' packages were written (at least in
> > part) because the 'reshape'-function was so difficult to
> > understand.
> >>
> >> If you do choose to use the reshape2 package, which is
> > well-respected and often extremely helpful, the function you
> > will want to start with is 'melt'.
> >>
> >>
> >>> long <- reshape(wide, direction="long")
> >>
> >> I don't think this example will be particularly helpful
> > since the initial direction is "long" (from "wide") and more
> > input would be needed.
> >
> > Here's a dataset to experiment with
> >
> > df5 <- data.frame(dose.0 =
> > c(40,50,60,50),resp.0=c(40,50,60,50),
> > dose.1 = c(1,2,1,2), resp.1=c(1,2,1,2)+3,
> > dose.2 = c(2,1,2,1), resp.2=c(1,2,1,2)+3,
> > dose.3 = c(3,3,3,3), resp.3=c(1,2,1,2)+3 )
> >
> > Notice that you would need add the ".0" to the column names
> >
> > reshape(df5,  direction="long",
> >              v.names=c("dose", "resp"),
> >               varying=list(dose=c(1,3,5,7), resp=c(2,4,6,8) )
> >        )  # succeeds
> >
> >
> >
> > So perhaps could use similar call (after append the ".0"'s)
> > with:
> >
> >  varying=list(sim=seq(1,810,by=4),
> >               X1= seq(2,810,by=4),
> >               X2= seq(3,810,by=4),
> >               X3= seq(4,810,by=4)
> >               )
> >
> >>
> >>
> >>> wide
> >>> long
> >>>
> >>>
> >>>
> >>> On Tue, Jul 23, 2013 at 9:35 AM, Andrea Lamont
> > <alamont...@gmail.com> wrote:
> >>>
> >>>> Hello:
> >>>>
> >>>> I am running a simulation study and am stuck with a
> > subsetting problem.
> >>>>
> >>>> Here is the basic issue:
> >>>> I generated data and am running a simulation that uses
> > multiple imputation.
> >>>> For each generated dataset, I used multiple imputation.
> > The resultant
> >>>> dataset is in wide for where each imputation is recorded
> > as a separate
> >>>> column (though the different simulations are stacked).
> > Here is an example
> >>>> of what it looks like:
> >>>>
> >>>> sim   X1   X2   X3   sim.1   X1.1    X1.1    X3.1
> >>
> >>>> 1         #    #     #        #           #          #
> > #
> >>>> 1         #    #     #        #           #          #
> > #
> >>>> 1         #    #     #        #           #          #
> > #
> >>>> 2         #    #     #        #           #          #
> > #
> >>>> 2         #    #     #        #           #          #
> > #
> >>>> 2         #    #     #        #           #          #
> > #
> >>>>
> >>>> sim refers to the simulated/generated dataset. X1-X3 are
> > the values for the
> >>>> first imputed dataset, X1.1-X3.1 are the values for the
> > second imputed
> >>>> dataset.
> >>>>
> >>>> The problem is that I want the data to be in long format,
> > like this:
> >>>>
> >>>> sim m X1 X2 X3
> >>>> 1  1   #   #    #
> >>>> 1  2   #   #    #
> >>>> 2  1   #   #    #
> >>>> 2  2   #   #    #
> >>>>
> >>>> where m is the imputation number.
> >>>> This will allow me to do cleaner calculations (e.g.
> > X3-X1).
> >>>>
> >>>> I know I can subset the data manually - e.g. [,1:10] and
> > save this to
> >>>> separate datasets then  rbind; however, I'm looking for a
> > more flexible
> >>>> approach to do this.  This manual approach would be quite
> > tedious as number
> >>>> of imputations (and therefore number of columns) increased
> > (with only 10
> >>>> imputations, there are roughly 810 columns). Also,I would
> > like to
> >>>> avoid having to recode each time I change the number of
> > imputations.
> >>>>
> >>>> THe same is true for the reshape function, which would
> > require naming
> >>>> a huge number of columns and edits each time 'm' changes.
> >>
> >> If the columns are named regularly, then 'reshape' will
> > attempt to split properly without an explicit naming. Details
> > and a better description of the problem might allow more
> > specific answers to emerge. The fact that the first instances
> > have no numeric indicators may be a problem for the algorithm.
> >
> >>
> >> Why not post dput(head( dfrm[ ,1:12]))
> >>
> >> --
> >> David.
> >>
> >>>>
> >>>>
> >>>> Is there a flexible way to approach this? I'm inclined to
> > use a for loop,
> >>>> but know that 1) this is generally inefficient and 2) am
> > having trouble
> >>>> with
> >>>> the coding regardless.
> >>>>
> >>>> Any suggestions are appreciated.
> >>>>
> >>>> Thanks,
> >>>> Andrea
> >>>>
> >
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible
> > code.
> >
>
> David Winsemius
> Alameda, CA, USA
>
>


-- 
Andrea Lamont, MA
Clinical-Community Psychology
University of South Carolina
Barnwell College
Columbia, SC 29208

Please consider the environment before printing this email.

CONFIDENTIAL: This transmission is intended for the use of the
individual(s) or entity to which it is addressed, and may contain
information that is privileged, confidential, and exempt from disclosure
under applicable law. Should the reader of this message not be the intended
recipient(s), you are hereby notified that any dissemination, distribution,
or copying of this communication is strictly prohibited.  If you are not
the intended recipient, please contact the sender by reply email and
destroy/delete all copies of the original message.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] flexible approach to subsetting data

Reply via email to