[R] Odp: How to see the previous commands after save workspace/load workspace ?

2011-08-04 Thread Petr PIKAL
Hi

> 
> I did save workspace and when I load it, I can see the variables,
> using ls().
> But I cannot see the commands from the program I saved. How to do
> that?

Perhaps you can check .Rhistory file.

Regards
Petr

> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave - landscape figure

2011-08-04 Thread Duncan Mackay

Hi Eduardo

in the preamble put

\usepackage[figureright]{rotating}

see manual for figureright if you do not like it

and then some graphics with options where needed

\begin{sidewaysfigure}
\centering
\includegraphics[width=,%
 clip=true,%
 trim=0in 0in 0in 0in,% LBRT
 keepaspectratio=true]%
 {filename}
\end{sidewaysfigure}

otherwise \usepackage landscape (check spelling) for a full page

HTH

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
ARMIDALE NSW 2351
Email: home mac...@northnet.com.au



At 05:58 05/08/2011, you wrote:

On 04/08/2011 3:40 PM, Eduardo M. A. M. Mendes wrote:

Dear R-users

I am trying to understand how Sweave works by 
running some simple examples.  In the example I 
am working with there is a chunk where the 
R-commands related to plotting a figure are 
placed.  When running R CMD Sweave … , pdflatex 
the output is a portrait figure.  I wonder 
whether it would be possible to change the 
orientation to landscape (not in the latex file but in Rnw file).


Sweave can change the height and width of the 
figure so it is more landscape-shaped (width > 
height) using options at the start of the chunk.


Rotating a figure is something LaTeX needs to 
do:  you would tell Sweave to produce the figure 
but not include it, then use \includegraphics{} 
with the right option to rotate it.


For example:

<>=
plot(rnorm(100))
@

\includegraphics[angle=90,width=0.8\textheight]{Myfig}

This is untested, and you'll need to consult a 
LaTeX reference for rotating the figure caption, etc.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Latent Class with covariates

2011-08-04 Thread Iasonas Lamprianou
Dear all 

Last week, I posted a question and some of you spent their valuable time to 
respond. Thank you (once again). This time, after doing a lot of homework, I 
return with  a second question on the issue.

I was advised to use poLCA for a Latent Class Analysis with covariates. The 
software works well, although I feel a bit uncomfortable with the fact that you 
need to run many times the algorithm to reach the global maxima (so how do we 
know that we replicated the analysis enough times???). This software solved one 
of my problems, but now I need a software to be able to compute probability of 
laten group membersips using ordinal "response" variables and at the same time 
fit covariates.

I have seven ordinal variables of Political Trust (scale from 0-10 where 0 
means no trust and 10 means full trust). Around 50,000 persons completed the 
questionnaire. I would like to compute the probability of membership for 4 (it 
seems that this is about the right number) of clusters. Then, I will need to 
use a small number of covariates to predict membership. Unfortunately, poLCA 
cannot compute probability of latent group membership for ordinal categorical 
"response" variables. Can aynone suggest another R package (or any other free 
package)? 

Or maybe is there  an R package to use my ordinal "response" variables to 
compute membership probabilities and then (as a second independent analysis) 
use another software to compute regression coefficients where the membership 
probabilities will be coninuous(???) dependent variables?

Thnk you for the help

Jason 


Dr. Iasonas Lamprianou
Department of Social and Political Sciences
University of Cyprus



>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] functions on rows or columns of two (or more) arrays

2011-08-04 Thread Dennis Murphy
Hi:

Here's one approach:

a=matrix(1:50,nrow=10)
a2=floor(jitter(a,amount=50))

# Write a function to combine the columns of interest
# into a data frame and fit a linear model
regfn <- function(k) {
 rdf <- data.frame(x = a[k, ], y = a2[k, ])
 lm(y ~ x, data = rdf)
   }

# Use lapply() to run regfn() recursively along
# the rows of a and a2:
modlist <- lapply(seq_len(nrow(a)), regfn)

# I prefer plyr for extraction of output from a list of models.
# Here are a few examples:

library('plyr')
# Extract the R^2 values
ldply(modlist, function(m) summary(m)$r.squared)
# Extract the residuals
laply(modlist, function(m) resid(m))
# Extract the estimated model coefficients
ldply(modlist, function(m) coef(m))
# Extract the coefficient summary tables as a list
llply(modlist, function(m) summary(m)$coefficients)

In the anonymous functions, the argument m refers to an arbitrary lm
object, so you can do to it what you would with any given lm object;
all you're doing is abstracting the process.

HTH,
Dennis

On Thu, Aug 4, 2011 at 2:17 PM, Jim Bouldin  wrote:
> I realize this should be simple, but even after reading over the several
> help pages several times, I still cannot decide between the myriad "apply"
> functions to address it.  I simply want to apply a function to all the rows
> (or columns) of the same index from two (or more) identically sized arrays
> (or data frames).
>
> For example:
>
>> a=matrix(1:50,nrow=10)
>> a2=floor(jitter(a,amount=50))
>> a
>      [,1] [,2] [,3] [,4] [,5]
>  [1,]    1   11   21   31   41
>  [2,]    2   12   22   32   42
>  [3,]    3   13   23   33   43
>  [4,]    4   14   24   34   44
>  [5,]    5   15   25   35   45
>  [6,]    6   16   26   36   46
>  [7,]    7   17   27   37   47
>  [8,]    8   18   28   38   48
>  [9,]    9   19   29   39   49
> [10,]   10   20   30   40   50
>> a2
>      [,1] [,2] [,3] [,4] [,5]
>  [1,]   31 56 -29 -13 10
>  [2,]   38   61   71   55    9
>  [3,]  -29   38   47   12   38
>  [4,]   12    2   43   39   93
>  [5,]  -43   23  -23   62    1
>  [6,]  -13   61   55   11    2
>  [7,]  -42    1   38   12    8
>  [8,]  -13   -6  -18   16   95
>  [9,]  -19   -2   78   33    1
> [10,]   20 -16 -11 19 17
>
> if I try the following for example:
> apply(a,1,function(x) lm(a~a2))
>
> I get 10 identical repeats (except for the list indexer) of the following:
>
> [[1]]
>
> Call:
> lm(formula = a ~ a2)
>
> Coefficients:
>             [,1]       [,2]       [,3]       [,4]       [,5]
> (Intercept)   8.372135  18.372135  28.372135  38.372135  48.372135
> a21          -0.006163  -0.006163  -0.006163  -0.006163  -0.006163
> a22          -0.093390  -0.093390  -0.093390  -0.093390  -0.093390
> a23           0.009315   0.009315   0.009315   0.009315   0.009315
> a24          -0.015143  -0.015143  -0.015143  -0.015143  -0.015143
> a25          -0.026761  -0.026761  -0.026761  -0.026761  -0.026761
>
> ...Which is clearly very wrong, in a number of ways.  If I try by columns:
> apply(a,2,function(x) lm(a~a2))
> ...I get exactly the same result.
>
> So, which is the appropriate apply-type function when two arrays (or
> d.f.'s?) are involved like this? Or none of them and some other approach
> (other than looping which I can do but which I assume is not optimal)?
> Thanks for any help.
> --
> Jim Bouldin, PhD
> Research Ecologist
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Translate Sine Function in R?

2011-08-04 Thread Katrina Bennett
Hello, I'm trying to generate a sine wave in R to fit my observations using
the general formula:

y=a*sin(b[x+h*pi)]+k

where a = amplitude, b=period, h=phase shift, and k=vertical shift

I want to use following translation to bring the sine function up onto the
y-axis to range from 0-1, and this will place the wave on the x-axis from
0-pi/2.

y=1/2sin(2[x+ 1/4*pi]) + 1/2

Additionally, I need to spread this along a x-axis that spans 1-153 (days).

Can anyone help with this? I seem to be able to use the curve function fine,
but entering the translations doesn't seem to provide an answer.

Here is an example of the data set I am trying to 'match' using this
function.

dat <-
c(75.44855206,NA,NA,NA,82.70745342,82.5335019,88.56617647,80.00128866,94.15418227,86.63987539,93.91052952,74.10612245,86.62289562,90.47961047,NA,NA,82.45320197,72.14371257,NA,71.44104803,72.59742896,68.36363636,NA,NA,61,NA,NA,71.26502909,NA,85.9333,84.34248284,79.00522193,79.64223058,97.2074017,88.43700548,96.40413877,95.13511869,92.57379057,93.97498475,NA,97.55995131,89.53321146,97.21728545,93.21980198,77.54054054,95.85392575,86.25684723,97.55325624,80.03950617,NA,91.34023128,92.42906574,88.59433962,65.77272727,89.63772455,NA,NA,NA,NA,74.86344239,83.57594937,70.22516556,65.30543319,NA,NA,67.84852294,60.90909091,54.79303797,NA,52.18735363,33.47003155,NA,41.34693878,24.5047043,NA,NA,NA,NA,9.9,13.6875,NA,11.90267176,84.14285714,3.781456954,NA,1.432926829,4.26557377,1.823529412,0.444620253,4.711155378,NA,6.320284698,0.581632653,0.144578313,3.7,0,0,0,0,0,NA,0.032947462,0,0,10.54545455,0,NA,0.561007958,0.75,NA,0.048780488,0.74137931,NA,2.023339318,0,0,0,NA,NA,0.156950673,NA,0.283769634,32.81818182,NA,NA,0,NA,0,0,0,NA,0.212454212,3.120181406,NA,0.011811024,NA,0,0.120430108,5.928571429,1.75,0.679292929,0.97,NA,0,NA,NA,1,0.38547486,NA,1.460732984,0.007795889,0.05465288,0.004341534)
plot(dat/100)
par(new=F)
x.seq <- seq(100, 0, , 153)
y <- ??? #*y = 2 sin 2π (x - 1/4)* or y ~ a + c*sin(x+b)

However, I can't find a reference for the no place for k. Also, I've tried a
lot of different iterations, but can't seem to figure out how to do this in
R.

Any thoughts or ideas on this?

Thank you,

Katrina

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] source() or OS X Lion?

2011-08-04 Thread Mark Ebbert
For pete's sake, I knew that. I apologize for wasting everyone's time. I tell 
ya, this has been an off week for me.

Thank you for your kind responses.

On Aug 4, 2011, at 9:35 PM, Rolf Turner wrote:

> 
> Note that ***histogram()*** (as opposed to "hist()") is a function from the
> "lattice" package. So at some stage you must have issued the command
> "require(lattice)" or equivalently "library(lattice)".  (So your ``exact 
> code''
> is a little misleading.)
> 
> You are thus getting bitten by the fact that the output of lattice
> plot functions must be explicitly *printed* when called from within
> a function such as source().  See fortune("line 800").  They needn't
> be explicitly printed when called from the command line, which is
> why you are getting the figure you expect by typing the code at the
> R prompt.
> 
> So put
> 
> print(histogram(~Total.Change,data=x,xlab="Percent Change"))
> 
> in the file that you are source()-ing and all will be in harmony in the
> universe.
> 
> Nothing really to do either with source() or with OS X Lion.
> 
> cheers,
> 
> Rolf Turner
> 
> P. S. Apropos of nothing (but speaking of lions) everyone should read
> ``The Lion of Boaz-Jachin and Jachin-Boaz'' by Russell Hoban. :-)
> 
> R. T.
> 
> On 05/08/11 09:24, Mark Ebbert wrote:
>> Dear R Gurus,
>> 
>> I'm seeing some strange behavior that I can't explain. I'm generating a 
>> figure for a paper and I like to save the script (no matter how simple) for 
>> future reference. My practice is to write the script and run it using the 
>> 'source()' function. What's weird is that the resultant figure is not 
>> readable by OS X 10.7.0 (Lion). While trying to figure out what I did wrong, 
>> I discovered that typing the exact same code into the R prompt (running in 
>> Terminal) will produce the figure as I would expect it. The only idea I have 
>> is that something has changed in Lion that doesn't allow 'source()' to 
>> interpret it properly.
>> 
>> Any ideas? Here is the exact code I'm using:
>>   1 x<-read.delim("path/data.txt")
>>   2
>>   3 pdf("path/PaperFig-hist_of_perc_change-individ_samps-by_subtype.pdf")
>>   4 histogram(~Total.Change,data=x,xlab="Percent Change")
>>   5 dev.off()
>> 
>> I appreciate any help. I'm especially curious if there's a Lion user who 
>> could give this a try.
>> 
>> OS X 10.7.0 (Lion)
>> R version 2.13.0 (2011-04-13)
>> Copyright (C) 2011 The R Foundation for Statistical Computing
>> ISBN 3-900051-07-0
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Not really off-topic ...

2011-08-04 Thread John Kane
 You clearly have some strange ... err, whatever.

--- On Thu, 8/4/11, Ted Harding  wrote:

> From: Ted Harding 
> Subject: [R] Not really off-topic ...
> To: r-h...@stat.math.ethz.ch
> Received: Thursday, August 4, 2011, 8:44 PM
> Greetings all!
> Just arrived via my daily digest of the ALLSTAT list
> is the following:
> 
> http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I
> 
> If you watch/listen, you will see/hear that it is not
> at all off-topic.
> 
> And now to bed, before my correspondence is analysed.
> 
> Best wishes,
> Ted.
> 
> 
> E-Mail: (Ted Harding) 
> Fax-to-email: +44 (0)870 094 0861
> Date: 05-Aug-11           
>                
>            Time:
> 01:44:07
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] boolean SEXP interpretation upon function return

2011-08-04 Thread Alexandre Aguiar
Hi,

When a function returns a SEXP of type LGLSXP (logical) to signal whether 
it succeeded or failed, how is it intrepreted? Is it like C where SUCCESS 
= 0 or other value?

Thanks.


-- 


Alexandre

--
Alexandre Santos Aguiar, MD, SCT


signature.asc
Description: This is a digitally signed message part.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Which is more efficient?

2011-08-04 Thread R. Michael Weylandt
You can study this yourself using the System.time() utility: just write
System.time() around any block of code and R will time it for you.

Offhand, I'd guess example2 may be ever so slightly quicker since it doesn't
have to create colA and colB, but not to a degree that would be noticeable
for reasonably sized data. More importantly, you should probably notice that
the examples give different output: one puts just the p.value of the t.test
in tt_pvalue while the other puts the entire t.test object. You probably
meant

Example2:
tt_pvalue [ i ] <- t.test ( temp[ , j ], temp[ , k ],
var.equal=TRUE)$p.value

If you are a beginner, I'd strongly suggest you wait the extra 3.2
milliseconds and use code like example one: it will be easier to debug.

In your second block of code, you wind up t-testing a column against itself
many times and you wind up deleting many of the p.values you store. Is this
actual code or are you more interested in how something would be vectorized?
If the first, write back and I'll talk to you about storing the results and
doing the tests in a logical manner.

If you are only interested from a coding efficiency point of view, the first
for loop over all the files is probably best replaced by

L =  lapply(files_to_test, read.table, header=TRUE, sep="\t")

This will create a list object L containing all of the file information:
List objects are basically R's way of sticking any combination of objects
together in one big "super-object" that can contain anything. (I'm sure the
code experts will want to correct me, but for a beginner I think that gives
sufficient intuition.)

Once you have everything in R you have a wealth of opportunities depending
on what you want to do: there's an open thread started by J. Bouldin on how
to do things columnwise over different objects most efficiently in R right
now that will hopefully get some good answers. Let me know if there's a
specific thing you want to wind up doing and I'll try to give you a hand: if
it's just a theoretical interest, keep an eye on the other thread.

Hope this helps,

Michael Weylandt


On Thu, Aug 4, 2011 at 11:19 PM, Matt Curcio wrote:

> Greetings all,
> I am curious to know if either of these two sets of code is more efficient?
>
> Example1:
>  ## t-test ##
> colA <- temp [ , j ]
> colB <- temp [ , k ]
> ttr <- t.test ( colA, colB, var.equal=TRUE)
> tt_pvalue [ i ] <- ttr$p.value
>
> or
> Example2:
> tt_pvalue [ i ] <- t.test ( temp[ , j ], temp[ , k ], var.equal=TRUE)
> -
> I have three loops, i, j, k.
> One to test the all of  files in a directory.  One to tease out
> column  and compare it by means of t-test to column  in each of
> the files.
> ---
> for ( i in 1:num_files ) {
>   temp <- read.table ( files_to_test [ i ], header=TRUE, sep="\t")
>   num_cols <- ncol ( temp )
>   ## Define Columns To Compare ##
>   for ( j in 2 : num_cols ) {
>  for ( k in 3 : num_cols ) {
>  ## t-test ##
>  colA <- temp [ , j ]
>  colB <- temp [ , k ]
>  ttr <- t.test ( colA, colB, var.equal=TRUE)
>  tt_pvalue [ i ] <- ttr$p.value
>  }
>   }
> }
> 
> I am a novice writer of code and am interested to hear if there are
> any (dis)advantages to one way or the other.
> M
>
>
> Matt Curcio
> M: 401-316-5358
> E: matt.curcio...@gmail.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way of creating a shifted (lagged) variable?

2011-08-04 Thread R. Michael Weylandt
Yes, the stats package has a lag function, but it's not really appropriate
for the sample data Dimitri gave us: to wit, it's not of "ts" (time series)
class so lag doesn't know what to do with it and gives an error message.
Perhaps it's just that I never really took the time to get used to it, but
I'm not a fan of R's "ts" class

And while I do endorse the use of time series objects over data frames when
appropriate, I'd suggest going for the xts class which has greater
functionality and it's default lag seems to have a more logical default:
lagging the series in xts moves the first data point back, while it moves it
forward in the ts class.

I guess my point is: if Dimitri is working with the dates directly instead
of time series (as some of his other posts have suggested), he should stay
in the data frame type and use the lag I, Daniel, or Josh wrote: if he has
proper time series, he might as well jump right into xts.

Michael Weylandt

On Thu, Aug 4, 2011 at 7:57 PM, Ken H  wrote:

> Hey all,
>   Correct me if I'm wrong but, the 'stats' package has a lag() function
> like so
>lagged.series=lag(series,number of lags wanted)
>Furthermore, I am pretty sure that lag( ) accepts negative lags:=>
> leads.
>   lag(x,1)=> object of one lag, lag(x,-1) object with one
> lead.
>   Hope this answers your question,
>Ken
>
> On Thu, Aug 4, 2011 at 4:19 PM, Dimitri Liakhovitski <
> dimitri.liakhovit...@gmail.com> wrote:
>
> > Thanks a lot for the recommendations - some of them I am implementing
> > already.
> >
> > Just a clarification:
> > the only reason I try to compare things to SPSS is that I am the only
> > person in my office using R. Whenever I work on an R code my goal is
> > not just to make it work, but also to "boast" to the SPSS users that
> > it's much easier/faster/niftier in R. So, you are preaching to the
> > choir here.
> >
> > Dimitri
> >
> >
> > On Thu, Aug 4, 2011 at 4:02 PM, Joshua Wiley 
> > wrote:
> > >
> > >
> > > On Aug 4, 2011, at 11:46, Dimitri Liakhovitski <
> > dimitri.liakhovit...@gmail.com> wrote:
> > >
> > >> Thanks a lot, guys!
> > >> It's really helpful. But - to be objective- it's still quite a few
> > >> lines longer than in SPSS.
> > >
> > > Not once you've sources the function!  For the simple case of a vector,
> > try:
> > >
> > > X <- 1:10
> > > mylag2 <- function(X, lag) {
> > >  c(rep(NA, length(seq(lag))), X[-seq(lag)])
> > > }
> > >
> > > Though this does not work for lead, it is fairly short. Then you could
> > use the *apply family if you needed it on multiple columns or vectors.
> > >
> > > Cheers,
> > >
> > > Josh
> > >
> > >> Dimitri
> > >>
> > >> On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund <
> > djnordl...@frontier.com> wrote:
> > >>>
> > >>>
> >  -Original Message-
> >  From: r-help-boun...@r-project.org [mailto:
> > r-help-boun...@r-project.org]
> >  On Behalf Of Dimitri Liakhovitski
> >  Sent: Thursday, August 04, 2011 8:24 AM
> >  To: r-help
> >  Subject: [R] Efficient way of creating a shifted (lagged) variable?
> > 
> >  Hello!
> > 
> >  I have a data set:
> >  set.seed(123)
> >  y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-
> >  31"),by="week"))
> >  y$var1<-c(1,2,3,round(rnorm(54),1))
> >  y$var2<-c(10,20,30,round(rnorm(54),1))
> > 
> >  # All I need is to create lagged variables for var1 and var2. I
> looked
> >  around a bit and found several ways of doing it. They all seem quite
> >  complicated - while in SPSS it's just a few letters (like LAG()).
> Here
> >  is what I've written but I wonder. It works - but maybe there is a
> >  very simple way of doing it in R that I could not find?
> >  I need the same for "lead" (opposite of lag).
> >  Any hint is greatly appreciated!
> > 
> >  ### The function I created:
> >  mylag <- function(x,max.lag=1){   # x has to be a 1-column data
> frame
> > temp<-
> > 
> > as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
> > for(i in 1:length(temp)){
> >   names(temp)[i]<-paste(names(x),".lag",i,sep="")
> >  }
> >    return(temp)
> >  }
> > 
> >  ### Running mylag to get my result:
> >  myvars<-c("var1","var2")
> >  for(i in myvars) {
> >    y<-cbind(y,mylag(y[i]),max.lag=2)
> >  }
> >  (y)
> > 
> >  --
> >  Dimitri Liakhovitski
> >  marketfusionanalytics.com
> > 
> > >>>
> > >>> Dimitri,
> > >>>
> > >>> I would first look into the zoo package as has already been
> suggested.
> >  However, if you haven't already got your solution then here are a couple
> of
> > functions that might help you get started.  I won't vouch for efficiency.
> > >>>
> > >>>
> > >>> lag.fun <- function(df, x, max.lag=1) {
> > >>>  for(i in x) {
> > >>>for(j in 1:max.lag){
> > >>>  lagx <- paste(i,'.lag',j,sep='')
> > >>>  df[,lagx] <- 

Re: [R] source() or OS X Lion?

2011-08-04 Thread Rolf Turner


Note that ***histogram()*** (as opposed to "hist()") is a function from the
"lattice" package. So at some stage you must have issued the command
"require(lattice)" or equivalently "library(lattice)".  (So your ``exact 
code''

is a little misleading.)

You are thus getting bitten by the fact that the output of lattice
plot functions must be explicitly *printed* when called from within
a function such as source().  See fortune("line 800").  They needn't
be explicitly printed when called from the command line, which is
why you are getting the figure you expect by typing the code at the
R prompt.

So put

print(histogram(~Total.Change,data=x,xlab="Percent Change"))

in the file that you are source()-ing and all will be in harmony in the
universe.

Nothing really to do either with source() or with OS X Lion.

cheers,

Rolf Turner

P. S. Apropos of nothing (but speaking of lions) everyone should read
``The Lion of Boaz-Jachin and Jachin-Boaz'' by Russell Hoban. :-)

R. T.

On 05/08/11 09:24, Mark Ebbert wrote:

Dear R Gurus,

I'm seeing some strange behavior that I can't explain. I'm generating a figure 
for a paper and I like to save the script (no matter how simple) for future 
reference. My practice is to write the script and run it using the 'source()' 
function. What's weird is that the resultant figure is not readable by OS X 
10.7.0 (Lion). While trying to figure out what I did wrong, I discovered that 
typing the exact same code into the R prompt (running in Terminal) will produce 
the figure as I would expect it. The only idea I have is that something has 
changed in Lion that doesn't allow 'source()' to interpret it properly.

Any ideas? Here is the exact code I'm using:
   1 x<-read.delim("path/data.txt")
   2
   3 pdf("path/PaperFig-hist_of_perc_change-individ_samps-by_subtype.pdf")
   4 histogram(~Total.Change,data=x,xlab="Percent Change")
   5 dev.off()

I appreciate any help. I'm especially curious if there's a Lion user who could 
give this a try.

OS X 10.7.0 (Lion)
R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Which is more efficient?

2011-08-04 Thread Matt Curcio
Greetings all,
I am curious to know if either of these two sets of code is more efficient?

Example1:
 ## t-test ##
colA <- temp [ , j ]
colB <- temp [ , k ]
ttr <- t.test ( colA, colB, var.equal=TRUE)
tt_pvalue [ i ] <- ttr$p.value

or
Example2:
tt_pvalue [ i ] <- t.test ( temp[ , j ], temp[ , k ], var.equal=TRUE)
-
I have three loops, i, j, k.
One to test the all of  files in a directory.  One to tease out
column  and compare it by means of t-test to column  in each of
the files.
---
for ( i in 1:num_files ) {
   temp <- read.table ( files_to_test [ i ], header=TRUE, sep="\t")
   num_cols <- ncol ( temp )
   ## Define Columns To Compare ##
   for ( j in 2 : num_cols ) {
  for ( k in 3 : num_cols ) {
  ## t-test ##
  colA <- temp [ , j ]
  colB <- temp [ , k ]
  ttr <- t.test ( colA, colB, var.equal=TRUE)
  tt_pvalue [ i ] <- ttr$p.value
  }
   }
}

I am a novice writer of code and am interested to hear if there are
any (dis)advantages to one way or the other.
M


Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] source() or OS X Lion?

2011-08-04 Thread Dylan Beaudette
Save the result of histogram(), then print() the saved object.

On Aug 4, 2011 5:13 PM, "Mark Ebbert"  wrote:

Dear R Gurus,

I'm seeing some strange behavior that I can't explain. I'm generating a
figure for a paper and I like to save the script (no matter how simple) for
future reference. My practice is to write the script and run it using the
'source()' function. What's weird is that the resultant figure is not
readable by OS X 10.7.0 (Lion). While trying to figure out what I did wrong,
I discovered that typing the exact same code into the R prompt (running in
Terminal) will produce the figure as I would expect it. The only idea I have
is that something has changed in Lion that doesn't allow 'source()' to
interpret it properly.

Any ideas? Here is the exact code I'm using:
 1 x<-read.delim("path/data.txt")
 2
 3 pdf("path/PaperFig-hist_of_perc_change-individ_samps-by_subtype.pdf")
 4 histogram(~Total.Change,data=x,xlab="Percent Change")
 5 dev.off()

I appreciate any help. I'm especially curious if there's a Lion user who
could give this a try.

OS X 10.7.0 (Lion)
R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] source() or OS X Lion?

2011-08-04 Thread Duncan Murdoch

On 11-08-04 5:24 PM, Mark Ebbert wrote:

Dear R Gurus,

I'm seeing some strange behavior that I can't explain. I'm generating a figure 
for a paper and I like to save the script (no matter how simple) for future 
reference. My practice is to write the script and run it using the 'source()' 
function. What's weird is that the resultant figure is not readable by OS X 
10.7.0 (Lion). While trying to figure out what I did wrong, I discovered that 
typing the exact same code into the R prompt (running in Terminal) will produce 
the figure as I would expect it. The only idea I have is that something has 
changed in Lion that doesn't allow 'source()' to interpret it properly.

Any ideas? Here is the exact code I'm using:
   1 x<-read.delim("path/data.txt")
   2
   3 pdf("path/PaperFig-hist_of_perc_change-individ_samps-by_subtype.pdf")
   4 histogram(~Total.Change,data=x,xlab="Percent Change")
   5 dev.off()

I appreciate any help. I'm especially curious if there's a Lion user who could 
give this a try.


I don't have Lion, but it would be helpful to know what "not readable" 
means.  If you try to open the file in Preview, what happens?


Duncan Murdoch



OS X 10.7.0 (Lion)
R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Not really off-topic ...

2011-08-04 Thread Ted Harding
Greetings all!
Just arrived via my daily digest of the ALLSTAT list
is the following:

http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I

If you watch/listen, you will see/hear that it is not
at all off-topic.

And now to bed, before my correspondence is analysed.

Best wishes,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 05-Aug-11   Time: 01:44:07
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running a column loop through the Moran.I function.

2011-08-04 Thread R. Michael Weylandt
I'm on my phone so I can't verify this but have you tried the apply function?

apply(attri,2,Moran.I,Locate.dists.inv,NA.rm=T)

Michael Weylandt

On Aug 4, 2011, at 4:54 PM, Scott Seely  wrote:

> Dear R users,
> 
> I have two data frames that consist of statistical information for most
> countries around the world. One dataframe consists of the latitude and
> longitude ("coord.csv") of each country, while the other consists of 100's
> of different attributes ("countryattri.csv") for each country (like, GDP,
> Population, etc.). The data is organized with a header and then countries
> down the first column. I'm trying to run a spatial autocorrelation for each
> column or attribute in the "attri" dataframe. The process for running this
> on a single column is as follows:
> 
> example of "coord.csv" dataframe:
> 
> country, lat, long
> Albania, 41.00, 20.00
> Algeria, 28.00, 3.00
> Angola, -12.30, 18.30
> 
> example of "countryattri.csv" dataframe:
> 
> country, GDP, Population
> Albania, 10, 2
> Algeria, 120, 30
> Angola, 130, 30
> 
>>Locate<-read.csv("coord.csv", header = TRUE, sep = ",")
>>Locate.dists<-as.matrix(dist(cbind(Locate$long, Locate$lat)))
>>Locate.dists.inv<-1/((Locate.dists)^2)
>>diag(Locate.dists.inv)<-0
>>attri<-read.csv("countryattri.csv", header = TRUE, sep = ",")
>>Moran.I(attri$GDP, Locate.dists.inv, na.rm = TRUE)
> 
> This gives you Moran's I (correlation coefficient) for the GDP column in the
> "attri" dataframe. I am trying to run the Moran.I function as a loop for all
> columns in the "attri" dataframe so that I can obtain Moran coefficients for
> every attribute.
> 
> I've tried a number of different approaches but cannot get a column loop to
> run through this function. I would be very grateful if someone could help me
> with this problem.
> 
> Thank you,
> 
> *Scott Seely*
> p.s. the Moran.I function is a part of the "ape" package.
> 
>[[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] General indexing in multidimensional arrays

2011-08-04 Thread Jannis
Your function only works for the first dimensions (e.g. indices 
indicating the positions in the first two dimensions in datacube), correct?


Otherwise it looks very handy! And certainly more elegent than my 
function monster!


Jannis

On 08/04/2011 09:58 PM, R. Michael Weylandt wrote:

Hi Jannis,

Like Gene, I'm not entirely sure what your code is intended to do, but it
wasn't hard to adapt mine to at least print out the desired slice through
the higher-dimensional array:

cubeValues<- function(Insert, Destination, Location) {
 Location = as.matrix(Location)
 Location = array(Location,dim(Destination))
 if (is.null(Insert)) {
 Destination = round(Destination,3)
 Destination[!Location] = NA
 print(Destination)
 return(invisible())
 }
 Destination[Location]<- Insert
 return(Destination)
}

If Insert = NULL, it adopts a printing rather than value assigning behavior.


If you could indicate how you want the values when they come out, it's
pretty easy to adapt this to do whatever, but I can't just pull out
subarrays of arbitrary shape while keeping shape: e.g.,

x = matrix(1:4,2,2,byrow=T)
y = rbind(c(T,F),c(F,T))


is(x[y])

"vector"

If you just want the values in a vector, take this version of my code:

cubeValues<- function(Insert, Destination, Location) {
 Location = as.matrix(Location)
 Location = array(Location,dim(Destination))
 if (is.null(Insert)) {
 return(Destination[Location])
 }
 Destination[Location]<- Insert
 return(Destination)
}

It sounds like you've got what you want, but hopefully this will be of some
use to anyone who stumbles across this and, like Gene&  myself, doesn't
really get your code.

NB: I have not tested the provided code very much -- it relies on the
array() function to repeat Location as appropriate. If you know how R
repeats smaller arrays to make them fit big arrays, this should be fine for
you -- caveat code-or.

Michael Weylandt

PS -- The combinations() function of the gtools package might be of help to
you as well. We could get the entire example Gene got by

ans =  combinations(1:4,2,repeats.allowed=T)
rbind(cbind(ans,4),cbind(ans,1))

and it's probably not hard to simplify the entire code as desired.

On Thu, Aug 4, 2011 at 6:33 AM, Jannis  wrote:


Thanks, Michael. I was, however, after a function I coul use for both
extracting and replacing subarrays. In case anybody else stumbles over this
problem, here is my solution. Its programming is most probably horribly
clumsy:


ind.datacube = function(
##title<<  create logical index matrices for multidimensional datacubes
  datacube   ##<<  array: datacube from which to extract the subparts
  , logical.ind  ##<<  logical array: TRUE/FALSE index matrix for a subset of
the dimensions
 ##   of datacube. The size of logical.ind`s dimesnions has
to match the
 ##   sizes of the corresponding dimesnions in datacube.
  , dims='auto'  ##<<  integer vector or 'auto' : indices of the dimensions
in datacube corresponding
 ##   to the dimensions of logical.ind. If set to 'auto'
this matching is tried to
 ##   be accomplished by comparing the sizes of the
dimensions of the two objects.
)
{
if (sum(logical.ind) == 0) {
stop('No TRUE value in index matrix!')
} else {
if (dims == 'auto')
{
if (is.null(dim(logical.ind)[1])) {
size.ind = length(logical.ind)
logical.ind  = matrix(logical.ind,ncol=1)
} else {
size.ind = dim(logical.ind)
}
dims = match(size.ind, dim(datacube))
if (sum(duplicated(size.ind))>  0 || sum(duplicated(dims))>  0 )
stop('dimensions do not match unambigously. Supply dims
manually!')
}
dims.nonapply<- setdiff(1:length(dim(datacube)**),dims)
ind.matrix<- which(logical.ind, arr.ind = TRUE)

args.expand.grid<- list()
counter = 1
for (i in 1: length(dim(datacube)))
{
if (is.element(i,dims.nonapply)) {
args.expand.grid[[i]] = 1:dim(datacube)[dims.nonapply[**i]]
} else {
args.expand.grid[[i]] = ind.matrix[,counter]
counter = counter + 1
}
}

ind.all<- as.matrix(do.call(expand.grid, args.expand.grid))
ind.matrix<- ind.all[,order(c(dims.**nonapply,dims))]

}
##value<<  integer index matrix which can be used to index datacube
ind.matrix

}


On 08/04/2011 12:12 AM, R. Michael Weylandt
wrote:


  This might be a little late: but how about this (slightly clumsy)

function:

putValues<- function(Insert, Destination, Location) {
 Location = as.matrix(Location)
 Location = array(Location,dim(**Destination))
 Destination[Location]<- Insert
 return(Destination)
}

It currently assumes that the location array lines up in dimension order

[R] Running a column loop through the Moran.I function.

2011-08-04 Thread Scott Seely
Dear R users,

I have two data frames that consist of statistical information for most
countries around the world. One dataframe consists of the latitude and
longitude ("coord.csv") of each country, while the other consists of 100's
of different attributes ("countryattri.csv") for each country (like, GDP,
Population, etc.). The data is organized with a header and then countries
down the first column. I'm trying to run a spatial autocorrelation for each
column or attribute in the "attri" dataframe. The process for running this
on a single column is as follows:

example of "coord.csv" dataframe:

country, lat, long
Albania, 41.00, 20.00
Algeria, 28.00, 3.00
Angola, -12.30, 18.30

example of "countryattri.csv" dataframe:

country, GDP, Population
Albania, 10, 2
Algeria, 120, 30
Angola, 130, 30

> Locate<-read.csv("coord.csv", header = TRUE, sep = ",")
> Locate.dists<-as.matrix(dist(cbind(Locate$long, Locate$lat)))
> Locate.dists.inv<-1/((Locate.dists)^2)
> diag(Locate.dists.inv)<-0
> attri<-read.csv("countryattri.csv", header = TRUE, sep = ",")
> Moran.I(attri$GDP, Locate.dists.inv, na.rm = TRUE)

This gives you Moran's I (correlation coefficient) for the GDP column in the
"attri" dataframe. I am trying to run the Moran.I function as a loop for all
columns in the "attri" dataframe so that I can obtain Moran coefficients for
every attribute.

I've tried a number of different approaches but cannot get a column loop to
run through this function. I would be very grateful if someone could help me
with this problem.

Thank you,

*Scott Seely*
p.s. the Moran.I function is a part of the "ape" package.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] source() or OS X Lion?

2011-08-04 Thread Mark Ebbert
Dear R Gurus,

I'm seeing some strange behavior that I can't explain. I'm generating a figure 
for a paper and I like to save the script (no matter how simple) for future 
reference. My practice is to write the script and run it using the 'source()' 
function. What's weird is that the resultant figure is not readable by OS X 
10.7.0 (Lion). While trying to figure out what I did wrong, I discovered that 
typing the exact same code into the R prompt (running in Terminal) will produce 
the figure as I would expect it. The only idea I have is that something has 
changed in Lion that doesn't allow 'source()' to interpret it properly.

Any ideas? Here is the exact code I'm using:
  1 x<-read.delim("path/data.txt")
  2
  3 pdf("path/PaperFig-hist_of_perc_change-individ_samps-by_subtype.pdf")
  4 histogram(~Total.Change,data=x,xlab="Percent Change")
  5 dev.off()

I appreciate any help. I'm especially curious if there's a Lion user who could 
give this a try.

OS X 10.7.0 (Lion)
R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way of creating a shifted (lagged) variable?

2011-08-04 Thread Ken H
Hey all,
   Correct me if I'm wrong but, the 'stats' package has a lag() function
like so
lagged.series=lag(series,number of lags wanted)
Furthermore, I am pretty sure that lag( ) accepts negative lags:=>
leads.
   lag(x,1)=> object of one lag, lag(x,-1) object with one
lead.
   Hope this answers your question,
Ken

On Thu, Aug 4, 2011 at 4:19 PM, Dimitri Liakhovitski <
dimitri.liakhovit...@gmail.com> wrote:

> Thanks a lot for the recommendations - some of them I am implementing
> already.
>
> Just a clarification:
> the only reason I try to compare things to SPSS is that I am the only
> person in my office using R. Whenever I work on an R code my goal is
> not just to make it work, but also to "boast" to the SPSS users that
> it's much easier/faster/niftier in R. So, you are preaching to the
> choir here.
>
> Dimitri
>
>
> On Thu, Aug 4, 2011 at 4:02 PM, Joshua Wiley 
> wrote:
> >
> >
> > On Aug 4, 2011, at 11:46, Dimitri Liakhovitski <
> dimitri.liakhovit...@gmail.com> wrote:
> >
> >> Thanks a lot, guys!
> >> It's really helpful. But - to be objective- it's still quite a few
> >> lines longer than in SPSS.
> >
> > Not once you've sources the function!  For the simple case of a vector,
> try:
> >
> > X <- 1:10
> > mylag2 <- function(X, lag) {
> >  c(rep(NA, length(seq(lag))), X[-seq(lag)])
> > }
> >
> > Though this does not work for lead, it is fairly short. Then you could
> use the *apply family if you needed it on multiple columns or vectors.
> >
> > Cheers,
> >
> > Josh
> >
> >> Dimitri
> >>
> >> On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund <
> djnordl...@frontier.com> wrote:
> >>>
> >>>
>  -Original Message-
>  From: r-help-boun...@r-project.org [mailto:
> r-help-boun...@r-project.org]
>  On Behalf Of Dimitri Liakhovitski
>  Sent: Thursday, August 04, 2011 8:24 AM
>  To: r-help
>  Subject: [R] Efficient way of creating a shifted (lagged) variable?
> 
>  Hello!
> 
>  I have a data set:
>  set.seed(123)
>  y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-
>  31"),by="week"))
>  y$var1<-c(1,2,3,round(rnorm(54),1))
>  y$var2<-c(10,20,30,round(rnorm(54),1))
> 
>  # All I need is to create lagged variables for var1 and var2. I looked
>  around a bit and found several ways of doing it. They all seem quite
>  complicated - while in SPSS it's just a few letters (like LAG()). Here
>  is what I've written but I wonder. It works - but maybe there is a
>  very simple way of doing it in R that I could not find?
>  I need the same for "lead" (opposite of lag).
>  Any hint is greatly appreciated!
> 
>  ### The function I created:
>  mylag <- function(x,max.lag=1){   # x has to be a 1-column data frame
> temp<-
> 
> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
> for(i in 1:length(temp)){
>   names(temp)[i]<-paste(names(x),".lag",i,sep="")
>  }
>    return(temp)
>  }
> 
>  ### Running mylag to get my result:
>  myvars<-c("var1","var2")
>  for(i in myvars) {
>    y<-cbind(y,mylag(y[i]),max.lag=2)
>  }
>  (y)
> 
>  --
>  Dimitri Liakhovitski
>  marketfusionanalytics.com
> 
> >>>
> >>> Dimitri,
> >>>
> >>> I would first look into the zoo package as has already been suggested.
>  However, if you haven't already got your solution then here are a couple of
> functions that might help you get started.  I won't vouch for efficiency.
> >>>
> >>>
> >>> lag.fun <- function(df, x, max.lag=1) {
> >>>  for(i in x) {
> >>>for(j in 1:max.lag){
> >>>  lagx <- paste(i,'.lag',j,sep='')
> >>>  df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i])
> >>>}
> >>>  }
> >>>  df
> >>> }
> >>>
> >>> lead.fun <- function(df, x, max.lead=1) {
> >>>  for(i in x) {
> >>>for(j in 1:max.lead){
> >>>  leadx <- paste(i,'.lead',j,sep='')
> >>>  df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j))
> >>>}
> >>>  }
> >>>  df
> >>> }
> >>>
> >>> y <- lag.fun(y,myvars,2)
> >>> y <- lead.fun(y,myvars,2)
> >>>
> >>>
> >>> Hope this is helpful,
> >>>
> >>> Dan
> >>>
> >>> Daniel Nordlund
> >>> Bothell, WA USA
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Dimitri Liakhovitski
> >> marketfusionanalytics.com
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Dimitri Liakhovitski
> marketfusionanalytics.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducibl

Re: [R] functions on rows or columns of two (or more) arrays

2011-08-04 Thread Florent D.
The apply function also works with multi-dimensional arrays, I think
this is what you want to achieve using a 3d array:

aaa <- array(NA, dim = c(2, dim(a)))
aaa[1,,] <- a
aaa[2,,] <- a2
apply(aaa, 3, function(x)lm(x[1,]~x[2,]))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can glmnet handle models with numeric and categorical data?

2011-08-04 Thread Paul Smith
On Fri, Aug 5, 2011 at 12:02 AM, Marc Schwartz  wrote:
>> Can the x matrix in the glmnet() function of glmnet package be a
>> data.frame with numeric columns and factor columns? I am asking this
>> because I have a model with both numeric and categorical predictors,
>> which I would like to study with glmnet. I have already tried to use a
>> data.frame, but with no success -- as far as I know, the matrix object
>> can only have data of a single type. Is there some way of
>> circumventing this problem?
>
> My recollection is that you would use ?model.matrix on the data frame to 
> create the requisite matrix input for glmnet().
>
> The caution however, is that glmnet() standardizes the input covariates, 
> which is not appropriate for factors. Thus, you would want to set 
> 'standardize = FALSE' and use appropriate methods in pre-processing 
> continuous variables.

Again, Mark, thanks a lot for your so helpful answer -- I completely
ignored model.matrix().

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can glmnet handle models with numeric and categorical data?

2011-08-04 Thread Marc Schwartz
On Aug 4, 2011, at 5:41 PM, Paul Smith wrote:

> Dear All,
> 
> Can the x matrix in the glmnet() function of glmnet package be a
> data.frame with numeric columns and factor columns? I am asking this
> because I have a model with both numeric and categorical predictors,
> which I would like to study with glmnet. I have already tried to use a
> data.frame, but with no success -- as far as I know, the matrix object
> can only have data of a single type. Is there some way of
> circumventing this problem?
> 
> Thanks in advance,
> 
> Paul

Hi Paul,

My recollection is that you would use ?model.matrix on the data frame to create 
the requisite matrix input for glmnet().

The caution however, is that glmnet() standardizes the input covariates, which 
is not appropriate for factors. Thus, you would want to set 'standardize = 
FALSE' and use appropriate methods in pre-processing continuous variables.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can glmnet handle models with numeric and categorical data?

2011-08-04 Thread Paul Smith
Dear All,

Can the x matrix in the glmnet() function of glmnet package be a
data.frame with numeric columns and factor columns? I am asking this
because I have a model with both numeric and categorical predictors,
which I would like to study with glmnet. I have already tried to use a
data.frame, but with no success -- as far as I know, the matrix object
can only have data of a single type. Is there some way of
circumventing this problem?

Thanks in advance,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic creation of binary logistic models

2011-08-04 Thread Paul Smith
On Thu, Aug 4, 2011 at 9:35 PM, Marc Schwartz  wrote:
>> Suppose that you are trying to create a binary logistic model by
>> trying different combinations of predictors. Has R got an automatic
>> way of doing this, i.e., is there some way of automatically generating
>> different tentative models and checking their corresponding AIC value?
>> If so, could you please direct me to an example?
>
> Hi Paul,
>
> If it were not for JSS going on at the moment, you would likely get a reply 
> from Frank Harrell telling you why using this approach is not a good idea. 
> This is tantamount to using a stepwise approach with variables going in and 
> out of the model, based upon either AIC or perhaps Wald p values.
>
> If you search the R list archives using rseek.org with keywords such as 
> "stepwise regression Harrell", you will see a plethora of discussions on this 
> over the years.
>
> You might want to obtain a copy of Frank's book Regression Modeling 
> Strategies along with Ewout Steyerberg's book Clinical Prediction Models, 
> which cover this topic and offer alternative solutions to model development. 
> These generally include the pre-specification of full models, considering how 
> many covariate degrees of freedom you can reasonably include in the model and 
> applying shrinkage/penalization.
>
> If you need to engage in data reduction, you might want to consider using the 
> LASSO, as implemented in the glmnet package on CRAN. More information on this 
> method is available at: http://www-stat.stanford.edu/~tibs/lasso.html. An 
> alternative might be backward elimination, which Frank does touch on and 
> covers in:
>
>  http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/rms.pdf
>
> which is a supplement to his course.
>
> Automated creation of models ignores the expertise of both the statistician 
> and subject matter experts, to the detriment of inference.

Thanks, Marc, for your very useful reply.

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] General indexing in multidimensional arrays

2011-08-04 Thread R. Michael Weylandt
As currently implemented it works for both "Location" and "Destination" (I
really should think of some better variable names) of any dimensionality,
but it does assume that if say, you give a 2d location, those correspond to
the first two dimensions of destination.

But if you give 1d or 3d, e.g., for a 5d destination, it can fill out as
appropriate.

If you want to say give it the 2nd and 3rd dimensions, I'd suggest you put
an optional argument and command using the aperm() function immediately
after Location = array(Location,dim(Destination)).

e.g.,

cubeValue <- function(Insert, Destination, Location, permutation = NULL) {

if (!is.null(permutation)) {Location = aperm(Location, permutation)}

}

You might need to play with aperm to get a sense of how it "thinks" but that
should do it.

Alternatively, you can put in something like theseDims = c(2,3) and then
write a line or two that figures out the appropriate permutation from there.
In 3D, it's not hard but I'm not sure if there's a single obvious way to
generalize to higher dimensions: if I have some time to think about it I'll
get back to you.

# for 3D
permutation = c(theseDims, setdiff(1:3,theseDims))

Michael Weylandt


On Thu, Aug 4, 2011 at 5:54 PM, Jannis  wrote:

> Your function only works for the first dimensions (e.g. indices indicating
> the positions in the first two dimensions in datacube), correct?
>
> Otherwise it looks very handy! And certainly more elegent than my function
> monster!
>
> Jannis
>
>
> On 08/04/2011 09:58 PM, R. Michael Weylandt wrote:
>
>> Hi Jannis,
>>
>> Like Gene, I'm not entirely sure what your code is intended to do, but it
>> wasn't hard to adapt mine to at least print out the desired slice through
>> the higher-dimensional array:
>>
>> cubeValues<- function(Insert, Destination, Location) {
>> Location = as.matrix(Location)
>> Location = array(Location,dim(**Destination))
>> if (is.null(Insert)) {
>> Destination = round(Destination,3)
>> Destination[!Location] = NA
>> print(Destination)
>> return(invisible())
>> }
>> Destination[Location]<- Insert
>> return(Destination)
>> }
>>
>> If Insert = NULL, it adopts a printing rather than value assigning
>> behavior.
>>
>>
>> If you could indicate how you want the values when they come out, it's
>> pretty easy to adapt this to do whatever, but I can't just pull out
>> subarrays of arbitrary shape while keeping shape: e.g.,
>>
>> x = matrix(1:4,2,2,byrow=T)
>> y = rbind(c(T,F),c(F,T))
>>
>>  is(x[y])
>>>
>> "vector"
>>
>> If you just want the values in a vector, take this version of my code:
>>
>> cubeValues<- function(Insert, Destination, Location) {
>> Location = as.matrix(Location)
>> Location = array(Location,dim(**Destination))
>> if (is.null(Insert)) {
>> return(Destination[Location])
>> }
>> Destination[Location]<- Insert
>> return(Destination)
>> }
>>
>> It sounds like you've got what you want, but hopefully this will be of
>> some
>> use to anyone who stumbles across this and, like Gene&  myself, doesn't
>> really get your code.
>>
>> NB: I have not tested the provided code very much -- it relies on the
>> array() function to repeat Location as appropriate. If you know how R
>> repeats smaller arrays to make them fit big arrays, this should be fine
>> for
>> you -- caveat code-or.
>>
>> Michael Weylandt
>>
>> PS -- The combinations() function of the gtools package might be of help
>> to
>> you as well. We could get the entire example Gene got by
>>
>> ans =  combinations(1:4,2,repeats.**allowed=T)
>> rbind(cbind(ans,4),cbind(ans,**1))
>>
>> and it's probably not hard to simplify the entire code as desired.
>>
>> On Thu, Aug 4, 2011 at 6:33 AM, Jannis  wrote:
>>
>>  Thanks, Michael. I was, however, after a function I coul use for both
>>> extracting and replacing subarrays. In case anybody else stumbles over
>>> this
>>> problem, here is my solution. Its programming is most probably horribly
>>> clumsy:
>>>
>>>
>>> ind.datacube = function(
>>> ##title<<  create logical index matrices for multidimensional datacubes
>>>  datacube   ##<<  array: datacube from which to extract the subparts
>>>  , logical.ind  ##<<  logical array: TRUE/FALSE index matrix for a subset
>>> of
>>> the dimensions
>>> ##   of datacube. The size of logical.ind`s dimesnions
>>> has
>>> to match the
>>> ##   sizes of the corresponding dimesnions in datacube.
>>>  , dims='auto'  ##<<  integer vector or 'auto' : indices of the
>>> dimensions
>>> in datacube corresponding
>>> ##   to the dimensions of logical.ind. If set to 'auto'
>>> this matching is tried to
>>> ##   be accomplished by comparing the sizes of the
>>> dimensions of the two objects.
>>> )
>>> {
>>>if (sum(logical.ind) == 0) {
>>>stop('No TRUE value in index matrix!')
>>>} else {
>>>if (dims == 'auto')
>>>{
>>>if (i

Re: [R] matrix rows to single numeric element

2011-08-04 Thread R. Michael Weylandt
> This seems to work:
>
> apply(mat1,1,function(x){paste(x,collapse="")})
>
> The collapse command inside of paste is (I think) the easiest way to
> combine strings.
>
> Michael Weylandt
>
>
> On Thu, Aug 4, 2011 at 5:45 PM, Wegan, Michael (DNRE)  > wrote:
>
>> I have a matrix of 5 columns and 64 rows, let's call it "mat1".  All
>> values are 1 or 0.  I need to take the values of the elements by row and
>> create a single numeric element that can be placed its respective slot in a
>> 64-element list, named "list1".  For example, mat1[11,1:5] = 0,1,1,0,1 and I
>> must put 01101, with a length of 1, into the 11th element of list1.  I can
>> create the code for an iterative process, but am having a great deal of
>> difficulty in figuring out how to create a single element from the row
>> values, especially when the first value is 0, as in my example.
>>
>>
>> Thanks in advance.
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survival probability estimate method

2011-08-04 Thread array chip
Dear Peter,

Thanks very much for the references. It seems the method is based on parametric 
proportional hazard models by incorporating cubic spline of the baseline 
hazards, not sure if tweaking survreg() would do?

Best,

John


- Original Message -
From: Peter Jepsen 
To: 'array chip' ; r-help 
Cc: 
Sent: Thursday, August 4, 2011 1:09 PM
Subject: SV: [R] survival probability estimate method

Dear John

I am not aware of an R package that does this, but I believe that Patrick 
Royston's -stpm- function for Stata does. Here's two references found in 
http://www.stata-journal.com/sjpdf.html?articlenum=st0001_2: 
Royston, P. 2001. Flexible parametric alternatives to the Cox model. Stata 
Journal 1(1): 1-28.
Royston, P. and M. K. B. Parmar. 2002. Flexible parametric-hazards and 
proportional odds models for censored survival data, with application to 
prognostic modelling and estimation of treatment effects. Statistics in 
Medicine 21: 2175-2197.

Best regards,
Peter.

-Oprindelig meddelelse-
Fra: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] På 
vegne af array chip
Sendt: 4. august 2011 20:45
Til: r-help
Emne: [R] survival probability estimate method

Hi, I was reading a paper published in JCO "Prediction of risk of distant 
recurrence using 21-gene recurrence score in node-negative and node-positive 
postmenopausal patients with breast cancer treated with anastrozole or 
tamoxifen: a TransATAC study" (ICO 2010 28: 1829). The author uses a method to 
estimate the 9-year risk of distant recurrence as a function of continuous 
recurrence score (RS). The method is special as author states:
 
"To define the continuous relation between RS, as a linear covariate, and 
9-year risk of distant recurrence, the logarithm of the baseline cumulative 
hazard function was fitted by constrained cubic splines with 3 df. These models 
tend to be more robust for prediction of survival probabilities and 
corresponding confidence limits at late follow-up time as a result of the 
modeling of the baseline cumulative hazard function by natural cubic splines 
(in contrast to using the crude hazard function itself)."
 
Does R provide a package/function to do this particular method for estimating 
survival probability as a function of a continuous variable? Is the 
survest.cph() in rms package doing estimation with just the crude hazard 
function?
 
Thanks very much!
 
John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] matrix rows to single numeric element

2011-08-04 Thread Wegan, Michael (DNRE)
I have a matrix of 5 columns and 64 rows, let's call it "mat1".  All values are 
1 or 0.  I need to take the values of the elements by row and create a single 
numeric element that can be placed its respective slot in a 64-element list, 
named "list1".  For example, mat1[11,1:5] = 0,1,1,0,1 and I must put 01101, 
with a length of 1, into the 11th element of list1.  I can create the code for 
an iterative process, but am having a great deal of difficulty in figuring out 
how to create a single element from the row values, especially when the first 
value is 0, as in my example.


Thanks in advance.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multi-dimensional Gaussian quadrature

2011-08-04 Thread Lei Liu

Hi there,

Does anyone know if there is a package in R for multi-demensional 
Gaussian quadrature? I checked out the package "gaussquad" but it can 
only do 1D. Thanks!


Lei Liu
Associate Professor
Division of Biostatistics
Department of Public Health Sciences
University of Virginia School of Medicine

http://people.virginia.edu/~ll9f/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] functions on rows or columns of two (or more) arrays

2011-08-04 Thread R. Michael Weylandt
I hope someone experience with plyr package comes and helps because this
sounds like what it does well, but for your specific example something like
this works:

A = rbind(a,a2)
q = apply(A,2,function(x) {lm(x[1:nrow(a)] ~ x[-(1:nrow(a))])})

but yeah, that's pretty rough so I hope someone can come up with something
more elegant.

If nothing else, I think that idea can be made to work in most
circumstances: put it together, then break it apart inside the function
passed to apply.

Michael Weylandt

On Thu, Aug 4, 2011 at 5:17 PM, Jim Bouldin  wrote:

> I realize this should be simple, but even after reading over the several
> help pages several times, I still cannot decide between the myriad "apply"
> functions to address it.  I simply want to apply a function to all the rows
> (or columns) of the same index from two (or more) identically sized arrays
> (or data frames).
>
> For example:
>
> > a=matrix(1:50,nrow=10)
> > a2=floor(jitter(a,amount=50))
> > a
>  [,1] [,2] [,3] [,4] [,5]
>  [1,]1   11   21   31   41
>  [2,]2   12   22   32   42
>  [3,]3   13   23   33   43
>  [4,]4   14   24   34   44
>  [5,]5   15   25   35   45
>  [6,]6   16   26   36   46
>  [7,]7   17   27   37   47
>  [8,]8   18   28   38   48
>  [9,]9   19   29   39   49
> [10,]   10   20   30   40   50
> > a2
>  [,1] [,2] [,3] [,4] [,5]
>  [1,]   31 56 -29 -13 10
>  [2,]   38   61   71   559
>  [3,]  -29   38   47   12   38
>  [4,]   122   43   39   93
>  [5,]  -43   23  -23   621
>  [6,]  -13   61   55   112
>  [7,]  -421   38   128
>  [8,]  -13   -6  -18   16   95
>  [9,]  -19   -2   78   331
> [10,]   20 -16 -11 19 17
>
> if I try the following for example:
> apply(a,1,function(x) lm(a~a2))
>
> I get 10 identical repeats (except for the list indexer) of the following:
>
> [[1]]
>
> Call:
> lm(formula = a ~ a2)
>
> Coefficients:
> [,1]   [,2]   [,3]   [,4]   [,5]
> (Intercept)   8.372135  18.372135  28.372135  38.372135  48.372135
> a21  -0.006163  -0.006163  -0.006163  -0.006163  -0.006163
> a22  -0.093390  -0.093390  -0.093390  -0.093390  -0.093390
> a23   0.009315   0.009315   0.009315   0.009315   0.009315
> a24  -0.015143  -0.015143  -0.015143  -0.015143  -0.015143
> a25  -0.026761  -0.026761  -0.026761  -0.026761  -0.026761
>
> ...Which is clearly very wrong, in a number of ways.  If I try by columns:
> apply(a,2,function(x) lm(a~a2))
> ...I get exactly the same result.
>
> So, which is the appropriate apply-type function when two arrays (or
> d.f.'s?) are involved like this? Or none of them and some other approach
> (other than looping which I can do but which I assume is not optimal)?
> Thanks for any help.
> --
> Jim Bouldin, PhD
> Research Ecologist
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survival probability estimate method

2011-08-04 Thread array chip
Hi all, the reference for this method was:

“Flexible parametric proportional-hazards and proportional-odds models for 
censored survival data, with application to prognostic modeling and estimation 
of treatment effects” published in Stat Med (2002) 21: 2175
 
The abstract is:
 
Modelling of censored survival data is almost always done by Cox 
proportional-hazards regression. However, use of parametric models for such 
data may have some advantages. For example, non-proportional hazards, a 
potential difficulty with Cox models, may sometimes be handled in a simple way, 
and visualization of the hazard function is much easier. Extensions of the 
Weibull and log-logistic models are proposed in which natural cubic splines are 
used to smooth the baseline log cumulative hazard and log cumulative odds of 
failure functions. Further extensions to allow non-proportional effects of some 
or all of the covariates are introduced. A hypothesis test of the 
appropriateness of the scale chosen for covariate effects (such as of 
treatment) is proposed. The new models are applied to two data sets in cancer. 
The results throw interesting light on the behaviour of both the hazard 
function and the hazard ratio over time. The tools described here may be a
 step towards providing greater insight into the natural history of the disease 
and into possible underlying causes of clinical events. We illustrate these 
aspects by using the two examples in cancer.
 
Hope this helps someone give me some hints how to do this in R.
 
Thanks
 
John
 
- Original Message -
From: array chip 
To: r-help 
Cc: 
Sent: Thursday, August 4, 2011 11:44 AM
Subject: [R] survival probability estimate method

Hi, I was reading a paper published in JCO "Prediction of risk of distant 
recurrence using 21-gene recurrence score in node-negative and node-positive 
postmenopausal patients with breast cancer treated with anastrozole or 
tamoxifen: a TransATAC study" (ICO 2010 28: 1829). The author uses a method to 
estimate the 9-year risk of distant recurrence as a function of continuous 
recurrence score (RS). The method is special as author states:
 
"To define the continuous relation between RS, as a linear covariate, and 
9-year risk of distant recurrence, the logarithm of the baseline cumulative 
hazard function was fitted by constrained cubic splines with 3 df. These models 
tend to be more robust for prediction of survival probabilities and 
corresponding confidence limits at late follow-up time as a result of the 
modeling of the baseline cumulative hazard function by natural cubic splines 
(in contrast to using the crude hazard function itself)."
 
Does R provide a package/function to do this particular method for estimating 
survival probability as a function of a continuous variable? Is the 
survest.cph() in rms package doing estimation with just the crude hazard 
function?
 
Thanks very much!
 
John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] functions on rows or columns of two (or more) arrays

2011-08-04 Thread Jim Bouldin
I realize this should be simple, but even after reading over the several
help pages several times, I still cannot decide between the myriad "apply"
functions to address it.  I simply want to apply a function to all the rows
(or columns) of the same index from two (or more) identically sized arrays
(or data frames).

For example:

> a=matrix(1:50,nrow=10)
> a2=floor(jitter(a,amount=50))
> a
  [,1] [,2] [,3] [,4] [,5]
 [1,]1   11   21   31   41
 [2,]2   12   22   32   42
 [3,]3   13   23   33   43
 [4,]4   14   24   34   44
 [5,]5   15   25   35   45
 [6,]6   16   26   36   46
 [7,]7   17   27   37   47
 [8,]8   18   28   38   48
 [9,]9   19   29   39   49
[10,]   10   20   30   40   50
> a2
  [,1] [,2] [,3] [,4] [,5]
 [1,]   31   56  -29  -13   10
 [2,]   38   61   71   559
 [3,]  -29   38   47   12   38
 [4,]   122   43   39   93
 [5,]  -43   23  -23   621
 [6,]  -13   61   55   112
 [7,]  -421   38   128
 [8,]  -13   -6  -18   16   95
 [9,]  -19   -2   78   331
[10,]   20  -16  -11   19   17

if I try the following for example:
apply(a,1,function(x) lm(a~a2))

I get 10 identical repeats (except for the list indexer) of the following:

[[1]]

Call:
lm(formula = a ~ a2)

Coefficients:
 [,1]   [,2]   [,3]   [,4]   [,5]
(Intercept)   8.372135  18.372135  28.372135  38.372135  48.372135
a21  -0.006163  -0.006163  -0.006163  -0.006163  -0.006163
a22  -0.093390  -0.093390  -0.093390  -0.093390  -0.093390
a23   0.009315   0.009315   0.009315   0.009315   0.009315
a24  -0.015143  -0.015143  -0.015143  -0.015143  -0.015143
a25  -0.026761  -0.026761  -0.026761  -0.026761  -0.026761

...Which is clearly very wrong, in a number of ways.  If I try by columns:
apply(a,2,function(x) lm(a~a2))
...I get exactly the same result.

So, which is the appropriate apply-type function when two arrays (or
d.f.'s?) are involved like this? Or none of them and some other approach
(other than looping which I can do but which I assume is not optimal)?
Thanks for any help.
-- 
Jim Bouldin, PhD
Research Ecologist

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] General indexing in multidimensional arrays

2011-08-04 Thread Gene Leynes
This function you wrote is interesting, but I almost missed it since I
didn't see a direct example!

cubeValues(NULL, data, indices)
cubeValues(1, data, indices)
cubeValues(c(1,2), data, indices)

(sorry if I'm beating a dead horse here)

On Thu, Aug 4, 2011 at 2:58 PM, R. Michael Weylandt <
michael.weyla...@gmail.com> wrote:

> Hi Jannis,
>
> Like Gene, I'm not entirely sure what your code is intended to do, but it
> wasn't hard to adapt mine to at least print out the desired slice through
> the higher-dimensional array:
>
> cubeValues <- function(Insert, Destination, Location) {
> Location = as.matrix(Location)
>Location = array(Location,dim(Destination))
> if (is.null(Insert)) {
>Destination = round(Destination,3)
>Destination[!Location] = NA
>print(Destination)
>return(invisible())
> }
>Destination[Location] <- Insert
>return(Destination)
> }
>
> If Insert = NULL, it adopts a printing rather than value assigning
> behavior.
>
>
> If you could indicate how you want the values when they come out, it's
> pretty easy to adapt this to do whatever, but I can't just pull out
> subarrays of arbitrary shape while keeping shape: e.g.,
>
> x = matrix(1:4,2,2,byrow=T)
> y = rbind(c(T,F),c(F,T))
>
> > is(x[y])
> "vector"
>
> If you just want the values in a vector, take this version of my code:
>
> cubeValues <- function(Insert, Destination, Location) {
> Location = as.matrix(Location)
>Location = array(Location,dim(Destination))
> if (is.null(Insert)) {
>return(Destination[Location])
> }
>Destination[Location] <- Insert
>return(Destination)
> }
>
> It sounds like you've got what you want, but hopefully this will be of some
> use to anyone who stumbles across this and, like Gene & myself, doesn't
> really get your code.
>
> NB: I have not tested the provided code very much -- it relies on the
> array() function to repeat Location as appropriate. If you know how R
> repeats smaller arrays to make them fit big arrays, this should be fine for
> you -- caveat code-or.
>
> Michael Weylandt
>
> PS -- The combinations() function of the gtools package might be of help to
> you as well. We could get the entire example Gene got by
>
> ans =  combinations(1:4,2,repeats.allowed=T)
> rbind(cbind(ans,4),cbind(ans,1))
>
> and it's probably not hard to simplify the entire code as desired.
>
> On Thu, Aug 4, 2011 at 6:33 AM, Jannis  wrote:
>
> > Thanks, Michael. I was, however, after a function I coul use for both
> > extracting and replacing subarrays. In case anybody else stumbles over
> this
> > problem, here is my solution. Its programming is most probably horribly
> > clumsy:
> >
> >
> > ind.datacube = function(
> > ##title<< create logical index matrices for multidimensional datacubes
> >  datacube   ##<< array: datacube from which to extract the subparts
> >  , logical.ind  ##<< logical array: TRUE/FALSE index matrix for a subset
> of
> > the dimensions
> > ##   of datacube. The size of logical.ind`s dimesnions
> has
> > to match the
> > ##   sizes of the corresponding dimesnions in datacube.
> >  , dims='auto'  ##<< integer vector or 'auto' : indices of the dimensions
> > in datacube corresponding
> > ##   to the dimensions of logical.ind. If set to 'auto'
> > this matching is tried to
> > ##   be accomplished by comparing the sizes of the
> > dimensions of the two objects.
> > )
> > {
> >if (sum(logical.ind) == 0) {
> >stop('No TRUE value in index matrix!')
> >} else {
> >if (dims == 'auto')
> >{
> >if (is.null(dim(logical.ind)[1])) {
> >size.ind = length(logical.ind)
> >logical.ind  = matrix(logical.ind,ncol=1)
> >} else {
> >size.ind = dim(logical.ind)
> >}
> >dims = match(size.ind, dim(datacube))
> >if (sum(duplicated(size.ind)) > 0 || sum(duplicated(dims)) > 0
> )
> >stop('dimensions do not match unambigously. Supply dims
> > manually!')
> >}
> >dims.nonapply <- setdiff(1:length(dim(datacube)**),dims)
> >ind.matrix <- which(logical.ind, arr.ind = TRUE)
> >
> >args.expand.grid <- list()
> >counter = 1
> >for (i in 1: length(dim(datacube)))
> >{
> >if (is.element(i,dims.nonapply)) {
> >args.expand.grid[[i]] =
> 1:dim(datacube)[dims.nonapply[**i]]
> >} else {
> >args.expand.grid[[i]] = ind.matrix[,counter]
> >counter = counter + 1
> >}
> >}
> >
> >ind.all <- as.matrix(do.call(expand.grid, args.expand.grid))
> >ind.matrix <- ind.all[,order(c(dims.**nonapply,dims))]
> >
> >}
> >##value<< integer index matrix which can be used to index datacube
> >ind.matrix
> >
> > }
> >
> >
> > On 08/04/2011 12:12 AM, R. Michael Weylandt 
> > wrote:
> >
> >

[R] Multi-task SVM in R

2011-08-04 Thread Vishal Thapar
Hi,

Does anyone know of a package that implements Multi-task SVM in R? I found
one in Matlab :
http://www.ece.umn.edu/users/cherkass/predictive_learning/Resources/MTL_Software_Description.pdf

I was wondering if there is something similar in R?

Thank you for your help.

Sincerely,

Vishal

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple endpoint (possibly group sequential) sample size calculation

2011-08-04 Thread Marc Schwartz
On Aug 4, 2011, at 2:03 PM, Paul Miller wrote:

> 
> Hello everyone,
> 
> I need to do a sample size calculation. The study two arms and two endpoints. 
> The two arms are two different cancer drugs and the two endpoints reflect 
> efficacy (based on progression free survival) and toxicity. 
> 
> Until now, I have been trying to understand this in terms of a one-arm 
> design, where the acceptable rate of efficacy might be 0.40, the unacceptable 
> rate of efficacy might be 0.20, the acceptable rate of non-toxicity might be 
> 0.85, and the unacceptable rate of non-toxicity might be 0.65. Then one would 
> pick an alpha for the probability of accepting a poor response, an alpha for 
> the probability of accepting a toxic drug, and a beta for the probability of 
> rejecting a good drug.
> 
> I'm not really sure how that sort of thinking translates into a two-arm 
> design though.
> 
> Ideally, I'd like the calculation to be based on a group sequential design 
> with two stages, but that's certainly not necessary, and I'd be very happy to 
> learn how to do things both with and without this extra element.
> 
> Any help with this would be greatly appreciated.
> 
> Thanks,
> 
> Paul

Hi Paul,

I am not clear that there is a current R package that will handle both of these 
considerations, though will stand to be corrected if wrong.

A Google search using "sample size calculation multiple endpoints" yields some 
possible theoretical papers that might be helpful, such as:

  http://www.ncbi.nlm.nih.gov/pubmed/20687162
  http://www.ncbi.nlm.nih.gov/pubmed/17674404

I have used the gsDesign package on CRAN to assist with group sequential 
designs with a single primary endpoint. There was just a review in this month's 
The American Statistician which covered several software implementations, 
including gsDesign:

  http://pubs.amstat.org/doi/abs/10.1198/tast.2011.10213

There is also a CRAN Task View that might be helpful:

  http://cran.r-project.org/web/views/ClinicalTrials.html

My various books on group sequential designs are at home, so am unable to check 
them at the moment, but pretty sure that at least Jennison and Turnbull (1999) 
have a chapter on multiple endpoints.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to see the previous commands after save workspace/load workspace ?

2011-08-04 Thread R. Michael Weylandt
Check out ?savehistory in utils.

Also, I think most GUI's do this - I can certainly attest that RStudio does

Michael Weylandt

On Thu, Aug 4, 2011 at 11:55 AM, Bogdan Lataianu wrote:

> I did save workspace and when I load it, I can see the variables,
> using ls().
> But I cannot see the commands from the program I saved. How to do
> that?
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] General indexing in multidimensional arrays

2011-08-04 Thread Gene Leynes
(I was just about to send this before your last reply)

As a more general tip, I find that using %in% is easier to understand than
"is.element" and "setdiff" because it can be more universal, especially when
combined with "all" and the "!".
To me the functions like setdiff are easier to read, but harder to
interpret

Example:
x = letters[1:5]
y = letters[3:14]

is.element(x,y)  ## Similar to is.element(i,dims.nonapply)
x %in% y ## Same as i %in% dims.nonapply

all(x %in% y[1:2])
all(y[1:2] %in% x)

## Similar to setdiff(1:length(dim(datacube)),dims)
setdiff(x,y)
x[!x %in% y]


On Thu, Aug 4, 2011 at 5:33 AM, Jannis  wrote:

> Thanks, Michael. I was, however, after a function I coul use for both
> extracting and replacing subarrays. In case anybody else stumbles over this
> problem, here is my solution. Its programming is most probably horribly
> clumsy:
>
>
> ind.datacube = function(
> ##title<< create logical index matrices for multidimensional datacubes
>  datacube   ##<< array: datacube from which to extract the subparts
>  , logical.ind  ##<< logical array: TRUE/FALSE index matrix for a subset of
> the dimensions
> ##   of datacube. The size of logical.ind`s dimesnions has
> to match the
> ##   sizes of the corresponding dimesnions in datacube.
>  , dims='auto'  ##<< integer vector or 'auto' : indices of the dimensions
> in datacube corresponding
> ##   to the dimensions of logical.ind. If set to 'auto'
> this matching is tried to
> ##   be accomplished by comparing the sizes of the
> dimensions of the two objects.
> )
> {
>if (sum(logical.ind) == 0) {
>stop('No TRUE value in index matrix!')
>} else {
>if (dims == 'auto')
>{
>if (is.null(dim(logical.ind)[1])) {
>size.ind = length(logical.ind)
>logical.ind  = matrix(logical.ind,ncol=1)
>} else {
>size.ind = dim(logical.ind)
>}
>dims = match(size.ind, dim(datacube))
>if (sum(duplicated(size.ind)) > 0 || sum(duplicated(dims)) > 0 )
>stop('dimensions do not match unambigously. Supply dims
> manually!')
>}
>dims.nonapply <- setdiff(1:length(dim(datacube)**),dims)
>ind.matrix <- which(logical.ind, arr.ind = TRUE)
>
>args.expand.grid <- list()
>counter = 1
>for (i in 1: length(dim(datacube)))
>{
>if (is.element(i,dims.nonapply)) {
>args.expand.grid[[i]] = 1:dim(datacube)[dims.nonapply[**i]]
>} else {
>args.expand.grid[[i]] = ind.matrix[,counter]
>counter = counter + 1
>}
>}
>
>ind.all <- as.matrix(do.call(expand.grid, args.expand.grid))
>ind.matrix <- ind.all[,order(c(dims.**nonapply,dims))]
>
>}
>##value<< integer index matrix which can be used to index datacube
>ind.matrix
>
> }
>
>
> On 08/04/2011 12:12 AM, R. Michael Weylandt 
> wrote:
>
>> This might be a little late: but how about this (slightly clumsy)
>>> function:
>>>
>>> putValues<- function(Insert, Destination, Location) {
>>> Location = as.matrix(Location)
>>> Location = array(Location,dim(**Destination))
>>> Destination[Location]<- Insert
>>> return(Destination)
>>> }
>>>
>>> It currently assumes that the location array lines up in dimension order,
>>> but other than that seems to work pretty well. If you want, it shouldn't
>>> be
>>> hard to change it to take in a set of dimensions to arrange Location
>>> along.
>>> If you like any of the other suggested behaviors, you could put in a
>>> is.null(Insert) option that returns the desired subset of values. I
>>> haven't
>>> tested it completely, but for a few sample inputs, it seems be do as
>>> desired.
>>>
>>> Michael
>>>
>>>
>>> On Wed, Aug 3, 2011 at 5:00 PM, Jannis  wrote:
>>>
>>>  Thanks for all the replies!Unfortunately the solutions only work for
 extracting subsets of the data (which was exactly what I was asking for)
 and
 not to replace subsets with other values. I used them, however, to
 program a
 rather akward function to do that. Seems I found one of the few aspects
 where Matlab actually is slightly easier to use than R.


 Thanks for your help!
 Jannis

 On 08/01/2011 05:50 PM, Gene Leynes wrote:

  What do you think about this?
>
> apply(data, 3, '[', indices)
>
>
> On Mon, Aug 1, 2011 at 4:38 AM, Jannis   wrote:
>
>  Dear R community,
>
>>
>> I have a general question regarding indexing in multidiemensional
>> arrays.
>>
>> Imagine I have a three dimensional array and I only want to extract on
>> vector along a single dimension from it:
>>
>>
>> data<- array(rnorm(64),dim=c(4,4,4))
>>
>> result<- data[1,1,]
>>
>> If I want to extract more than one of these vectors, it w

[R] randomForest partial dependence plot variable names

2011-08-04 Thread Katharine Miller
Hello,

I am running randomForest models on a number of species.  I would like to be
able to automate the printing of dependence plots for the most important
variables in each model, but I am unable to figure out how to enter the
variable names into my code.  I had originally thought to extract them from
the $importance matrix after sorting by metric (e.g. %IncMSE), but the
importance matrix is n by 2 - containing only the data for each metric
(%IncMSE and IncNodePurity).  It is clearly linked to the variable names,
but I am unsure how to extract those names for use in scripting.  Any
assistance would be greatly appreciated as I am currently typing the
variable names into each partialPlot call for every model I run.and that
is taking a LONG time.

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Text annotation of a graph

2011-08-04 Thread Lisa
Dear All, 

I am trying to add some text annotation to a graph using matplot() as
follows:

  vars <- c("v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v8", "v10")
  id <- seq(5.000, 0.001, by = -0.001)
  sid <- c(4.997, 3.901, 2.339, 0.176, 0.151, 0.101, 0.076, 0.051, 0.026,
0.001)
  rn <- sample(seq(0, 0.6, by = 0.001), 5000, replace = T)
  matplot(rbind(rn, rep(0, length(rn))), rbind(id, id), xlim = c(0, 1),
  type = "l", lty = 1, lwd = 1, col = 1, xlab = "",
  ylab = "", axes = F)
  abline(v = 0, lty = 2)
  axis(1)
  mtext(side = 2, text = c(vars), at = sid, las = 2, line = 0.8)
  axis(3) 

But the text annotation can not be displayed correctly, i.e., some of them
stick together. 
Can anybody help me with this particular problem? Thanks in advance.

Lisa


--
View this message in context: 
http://r.789695.n4.nabble.com/Text-annotation-of-a-graph-tp3719775p3719775.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] General indexing in multidimensional arrays

2011-08-04 Thread Jannis
Thanks, Gene, for your hint! I indeed did not check any possible 
situation and my function was not returning what I intened it to return. 
This updated version, however, should. I am sure there are much easier 
ways (or ready made functions) to do the same.




ind.datacube = function(
##title<< create logical index matrices for multidimensional datacubes
  datacube   ##<< array: datacube from which to extract the subparts
  , logical.ind  ##<< logical array: TRUE/FALSE index matrix for a 
subset of the dimensions
 ##   of datacube. The size of logical.ind`s dimesnions 
has to match the

 ##   sizes of the corresponding dimensions in datacube.
  , dims='auto'  ##<< integer vector or 'auto' : indices of the 
dimensions in datacube corresponding
 ##   to the dimensions of logical.ind. If set to 
'auto' this matching is tried to
 ##   be accomplished by comparing the sizes of the 
dimensions of the two objects.

)
{
if (sum(logical.ind) == 0) {
stop('No TRUE value in index matrix!')
} else {
if (dims[1] == 'auto')
{
if (is.null(dim(logical.ind)[1])) {
size.ind = length(logical.ind)
logical.ind  = matrix(logical.ind,ncol=1)
} else {
size.ind = dim(logical.ind)
}
dims = match(size.ind, dim(datacube))
if (sum(duplicated(size.ind)) > 0 || sum(duplicated(dims)) 
> 0 )
stop('dimensions do not match unambigously. Supply dims 
manually!')

}
dims.all <- setdiff(1:length(dim(datacube)),dims)
ind.matrix.choice <- which(logical.ind, arr.ind = TRUE)

dims.all.expand <- list()
for (i in 1:length(dims.all))
   dims.all.expand[[i]] <- 1:dim(datacube)[dims.all[i]]
dims.all.grid <-  as.matrix(do.call(expand.grid,   
dims.all.expand))


expgrid.dims.all <- as.matrix(do.call(expand.grid, 
dims.all.expand))
dims.all.mat <-  
matrix(rep(dims.all.grid,times=2),ncol=length(dims.all))
ind.matrix.all <- 
cbind(ind.matrix.choice[rep(1:dim(ind.matrix.choice)[1],each=dim(dims.all.grid)[1]),] 
,

dims.all.mat)
ind.matrix.ord <- ind.matrix.all[,order(c(dims,dims.all))]
}
colnames(ind.matrix.ord) <- paste('dim',1:length(dim(datacube)),sep='')
##value<< integer index matrix which can be used to index datacube
ind.matrix.ord
}


data <- array(rnorm(64),dim=c(4,4,4))
indices <- matrix(FALSE,ncol=4,nrow=4)
indices[1,3] <- TRUE
indices[4,1] <- TRUE
#result <- data[indices,]
ind.datacube(data, indices, dims=c(1,2))




On 08/04/2011 09:20 PM, Gene Leynes wrote:

data<- array(rnorm(64),dim=c(4,4,4))


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic creation of binary logistic models

2011-08-04 Thread Marc Schwartz
On Aug 4, 2011, at 2:23 PM, Paul Smith wrote:

> Dear All,
> 
> Suppose that you are trying to create a binary logistic model by
> trying different combinations of predictors. Has R got an automatic
> way of doing this, i.e., is there some way of automatically generating
> different tentative models and checking their corresponding AIC value?
> If so, could you please direct me to an example?
> 
> Thanks in advance,
> 
> Paul

Hi Paul,

If it were not for JSS going on at the moment, you would likely get a reply 
from Frank Harrell telling you why using this approach is not a good idea. This 
is tantamount to using a stepwise approach with variables going in and out of 
the model, based upon either AIC or perhaps Wald p values.

If you search the R list archives using rseek.org with keywords such as 
"stepwise regression Harrell", you will see a plethora of discussions on this 
over the years.

You might want to obtain a copy of Frank's book Regression Modeling Strategies 
along with Ewout Steyerberg's book Clinical Prediction Models, which cover this 
topic and offer alternative solutions to model development. These generally 
include the pre-specification of full models, considering how many covariate 
degrees of freedom you can reasonably include in the model and applying 
shrinkage/penalization.

If you need to engage in data reduction, you might want to consider using the 
LASSO, as implemented in the glmnet package on CRAN. More information on this 
method is available at: http://www-stat.stanford.edu/~tibs/lasso.html. An 
alternative might be backward elimination, which Frank does touch on and covers 
in:

  http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/rms.pdf

which is a supplement to his course.

Automated creation of models ignores the expertise of both the statistician and 
subject matter experts, to the detriment of inference.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random value generation with added constraint.

2011-08-04 Thread Duncan Murdoch

On 04/08/2011 12:03 PM, Vijayan Padmanabhan wrote:

Hi
I am looking at generating a random dataset of say 100 values fitting in a
normal distribution of a given mean and SD, I am aware of rnorm
function. However i am trying to build into this function one added
constraint that all the random value generated should also obey the
constraint that they only take values between say X to X+25
How do i do this in R?


The easiest way is to use the inverse-CDF method to generate values.  
For example:


mu <- 50
sd <- 10
X <- 30
lower <- pnorm(X, mean=mu, sd=sd)
upper <- pnorm(X+25, mean=mu, sd=sd)
U <- runif(1000, lower, upper)
Y <- qnorm(U, mean=mu, sd=sd)


This will fail if you go too far out in the tail (e.g. trying mu=0, 
sd=1, X=30); for that you need to be more careful, and work with log 
probabilities, etc.


Duncan Murdoch



Any help would be highly appreciated,.
Thanks
Vijayan Padmanabhan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way of creating a shifted (lagged) variable?

2011-08-04 Thread Dimitri Liakhovitski
Thanks a lot for the recommendations - some of them I am implementing already.

Just a clarification:
the only reason I try to compare things to SPSS is that I am the only
person in my office using R. Whenever I work on an R code my goal is
not just to make it work, but also to "boast" to the SPSS users that
it's much easier/faster/niftier in R. So, you are preaching to the
choir here.

Dimitri


On Thu, Aug 4, 2011 at 4:02 PM, Joshua Wiley  wrote:
>
>
> On Aug 4, 2011, at 11:46, Dimitri Liakhovitski 
>  wrote:
>
>> Thanks a lot, guys!
>> It's really helpful. But - to be objective- it's still quite a few
>> lines longer than in SPSS.
>
> Not once you've sources the function!  For the simple case of a vector, try:
>
> X <- 1:10
> mylag2 <- function(X, lag) {
>  c(rep(NA, length(seq(lag))), X[-seq(lag)])
> }
>
> Though this does not work for lead, it is fairly short. Then you could use 
> the *apply family if you needed it on multiple columns or vectors.
>
> Cheers,
>
> Josh
>
>> Dimitri
>>
>> On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund  
>> wrote:
>>>
>>>
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Dimitri Liakhovitski
 Sent: Thursday, August 04, 2011 8:24 AM
 To: r-help
 Subject: [R] Efficient way of creating a shifted (lagged) variable?

 Hello!

 I have a data set:
 set.seed(123)
 y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-
 31"),by="week"))
 y$var1<-c(1,2,3,round(rnorm(54),1))
 y$var2<-c(10,20,30,round(rnorm(54),1))

 # All I need is to create lagged variables for var1 and var2. I looked
 around a bit and found several ways of doing it. They all seem quite
 complicated - while in SPSS it's just a few letters (like LAG()). Here
 is what I've written but I wonder. It works - but maybe there is a
 very simple way of doing it in R that I could not find?
 I need the same for "lead" (opposite of lag).
 Any hint is greatly appreciated!

 ### The function I created:
 mylag <- function(x,max.lag=1){   # x has to be a 1-column data frame
    temp<-
 as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
    for(i in 1:length(temp)){
      names(temp)[i]<-paste(names(x),".lag",i,sep="")
     }
   return(temp)
 }

 ### Running mylag to get my result:
 myvars<-c("var1","var2")
 for(i in myvars) {
   y<-cbind(y,mylag(y[i]),max.lag=2)
 }
 (y)

 --
 Dimitri Liakhovitski
 marketfusionanalytics.com

>>>
>>> Dimitri,
>>>
>>> I would first look into the zoo package as has already been suggested.  
>>> However, if you haven't already got your solution then here are a couple of 
>>> functions that might help you get started.  I won't vouch for efficiency.
>>>
>>>
>>> lag.fun <- function(df, x, max.lag=1) {
>>>  for(i in x) {
>>>    for(j in 1:max.lag){
>>>      lagx <- paste(i,'.lag',j,sep='')
>>>      df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i])
>>>    }
>>>  }
>>>  df
>>> }
>>>
>>> lead.fun <- function(df, x, max.lead=1) {
>>>  for(i in x) {
>>>    for(j in 1:max.lead){
>>>      leadx <- paste(i,'.lead',j,sep='')
>>>      df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j))
>>>    }
>>>  }
>>>  df
>>> }
>>>
>>> y <- lag.fun(y,myvars,2)
>>> y <- lead.fun(y,myvars,2)
>>>
>>>
>>> Hope this is helpful,
>>>
>>> Dan
>>>
>>> Daniel Nordlund
>>> Bothell, WA USA
>>>
>>>
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> marketfusionanalytics.com
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] random value generation with added constraint.

2011-08-04 Thread Vijayan Padmanabhan
Hi
I am looking at generating a random dataset of say 100 values fitting in a
normal distribution of a given mean and SD, I am aware of rnorm
function. However i am trying to build into this function one added
constraint that all the random value generated should also obey the
constraint that they only take values between say X to X+25
How do i do this in R?
Any help would be highly appreciated,.
Thanks
Vijayan Padmanabhan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic creation of binary logistic models

2011-08-04 Thread Paul Smith
Wonderful! Thanks, Jeremy.

Is bestglm() also able of trying nonlinear transformations of the
variables, say log(X1) for instance?

Paul



On Thu, Aug 4, 2011 at 8:28 PM, Jeremy Miles  wrote:
>
> Sounds like you want a best subsets regression, the bestglm() function,
> found in the bestglm() package will do the trick.
> Jeremy
>
> On 4 August 2011 12:23, Paul Smith  wrote:
>>
>> Dear All,
>>
>> Suppose that you are trying to create a binary logistic model by
>> trying different combinations of predictors. Has R got an automatic
>> way of doing this, i.e., is there some way of automatically generating
>> different tentative models and checking their corresponding AIC value?
>> If so, could you please direct me to an example?
>>
>> Thanks in advance,
>>
>> Paul
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Apply sensitivity library to dummy variables

2011-08-04 Thread Chih-Lin Chi
I am using SRC function in the sensitivity library. The independent
variables (X) are a few dummy variables and the dependent variable (y)
is numeric, ranged from 0,18 to 0.84. Example of independent variables
are red coded as [0,0,0,1], green [0,0,1,0], blue [0,1,0,0], and
yellow [1,0,0,0]. Error message occurs "Error in if (const(t,
min(1e-08, mean(t, na.rm = TRUE)/1e+06))) { : missing value where
TRUE/FALSE needed". Any suggestion?

If I choose not to code red [0,0,0], and code only three groups, green
[0,0,1], blue [0,1,0], and yellow [1,0,0]. SRC works. Do I interpret
the result like "compared to red, the SRC score for green is 0.8"?
Thanks for your help!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to see the previous commands after save workspace/load workspace ?

2011-08-04 Thread Bogdan Lataianu
I did save workspace and when I load it, I can see the variables,
using ls().
But I cannot see the commands from the program I saved. How to do
that?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave - landscape figure

2011-08-04 Thread Marc Schwartz

On Aug 4, 2011, at 2:40 PM, Eduardo M. A. M. Mendes wrote:

> Dear R-users
> 
> I am trying to understand how Sweave works by running some simple examples.  
> In the example I am working with there is a chunk where the R-commands 
> related to plotting a figure are placed.  When running R CMD Sweave … , 
> pdflatex the output is a portrait figure.  I wonder whether it would be 
> possible to change the orientation to landscape (not in the latex file but in 
> Rnw file).
> 
> Many thanks
> 
> Ed


You can use the lscape package by placing:

  \usepackage{lscape}

in your LaTeX preamble in the .Rnw file. Then use:

\begin{landscape}

  other code here

\end{landscape}

That way you can create a landscape oriented page within a document that might 
otherwise contain portrait orientation pages.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Limited number of principal components in PCA

2011-08-04 Thread William Armstrong
David and Josh,

Thank you for the suggestions.  I have attached a file ('q_values.txt') that
contains the values of the 'Q' variable.

David -- I am attempting an 'S' mode PCA, where the columns are actually the
cases (different stream gaging stations) and the rows are the variables (the
maximum flow at each station for a given year).  I think the format you are
referring to is 'R' mode, but I was under the impression that R (the
program, not the PCA mode) could handle the analyses in either format.  Am I
mistaken?

My first eigenvalue is:

> unrotated_pca_q$sdev[1]^2
[1] 17.77812

Does that value seem large enough to explain the reduction in principal
components from 65 to 54?

Also, the loadings on the first PC are not particularly high:

 > max(abs(unrotated_pca_q$rotation[1:84]))
[1] 0.1794776

Does that suggest that maybe the data are not very highly correlated?

Thank you both very much for your help.

Billy

http://r.789695.n4.nabble.com/file/n3719440/q_values.txt q_values.txt 

--
View this message in context: 
http://r.789695.n4.nabble.com/Limited-number-of-principal-components-in-PCA-tp3704956p3719440.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survival probability estimate method

2011-08-04 Thread Peter Jepsen
Dear John

I am not aware of an R package that does this, but I believe that Patrick 
Royston's -stpm- function for Stata does. Here's two references found in 
http://www.stata-journal.com/sjpdf.html?articlenum=st0001_2: 
Royston, P. 2001. Flexible parametric alternatives to the Cox model. Stata 
Journal 1(1): 1-28.
Royston, P. and M. K. B. Parmar. 2002. Flexible parametric-hazards and 
proportional odds models for censored survival data, with application to 
prognostic modelling and estimation of treatment effects. Statistics in 
Medicine 21: 2175-2197.

Best regards,
Peter.

-Oprindelig meddelelse-
Fra: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] På 
vegne af array chip
Sendt: 4. august 2011 20:45
Til: r-help
Emne: [R] survival probability estimate method

Hi, I was reading a paper published in JCO "Prediction of risk of distant 
recurrence using 21-gene recurrence score in node-negative and node-positive 
postmenopausal patients with breast cancer treated with anastrozole or 
tamoxifen: a TransATAC study" (ICO 2010 28: 1829). The author uses a method to 
estimate the 9-year risk of distant recurrence as a function of continuous 
recurrence score (RS). The method is special as author states:
 
"To define the continuous relation between RS, as a linear covariate, and 
9-year risk of distant recurrence, the logarithm of the baseline cumulative 
hazard function was fitted by constrained cubic splines with 3 df. These models 
tend to be more robust for prediction of survival probabilities and 
corresponding confidence limits at late follow-up time as a result of the 
modeling of the baseline cumulative hazard function by natural cubic splines 
(in contrast to using the crude hazard function itself)."
 
Does R provide a package/function to do this particular method for estimating 
survival probability as a function of a continuous variable? Is the 
survest.cph() in rms package doing estimation with just the crude hazard 
function?
 
Thanks very much!
 
John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] input equivalent of capture.output

2011-08-04 Thread Antonio Piccolboni
Hi,
I am looking for the equivalent of capture.output for stdin. Imagine I am
writing a test for a function that reads from stdin, say scan.

>1: foobar
2: Read 1 item
[1] "foobar"

(please don't reply that scan can also read from a file, that's not what I
am trying to test; let's say I found a bug that occurs only when reading
from stdin, fixed it and I want to add a test specifically for that case; in
reality I am dealing with a function that only reads from stdin)
Looking good, but we need to automate that. So what I would like to write is
an assertion such as

"foobar" == feed.input("foobar", scan, what = "character")

feed.input would evaluate scan(what="character") while redirecting stdin so
that it consists of the characters "foobar" followed by EOF.

I googled this to exhaustion, but couldn't find a solution. I also looked at
testing frameworks, while I am not familiar with any of them, I didn't see
anything obviously related to this problem. Any suggestions? Thanks


Antonio

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave - landscape figure

2011-08-04 Thread Duncan Murdoch

On 04/08/2011 3:40 PM, Eduardo M. A. M. Mendes wrote:

Dear R-users

I am trying to understand how Sweave works by running some simple examples.  In 
the example I am working with there is a chunk where the R-commands related to 
plotting a figure are placed.  When running R CMD Sweave … , pdflatex the 
output is a portrait figure.  I wonder whether it would be possible to change 
the orientation to landscape (not in the latex file but in Rnw file).



Sweave can change the height and width of the figure so it is more 
landscape-shaped (width > height) using options at the start of the chunk.


Rotating a figure is something LaTeX needs to do:  you would tell Sweave 
to produce the figure but not include it, then use \includegraphics{} 
with the right option to rotate it.


For example:

<>=
plot(rnorm(100))
@

\includegraphics[angle=90,width=0.8\textheight]{Myfig}

This is untested, and you'll need to consult a LaTeX reference for 
rotating the figure caption, etc.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way of creating a shifted (lagged) variable?

2011-08-04 Thread Joshua Wiley


On Aug 4, 2011, at 11:46, Dimitri Liakhovitski  
wrote:

> Thanks a lot, guys!
> It's really helpful. But - to be objective- it's still quite a few
> lines longer than in SPSS.

Not once you've sources the function!  For the simple case of a vector, try:

X <- 1:10
mylag2 <- function(X, lag) {
  c(rep(NA, length(seq(lag))), X[-seq(lag)])
}

Though this does not work for lead, it is fairly short. Then you could use the 
*apply family if you needed it on multiple columns or vectors.

Cheers,

Josh

> Dimitri
> 
> On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund  
> wrote:
>> 
>> 
>>> -Original Message-
>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>>> On Behalf Of Dimitri Liakhovitski
>>> Sent: Thursday, August 04, 2011 8:24 AM
>>> To: r-help
>>> Subject: [R] Efficient way of creating a shifted (lagged) variable?
>>> 
>>> Hello!
>>> 
>>> I have a data set:
>>> set.seed(123)
>>> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-
>>> 31"),by="week"))
>>> y$var1<-c(1,2,3,round(rnorm(54),1))
>>> y$var2<-c(10,20,30,round(rnorm(54),1))
>>> 
>>> # All I need is to create lagged variables for var1 and var2. I looked
>>> around a bit and found several ways of doing it. They all seem quite
>>> complicated - while in SPSS it's just a few letters (like LAG()). Here
>>> is what I've written but I wonder. It works - but maybe there is a
>>> very simple way of doing it in R that I could not find?
>>> I need the same for "lead" (opposite of lag).
>>> Any hint is greatly appreciated!
>>> 
>>> ### The function I created:
>>> mylag <- function(x,max.lag=1){   # x has to be a 1-column data frame
>>>temp<-
>>> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
>>>for(i in 1:length(temp)){
>>>  names(temp)[i]<-paste(names(x),".lag",i,sep="")
>>> }
>>>   return(temp)
>>> }
>>> 
>>> ### Running mylag to get my result:
>>> myvars<-c("var1","var2")
>>> for(i in myvars) {
>>>   y<-cbind(y,mylag(y[i]),max.lag=2)
>>> }
>>> (y)
>>> 
>>> --
>>> Dimitri Liakhovitski
>>> marketfusionanalytics.com
>>> 
>> 
>> Dimitri,
>> 
>> I would first look into the zoo package as has already been suggested.  
>> However, if you haven't already got your solution then here are a couple of 
>> functions that might help you get started.  I won't vouch for efficiency.
>> 
>> 
>> lag.fun <- function(df, x, max.lag=1) {
>>  for(i in x) {
>>for(j in 1:max.lag){
>>  lagx <- paste(i,'.lag',j,sep='')
>>  df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i])
>>}
>>  }
>>  df
>> }
>> 
>> lead.fun <- function(df, x, max.lead=1) {
>>  for(i in x) {
>>for(j in 1:max.lead){
>>  leadx <- paste(i,'.lead',j,sep='')
>>  df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j))
>>}
>>  }
>>  df
>> }
>> 
>> y <- lag.fun(y,myvars,2)
>> y <- lead.fun(y,myvars,2)
>> 
>> 
>> Hope this is helpful,
>> 
>> Dan
>> 
>> Daniel Nordlund
>> Bothell, WA USA
>> 
>> 
>> 
> 
> 
> 
> -- 
> Dimitri Liakhovitski
> marketfusionanalytics.com
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] General indexing in multidimensional arrays

2011-08-04 Thread R. Michael Weylandt
Hi Jannis,

Like Gene, I'm not entirely sure what your code is intended to do, but it
wasn't hard to adapt mine to at least print out the desired slice through
the higher-dimensional array:

cubeValues <- function(Insert, Destination, Location) {
Location = as.matrix(Location)
Location = array(Location,dim(Destination))
if (is.null(Insert)) {
Destination = round(Destination,3)
Destination[!Location] = NA
print(Destination)
return(invisible())
}
Destination[Location] <- Insert
return(Destination)
}

If Insert = NULL, it adopts a printing rather than value assigning behavior.


If you could indicate how you want the values when they come out, it's
pretty easy to adapt this to do whatever, but I can't just pull out
subarrays of arbitrary shape while keeping shape: e.g.,

x = matrix(1:4,2,2,byrow=T)
y = rbind(c(T,F),c(F,T))

> is(x[y])
"vector"

If you just want the values in a vector, take this version of my code:

cubeValues <- function(Insert, Destination, Location) {
Location = as.matrix(Location)
Location = array(Location,dim(Destination))
if (is.null(Insert)) {
return(Destination[Location])
}
Destination[Location] <- Insert
return(Destination)
}

It sounds like you've got what you want, but hopefully this will be of some
use to anyone who stumbles across this and, like Gene & myself, doesn't
really get your code.

NB: I have not tested the provided code very much -- it relies on the
array() function to repeat Location as appropriate. If you know how R
repeats smaller arrays to make them fit big arrays, this should be fine for
you -- caveat code-or.

Michael Weylandt

PS -- The combinations() function of the gtools package might be of help to
you as well. We could get the entire example Gene got by

ans =  combinations(1:4,2,repeats.allowed=T)
rbind(cbind(ans,4),cbind(ans,1))

and it's probably not hard to simplify the entire code as desired.

On Thu, Aug 4, 2011 at 6:33 AM, Jannis  wrote:

> Thanks, Michael. I was, however, after a function I coul use for both
> extracting and replacing subarrays. In case anybody else stumbles over this
> problem, here is my solution. Its programming is most probably horribly
> clumsy:
>
>
> ind.datacube = function(
> ##title<< create logical index matrices for multidimensional datacubes
>  datacube   ##<< array: datacube from which to extract the subparts
>  , logical.ind  ##<< logical array: TRUE/FALSE index matrix for a subset of
> the dimensions
> ##   of datacube. The size of logical.ind`s dimesnions has
> to match the
> ##   sizes of the corresponding dimesnions in datacube.
>  , dims='auto'  ##<< integer vector or 'auto' : indices of the dimensions
> in datacube corresponding
> ##   to the dimensions of logical.ind. If set to 'auto'
> this matching is tried to
> ##   be accomplished by comparing the sizes of the
> dimensions of the two objects.
> )
> {
>if (sum(logical.ind) == 0) {
>stop('No TRUE value in index matrix!')
>} else {
>if (dims == 'auto')
>{
>if (is.null(dim(logical.ind)[1])) {
>size.ind = length(logical.ind)
>logical.ind  = matrix(logical.ind,ncol=1)
>} else {
>size.ind = dim(logical.ind)
>}
>dims = match(size.ind, dim(datacube))
>if (sum(duplicated(size.ind)) > 0 || sum(duplicated(dims)) > 0 )
>stop('dimensions do not match unambigously. Supply dims
> manually!')
>}
>dims.nonapply <- setdiff(1:length(dim(datacube)**),dims)
>ind.matrix <- which(logical.ind, arr.ind = TRUE)
>
>args.expand.grid <- list()
>counter = 1
>for (i in 1: length(dim(datacube)))
>{
>if (is.element(i,dims.nonapply)) {
>args.expand.grid[[i]] = 1:dim(datacube)[dims.nonapply[**i]]
>} else {
>args.expand.grid[[i]] = ind.matrix[,counter]
>counter = counter + 1
>}
>}
>
>ind.all <- as.matrix(do.call(expand.grid, args.expand.grid))
>ind.matrix <- ind.all[,order(c(dims.**nonapply,dims))]
>
>}
>##value<< integer index matrix which can be used to index datacube
>ind.matrix
>
> }
>
>
> On 08/04/2011 12:12 AM, R. Michael Weylandt 
> wrote:
>
>>  This might be a little late: but how about this (slightly clumsy)
>>> function:
>>>
>>> putValues<- function(Insert, Destination, Location) {
>>> Location = as.matrix(Location)
>>> Location = array(Location,dim(**Destination))
>>> Destination[Location]<- Insert
>>> return(Destination)
>>> }
>>>
>>> It currently assumes that the location array lines up in dimension order,
>>> but other than that seems to work pretty well. If you want, it shouldn't
>>> be
>>> hard to change it to take in a set of dimensions to arrange Location
>>> along.
>>> If you like 

[R] Sweave - landscape figure

2011-08-04 Thread Eduardo M. A. M. Mendes
Dear R-users

I am trying to understand how Sweave works by running some simple examples.  In 
the example I am working with there is a chunk where the R-commands related to 
plotting a figure are placed.  When running R CMD Sweave … , pdflatex the 
output is a portrait figure.  I wonder whether it would be possible to change 
the orientation to landscape (not in the latex file but in Rnw file).

Many thanks

Ed

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic creation of binary logistic models

2011-08-04 Thread Jeremy Miles
Sounds like you want a best subsets regression, the bestglm() function,
found in the bestglm() package will do the trick.

Jeremy

On 4 August 2011 12:23, Paul Smith  wrote:

> Dear All,
>
> Suppose that you are trying to create a binary logistic model by
> trying different combinations of predictors. Has R got an automatic
> way of doing this, i.e., is there some way of automatically generating
> different tentative models and checking their corresponding AIC value?
> If so, could you please direct me to an example?
>
> Thanks in advance,
>
> Paul
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] labelling a stacked barchart (lattice)

2011-08-04 Thread Weidong Gu
Marc,

Try this one

barchart(data=dta, ~x, group=y,
stack=T,col=sort(brewer.pal(7,"Purples")), xlab="Percent",
box.width=.5, scales=list(tick.number=10),
panel=function(x,y,...){
panel.barchart(x,y,...)
panel.text(cumsum(x)-dta$x/2,y,labels=dta$x)
panel.text(cumsum(x)-dta$x/2,1.3,labels=as.character(dta$y))
})

Weidong Gu

On Wed, Aug 3, 2011 at 9:01 PM, M/K Zodet  wrote:
> All:
>
> Below is my code for creating a basic horizontal, stacked barchart.  I'd like 
> to label the plot in two ways:  1) place the x values in each piece and 2) 
> place the y values above each piece (angled).  I'm currently using lattice, 
> but I'm open to suggestions using ggplot2.
>
> Questions:
>
> 1.  Can this be done?...I assume yes.  So, what are the best 
> options/functions for doing this.
> 2.  Is there a way to alter the transparency of the bar fill with the brewer 
> palette?  I know I can alter this w/ heat.~, topo.~, cm.colors, etc.
>
> Thanks in advance.
>
> Marc
> Using R for Mac OS X GUI 1.40-devel Leopard build 64-bit
>
>
> dta <- data.frame(x=c(46.0, 14.7, 16.4, 15.8, 7.0), y=c("Back", "Neck", 
> "Extrem", "MuscSkel", "Oth"))
> dta
> barchart(data=dta, ~x, group=y, stack=T, col=sort(brewer.pal(7,"Purples")),
>         xlab="Percent", box.width=.5, scales=list(tick.number=10))
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Automatic creation of binary logistic models

2011-08-04 Thread Paul Smith
Dear All,

Suppose that you are trying to create a binary logistic model by
trying different combinations of predictors. Has R got an automatic
way of doing this, i.e., is there some way of automatically generating
different tentative models and checking their corresponding AIC value?
If so, could you please direct me to an example?

Thanks in advance,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way of creating a shifted (lagged) variable?

2011-08-04 Thread Daniel Nordlund


> -Original Message-
> From: Dimitri Liakhovitski [mailto:dimitri.liakhovit...@gmail.com]
> Sent: Thursday, August 04, 2011 11:47 AM
> To: Daniel Nordlund; r-help
> Subject: Re: [R] Efficient way of creating a shifted (lagged) variable?
> 
> Thanks a lot, guys!
> It's really helpful. But - to be objective- it's still quite a few
> lines longer than in SPSS.
> Dimitri
> 
> On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund 
> wrote:
> >
> >
> >> -Original Message-
> >> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org]
> >> On Behalf Of Dimitri Liakhovitski
> >> Sent: Thursday, August 04, 2011 8:24 AM
> >> To: r-help
> >> Subject: [R] Efficient way of creating a shifted (lagged) variable?
> >>
> >> Hello!
> >>
> >> I have a data set:
> >> set.seed(123)
> >> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-
> >> 31"),by="week"))
> >> y$var1<-c(1,2,3,round(rnorm(54),1))
> >> y$var2<-c(10,20,30,round(rnorm(54),1))
> >>
> >> # All I need is to create lagged variables for var1 and var2. I looked
> >> around a bit and found several ways of doing it. They all seem quite
> >> complicated - while in SPSS it's just a few letters (like LAG()). Here
> >> is what I've written but I wonder. It works - but maybe there is a
> >> very simple way of doing it in R that I could not find?
> >> I need the same for "lead" (opposite of lag).
> >> Any hint is greatly appreciated!
> >>
> >> ### The function I created:
> >> mylag <- function(x,max.lag=1){   # x has to be a 1-column data frame
> >>temp<-
> >>
> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
> >>for(i in 1:length(temp)){
> >>  names(temp)[i]<-paste(names(x),".lag",i,sep="")
> >> }
> >>   return(temp)
> >> }
> >>
> >> ### Running mylag to get my result:
> >> myvars<-c("var1","var2")
> >> for(i in myvars) {
> >>   y<-cbind(y,mylag(y[i]),max.lag=2)
> >> }
> >> (y)
> >>
> >> --
> >> Dimitri Liakhovitski
> >> marketfusionanalytics.com
> >>
> >
> > Dimitri,
> >
> > I would first look into the zoo package as has already been suggested.
>  However, if you haven't already got your solution then here are a couple
> of functions that might help you get started.  I won't vouch for
> efficiency.
> >
> >
> > lag.fun <- function(df, x, max.lag=1) {
> >  for(i in x) {
> >for(j in 1:max.lag){
> >  lagx <- paste(i,'.lag',j,sep='')
> >  df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i])
> >}
> >  }
> >  df
> > }
> >
> > lead.fun <- function(df, x, max.lead=1) {
> >  for(i in x) {
> >for(j in 1:max.lead){
> >  leadx <- paste(i,'.lead',j,sep='')
> >  df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j))
> >}
> >  }
> >  df
> > }
> >
> > y <- lag.fun(y,myvars,2)
> > y <- lead.fun(y,myvars,2)
> >
> >
> >
> >
> >
> 
> 

Dimitri,

I (and probably a lot of others on the list) don't know SPSS anymore.  I 
haven't used it in 30 years.  So, I don't know how you would use LAG() in SPSS 
to achieve what you want, and you didn't give us any example of how you would 
like to be able to use a lag function in your code.  Without at least some 
pseudo code demonstrating the simple usage you are looking for, it is hard to 
give you code that works the way you want.  That being said, you can always use 
SPSS.

Dan

Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] General indexing in multidimensional arrays

2011-08-04 Thread Gene Leynes
I'm just wondering
When I use your function on your data I get a result that is not very
intuitive.  Is this what you were trying to do?
(I took the  liberty of setting dims to 3 since 'auto', 1, and 2 didn't
work.


> data<- array(rnorm(64),dim=c(4,4,4))
> indices<- matrix(FALSE,ncol=4,nrow=4)
> indices[1,3]   <- TRUE
> indices[4,1]   <- TRUE
> #result <- data[indices,]
> ind.datacube(data, indices, dims=3)
  Var1 Var2 Var3
 [1,]114
 [2,]214
 [3,]314
 [4,]414
 [5,]124
 [6,]224
 [7,]324
 [8,]424
 [9,]134
[10,]234
[11,]334
[12,]434
[13,]144
[14,]244
[15,]344
[16,]444
[17,]111
[18,]211
[19,]311
[20,]411
[21,]121
[22,]221
[23,]321
[24,]421
[25,]131
[26,]231
[27,]331
[28,]431
[29,]141
[30,]241
[31,]341
[32,]441


On Thu, Aug 4, 2011 at 5:33 AM, Jannis  wrote:

> Thanks, Michael. I was, however, after a function I coul use for both
> extracting and replacing subarrays. In case anybody else stumbles over this
> problem, here is my solution. Its programming is most probably horribly
> clumsy:
>
>
> ind.datacube = function(
> ##title<< create logical index matrices for multidimensional datacubes
>  datacube   ##<< array: datacube from which to extract the subparts
>  , logical.ind  ##<< logical array: TRUE/FALSE index matrix for a subset of
> the dimensions
> ##   of datacube. The size of logical.ind`s dimesnions has
> to match the
> ##   sizes of the corresponding dimesnions in datacube.
>  , dims='auto'  ##<< integer vector or 'auto' : indices of the dimensions
> in datacube corresponding
> ##   to the dimensions of logical.ind. If set to 'auto'
> this matching is tried to
> ##   be accomplished by comparing the sizes of the
> dimensions of the two objects.
> )
> {
>if (sum(logical.ind) == 0) {
>stop('No TRUE value in index matrix!')
>} else {
>if (dims == 'auto')
>{
>if (is.null(dim(logical.ind)[1])) {
>size.ind = length(logical.ind)
>logical.ind  = matrix(logical.ind,ncol=1)
>} else {
>size.ind = dim(logical.ind)
>}
>dims = match(size.ind, dim(datacube))
>if (sum(duplicated(size.ind)) > 0 || sum(duplicated(dims)) > 0 )
>stop('dimensions do not match unambigously. Supply dims
> manually!')
>}
>dims.nonapply <- setdiff(1:length(dim(datacube)**),dims)
>ind.matrix <- which(logical.ind, arr.ind = TRUE)
>
>args.expand.grid <- list()
>counter = 1
>for (i in 1: length(dim(datacube)))
>{
>if (is.element(i,dims.nonapply)) {
>args.expand.grid[[i]] = 1:dim(datacube)[dims.nonapply[**i]]
>} else {
>args.expand.grid[[i]] = ind.matrix[,counter]
>counter = counter + 1
>}
>}
>
>ind.all <- as.matrix(do.call(expand.grid, args.expand.grid))
>ind.matrix <- ind.all[,order(c(dims.**nonapply,dims))]
>
>}
>##value<< integer index matrix which can be used to index datacube
>ind.matrix
>
> }
>
>
> On 08/04/2011 12:12 AM, R. Michael Weylandt 
> wrote:
>
>> This might be a little late: but how about this (slightly clumsy)
>>> function:
>>>
>>> putValues<- function(Insert, Destination, Location) {
>>> Location = as.matrix(Location)
>>> Location = array(Location,dim(**Destination))
>>> Destination[Location]<- Insert
>>> return(Destination)
>>> }
>>>
>>> It currently assumes that the location array lines up in dimension order,
>>> but other than that seems to work pretty well. If you want, it shouldn't
>>> be
>>> hard to change it to take in a set of dimensions to arrange Location
>>> along.
>>> If you like any of the other suggested behaviors, you could put in a
>>> is.null(Insert) option that returns the desired subset of values. I
>>> haven't
>>> tested it completely, but for a few sample inputs, it seems be do as
>>> desired.
>>>
>>> Michael
>>>
>>>
>>> On Wed, Aug 3, 2011 at 5:00 PM, Jannis  wrote:
>>>
>>>  Thanks for all the replies!Unfortunately the solutions only work for
 extracting subsets of the data (which was exactly what I was asking for)
 and
 not to replace subsets with other values. I used them, however, to
 program a
 rather akward function to do that. Seems I found one of the few aspects
 where Matlab actually is slightly easier to use than R.


 Thanks for your help!
 Jannis

 On 08/01/2011 05:50 PM, Gene Leynes wrote:

  What do you think about this?
>
>>

[R] Multiple endpoint (possibly group sequential) sample size calculation

2011-08-04 Thread Paul Miller

Hello everyone,

I need to do a sample size calculation. The study two arms and two endpoints. 
The two arms are two different cancer drugs and the two endpoints reflect 
efficacy (based on progression free survival) and toxicity. 

Until now, I have been trying to understand this in terms of a one-arm design, 
where the acceptable rate of efficacy might be 0.40, the unacceptable rate of 
efficacy might be 0.20, the acceptable rate of non-toxicity might be 0.85, and 
the unacceptable rate of non-toxicity might be 0.65. Then one would pick an 
alpha for the probability of accepting a poor response, an alpha for the 
probability of accepting a toxic drug, and a beta for the probability of 
rejecting a good drug.

I'm not really sure how that sort of thinking translates into a two-arm design 
though.

Ideally, I'd like the calculation to be based on a group sequential design with 
two stages, but that's certainly not necessary, and I'd be very happy to learn 
how to do things both with and without this extra element.

Any help with this would be greatly appreciated.

Thanks,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way of creating a shifted (lagged) variable?

2011-08-04 Thread R. Michael Weylandt
But quite a few thousand dollars cheaper - seems a fair trade to me!

Seriously though: your last few requests have been focused on time objects,
so I strongly suggest you take a look at xts/zoo (
http://cran.r-project.org/web/packages/xts/index.html) -- it will make time
objects much easier for you and lagging will be as simple as lag(x). They're
very good for things that are time indexed: for things where the dates
themselves are data points, you'll need to stick with the timeDate package
and data frames. In that case, if you load up your preferred flavour of the
provided lag functions, they will be just as brief as SPSS.

If you're not doing so already, I suggest you start putting together a .R
file of these sorts of little helper functions and source() it (from a fixed
location; not your wd) at the beginning of all your scripts. For all intents
and purposes, it becomes your own little starter package.

MW

PS -- Our "tech support" is also probably sassier than theirs too. :-)

On Thu, Aug 4, 2011 at 2:46 PM, Dimitri Liakhovitski <
dimitri.liakhovit...@gmail.com> wrote:

> Thanks a lot, guys!
> It's really helpful. But - to be objective- it's still quite a few
> lines longer than in SPSS.
> Dimitri
>
> On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund 
> wrote:
> >
> >
> >> -Original Message-
> >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org
> ]
> >> On Behalf Of Dimitri Liakhovitski
> >> Sent: Thursday, August 04, 2011 8:24 AM
> >> To: r-help
> >> Subject: [R] Efficient way of creating a shifted (lagged) variable?
> >>
> >> Hello!
> >>
> >> I have a data set:
> >> set.seed(123)
> >> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-
> >> 31"),by="week"))
> >> y$var1<-c(1,2,3,round(rnorm(54),1))
> >> y$var2<-c(10,20,30,round(rnorm(54),1))
> >>
> >> # All I need is to create lagged variables for var1 and var2. I looked
> >> around a bit and found several ways of doing it. They all seem quite
> >> complicated - while in SPSS it's just a few letters (like LAG()). Here
> >> is what I've written but I wonder. It works - but maybe there is a
> >> very simple way of doing it in R that I could not find?
> >> I need the same for "lead" (opposite of lag).
> >> Any hint is greatly appreciated!
> >>
> >> ### The function I created:
> >> mylag <- function(x,max.lag=1){   # x has to be a 1-column data frame
> >>temp<-
> >> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
> >>for(i in 1:length(temp)){
> >>  names(temp)[i]<-paste(names(x),".lag",i,sep="")
> >> }
> >>   return(temp)
> >> }
> >>
> >> ### Running mylag to get my result:
> >> myvars<-c("var1","var2")
> >> for(i in myvars) {
> >>   y<-cbind(y,mylag(y[i]),max.lag=2)
> >> }
> >> (y)
> >>
> >> --
> >> Dimitri Liakhovitski
> >> marketfusionanalytics.com
> >>
> >
> > Dimitri,
> >
> > I would first look into the zoo package as has already been suggested.
>  However, if you haven't already got your solution then here are a couple of
> functions that might help you get started.  I won't vouch for efficiency.
> >
> >
> > lag.fun <- function(df, x, max.lag=1) {
> >  for(i in x) {
> >for(j in 1:max.lag){
> >  lagx <- paste(i,'.lag',j,sep='')
> >  df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i])
> >}
> >  }
> >  df
> > }
> >
> > lead.fun <- function(df, x, max.lead=1) {
> >  for(i in x) {
> >for(j in 1:max.lead){
> >  leadx <- paste(i,'.lead',j,sep='')
> >  df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j))
> >}
> >  }
> >  df
> > }
> >
> > y <- lag.fun(y,myvars,2)
> > y <- lead.fun(y,myvars,2)
> >
> >
> > Hope this is helpful,
> >
> > Dan
> >
> > Daniel Nordlund
> > Bothell, WA USA
> >
> >
> >
>
>
>
> --
> Dimitri Liakhovitski
> marketfusionanalytics.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way of creating a shifted (lagged) variable?

2011-08-04 Thread Dimitri Liakhovitski
Thanks a lot, guys!
It's really helpful. But - to be objective- it's still quite a few
lines longer than in SPSS.
Dimitri

On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund  wrote:
>
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>> On Behalf Of Dimitri Liakhovitski
>> Sent: Thursday, August 04, 2011 8:24 AM
>> To: r-help
>> Subject: [R] Efficient way of creating a shifted (lagged) variable?
>>
>> Hello!
>>
>> I have a data set:
>> set.seed(123)
>> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-
>> 31"),by="week"))
>> y$var1<-c(1,2,3,round(rnorm(54),1))
>> y$var2<-c(10,20,30,round(rnorm(54),1))
>>
>> # All I need is to create lagged variables for var1 and var2. I looked
>> around a bit and found several ways of doing it. They all seem quite
>> complicated - while in SPSS it's just a few letters (like LAG()). Here
>> is what I've written but I wonder. It works - but maybe there is a
>> very simple way of doing it in R that I could not find?
>> I need the same for "lead" (opposite of lag).
>> Any hint is greatly appreciated!
>>
>> ### The function I created:
>> mylag <- function(x,max.lag=1){   # x has to be a 1-column data frame
>>    temp<-
>> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
>>    for(i in 1:length(temp)){
>>      names(temp)[i]<-paste(names(x),".lag",i,sep="")
>>     }
>>   return(temp)
>> }
>>
>> ### Running mylag to get my result:
>> myvars<-c("var1","var2")
>> for(i in myvars) {
>>   y<-cbind(y,mylag(y[i]),max.lag=2)
>> }
>> (y)
>>
>> --
>> Dimitri Liakhovitski
>> marketfusionanalytics.com
>>
>
> Dimitri,
>
> I would first look into the zoo package as has already been suggested.  
> However, if you haven't already got your solution then here are a couple of 
> functions that might help you get started.  I won't vouch for efficiency.
>
>
> lag.fun <- function(df, x, max.lag=1) {
>  for(i in x) {
>    for(j in 1:max.lag){
>      lagx <- paste(i,'.lag',j,sep='')
>      df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i])
>    }
>  }
>  df
> }
>
> lead.fun <- function(df, x, max.lead=1) {
>  for(i in x) {
>    for(j in 1:max.lead){
>      leadx <- paste(i,'.lead',j,sep='')
>      df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j))
>    }
>  }
>  df
> }
>
> y <- lag.fun(y,myvars,2)
> y <- lead.fun(y,myvars,2)
>
>
> Hope this is helpful,
>
> Dan
>
> Daniel Nordlund
> Bothell, WA USA
>
>
>



-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survival probability estimate method

2011-08-04 Thread array chip
Hi, I was reading a paper published in JCO "Prediction of risk of distant 
recurrence using 21-gene recurrence score in node-negative and node-positive 
postmenopausal patients with breast cancer treated with anastrozole or 
tamoxifen: a TransATAC study" (ICO 2010 28: 1829). The author uses a method to 
estimate the 9-year risk of distant recurrence as a function of continuous 
recurrence score (RS). The method is special as author states:
 
"To define the continuous relation between RS, as a linear covariate, and 
9-year risk of distant recurrence, the logarithm of the baseline cumulative 
hazard function was fitted by constrained cubic splines with 3 df. These models 
tend to be more robust for prediction of survival probabilities and 
corresponding confidence limits at late follow-up time as a result of the 
modeling of the baseline cumulative hazard function by natural cubic splines 
(in contrast to using the crude hazard function itself)."
 
Does R provide a package/function to do this particular method for estimating 
survival probability as a function of a continuous variable? Is the 
survest.cph() in rms package doing estimation with just the crude hazard 
function?
 
Thanks very much!
 
John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] identifying weeks (dates) that certain days (dates) fall into

2011-08-04 Thread Dimitri Liakhovitski
Michael, thanks a lot!
Really appreciate it - what wasn't hard for you would be for me!
Dimitri

On Thu, Aug 4, 2011 at 2:20 PM, R. Michael Weylandt
 wrote:
> You are getting 105 because the default behavior of findInterval  is such
> that v[N+1] := + Inf (as noted in ? findInterval); that is, the last
> interval is actually taken to stretch from the final element of sbwl.dates
> unto eternity. It shouldn't be hard to write a catch line to set that to
> whatever you want for it to go to.
>
> E.g.,
>
> myFindInterval <- function(x, vec, rightmost.closed = FALSE, all.inside =
> FALSE) {
>     FI = findInterval(x,vec,rightmost.closed, all.inside)
>     FI[FI==length(vec)] <- 0 # or Inf or whatever
>     return(FI)
> }
>
> Michael Weylandt
>
>
> On Thu, Aug 4, 2011 at 9:26 AM, Dimitri Liakhovitski
>  wrote:
>>
>> Sorry for renewing the topoic. I thought it worked but now I've run
>> into a little problem:
>>
>>  # My data frame with dates for week starts (Mondays)
>> y<-data.frame(week=seq(as.Date("2009-12-28"),
>> as.Date("2011-12-26"),by="week") )
>>
>> # I have a vector of super bowl dates (including the future one for 2012):
>>
>> sbwl.dates<-as.Date(c("2005-02-06","2006-02-05","2007-02-04","2008-02-03","2009-02-01","2010-02-07","2011-02-06","2012-02-05"))
>> I want to find the weeks in y that contain super bowl dates for
>> applicable years. I am trying:
>> sbwl.weeks<-findInterval(sbwl.dates, y$week)
>> sbwl.weeks<-sbwl.weeks[sbwl.weeks>0]
>> (sbwl.weeks)
>> > 6 58 105
>> y$flag<-0
>> y$flag[sbwl.weeks]<-1
>>
>> 6 and 58 are correct. But why am I getting 105 (the last row)?
>> Any way to fix it?
>> Thanks a lot!
>> Dimitri
>>
>>
>>
>> On Tue, Aug 2, 2011 at 12:57 PM, Dimitri Liakhovitski
>>  wrote:
>> > Thanks a lot, everyone!
>> > Dimitri
>> >
>> > On Tue, Aug 2, 2011 at 12:34 PM, Dennis Murphy 
>> > wrote:
>> >> Hi:
>> >>
>> >> You could try the lubridate package:
>> >>
>> >> library(lubridate)
>> >> week(weekly$week)
>> >> week(july4)
>> >> [1] 27 27
>> >>
>> >>> week
>> >> function (x)
>> >> yday(x)%/%7 + 1
>> >> 
>> >>
>> >> which is essentially Gabor's code :)
>> >>
>> >> HTH,
>> >> Dennis
>> >>
>> >> On Tue, Aug 2, 2011 at 7:36 AM, Dimitri Liakhovitski
>> >>  wrote:
>> >>> Hello!
>> >>>
>> >>> I have dates for the beginning of each week, e.g.:
>> >>> weekly<-data.frame(week=seq(as.Date("2010-04-01"),
>> >>> as.Date("2011-12-26"),by="week"))
>> >>> week  # each week starts on a Monday
>> >>>
>> >>> I also have a vector of dates I am interested in, e.g.:
>> >>> july4<-as.Date(c("2010-07-04","2011-07-04"))
>> >>>
>> >>> I would like to flag the weeks in my weekly$week that contain those 2
>> >>> individual dates.
>> >>> I can only think of a very clumsy way of doing it:
>> >>>
>> >>> myrows<-c(which(weekly$week==weekly$week[weekly$week>july4[1]][1]-7),
>> >>>        which(weekly$week==weekly$week[weekly$week>july4[2]][1]-7))
>> >>> weekly$flag<-0
>> >>> weekly$flag[myrows]<-1
>> >>>
>> >>> It's clumsy - because actually, my vector of dates of interest (july4
>> >>> above) is much longer.
>> >>> Is there maybe a more elegant way of doing it?
>> >>> Thank you!
>> >>> --
>> >>> Dimitri Liakhovitski
>> >>> marketfusionanalytics.com
>> >>>
>> >>> __
>> >>> R-help@r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide
>> >>> http://www.R-project.org/posting-guide.html
>> >>> and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Dimitri Liakhovitski
>> > marketfusionanalytics.com
>> >
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> marketfusionanalytics.com
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] identifying weeks (dates) that certain days (dates) fall into

2011-08-04 Thread R. Michael Weylandt
You are getting 105 because the default behavior of findInterval  is such
that v[N+1] := + Inf (as noted in ? findInterval); that is, the last
interval is actually taken to stretch from the final element of sbwl.dates
unto eternity. It shouldn't be hard to write a catch line to set that to
whatever you want for it to go to.

E.g.,

myFindInterval <- function(x, vec, rightmost.closed = FALSE, all.inside =
FALSE) {
FI = findInterval(x,vec,rightmost.closed, all.inside)
FI[FI==length(vec)] <- 0 # or Inf or whatever
return(FI)
}

Michael Weylandt


On Thu, Aug 4, 2011 at 9:26 AM, Dimitri Liakhovitski <
dimitri.liakhovit...@gmail.com> wrote:

> Sorry for renewing the topoic. I thought it worked but now I've run
> into a little problem:
>
>  # My data frame with dates for week starts (Mondays)
> y<-data.frame(week=seq(as.Date("2009-12-28"),
> as.Date("2011-12-26"),by="week") )
>
> # I have a vector of super bowl dates (including the future one for 2012):
>
> sbwl.dates<-as.Date(c("2005-02-06","2006-02-05","2007-02-04","2008-02-03","2009-02-01","2010-02-07","2011-02-06","2012-02-05"))
> I want to find the weeks in y that contain super bowl dates for
> applicable years. I am trying:
> sbwl.weeks<-findInterval(sbwl.dates, y$week)
> sbwl.weeks<-sbwl.weeks[sbwl.weeks>0]
> (sbwl.weeks)
> > 6 58 105
> y$flag<-0
> y$flag[sbwl.weeks]<-1
>
> 6 and 58 are correct. But why am I getting 105 (the last row)?
> Any way to fix it?
> Thanks a lot!
> Dimitri
>
>
>
> On Tue, Aug 2, 2011 at 12:57 PM, Dimitri Liakhovitski
>  wrote:
> > Thanks a lot, everyone!
> > Dimitri
> >
> > On Tue, Aug 2, 2011 at 12:34 PM, Dennis Murphy 
> wrote:
> >> Hi:
> >>
> >> You could try the lubridate package:
> >>
> >> library(lubridate)
> >> week(weekly$week)
> >> week(july4)
> >> [1] 27 27
> >>
> >>> week
> >> function (x)
> >> yday(x)%/%7 + 1
> >> 
> >>
> >> which is essentially Gabor's code :)
> >>
> >> HTH,
> >> Dennis
> >>
> >> On Tue, Aug 2, 2011 at 7:36 AM, Dimitri Liakhovitski
> >>  wrote:
> >>> Hello!
> >>>
> >>> I have dates for the beginning of each week, e.g.:
> >>> weekly<-data.frame(week=seq(as.Date("2010-04-01"),
> >>> as.Date("2011-12-26"),by="week"))
> >>> week  # each week starts on a Monday
> >>>
> >>> I also have a vector of dates I am interested in, e.g.:
> >>> july4<-as.Date(c("2010-07-04","2011-07-04"))
> >>>
> >>> I would like to flag the weeks in my weekly$week that contain those 2
> >>> individual dates.
> >>> I can only think of a very clumsy way of doing it:
> >>>
> >>> myrows<-c(which(weekly$week==weekly$week[weekly$week>july4[1]][1]-7),
> >>>which(weekly$week==weekly$week[weekly$week>july4[2]][1]-7))
> >>> weekly$flag<-0
> >>> weekly$flag[myrows]<-1
> >>>
> >>> It's clumsy - because actually, my vector of dates of interest (july4
> >>> above) is much longer.
> >>> Is there maybe a more elegant way of doing it?
> >>> Thank you!
> >>> --
> >>> Dimitri Liakhovitski
> >>> marketfusionanalytics.com
> >>>
> >>> __
> >>> R-help@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >
> >
> >
> > --
> > Dimitri Liakhovitski
> > marketfusionanalytics.com
> >
>
>
>
> --
> Dimitri Liakhovitski
> marketfusionanalytics.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] special recursive filter

2011-08-04 Thread R. Michael Weylandt
More than happy to help.

I think the term is "sparse" matrix.

I haven't totally wrapped my mind around why your version seems to work (and
testing it, it does seem to work for non-sparse matrices as well), but
still, if it makes sense to you, why bother with adding one to the sum and
the if statement? This should do exactly the same, only slightly quicker:

positions <- which(x == 1) # so I can skip the zeros in the loop
result <- numeric(length(x) + wait)
for(m in positions){ result[m + wait] <-
(-1)*(sum(result[m:(m+wait-1)])==0)  }
# This could probably be vectorized as well, but this is pretty fast already
result <- -result[-(1:wait)]

Michael Weylandt

PS -- It's good form to cc the entire list at each step so that it all gets
wisked away to the R archives for google-ability.

On Mon, Aug 1, 2011 at 8:06 AM, Konrad Schubert  wrote:

> Hi Michael,
>
> thank you very much for your quick responses!
>
> Your second answer is right - and much more faster and even simpler (Why I
> didn't catch the same way?) then my own one!
>
> Over the weekend I thought still about improvements. A slightly better one
> I would like to present. It is only better for spare vectors (Is it the
> right name for vectors with much more '0' then  other numeric entries?).
> Have a look:
>
>  positions <- which(x == 1) # so I can skip the zeros in the loop
>
>  result <- numeric(length(x) + wait)
>
>  for(m in positions){
>
>   ans <- sum(1, result[m:(m + wait - 1)])
>
>   if( ans == 1)
> result[m + wait] <- -1
>  }
>
>  result <- -result[-(1:wait)]
>
> In the end I'm happy with your solution!
> Thank you for your help!
> Thomas
>
>  Original-Nachricht 
> > Datum: Fri, 29 Jul 2011 12:25:10 -0400
> > Von: "R. Michael Weylandt " <
> michael.weyla...@gmail.com>
> > An: Konrad Schubert 
> > CC: R-help@r-project.org
> > Betreff: Re: [R] special recursive filter
>
> > Oh darn, I missed the recursive-ness entirely. You condition on the
> > filtered
> > series, not the signal itself.
> >
> > In that case, I have a solution which is pretty fast, but not
> particularly
> > R-esque.
> >
> > In effect your filter just says, take x but if you see a 1, sit out for
> > the
> > next wait periods. This seems prone to a repeat or while loop, and I
> don't
> > think can be much improved since you'll have to run the signal "in real
> > time" unless I'm missing a trick
> >
> > res = numeric(length(x))
> > i=1
> > while (i <= length(x) ) {
> >if (x[i] == 1) {res[i] =1; i = i+wait} # this improves speed somewhat
> > by
> > jumping over those spots you are going to keep = 0
> >i = i + 1 # no need to re-assign the default value of zero
> > }
> >
> > Again, unless there's a trick I'm missing, this seems about optimal since
> > it
> > runs slightly better than "real-time" through the signal.
> >
> > Sorry for my initial (wrong) remarks,
> >
> > Michael Weylandt
> >
> > On Fri, Jul 29, 2011 at 11:42 AM, R. Michael Weylandt <
> > michael.weyla...@gmail.com>  wrote:
> >
> > > I'm not sure I understand what your filter intends to do, but could
> this
> > > not be done more efficiently with logicals and the which? You also
> might
> > > need the cumsum() function and diff() with the optional lag argument
>  if
> > > I've misunderstood your filter.
> > >
> > > Specifically try this:
> > >
> > > res = c(x,rep(NA,wait)) # make a copy to work on and include the extra
> > NA
> > > which we might turn into zeros, but will drop later
> > > for (i in 1:wait) {res[which(x == 1) + i] <- 0}
> > > res = res[1:length(x)] # drop the extra added length.
> > >
> > > Michael Weylandt
> > >
> > >
> > >
> > > On Fri, Jul 29, 2011 at 11:16 AM, Konrad Schubert
> > wrote:
> > >
> > >> Hi,
> > >> I have a question about a special recursive filter problem.
> > >>
> > >> What I have:
> > >>
> > >> - given variables:
> > >>  x: time series with rather randomly occuring '0' and '1'
> > >>  wait: non negative integer
> > >>
> > >> - a working but ineffectiv implementation (see below)
> > >>
> > >> How the implementation works (what I want):
> > >>
> > >> The filter should drill holes of distance 'wait' between the '1' in x,
> > >> e.g.
> > >>
> > >> x = 1 0 1 1 0 1 0 1 0 1 0 1 1 1 1
> > >> wait = 2
> > >>
> > >> desired result:
> > >>
> > >> result = 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1
> > >>
> > >> working implementation:
> > >>
> > >>
> > >>
> >
> #*
> > >>  # basic informations
> > >>
> > >>
> >
> #*
> > >>
> > >>  # length of input vector
> > >>  lengthX <- length(x)
> > >>
> > >>  # stop times for the recursive filter indices
> > >>  stopS <- 1:lengthX + wait - 1
> > >>
> > >>  # initialize the result and the intermediate result vector
> > >>  # with additional length for recursive filtering
> > >>  result <- y <- numeric(lengthX + wait)
> > >>
> > >>
> > >>
> >
> #

[R] use of modMCMC

2011-08-04 Thread Paola Lecca
Dear all,



I used modFit of the package FME to fit a set of ODE to a ste of
experimental data.

The summary of this fit give me the following error

> summary(Fit)
Residual standard error: 984.1 on 452 degrees of freedom
Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix
In addition: Warning message:
In summary.modFit(Fit) : Cannot estimate covariance; system is singular


This is due becasue the Hessian matrix has all the entries equal to 0 in my
system.

In these cases, on the help page of modFit, it is suggested to use modMCMC
to generate new sets of parameters. modMCMC performs a Markov Chain Monte
Carlo simulation.

I do not understand very well how modMCMC can be used in a context of
parameter estimation. Could someone help me in understanding the use of this
function and its utility for parameter fitting?



Thank you very much in advance,

Paola.
-- 
*Paola Lecca, PhD*
*The Microsoft Research - University of Trento*
*Centre for Computational and Systems Biology*
*Piazza Manci 17 38123 Povo/Trento, Italy*
*Phome: +39 0461282843*
*Fax: +39 0461282814*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way of creating a shifted (lagged) variable?

2011-08-04 Thread R. Michael Weylandt
If you start looking at the time series classes (xts, zoo, etc) they have
very quick and flexible lag functions built in.

Might this be a slightly more efficient solution for the homebrew
implementation?

OurLag <- function(y, k=1) {
t = y[,1,drop=F]; d = y[,-1,drop=F];
if (is.matrix(y)) {rn = rownames(y); cn = colnames(y)} else {n =
names(y)}
if (k >= 1) {
 d = rbind(matrix(NA,ncol=ncol(d), nrow = k), d)
 d = d[1:nrow(t),]
} else {d = rbind(d[1:(nrow(d)+k),],matrix(NA,ncol=ncol(d), nrow = -k))}
ans = data.frame(t,d)
   if (is.matrix(y)) {rownames(ans) = rn; colnames(ans) = cn} else
{names(ans) = n}
   return(ans)
}

Set k<0 for lead instead of lag.

Michael Weylandt

On Thu, Aug 4, 2011 at 11:24 AM, Dimitri Liakhovitski <
dimitri.liakhovit...@gmail.com> wrote:

> Hello!
>
> I have a data set:
> set.seed(123)
> y<-data.frame(week=seq(as.Date("2010-01-03"),
> as.Date("2011-01-31"),by="week"))
> y$var1<-c(1,2,3,round(rnorm(54),1))
> y$var2<-c(10,20,30,round(rnorm(54),1))
>
> # All I need is to create lagged variables for var1 and var2. I looked
> around a bit and found several ways of doing it. They all seem quite
> complicated - while in SPSS it's just a few letters (like LAG()). Here
> is what I've written but I wonder. It works - but maybe there is a
> very simple way of doing it in R that I could not find?
> I need the same for "lead" (opposite of lag).
> Any hint is greatly appreciated!
>
> ### The function I created:
> mylag <- function(x,max.lag=1){   # x has to be a 1-column data frame
>
> temp<-as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
>   for(i in 1:length(temp)){
> names(temp)[i]<-paste(names(x),".lag",i,sep="")
>}
>  return(temp)
> }
>
> ### Running mylag to get my result:
> myvars<-c("var1","var2")
> for(i in myvars) {
>  y<-cbind(y,mylag(y[i]),max.lag=2)
> }
> (y)
>
> --
> Dimitri Liakhovitski
> marketfusionanalytics.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error bar plot with log scale in lattice

2011-08-04 Thread Xiao Hu
Hi, all,
I'm trying to modify the code to a log scale for y-axis from the post
http://tolstoy.newcastle.edu.au/R/help/06/06/28612.html

However, the error bar did not change accordingly.  The following is the code I 
used based on the singer.ucl data.

Thanks in advance!

prepanel.ci <- function(x, y, ly, uy, subscripts, ...)
{
y <- as.numeric(y)
ly <- as.numeric(ly[subscripts])
uy <- as.numeric(uy[subscripts])
list(ylim = range(y, uy, ly, finite = TRUE),y=list(log=10))
}

panel.ci <- function(x, y, ly, uy, subscripts, pch = 16, col.line = 'black', 
...)
{
x <- as.numeric(x)
y <- as.numeric(y)
ly <- as.numeric(ly[subscripts])
uy <- as.numeric(uy[subscripts])
 list(ylim = range(y, uy, ly, finite = TRUE),y=list(log=10))

panel.arrows(x, ly, x, uy, col = col.line,
 length = 0.25, unit = "native",
 angle = 90, code = 3,log="y")
panel.xyplot(x, y, pch = pch, col.line = col.line, log="y",...) }


xyplot(median ~ voice,
groups=range,
data=singer.ucl,
ly = singer.ucl$lower,
uy = singer.ucl$upper,
prepanel = prepanel.ci,
panel = panel.superpose,
panel.groups = panel.ci,
type="b",
scales=list(y=list(log=10)))

Best regards,
Shelley

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Efficient way of creating a shifted (lagged) variable?

2011-08-04 Thread Dimitri Liakhovitski
Hello!

I have a data set:
set.seed(123)
y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-31"),by="week"))
y$var1<-c(1,2,3,round(rnorm(54),1))
y$var2<-c(10,20,30,round(rnorm(54),1))

# All I need is to create lagged variables for var1 and var2. I looked
around a bit and found several ways of doing it. They all seem quite
complicated - while in SPSS it's just a few letters (like LAG()). Here
is what I've written but I wonder. It works - but maybe there is a
very simple way of doing it in R that I could not find?
I need the same for "lead" (opposite of lag).
Any hint is greatly appreciated!

### The function I created:
mylag <- function(x,max.lag=1){   # x has to be a 1-column data frame
   
temp<-as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
   for(i in 1:length(temp)){
 names(temp)[i]<-paste(names(x),".lag",i,sep="")
}
  return(temp)
}

### Running mylag to get my result:
myvars<-c("var1","var2")
for(i in myvars) {
  y<-cbind(y,mylag(y[i]),max.lag=2)
}
(y)

-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] use of modMCMC

2011-08-04 Thread Paola Lecca
Dear all,

I used modFit of the package FME to fit a set of ODE to a ste of eperiemntal
data.

The summary of this fit give me the following error

> summary(Fit)

Residual standard error: 984.1 on 452 degrees of freedom
Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix
In addition: Warning message:
In summary.modFit(Fit) : Cannot estimate covariance; system is singular

This is due becasue the Hessian matrix has all the entries equal to 0.

In these cases, on the help page of modFit, it is suggested to use modMCMC
to generate new sets of parameters. modMCMC performs a Markov Chain Monte
Carlo simulation.

I do not understand very well how modMCMC can be used in a context of
parameter estimation. Could someone help me in understanding the use of this
function and its utility for parameter fiting?

Thank you very much in advance,
Paola.
-- 
*Paola Lecca, PhD*
*The Microsoft Research - University of Trento*
*Centre for Computational and Systems Biology*
*Piazza Manci 17 38123 Povo/Trento, Italy*
*Phome: +39 0461282843*
*Fax: +39 0461282814*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible bug of QR decomposition in package Matrix

2011-08-04 Thread C6H5NO2
Thank you, Martin.

Yes, on my computer (48Gb memory) it also gives a message "caught
segfault" as you got. So it is because the matrix size is too large.

I apologize for the offence I've caused.

2011/8/4 Martin Maechler :
>> C6H5NO2  
>>     on Thu, 4 Aug 2011 16:03:54 +0800 writes:
>
>    > Thank you very much, Josh!
>    > As you suggested, I will contact the developers of "Matrix".
>
>    > PS, C6 are just initial characters of my email account :-)
>
>    > Best wishes,
>    > C6
>
> well, as the posting guide (http://www.R-project.org/posting-guide.html)
> says, this is regarded as impolite by many,
> and if I wasn't one of the Matrix package authors,
> I would not spend time helping  'C6'  either.
>
>    > 2011/8/4 Joshua Wiley :
>    >> Hi C6 (were C1 - 5 already taken in your family?),
>    >>
>    >> I downloaded your data and can replicate your problem.  R
>    >> ceases responding and terminates.  This does not occur with all
>    >> uses of qr on a dgCMatrix object.  I know nothing about sparse
>    >> matrices, but if you believe this should not be occurring, you
>    >> should contact the package maintainers.  Here is my
>    >> sessionInfo() (FYI, it would probably be helpful to report
>    >> yours also in case the issue is version dependent):
>    >>
>    >> R Under development (unstable) (2011-07-30 r56564) Platform:
>    >> x86_64-pc-mingw32/x64 (64-bit)
>    >>
>    >> locale: [1] LC_COLLATE=English_United States.1252 [2]
>    >> LC_CTYPE=English_United States.1252 [3]
>    >> LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5]
>    >> LC_TIME=English_United States.1252
>    >>
>    >> attached base packages: [1] stats     graphics  grDevices utils
>    >>     datasets  methods   base
>    >>
>    >> other attached packages: [1] Matrix_0.999375-50 lattice_0.19-30
>    >>
>    >> loaded via a namespace (and not attached): [1] grid_2.14.0
>    >>  tools_2.14.0
>    >>
>    >> Cheers,
>    >>
>    >> Josh
>    >>
>    >> On Wed, Aug 3, 2011 at 4:26 PM, C6H5NO2 
>    >> wrote:
>    >>> Hello R users,
>    >>>
>    >>> I am trying to give the QR decomposition for a large sparse
>    >>> matrix in the format of dgCMatrix. When I run qr function for
>    >>> this matrix, the R session simply stops and exits to the
>    >>> shell.  The matrix is of size 108595x108595, and it has
>    >>> 4866885 non-zeros. I did the experiment on windows 7 and linux
>    >>> mint 11 (both 64 bit), and the results are the same.
>    >>>
>    >>> I have uploaded my data file to
>    >>> http://ifile.it/elf2p6z/A.RData . The file is 10.681 MB and I
>    >>> hope someone could kindly download it.  The code to see my
>    >>> problem is:
>
>    >>>  library(Matrix)
>    >>>  load("A.RData")
>    >>>  B <- qr(A)
>
>
>    >>> Best wishes, C6
>
> And what's the size of RAM your two computers have ??
> The answer is of quite some importance.
>
> Short answer: If you have a large very sparse matrix,
> you don't know if the QR decomposition of that matrix is also
> very sparse... and if it ain't it will blow up memory,
> and that's what I'm pretty sure happened with you.
>
> What I don't see is why R "simply stops" for you and does not
> through a an error message about insufficient memory.
> As I show below, I do get a seg.fault
> --- which may be considered a bug ---
> *BUT* I do get the message about memory problems.
> Did you really *not* get any such message?
> Is it because you've used a GUI that hides such valuable
> information from the user?
>
> Here's the more detailed reason / analysis about why the above
> "kills R". This is commented R code,
> you can cut paste after you've got 'A' :
>
> str(A)
> ## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> ##   ..@ i       : int [1:4866885] 0 1 2 16 32 33 2392 2417 0 1 ...
> ##   ..@ p       : int [1:108596] 0 8 21 35 44 51 59 63 69 78 ...
> ##   ..@ Dim     : int [1:2] 108595 108595
> ##   ..@ Dimnames:List of 2
> ##   .. ..$ : NULL
> ##   .. ..$ : NULL
> ##   ..@ x       : num [1:4866885] 140.03 14.79 14.79 1.78 1.78 ...
> ##   ..@ factors : list()
>
> system.time(# the following is still not as fast as it could be):
> isSymmetric(A) # yes !
> )# 1.13 {the *2nd* time; 1.9 the 1st time !}
>
> ## First work with a  submatrix:
> n <- 1
> A1 <- A[1:n, 1:n]
> system.time(
>            qr1 <- qr(A1))# on cmath-8, machine with 48 GB  RAM memory
> ##   user  system elapsed
> ## 59.884   0.316  60.240  !!
>
> str(qr1)
> ## Formal class 'sparseQR' [package "Matrix"] with 6 slots
> ##   ..@ V   :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> ##   .. .. ..@ i       : int [1:7692948] 0 1 2 3 4 3 4 5 4 5 ...
> ##   .. .. ..@ p       : int [1:10001] 0 1 2 5 8 10 11 12 18 26 ...
> ##   .. .. ..@ Dim     : int [1:2] 1 1
> ##   .. .. ..@ Dimnames:List of 2
> ##   .. .. .. ..$ : NULL
> ##   .. .. .. ..$ : NULL
> ##   .. .. ..@ x       : num [1:7692948] 1 1 -3.71 8.68 -8.68 ...
> ##   .. .. ..@ factors : list()
> ##   ..@ beta: num [1

Re: [R] Plotting just a portion of a smoother graph in ggplot2

2011-08-04 Thread Timothy Bates
It's not immediately obvious

You need to look at coord_cartesian() and its ylim argument. 

Best, t

Sent from my iPhone

On 4 Aug 2011, at 02:38 PM, Christopher Desjardins  
wrote:

> Hi,
> I am using ggplot2 to with the following code:
> 
> gmathk2 <-
> qplot(time,math,colour=Kids,data=kids.ach.lm.k5,geom="smooth",method="lm",formula=y~ns(x,1))
> + opts(title="Smoother Plot: Math K-5") + xlab("Time") + ylab("Math") +
> scale_colour_brewer(pal="Set1"); gmathk2
> 
> This plots all the smoother for all the x values. What I'd like to do is
> plot the smoother for the x values that are only greater than or equal to 0.
> I don't want this:
> 
> gmathk2 <-
> qplot(time,math,colour=Kids,data=kids.ach.lm.k5,geom="smooth",method="lm",formula=y~ns(x,1))
> + opts(title="Smoother Plot: Math K-5") + xlab("Time") + ylab("Math")  +
> scale_colour_brewer(pal="Set1") + xlim(0,50); gmathk2
> 
> Because adding xlim seems to throw away the data below 0 when calculating
> the smoother. What I want it to do is have ggplot2 give me the same graph as
> the first command but just not plot the part of the smoother that is below
> 0.
> 
> Thanks,
> Chris
> 
>[[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] limma contrast matrix

2011-08-04 Thread Patrick Breheny

On 08/04/2011 08:53 AM, Belmont, John W wrote:

I am trying to correct for the effect of 2 covariates in a gene
expression data set.

design<-model.matrix(~0 + Factor + cov1 + cov2)


QUESTION: How to set up the contrast matrix?

The usual commands

fit <- lmFit(selDataMatrix, design)
cont.matrix <- makeContrasts(FacCont=Group1-Group2, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)

does not work because the original design matrix includes the
covariates.

I think I don't really understand how the contrast matrix works.


1) Unless you are sure you know what you're doing and have given this a 
lot of thought, I doubt you want to remove the intercept in this model.


2) A contrast specifies the coefficients of the linear combination of 
parameters that you wish to estimate/test.  If you don't know what this 
means, then I would advise learning more about regression or consulting 
a statistician before proceeding.


3) If the first two comments haven't scared you off, then you can 
specify contrasts via:


cont.matrix <- matrix(c(-1,1,0,0),ncol=1)

under the original parameterization, or

cont.matrix <- matrix(c(0,1,0,0),ncol=1)

if you take the advice of point 1) and put an intercept in your model.

--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rdconv LaTeX files

2011-08-04 Thread Michael O'Keeffe
Thanks!

I found (by properly reading docs) to use the inputenc package. The Rd
package solve most problems except for where I has a multiple line
code element in my Rd file. I tried to convert it to a LaTeX Rd 'code'
element and use an 'alltt' environment to hold the multi-line text. I
worked around this by using the verbatim element.

Thank you, your help was invaluable!

Regards,

Michael

On Thu, Aug 4, 2011 at 10:41 PM, Duncan Murdoch
 wrote:
>>
>> Hi,
>> I've written a package and converted my Rd files into LaTeX using
>> Rdconv. When I copy and paste these files in to my Sweave document I
>> get the error message when compiling the Sweave file:
>>
>> ! Undefined control sequence.
>> l.32 \inputencoding
>>                    {utf8}
>>
>> I also get the error message for the "\HeaderA", "\keyword".
>> Additionally I get an environment undefined error for some of the
>> "\begin" control sequences. Do I need to specifically "include"
>> certain LaTeX libraries in my Sweave document?
>
> Yes, Rd files use the Rd.sty style.  So \usepackage{Rd} would be necessary.
>  It's not designed for this kind of use, so you might find
> incompatibilities; I've never tried it.
>
> Duncan Murdoch
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] slow computation of functions over large datasets

2011-08-04 Thread Paul Hiemstra
 Hi all,

After reading this interesting discussion I delved a bit deeper into the
subject matter. The following snippet of code (see the end of my mail)
compares three ways of performing this task, using ddply, ave and one
yet unmentioned option: data.table (a package). The piece of code
generates mock datasets which vary in size and number of factor levels
for the factor. The results look like this (there is also a ggplot plot
in the script that summarise the table):

> res
   datsize noClasses   tave  tddply tdata.table
...note that I cut out part of the table for readability...
17   1e+0710  9.160   3.500   1.064
18   1e+0750 10.126   4.483   1.364
19   1e+07   100 10.485   5.016   1.407
20   1e+07   200 10.680   6.901   1.435
21   1e+07   500 10.801  12.569   1.474
22   1e+07  1000 10.923  21.001   1.540
23   1e+07  2500 11.514  51.020   1.622
24   1e+07 1 12.158 182.752   1.737

It is clear that the option of using data.table is by far the fastest of
the three and scales quite nicely with the number of factor levels, in
contrast to ddply. It is also faster than ave by up to a factor of 10.

cheers,
Paul

library(ggplot2)
library(data.table)
theme_set(theme_bw())
datsize = c(10e4, 10e5, 10e6)
noClasses = c(10, 50, 100, 200, 500, 1000, 2500, 10e3)
comb = expand.grid(datsize = datsize, noClasses = noClasses)
res = ddply(comb, .(datsize, noClasses), function(x) {
  expdata = data.frame(value = runif(x$datsize),
  cat = round(runif(x$datsize, min = 0, max = x$noClasses)))
  expdataDT = data.table(expdata)
  t1 = system.time(res1 <- with(expdata, ave(value, cat, FUN = sum)))
  t2 = system.time(res2 <- ddply(expdata, .(cat), summarise, val =
sum(value)))
  t3 = system.time(res3 <- expdataDT[, sum(value), by = cat])
  return(data.frame(tave = t1[3], tddply = t2[3], tdata.table = t3[3]))
}, .progress = 'text')
res

ggplot(aes(x = noClasses, y = log(value), color = variable),
data = melt(res, id.vars = c("datsize","noClasses"))) +
facet_wrap(~ datsize) + geom_line()

> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C 
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8   
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8  
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C   
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C  

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods 
[8] base

other attached packages:
[1] data.table_1.6.3 ggplot2_0.8.9proto_0.3-8  reshape_0.8.4  
[5] plyr_1.5.2   fortunes_1.4-1 

loaded via a namespace (and not attached):
[1] digest_0.4.2 tcltk_2.13.0 tools_2.13.0


On 08/03/2011 01:25 PM, Caroline Faisst wrote:
> Hello there,
>
>
> I'm computing the total value of an order from the price of the order items
> using a "for" loop and the "ifelse" function. I do this on a large dataframe
> (close to 1m lines). The computation of this function is painfully slow: in
> 1min only about 90 rows are calculated.
>
>
> The computation time taken for a given number of rows increases with the
> size of the dataset, see the example with my function below:
>
>
> # small dataset: function performs well
>
> exampledata<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7))
>
> exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"]
>
> system.time(for (i in 2:length(exampledata[,1]))
> {exampledata[i,"orderAmount"]<-ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])})
>
>
> # large dataset: the very same computational task takes much longer
>
> exampledata2<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4,5:200),itemPrice=c(10,17,9,12,25,10,1,9,7,25:220))
>
> exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"]
>
> system.time(for (i in 2:9)
> {exampledata2[i,"orderAmount"]<-ifelse(exampledata2[i,"orderID"]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"]+exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])})
>
>
>
> Does someone know a way to increase the speed?
>
>
> Thank you very much!
>
> Caroline
>
>   [[alternative HTML version deleted]]
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770


[[alte

Re: [R] conditional data replace (recode, change or whatsoever)

2011-08-04 Thread R. Michael Weylandt
Best I know, they are effectively the same** within R but I prefer "[" for a
couple of reasons.

1) "[" works for all data types, while $ is sometimes picky: e.g.,

x = matrix(rnorm(9),3); colnames(x) <- c("a","b","c")
x.df = as.data.frame(x)
x$a gives "Error in x$a : $ operator is invalid for atomic vectors"

while

x.df$a works as desired.

2) You can do computations inside of [] that you can't with $: this is a
little contrived, but consider

x.n = names(x.df)
for (i in 1:3)  {
 print(x.df[x.n[i]])
}

works as expected while

x.n = names(x.df)
for (i in 1:3)  {
 print(x.df$(x.n[i]))
}

does not.

3) "[" gives you access to the super-useful "drop" option.

For more info see ? "["

In short, "[" is more flexible and powerful (at least in my experience)
while $ saves the occasional keystroke here and there.

Michael Weylandt

** Caveat that you probably don't need at this point in the learning curve:
the documentation says $ is actually closer to "[[" which can be subtly
different than "[". See ?"[" for the differences.

On Wed, Aug 3, 2011 at 11:43 AM, Justin  wrote:

> zcatav  gmail.com> writes:
>
> >
> > Your suggestion works perfect as i pointed previous message. Now have
> another
> > question about data editing. I try this code:
> > X[X[,"c"]==1,"b"]<-X[,"d"]
> > and results with error: `[<-.data.frame`(`*tmp*`, X[, "c"] == 1, "b",
> value
> > = c(NA,  :
> >   replacement has 9 rows, data has 2
> >
>
> is this equivalent and/or preferred to:
>
> X$b[X$c==1]<-X$d[X$c==1] ??
>
> I assume this goes back to the various indexing methods for a dataframe, an
> object vector that is a column of a data frame vs. an object data frame
> that
> happens to be one column of a larger data frame.
>
> on a very large data set is one preferable for speed?  one for memory use?
>
> I tend to index using $ operators often and if I should quit let me know!!
>
>
> Thanks,
>
> Justin
>
> > Logically i selected 2 rows with X[,"c"]==1. Than i want to replace in
> that
> > rows its own data from "d" to "b" with X[,"b"]<-X[,"d"]. What is wrong?
> >
> > --
> > View this message in context:
>
> http://r.789695.n4.nabble.com/conditional-data-replace-recode-change-or-whatsoever-tp3714715p3715218.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> >
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting multiple age distributions with histogram()

2011-08-04 Thread Juliane Struve
Dear list,
 
I would like to plot several age distributions that are influenced by a factor 
(called group in the data example below). I would like to use historam() in the 
lattice package, but I can't figure out how to set up the data. Could someone 
give me a hint how to do it  for the example below? 
 
Thanks a lot,
 
Juliane 
 
data <- structure(list(Age = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), No = 
c(1084609L, 
1076523L, 1066171L, 887874L, 881239L, 872763L, 726800L, 720632L, 
712984L), Group = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Age", 
"No", "Group"), class = "data.frame", row.names = c(NA, -9L))


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R loop problem

2011-08-04 Thread testrider
Worked like a charm! 
Thanks a lot Jean V Adams

--
View this message in context: 
http://r.789695.n4.nabble.com/R-loop-problem-tp3718449p3718670.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting just a portion of a smoother graph in ggplot2

2011-08-04 Thread Christopher Desjardins
Hi,
I am using ggplot2 to with the following code:

gmathk2 <-
qplot(time,math,colour=Kids,data=kids.ach.lm.k5,geom="smooth",method="lm",formula=y~ns(x,1))
+ opts(title="Smoother Plot: Math K-5") + xlab("Time") + ylab("Math") +
scale_colour_brewer(pal="Set1"); gmathk2

This plots all the smoother for all the x values. What I'd like to do is
plot the smoother for the x values that are only greater than or equal to 0.
I don't want this:

gmathk2 <-
qplot(time,math,colour=Kids,data=kids.ach.lm.k5,geom="smooth",method="lm",formula=y~ns(x,1))
+ opts(title="Smoother Plot: Math K-5") + xlab("Time") + ylab("Math")  +
scale_colour_brewer(pal="Set1") + xlim(0,50); gmathk2

Because adding xlim seems to throw away the data below 0 when calculating
the smoother. What I want it to do is have ggplot2 give me the same graph as
the first command but just not plot the part of the smoother that is below
0.

Thanks,
Chris

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to extract sublist from a list?

2011-08-04 Thread R. Michael Weylandt
I'm not Josh Wiley, but I think I can answer this:

When dealing with lists, its often more convenient to use "[[" instead of
"[" because "[" returns a single element list while "[[" returns the list
element itself. (i.e., if I set x = list(Q = rnorm(64), V =
matrix(rnorm(64),8)) then x["Q"] is a list that has one element which is a
vector while x[["Q"]] is just that vector). With this in mind, we can parse
Josh's code:

lapply goes through each element of the list given in the first argument and
applies the function in the second argument with the third argument as any
necessary argument for the function (that's not totally clear, but I don't
really know how to state it more directly). So for your case,

lapply(list, "[[","y") looks at each element of list: sub1, sub2, sub3, etc.
and does the following: [["y"]]-- thus it returns exactly, list$sub1$y,
list$sub2$y, list$sub3$y as desired. Or more precisely, list$sub1[["y"]],
etc. which is effectively the same thing -- see Josh's original post for
details on the subtle differences.

sapply is just a version of lapply that plays a little nicer. For example,
with x as above: lapply(x,mean) gives me a list of two elements which are
the means, while sapply(x,mean) gives me a vector of the same information.
Sometimes it's a little nicer.

Hopefully this helps, but for more information, try these:

? lapply
? "[["

at  your command prompt.

Michael Weylandt

On Thu, Aug 4, 2011 at 6:57 AM, Richard Ma  wrote:

> Hi Joshua,
>
> Really helpful that you posted so many useful solutions for my problem.
>
> I can understand all your codes except these:
>
> *[code]
> lapply(lst, `[[`, "y")
> ## or if you are only retrieving a single value
> sapply(lst, `[[`, "y")
> [/code]*
>
> Can you explain these a little bit? What does '[[' means?
>
> Thanks a lot,
> Richard
>
>
> Joshua Wiley-2 wrote:
> >
> > On Thu, Aug 4, 2011 at 12:12 AM, Ashim Kapoor
> >  wrote:
> >>> How would we do this problem looping over seq(1:2) ?
> >
> > Because this goes to an email list serv, it is good practice to quote
> > the original problem.  I have no idea what "this" is.
> >
> >>>
> >>>
> >> To extend the example in the corresponding nabble post : -
> >>  sub1<-list(x="a",y="ab")
> >>  sub2<-list(x="c",y="ad")
> >>  lst<-list(sub1=sub1,sub2=sub2)
> >>  for ( t in seq(1:2) )  print(lst[[t]]$y)
> >>
> >> So I can print out the sub1$y/sub2$y but it's not clear how to extract
> >> them.
> >
> > Well, to extract them, just drop the call to print. You could use them
> > directly in the loop or could store them in new variables.
> >
> > ## note seq(1:2) is redundant with simply 1:2
> > or (t in 1:2) print(nchar(lst[[t]]$y))
> >
> > I am guess, though, that what you might be hoping to do is extract
> > specific elements from a list and store the extract elements in a new
> > list.
> >
> > lapply(1:2, function(i) lst[[i]]["y"])
> > ## or compare
> > lapply(1:2, function(i) lst[[i]][["y"]])
> >
> >>
> >> My original was different though.
> >>
> >> How would  say:-
> >>
> >> for ( t in seq(1:2) ) sub"t"$y
> >>
> >> Where sub"t" evaluates to sub1 or sub 2?
> >
> > if you actually want "sub1", or "sub2":
> >
> > ## note that I am wrapping in print() not so that it works
> > ## but so that you can see it at the console
> > for (t in 1:2) print(paste("sub", t, sep = ''))
> >
> > from which we can surmise that the following should work:
> >
> > for (t in 1:2) print(lst[[paste("sub", t, sep = '')]])
> >
> > which trivially extends to:
> >
> > for (t in 1:2) print(lst[[paste("sub", t, sep = '')]]$y)
> >
> > or perhaps more appropriately
> >
> > for (t in 1:2) print(lst[[paste("sub", t, sep = '')]][["y"]])
> >
> > If you just need to go one level down for *all* elements of your list
> >
> > lapply(lst, `[[`, "y")
> > ## or if you are only retrieving a single value
> > sapply(lst, `[[`, "y")
> >
> > Hope this helps,
> >
> > Josh
> >
> >>
> >> Many thanks.
> >> Ashim
> >>
> >>
> >>> On Thu, Aug 4, 2011 at 10:59 AM, Richard Ma
> >>> wrote:
> >>>
>  Thank you so much GlenB!
> 
>  I got it done using your method.
> 
>  I'm just curious how did you get this idea? Cause for me, this looks
> so
>  tricky
> 
>  Cheers,
>  Richard
> 
>  -
>  I'm a PhD student interested in Remote Sensing and R Programming.
>  --
>  View this message in context:
> 
> http://r.789695.n4.nabble.com/How-to-extract-sublist-from-a-list-tp3717451p3717713.html
>  Sent from the R help mailing list archive at Nabble.com.
> 
>  __
>  R-help@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide
>  http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
> 
> >>>
> >>>
> >>
> >>[[alternative HTML version deleted]]
> >>
> >> 

Re: [R] limma contrast matrix

2011-08-04 Thread James W. MacDonald

Hi John,

The limma package is part of Bioconductor, so you should be asking this 
question on the BioC listserv  rather than 
R-help.


In addition, please see the posting guide. The example you give is not 
self-contained, and the term 'does not work' is so unspecific as to be 
useless.


On 8/4/2011 8:53 AM, Belmont, John W wrote:

I am trying to correct for the effect of 2 covariates in a gene expression data 
set.


design<-model.matrix(~0 + Factor + cov1 + cov2)


QUESTION:
How to set up the contrast matrix?

The usual commands

fit<- lmFit(selDataMatrix, design)
cont.matrix<- makeContrasts(FacCont=Group1-Group2, levels=design)


Why would your design matrix have columns labeled 'Group1' and 'Group2'? 
Your code doesn't indicate that you did any such naming.



fit2<- contrasts.fit(fit, cont.matrix)

does not work because the original design matrix includes the covariates.

I think I don't really understand how the contrast matrix works.


I agree. So let's make some wild assumptions, and maybe I can help.

Factor <- factor(rep(1:2, each=6))
cov1 <- factor(rep(1:2, times=2, each=3))
## let's say you had two batches.
cov2 <- rnorm(12)
## and the other covariate is continuous

mat <- model.matrix(~ 0 + Factor + cov1 + cov2)

> mat
   Factor1 Factor2 cov12cov2
11   0 0  0.75675940
21   0 0 -0.80761696
31   0 0  0.61228480
41   0 1 -1.13920820
51   0 1 -0.24367358
61   0 1  0.32244694
70   1 0  0.51438468
80   1 0 -2.23587057
90   1 0 -0.06560733
10   0   1 1 -0.11273432
11   0   1 1  0.33398074
12   0   1 1  0.70900581

Now we have a model that controls for the fact that you did things in 
two batches (naughty boy) and that you have some random continuous 
covariate you want to control for. So Factor1 is the mean of the first 
group after controlling for the other two covariates, and Factor2 is the 
same, for the other group. Can you see that a contrast comparing these 
two factors is what you are after?


So something like

makeContrasts(whatIwant=Factor1-Factor2, groups=design)

should get you where you want to go.

Best,

Jim





John W. Belmont, M.D.,Ph.D.
Professor
Department of Molecular and Human Genetics
Baylor College of Medicine
1100 Bates, Room 8070
Houston, TX 77030
713-798-4634
fax: 713-798-7187

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] identifying weeks (dates) that certain days (dates) fall into

2011-08-04 Thread Dimitri Liakhovitski
Sorry for renewing the topoic. I thought it worked but now I've run
into a little problem:

 # My data frame with dates for week starts (Mondays)
y<-data.frame(week=seq(as.Date("2009-12-28"),
as.Date("2011-12-26"),by="week") )

# I have a vector of super bowl dates (including the future one for 2012):
sbwl.dates<-as.Date(c("2005-02-06","2006-02-05","2007-02-04","2008-02-03","2009-02-01","2010-02-07","2011-02-06","2012-02-05"))
I want to find the weeks in y that contain super bowl dates for
applicable years. I am trying:
sbwl.weeks<-findInterval(sbwl.dates, y$week)
sbwl.weeks<-sbwl.weeks[sbwl.weeks>0]
(sbwl.weeks)
> 6 58 105
y$flag<-0
y$flag[sbwl.weeks]<-1

6 and 58 are correct. But why am I getting 105 (the last row)?
Any way to fix it?
Thanks a lot!
Dimitri



On Tue, Aug 2, 2011 at 12:57 PM, Dimitri Liakhovitski
 wrote:
> Thanks a lot, everyone!
> Dimitri
>
> On Tue, Aug 2, 2011 at 12:34 PM, Dennis Murphy  wrote:
>> Hi:
>>
>> You could try the lubridate package:
>>
>> library(lubridate)
>> week(weekly$week)
>> week(july4)
>> [1] 27 27
>>
>>> week
>> function (x)
>> yday(x)%/%7 + 1
>> 
>>
>> which is essentially Gabor's code :)
>>
>> HTH,
>> Dennis
>>
>> On Tue, Aug 2, 2011 at 7:36 AM, Dimitri Liakhovitski
>>  wrote:
>>> Hello!
>>>
>>> I have dates for the beginning of each week, e.g.:
>>> weekly<-data.frame(week=seq(as.Date("2010-04-01"),
>>> as.Date("2011-12-26"),by="week"))
>>> week  # each week starts on a Monday
>>>
>>> I also have a vector of dates I am interested in, e.g.:
>>> july4<-as.Date(c("2010-07-04","2011-07-04"))
>>>
>>> I would like to flag the weeks in my weekly$week that contain those 2
>>> individual dates.
>>> I can only think of a very clumsy way of doing it:
>>>
>>> myrows<-c(which(weekly$week==weekly$week[weekly$week>july4[1]][1]-7),
>>>        which(weekly$week==weekly$week[weekly$week>july4[2]][1]-7))
>>> weekly$flag<-0
>>> weekly$flag[myrows]<-1
>>>
>>> It's clumsy - because actually, my vector of dates of interest (july4
>>> above) is much longer.
>>> Is there maybe a more elegant way of doing it?
>>> Thank you!
>>> --
>>> Dimitri Liakhovitski
>>> marketfusionanalytics.com
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>
>
> --
> Dimitri Liakhovitski
> marketfusionanalytics.com
>



-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: Counting rows given conditional

2011-08-04 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 04.08.2011 14:56:09:

> a217  
> Odeslal: r-help-boun...@r-project.org
> 
> 04.08.2011 14:56
> 
> Komu
> 
> r-help@r-project.org
> 
> Kopie
> 
> Předmět
> 
> [R] Counting rows given conditional
> 
> Hello,
> 
> I have an input file that contains multiple columns, but the column I'm
> concerned about looks like:
> 
> "TR"
> 5
> 0
> 4
> 1
> 0
> 2
> 0
> 
> To count all of the rows in the column I know how to do NROW(x$TR) which
> gives 7.
> 
> However, I would also like to count only the number of rows with values 
>=1
> (i.e. not 0). I've tried NROW(x$TR>=1) which did not give the intended
> output.

You are quite close. 
x$TR>=1
gives you logical vector TRUE/FALSE. You can compute count of TRUE values 
by
sum(logical.vector) e.g.
sum(x$TR>=1)

Regards
Petr



> 
> Do any of you have any suggestions as to where I'm going wrong?
> 
> 
> 
> --
> View this message in context: 
http://r.789695.n4.nabble.com/Counting-rows-
> given-conditional-tp3718541p3718541.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible bug of QR decomposition in package Matrix

2011-08-04 Thread Martin Maechler
> C6H5NO2  
> on Thu, 4 Aug 2011 16:03:54 +0800 writes:

> Thank you very much, Josh!
> As you suggested, I will contact the developers of "Matrix".

> PS, C6 are just initial characters of my email account :-)

> Best wishes,
> C6

well, as the posting guide (http://www.R-project.org/posting-guide.html)
says, this is regarded as impolite by many,
and if I wasn't one of the Matrix package authors,
I would not spend time helping  'C6'  either.

> 2011/8/4 Joshua Wiley :
>> Hi C6 (were C1 - 5 already taken in your family?),
>> 
>> I downloaded your data and can replicate your problem.  R
>> ceases responding and terminates.  This does not occur with all
>> uses of qr on a dgCMatrix object.  I know nothing about sparse
>> matrices, but if you believe this should not be occurring, you
>> should contact the package maintainers.  Here is my
>> sessionInfo() (FYI, it would probably be helpful to report
>> yours also in case the issue is version dependent):
>> 
>> R Under development (unstable) (2011-07-30 r56564) Platform:
>> x86_64-pc-mingw32/x64 (64-bit)
>> 
>> locale: [1] LC_COLLATE=English_United States.1252 [2]
>> LC_CTYPE=English_United States.1252 [3]
>> LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5]
>> LC_TIME=English_United States.1252
>> 
>> attached base packages: [1] stats     graphics  grDevices utils
>>     datasets  methods   base
>> 
>> other attached packages: [1] Matrix_0.999375-50 lattice_0.19-30
>> 
>> loaded via a namespace (and not attached): [1] grid_2.14.0
>>  tools_2.14.0
>> 
>> Cheers,
>> 
>> Josh
>> 
>> On Wed, Aug 3, 2011 at 4:26 PM, C6H5NO2 
>> wrote:
>>> Hello R users,
>>> 
>>> I am trying to give the QR decomposition for a large sparse
>>> matrix in the format of dgCMatrix. When I run qr function for
>>> this matrix, the R session simply stops and exits to the
>>> shell.  The matrix is of size 108595x108595, and it has
>>> 4866885 non-zeros. I did the experiment on windows 7 and linux
>>> mint 11 (both 64 bit), and the results are the same.
>>> 
>>> I have uploaded my data file to
>>> http://ifile.it/elf2p6z/A.RData . The file is 10.681 MB and I
>>> hope someone could kindly download it.  The code to see my
>>> problem is: 

>>>  library(Matrix) 
>>>  load("A.RData") 
>>>  B <- qr(A)


>>> Best wishes, C6

And what's the size of RAM your two computers have ??
The answer is of quite some importance.

Short answer: If you have a large very sparse matrix, 
you don't know if the QR decomposition of that matrix is also 
very sparse... and if it ain't it will blow up memory,
and that's what I'm pretty sure happened with you.

What I don't see is why R "simply stops" for you and does not
through a an error message about insufficient memory.
As I show below, I do get a seg.fault 
--- which may be considered a bug ---
*BUT* I do get the message about memory problems.
Did you really *not* get any such message?
Is it because you've used a GUI that hides such valuable
information from the user?

Here's the more detailed reason / analysis about why the above
"kills R". This is commented R code, 
you can cut paste after you've got 'A' : 

str(A)
## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   ..@ i   : int [1:4866885] 0 1 2 16 32 33 2392 2417 0 1 ...
##   ..@ p   : int [1:108596] 0 8 21 35 44 51 59 63 69 78 ...
##   ..@ Dim : int [1:2] 108595 108595
##   ..@ Dimnames:List of 2
##   .. ..$ : NULL
##   .. ..$ : NULL
##   ..@ x   : num [1:4866885] 140.03 14.79 14.79 1.78 1.78 ...
##   ..@ factors : list()

system.time(# the following is still not as fast as it could be):
isSymmetric(A) # yes !
)# 1.13 {the *2nd* time; 1.9 the 1st time !}

## First work with a  submatrix:
n <- 1
A1 <- A[1:n, 1:n]
system.time(
qr1 <- qr(A1))# on cmath-8, machine with 48 GB  RAM memory
##   user  system elapsed
## 59.884   0.316  60.240  !!

str(qr1)
## Formal class 'sparseQR' [package "Matrix"] with 6 slots
##   ..@ V   :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. ..@ i   : int [1:7692948] 0 1 2 3 4 3 4 5 4 5 ...
##   .. .. ..@ p   : int [1:10001] 0 1 2 5 8 10 11 12 18 26 ...
##   .. .. ..@ Dim : int [1:2] 1 1
##   .. .. ..@ Dimnames:List of 2
##   .. .. .. ..$ : NULL
##   .. .. .. ..$ : NULL
##   .. .. ..@ x   : num [1:7692948] 1 1 -3.71 8.68 -8.68 ...
##   .. .. ..@ factors : list()
##   ..@ beta: num [1:1] 0 0 0.01216 0.00743 0.01758 ...
##   ..@ p   : int [1:1] 61 62 63 64 94 80 161 162 163 164 ...
##   ..@ R   :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. ..@ i   : int [1:13581659] 0 1 2 2 3 3 4 2 3 4 ...
##   .. .. ..@ p   : int [1:10001] 0 1 2 3 5 7 11 12 13 15 ...
##   .. .. ..@ Dim : int [1:2] 1 1
##   .. .. ..@ Dimnames:Li

[R] Counting rows given conditional

2011-08-04 Thread a217
Hello,

I have an input file that contains multiple columns, but the column I'm
concerned about looks like:

"TR"
5
0
4
1
0
2
0

To count all of the rows in the column I know how to do NROW(x$TR) which
gives 7.

However, I would also like to count only the number of rows with values >=1
(i.e. not 0). I've tried NROW(x$TR>=1) which did not give the intended
output.

Do any of you have any suggestions as to where I'm going wrong?



--
View this message in context: 
http://r.789695.n4.nabble.com/Counting-rows-given-conditional-tp3718541p3718541.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] limma contrast matrix

2011-08-04 Thread Belmont, John W
I am trying to correct for the effect of 2 covariates in a gene expression data 
set.


design<-model.matrix(~0 + Factor + cov1 + cov2)


QUESTION:
How to set up the contrast matrix? 

The usual commands

fit <- lmFit(selDataMatrix, design) 
cont.matrix <- makeContrasts(FacCont=Group1-Group2, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)

does not work because the original design matrix includes the covariates.

I think I don't really understand how the contrast matrix works.


John W. Belmont, M.D.,Ph.D.
Professor
Department of Molecular and Human Genetics
Baylor College of Medicine
1100 Bates, Room 8070
Houston, TX 77030
713-798-4634
fax: 713-798-7187

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 3D Bar Graphs in ggplot2?

2011-08-04 Thread Duncan Murdoch

On 11-08-04 8:20 AM, Martin Maechler wrote:

Duncan Murdoch
 on Tue, 2 Aug 2011 16:51:38 -0400 writes:


 >  On 11-08-02 2:39 PM, wwreith wrote:
 >>  Does anyone know how to create a 3D Bargraph using
 >>  ggplot2/qplot. I don't mean 3D as in x,y,z coordinates. Just a
 >>  2D bar graph with a 3D shaped bard.  See attached excel file
 >>  for an example.
 >>
 >>  Before anyone asks I know that 3D looking bars don't add
 >>  anything except "prettiness".

 >  If you want graphs like that, you should be using Excel, not R.

Yes!!

10th commandment of R: You shall not misuse R!


I admit that the commandment partly motivated my message, but this was 
also advice to William:  Excel is designed to produce those things, R is 
not, so it is an inefficient use of his time to try to produce them in R.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R loop problem

2011-08-04 Thread Jean V Adams
Try this:

q <- z[match(p, y)]

Jean


`·.,,  ><(((º>   `·.,,  ><(((º>   `·.,,  ><(((º>

Jean V. Adams
Statistician
U.S. Geological Survey
Great Lakes Science Center
223 East Steinfest Road
Antigo, WI 54409  USA
715-627-4317, ext. 3125  (Office)
715-216-8014  (Cell)
715-623-6773  (FAX)
http://www.glsc.usgs.gov  (GLSC web site)
http://profile.usgs.gov/jvadams  (My homepage)
jvad...@usgs.gov  (E-mail)




From:
testrider 
To:
r-help@r-project.org
Date:
08/04/2011 07:21 AM
Subject:
[R] R loop problem
Sent by:
r-help-boun...@r-project.org



I have run into a speed issue, and given the size of the problem it feels
like there should be an easy solution. Here is the problem statement with
some arbitrary numbers added.

#p,q: vector with length(q)==length(p)==1 and length(levels(p))==3000
#y,z: vectors with length(levels(y))=length(y)==length(z)==5000

for (i in levels(p)){
q[i==p]<-z[i==y]}

At first i used two for loops which was horrible, now i got rid of one but 
i
don't know how to lose the second one.


PS. I expect the solution to be available through google etc and I have
searched for a solution but i did not find any usefull websites, probably
because i cannot pinpoint the best search words.

--
View this message in context: 
http://r.789695.n4.nabble.com/R-loop-problem-tp3718103p3718103.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rdconv LaTeX files

2011-08-04 Thread Duncan Murdoch

On 11-08-04 1:42 AM, Michael O'Keeffe wrote:

Hi,
I've written a package and converted my Rd files into LaTeX using
Rdconv. When I copy and paste these files in to my Sweave document I
get the error message when compiling the Sweave file:

! Undefined control sequence.
l.32 \inputencoding
{utf8}

I also get the error message for the "\HeaderA", "\keyword".
Additionally I get an environment undefined error for some of the
"\begin" control sequences. Do I need to specifically "include"
certain LaTeX libraries in my Sweave document?


Yes, Rd files use the Rd.sty style.  So \usepackage{Rd} would be 
necessary.  It's not designed for this kind of use, so you might find 
incompatibilities; I've never tried it.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] persp()

2011-08-04 Thread Jean V Adams
Rosario,

I don't think persp() is the function you need.

You may find that the maps in R's historical world map would work for you. 
 In that case you can try something like this (an example mapping France 
and Germany):

require(maps)
map("world", c("France", "Germany"), xlim=c(-6.61, 17.21), ylim=c(40.73, 
57.92))

If that doesn't meet your needs, you can use your own longitudes and 
latitudes to draw the country borders, for example:

require(maps)
long <- c(-1.15, 1.45, 2.49, -0.37, -0.11, -2.56, -2.87, -5.52, -3.81, 
-3.81, -1.15)
lat <- c(50.50, 52.35, 54.12, 54.91, 53.17, 52.21, 54.06, 53.61, 50.71, 
50.71, 50.50)
# use the map() function to set up the long/lat projection without drawing 
anything
map("world", xlim=range(long), ylim=range(lat), type="n")
lines(long, lat)

Jean


`·.,,  ><(((º>   `·.,,  ><(((º>   `·.,,  ><(((º>

Jean V. Adams
Statistician
U.S. Geological Survey
Great Lakes Science Center
223 East Steinfest Road
Antigo, WI 54409  USA



From:
Rosario Garcia Gil 
To:
"r-help@r-project.org" 
Date:
08/04/2011 01:05 AM
Subject:
[R] persp()
Sent by:
r-help-boun...@r-project.org



Hello

I am trying to draw a basic black and white map of two European countries.

After searching some key words in google and reading many pages I arrived 
to the conclusion that persp() could be used to draw that map.

I have prepared three small example files, which are supposed to be the 
files required for running that function.

xvector is a vector with the longitudes
yvector is a vector with the latitudes
zmatrix is supposed to the height, but since I only need a flat map I just 
gave the value 1 to each of the entries of the matrix (I am not sure this 
is correct though).

The first question for me when using persp() is that x and y values should 
be in increasing values (following the instructions), but I understand 
that the coordinates x and y are actually pairs of values 
(longitude/latitude pairs of values) and if I order them in ascending 
order both then the pairing is gone. I guess I am totally lost!

Still even if I try to run persp() by ordering in ascending value x and y 
values (even if it does not make sense for me) I still get this message:

<-  persp(xvector,yvector,zmatrix,theta=-40,phi=30)
Error in persp.default(xvector, yvector, zmatrix, theta = -40, phi = 30) : 

  increasing 'x' and 'y' values expected

Any help is wellcome. Is there any other better function to draw a flat 
map (2D), also example of the imput files is wellcome. Thanks in advance.
Rosario

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] persp()

2011-08-04 Thread Eik Vettorazzi
Hi Rosario,
you might have a look at the  "maps" and "maptools" (for reading
shape-files) packages.

#e.g.
library(maps)
map("world",c("sweden","germany"))

Cheers


Am 04.08.2011 02:58, schrieb Rosario Garcia Gil:
> Hello
> 
> I am trying to draw a basic black and white map of two European countries.
> 
> After searching some key words in google and reading many pages I arrived to 
> the conclusion that persp() could be used to draw that map.
> 
> I have prepared three small example files, which are supposed to be the files 
> required for running that function.
> 
> xvector is a vector with the longitudes
> yvector is a vector with the latitudes
> zmatrix is supposed to the height, but since I only need a flat map I just 
> gave the value 1 to each of the entries of the matrix (I am not sure this is 
> correct though).
> 
> The first question for me when using persp() is that x and y values should be 
> in increasing values (following the instructions), but I understand that the 
> coordinates x and y are actually pairs of values (longitude/latitude pairs of 
> values) and if I order them in ascending order both then the pairing is gone. 
> I guess I am totally lost!
> 
> Still even if I try to run persp() by ordering in ascending value x and y 
> values (even if it does not make sense for me) I still get this message:
> 
> <-  persp(xvector,yvector,zmatrix,theta=-40,phi=30)
> Error in persp.default(xvector, yvector, zmatrix, theta = -40, phi = 30) : 
>   increasing 'x' and 'y' values expected
> 
> Any help is wellcome. Is there any other better function to draw a flat map 
> (2D), also example of the imput files is wellcome. Thanks in advance.
> Rosario
> 
> 
> 
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 3D Bar Graphs in ggplot2?

2011-08-04 Thread Martin Maechler
> Duncan Murdoch 
> on Tue, 2 Aug 2011 16:51:38 -0400 writes:

> On 11-08-02 2:39 PM, wwreith wrote:
>> Does anyone know how to create a 3D Bargraph using
>> ggplot2/qplot. I don't mean 3D as in x,y,z coordinates. Just a
>> 2D bar graph with a 3D shaped bard.  See attached excel file
>> for an example.
>> 
>> Before anyone asks I know that 3D looking bars don't add
>> anything except "prettiness".

> If you want graphs like that, you should be using Excel, not R.

Yes!!

10th commandment of R: You shall not misuse R!

Martin Maechler

> Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R loop problem

2011-08-04 Thread jim holtman
A subset of actual data and what you would expect as a result would be
very helpful.   All you say is that p.q are vectors, but it would
appear that they are character vectors, but the content is unknown.
Also will the expression "q[i==p]<-z[i==y]" have the same length on
each side; the vectors appear to be of different lengths -- what
happens if recycling kicks in?  Or it is always a match of length 1"

So a little more definition, and data and example, would help.

On Thu, Aug 4, 2011 at 5:27 AM, testrider  wrote:
> I have run into a speed issue, and given the size of the problem it feels
> like there should be an easy solution. Here is the problem statement with
> some arbitrary numbers added.
>
> #p,q: vector with length(q)==length(p)==1 and length(levels(p))==3000
> #y,z: vectors with length(levels(y))=length(y)==length(z)==5000
>
> for (i in levels(p)){
> q[i==p]<-z[i==y]}
>
> At first i used two for loops which was horrible, now i got rid of one but i
> don't know how to lose the second one.
>
>
> PS. I expect the solution to be available through google etc and I have
> searched for a solution but i did not find any usefull websites, probably
> because i cannot pinpoint the best search words.
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/R-loop-problem-tp3718103p3718103.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sweave: pdf-graphics got three times lager with R 2.13.1

2011-08-04 Thread eriksengewald
Dear R-Users,

I am using R for years now but recently I encounter a problem with the
pdf-size of graphics generated with sweave. However, I am new in this forum.
Hence, please don't hesitate if I am wrong here.

I use a script which runs perfectly in R 2.11.1 and the pdf-size of the
graphs is about 3 KB. Running the same script with R 2.13.1 the file size
increases to 12 KB. I am using about 300 pdf-pictures in my tex-file,
generated by sweave. So this change in file size is dramatic for the finally
pdf-file.

Has somebody an explanation for this phenomenon?

Thanks a lot for your help.

Erik

Here some more facts:
- I am running a windows system (same problem with linux system)
- This is an example of my code chunk:
<>=
PlotFunctionRef2(daten,daten_ref1,daten_ref2,g=1) //plotting function
@


--
View this message in context: 
http://r.789695.n4.nabble.com/Sweave-pdf-graphics-got-three-times-lager-with-R-2-13-1-tp3718378p3718378.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] New multilevel modelling course practicals for the lmer and glmer functions - now online

2011-08-04 Thread Hilary Browne
The Centre for Multilevel Modelling is very pleased to announce the 
addition of R practicals to our free on-line multilevel modelling course. 
These give  detailed instructions of how to carry out a range of analyses 
in R, starting from multiple regression and progressing through to 
multilevel modelling of continuous and binary data using the lmer and glmer 
functions.


MLwiN and Stata versions of these practicals are already available.
You will need to log on or register onto the course to view these
practicals.

Read More...
http://www.cmm.bris.ac.uk/lemma/course/view.php?id=13

With best regards,

Hilary Browne
Technical and Business Manager
(Part-time: working days Tuesdays, Thursdays and Fridays)
Centre for Multilevel Modelling
University of Bristol
2 Priory Road
Bristol   BS8 1TX

Tel: +44 (0)117 331 0846
Web:  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to extract sublist from a list?

2011-08-04 Thread Richard Ma
Hi Joshua,

Really helpful that you posted so many useful solutions for my problem.

I can understand all your codes except these:

*[code]
lapply(lst, `[[`, "y")
## or if you are only retrieving a single value
sapply(lst, `[[`, "y")
[/code]*

Can you explain these a little bit? What does '[[' means?

Thanks a lot,
Richard


Joshua Wiley-2 wrote:
> 
> On Thu, Aug 4, 2011 at 12:12 AM, Ashim Kapoor
>  wrote:
>>> How would we do this problem looping over seq(1:2) ?
> 
> Because this goes to an email list serv, it is good practice to quote
> the original problem.  I have no idea what "this" is.
> 
>>>
>>>
>> To extend the example in the corresponding nabble post : -
>>  sub1<-list(x="a",y="ab")
>>  sub2<-list(x="c",y="ad")
>>  lst<-list(sub1=sub1,sub2=sub2)
>>  for ( t in seq(1:2) )  print(lst[[t]]$y)
>>
>> So I can print out the sub1$y/sub2$y but it's not clear how to extract
>> them.
> 
> Well, to extract them, just drop the call to print. You could use them
> directly in the loop or could store them in new variables.
> 
> ## note seq(1:2) is redundant with simply 1:2
> or (t in 1:2) print(nchar(lst[[t]]$y))
> 
> I am guess, though, that what you might be hoping to do is extract
> specific elements from a list and store the extract elements in a new
> list.
> 
> lapply(1:2, function(i) lst[[i]]["y"])
> ## or compare
> lapply(1:2, function(i) lst[[i]][["y"]])
> 
>>
>> My original was different though.
>>
>> How would  say:-
>>
>> for ( t in seq(1:2) ) sub"t"$y
>>
>> Where sub"t" evaluates to sub1 or sub 2?
> 
> if you actually want "sub1", or "sub2":
> 
> ## note that I am wrapping in print() not so that it works
> ## but so that you can see it at the console
> for (t in 1:2) print(paste("sub", t, sep = ''))
> 
> from which we can surmise that the following should work:
> 
> for (t in 1:2) print(lst[[paste("sub", t, sep = '')]])
> 
> which trivially extends to:
> 
> for (t in 1:2) print(lst[[paste("sub", t, sep = '')]]$y)
> 
> or perhaps more appropriately
> 
> for (t in 1:2) print(lst[[paste("sub", t, sep = '')]][["y"]])
> 
> If you just need to go one level down for *all* elements of your list
> 
> lapply(lst, `[[`, "y")
> ## or if you are only retrieving a single value
> sapply(lst, `[[`, "y")
> 
> Hope this helps,
> 
> Josh
> 
>>
>> Many thanks.
>> Ashim
>>
>>
>>> On Thu, Aug 4, 2011 at 10:59 AM, Richard Ma
>>> wrote:
>>>
 Thank you so much GlenB!

 I got it done using your method.

 I'm just curious how did you get this idea? Cause for me, this looks so
 tricky

 Cheers,
 Richard

 -
 I'm a PhD student interested in Remote Sensing and R Programming.
 --
 View this message in context:
 http://r.789695.n4.nabble.com/How-to-extract-sublist-from-a-list-tp3717451p3717713.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

>>>
>>>
>>
>>        [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, ATS Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-
I'm a PhD student interested in Remote Sensing and R Programming.
--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-extract-sublist-from-a-list-tp3717451p3718282.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coefficient names when using lm() with contrasts

2011-08-04 Thread Peter Morgan
Many thanks, Greg. It works like a dream.

Best wishes,

Pete



From:   Greg Snow 
To: Peter Morgan , "r-help@r-project.org" 

Date:   03/08/2011 19:06
Subject:Re: [R] Coefficient names when using lm() with contrasts
Sent by:r-help-boun...@r-project.org



If you add column names to your contrast matrix (treat3) then those names 
will be used in the coefficient names.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Peter Morgan
> Sent: Wednesday, August 03, 2011 6:12 AM
> To: r-help@r-project.org
> Subject: [R] Coefficient names when using lm() with contrasts
> 
> Dear R Users,
> 
> Am using lm() with contrasts as below. If I skip the contrasts()
> statement, I get the coefficient names to be
> > names(results$coef)
> [1] "(Intercept)" "VarAcat" "VarArat" "VarB"
> 
> which are much more meaningful than ones based on integers.
> 
> Can anyone tell me how to get R to keep the coefficient names based on
> the
> factor levels whilst using contrasts rather than labelling them with
> integers?
> 
> Many thanks in advance,
> 
> Pete
> 
> Cardiff, UK
> 
> > dt=read.table("testreg.txt",sep=",",header=T)
> > dt
>ID VarA VarB VarC
> 1   1  cat2   23
> 2   2  dog3   56
> 3   3  rat5   35
> 4   4  cat2   43
> 5   5  cat7   51
> 6   6  dog3   31
> 7   7  dog4   65
> 8   8  rat1   18
> 9   9  rat6   49
> 10 10  dog3   28
> > dt$VarA=relevel(dt$VarA,ref="dog")
> > treat3=matrix(-1/3,ncol=2,nrow=3); for (i in 1:2) {treat3[i+1,i]=2/3}
> > contrasts(dt$VarA)=treat3
> > levels(dt$VarA)
> [1] "dog" "cat" "rat"
> > results=lm(formula=VarC~VarA+VarB, data=dt)
> > names(results$coef)
> [1] "(Intercept)" "VarA1"   "VarA2"   "VarB"
>[[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >