date:20120203

[R] Indistinguishable balls into distinguishable boxes

2012-02-03 Thread Marc Girondot


Hi "the list" !

I would like to create a dataframe that answer to : "all the 
combinations of the different way to distribute n indistinguishable 
balls into k distinguishable boxes". Let take an example: 2 balls in 3 
boxes:

Box1 Box2 Box3
2   0   0
1   1   0
1   0   1
0   2   0
0   1   1
0   0   2

I have made a script (see script below) using expand.grid but it 
generates huge number of unnecessary solutions that I must filter. And 
when the number of balls or box is large, the size of the dataframe can 
be huge.


Has someone a solution for this problem ?

Thanks a lot

(I don't play with balls and box... it is a way to manage uncertainty. I 
know that 2 events have occured during a period of 3 days but I don't 
know when exactly)


Marc




Nballs<-2
nbbox<-3


# The number of different ways to distribute n indistinguishable balls
# into k distinguishable boxes is C(n+k-1,k-1).
# nb<-choose(Nballs+nbbox-1,nbbox-1)=dim(tb)[1]

if (Nballs==0) {
tb<-as.data.frame(matrix(rep(0, nbbox), ncol=nbbox))

} else {

es<-list(0:(Nballs-1))
es<-rep(es, nbbox)

tb<-expand.grid(es)
tb<-tb[apply(tb, 1, sum)==Nballs,1:nbbox]

# trick to have smaller tb after expand.grid
tbfull<- as.data.frame(matrix(rep(0, nbbox*nbbox), ncol=nbbox, 
dimnames=list(NULL, names(tb

for(i in 1:nbbox) {tbfull[i, i]<-Nballs}

tb<-rbind(tb, tbfull)

}

Result:

> tb
   Var1 Var2 Var3
4 110
6 101
7 011
41200
5 020
61002

--
__
Marc Girondot, Pr

Laboratoire Ecologie, Systématique et Evolution
Equipe de Conservation des Populations et des Communautés
CNRS, AgroParisTech et Université Paris-Sud 11 , UMR 8079
Bâtiment 362
91405 Orsay Cedex, France

Tel:  33 1 (0)1.69.15.72.30   Fax: 33 1 (0)1.69.15.73.53
e-mail: marc.giron...@u-psud.fr
Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html
Skype: girondot

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] effect function (effects package)

2012-02-03 Thread Hlavka, Eileen

Dear all,

How does the effect() function in the effects package calculate effects and 
standard errors for glm quasipoisson models?  I was using effect() to calculate 
the impact of increasing x to e + epsilon, and then finding the expected 
percent change.  I thought that this effect (as a percentage) should be 
exp(beta*epsilon), where beta is the appropriate coefficient from the model, 
but that's not what I'm getting using the effect() output.

Sorry for the lack of example-it would require toy data etc. and seems 
unnecessary since my question is more conceptual.

Thanks,

Eileen



__

This email message is for the sole use of the intended r...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Binding matrices of different sizes

2012-02-03 Thread chuck.01

http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=search_page&node=789695&query=cbind+with+different+rows



Rambler1 wrote
> 
> I am trying to bind two matrices of different length. They both are one
> column and many rows but the rows differ by a few. How am I able to cbind
> them together inserting NA's to fill the missing rows and make them the
> same length? Thanks in advance for any help!
> 


--
View this message in context: 
http://r.789695.n4.nabble.com/Binding-matrices-of-different-sizes-tp4356060p4356349.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rpart: plot without scientific notation

2012-02-03 Thread Adalberto Pineda

Stephen:  I tried running your script but couldn't make it work (any feedback
is welcome) so I changed it a little bit. I found you still need to set the
# of digits as the max # of possible digits of your yval response or
greater.

node.fun <- function(x) 
{ 
x$frame$yval<-as.integer(sprintf("%0.f", x$frame$yval))
x 
} 

plot(node.fun(tree), compress=TRUE)
text(node.fun(tree), digits = 15) 


--
View this message in context: 
http://r.789695.n4.nabble.com/rpart-plot-without-scientific-notation-tp3767975p4356308.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combining data structures

2012-02-03 Thread Pete Brecknock

Not entirely sure why you would want a data.frame that has multiple entries
in one of the columns (Connect.down) but leaving that aside is the following
of any use?

nn=list()

nn[[1]] =  list(Node = "1", Connect.up = c(NULL), Connect.down = c(2,3))
nn[[2]] =  list(Node = "2", Connect.up = c(1), Connect.down = c(4,5))
nn[[3]] =  list(Node = "3", Connect.up = c(NULL), Connect.down = c(2,3))
nn[[4]] =  list(Node = "4", Connect.up = c(1), Connect.down = c(4,5))

Output = do.call(as.data.frame(rbind),nn)

# Output
  value.Node value.Connect.up value.Connect.down
1  1 NULL   2, 3
2  21   4, 5
3  3 NULL   2, 3
4  41   4, 5

HTH

Pete



dkStevens wrote
> 
> Group
> 
> It's unlikely I'm trying this the best way, but I'm trying to create a 
> data structure from elements like
> 
> nNode = 2
> nn = vector("list",nNode)
> 
> nn[[1]] =  list(Node = "1", Connect.up = c(NULL), Connect.down = c(2,3))
> nn[[2]] =  list(Node = "2", Connect.up = c(1), Connect.down = c(4,5))
>   #( and eventually many more nodes)
> 
> NodeList = as.data.frame(nn[[1]])
> for(i in 2:nNode) {
>NodeList = rbind(NodeList,as.data.frame(nn[[i]]))
> }
> 
> and is trying to create a data frame with many rows and three columns: 
> Node, Connect.up,Connect.down
> in which the Connect.up and Connect.down columns may be single numbers 
> or vectors of numbers.  The above approach gives an error:
> 
> Error in data.frame(Node = "1", Connect.up = NULL, Connect.down = c(2,  :
>arguments imply differing number of rows: 1, 0, 2
> 
> My earlier try by brute force worked fine:
> 
> NodeList = as.data.frame(rbind(nn[[1]],nn[[2]]))
> 
>  > NodeList
>Node Connect.up Connect.down
> 11   NULL 2, 3
> 22  1 4, 5
> 
> and gives me what I want (many more rows eventually). But I want to do 
> this generically from the problem context in a procedure so I won't know 
> up front how many nodes I'll have.
> 
> Clearly I'm not understanding how referencing works for lists like I've 
> created.  Can anyone shed light on this?
> 
> -- 
> David K Stevens, P.E., Ph.D., Professor
> Civil and Environmental Engineering
> Utah Water Research Laboratory
> 8200 Old Main Hill
> Logan, UT  84322-8200
> 435 797 3229 - voice
> 435 797 1363 - fax
> david.stevens@
> 
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


--
View this message in context: 
http://r.789695.n4.nabble.com/combining-data-structures-tp4356288p4356547.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] X11 error while plotting in R on OSX

2012-02-03 Thread MacQueen, Don

I can't reproduce your error in R 2.14.1 on an OSX 10.6.8 box.

However, your example doesn't start with a graphics device specification.
Did you start the graphics device with x11() before your first plot()
command? 

If by "R on a terminal" you mean in Terminal.app, then you wouldn't
normally get an X11() window:

> plot( c(.1,.3,.4), ylim=c(0,1))
> dev.list()
quartz 
 2 

Did you start R, for example, in an xterm shell within an X windows
context?
If so, you could try starting the graphics device with  x11(type='Xlib')
to see if it behaves better.

Finally, why are you using MacPorts? The binary download for CRAN should
work fine.

You should ask your question on r-sig-mac, although you might not get much
support for problems with the MacPorts version of R. I don't know.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062

On 2/1/12 2:14 PM, "Yasir Suhail"  wrote:

>I installed R 2.14.1 on a OSX 10.7.2 computer using MacPorts. If I fire up
>R on a terminal, this very simple session gives me errors. I attach
>sessionInfo() details as well. In addition to the warnings, I also do not
>get any plots on the X11 window after the warnings. So one cannot just
>ignore the warnings since the plots don't work anymore after this. I guess
>one could try to avoid this by not overwriting the plots, but it should
>not
>lead to errors fatal to the X11 graphics output of the R session. Is this
>reproducible on other machines and is this a bug?
>
>Thanks!
>
>> plot(c(.1,.3,.4),ylim=c(0,1))
>> dev.off()
>null device
>  1
>> plot(c(.1,.3,.4),ylim=c(0,1))
>> plot(c(.1,.3,.4),ylim=c(0,1))
>> dev.off()
>null device
>  1
>> plot(c(.1,.3,.4),ylim=c(0,1))
>Warning messages:
>1: In dev.flush() :
>  X11 protocol error: BadRequest (invalid request code or no such
>operation)
>2: In dev.flush() : X11 protocol error: BadGC (invalid GC parameter)
>3: In dev.flush() :
>  X11 protocol error: BadRequest (invalid request code or no such
>operation)
>4: In dev.flush() :
>  X11 protocol error: BadRequest (invalid request code or no such
>operation)
>> sessionInfo()
>R version 2.14.1 (2011-12-22)
>Platform: x86_64-apple-darwin11.0.0/x86_64 (64-bit)
>
>locale:
>[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/C
>
>attached base packages:
>[1] stats graphics  grDevices utils datasets  methods   base
>Warning message:
>X11 protocol error: BadRequest (invalid request code or no such operation)
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R: sample size package

2012-02-03 Thread IOANNA

Hello, 

 

Lets assume I have an ordinal response variable representing the
D<-c(D0,D1,D2,D3,D4) where D0 is no damage and D4 is collapse which I want
to correlate with a continuous predictor variable, wind speed at the
location of each building.

 

is  there a function in R which I can use to estimate the sample size of
buildings with a given power if I want to perform an ordinal logistic
regression? Is this sample the same if I want to use kernel smoothing? Is
this possible to estimate the sample size in D0,.,D4? Collapse is rather
rare.

 

Regards, 

Yanna 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] combining data structures

2012-02-03 Thread David Stevens

Group

It's unlikely I'm trying this the best way, but I'm trying to create a 
data structure from elements like

nNode = 2
nn = vector("list",nNode)

nn[[1]] =  list(Node = "1", Connect.up = c(NULL), Connect.down = c(2,3))
nn[[2]] =  list(Node = "2", Connect.up = c(1), Connect.down = c(4,5))
  #( and eventually many more nodes)

NodeList = as.data.frame(nn[[1]])
for(i in 2:nNode) {
   NodeList = rbind(NodeList,as.data.frame(nn[[i]]))
}

and is trying to create a data frame with many rows and three columns: 
Node, Connect.up,Connect.down
in which the Connect.up and Connect.down columns may be single numbers 
or vectors of numbers.  The above approach gives an error:

Error in data.frame(Node = "1", Connect.up = NULL, Connect.down = c(2,  :
   arguments imply differing number of rows: 1, 0, 2

My earlier try by brute force worked fine:

NodeList = as.data.frame(rbind(nn[[1]],nn[[2]]))

 > NodeList
   Node Connect.up Connect.down
11   NULL 2, 3
22  1 4, 5

and gives me what I want (many more rows eventually). But I want to do 
this generically from the problem context in a procedure so I won't know 
up front how many nodes I'll have.

Clearly I'm not understanding how referencing works for lists like I've 
created.  Can anyone shed light on this?

-- 
David K Stevens, P.E., Ph.D., Professor
Civil and Environmental Engineering
Utah Water Research Laboratory
8200 Old Main Hill
Logan, UT  84322-8200
435 797 3229 - voice
435 797 1363 - fax
david.stev...@usu.edu




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread jim holtman

Exactly what does "crashed" mean?  What was the error message?  How
you tried to put:

rm(Lines)
gc()

at the end of the loop to free up and compact memory?  If you watch
the performance, does the R process seem to be growing in terms of the
amount of memory that is being used?  You can add:

memory.size()

before the above statements to see how much memory is being used.
This is just some more elementary debugging that you will have to
learn when using any system.

On Fri, Feb 3, 2012 at 3:22 PM, HC  wrote:
> Bad news!
>
> The readLines command works fine upto a certain limit. Once a few files have
> been written the R program crashes.
>
> I used the following code:
> *
> iFile<-"Test.txt"
> con <- file(iFile, "r")
>
> N<-125;
> iLoop<-1
>
> while(length(Lines <- readLines(con, n = N)) > 0 & iLoop<41) {
> oFile<-paste("Split_",iLoop,".txt",sep="")
>  write.table(Lines, oFile, sep = "\t", quote = FALSE, col.names= FALSE,
> row.names = FALSE)
>  iLoop<-iLoop+1
> }
> close(con)
> 
>
> With above N=1.25 million, it wrote 28 files of about 57 mb each. That is a
> total of about 1.6 GB and then crashed.
> I tried with other values on N and it crashes at about the same place in
> terms of total size output, i.e., about 1.6 GB.
>
> Is this due to any limitation of Windows 7, in terms of not having the
> pointer after this size?
>
> Your insight would be very helpful.
>
> Thank you.
> HC
>
>
>
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355679.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPATIAL QUESTION: HOW TO MAKE POLYGONS AROUND CLUSTERS OF POINTS AND EXTRACT AREAS AND COORDINATES OF THESE POLYGONS?

2012-02-03 Thread MacQueen, Don

I would suggest asking this question on r-sig-geo.
-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 2/3/12 6:59 AM, "Bjørn Økland"  wrote:

>Imagine that I have a large number of points (given by coordinates x and
>y) that vary in density per space. For the purpose of demonstration it
>could be generated like this: s <-
>data.frame(x=runif(1,0,900),y=runif(1,0,900)); plot(s)
>
>I want to create polygons around the points where point density is
>greater than a selected threshold (for example, by using krieging or
>equivalent method). For these polygons, I want to have the centre
>coordinates and the size of the area for further use in analyses.
>
>I would be very grateful if I could be shown the R packages and functions
>I should use to accomplish this, and even some outline of the code. Is it
>possible?
>
>Best regards
>Bjørn
>
>
>   [[alternative HTML version deleted]]
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can i calculate the mean of my data which is only bigger than 75?

2012-02-03 Thread R. Michael Weylandt

If I understand you correctly, this is what I would do:

rowsWeWant <- apply(dataframename, 1, function(x) any(x[, c("u", "v",
"w1","w2")] > 75])
newDataFrame <- dataframename[rowsWeWant, ]
colMeans(abs(newDataFrame))

I don't really use the subset function, but I imagine you could
replace the first two lines with subset(dataframename, any(u > 75, v >
75, w1 > 75, w2 > 75), x:Z2)

For your other question, I'd do something like subset(dataframename, x
== max(x)) to select the rows with the max values for x. If you want
to do it with the apply() function family, it's not much harder.

Michael

On Thu, Feb 2, 2012 at 4:19 PM, Yakamu Yakamu  wrote:
>
> Hi michael, thanks, but here is more explanations of my questions to have 
> more help, (also pls have a look at the data below):
>
>
>
> Three questions to give more concrete help:
>
> i) Is your data set stored as a matrix or a data.frame
>
> My data is in a data frame
> ii) What are you trying to get the mean of -- all variables pooled or of each 
> variable independently?
>
> Actually I would like to have all the data for either u, v w1 or w2 that are 
> bigger than75, which then there we’ll have new data frame with either u,v, w1 
> or w2 bigger than 75, doesn’t matter x,y,z1 and z2, they will just follow 
> what ever the results would be (in this new data frame we should still have 
> x,y,z1,z1,u, v,w1 and w2, but only those with the values of u or v or w1 or 
> w2 that are bigger than 75.
>
>
> iii) When you say >=75 for all variables, do you mean only use a row
> if it's >=75 for each element or just only use the >=75 elements for
> each calculation independently.
>
> after we have the new data frame, then I would like to have the mean for x, 
> y, z1 and z2 (the absolute number, without taking consideration the negative 
> signs). If possible, Itshould have all the results altogether (mean of x=.., 
> y=… z1=.. and z2= …)and not one by one.
>
>
>
> Another question, if I would like to create a new data frame with only the 
> maximum data of x (for example if I have 0.456; -0.456; and many more of this 
> values as the maximum values of x ,How can I do it ? (withput taking 
> consideration of the negative signs)
>
>  I hope my questions are clear now.
>
> Thanks in advance,
>
> Yakamu
> Michael
>
>
>
> x
>
> y
>
> Z1
>
> Z2
>
> u
>
> v
>
> W1
>
> W2
>
> -0.0077
>
> -0.4665
>
> -0.0048
>
> -0.1302
>
> 70
>
> 26
>
> 59
>
> 54
>
> -0.0028
>
> -0.0055
>
> 0.0026
>
> -0.001
>
> 62
>
> 42
>
> 82
>
> 62
>
> -0.0123
>
> 0.006
>
> -0.003
>
> 0.0029
>
> 74
>
> 18
>
> 83
>
> 78
>
> 0.0232
>
> 0.0367
>
> 0.0028
>
> 0.0027
>
> 65
>
> 34
>
> 74
>
> 78
>
> -0.0075
>
> 0.1141
>
> -0.0018
>
> 0.0363
>
> 63
>
> 0
>
> 77
>
> 69
>
> 0.004
>
> -0.0032
>
> 0.0036
>
> -0.0156
>
> 14
>
> 40
>
> 70
>
> 64
>
> -0.003
>
> -0.0392
>
> -0.006
>
> -0.0212
>
> 55
>
> 42
>
> 63
>
> 69
>
> -0.0116
>
> -0.0028
>
> 0.0031
>
> 0.0209
>
> 59
>
> 23
>
> 69
>
> 35
>
> 0.0171
>
> -0.0496
>
> -0.0055
>
> 0.0118
>
> 35
>
> 57
>
> 73
>
> 42
>
> -0.0135
>
> -0.0324
>
> 0.0001
>
> 0.0004
>
> 55
>
> 45
>
> 57
>
> 55
>
> 0.0345
>
> 0.004
>
> 0.0041
>
> 0.0079
>
> 77
>
> 38
>
> 57
>
> 71
>
> -0.0206
>
> -0.0152
>
> 0.003
>
> 0.0104
>
> 55
>
> 30
>
> 56
>
> 81
>
> -0.0044
>
> 0.0343
>
> 0.0059
>
> 0.0105
>
> 74
>
> 52
>
> 58
>
> 75
>
> 0.0138
>
> -0.065
>
> 0.0016
>
> -0.0064
>
> 68
>
> 64
>
> 70
>
> 56
>
> -0.0303
>
> 0.0012
>
> -0.009
>
> 0.0025
>
> 66
>
> 32
>
> 42
>
> 52
>
> -0.0231
>
> 0.0379
>
> -0.0006
>
> 0.0116
>
> 70
>
> 49
>
> 61
>
> 34
>
> 0.0305
>
> 0.078
>
> -0.0081
>
> -0.0082
>
> 83
>
> 45
>
> 22
>
> 18
>
> -0.03
>
> 0.0978
>
> 0.0118
>
> 0.0103
>
> 88
>
> 25
>
> 31
>
> 68
>
> 0.0072
>
> -0.0019
>
> 0.0049
>
> 0.0055
>
> 79
>
> 50
>
> 67
>
> 71
>
>
>
> --- On Wed, 2/1/12, R. Michael Weylandt  wrote:
>
>
> From: R. Michael Weylandt 
> Subject: Re: [R] how can i calculate the mean of my data which is only bigger 
> than 75?
> To: "Yakamu Yakamu" 
> Cc: "r-help@r-project.org" 
> Date: Wednesday, February 1, 2012, 12:47 PM
>
> I'm not entirely sure what you mean, but it's likely one of these:
>
> apply(data, 2, function(x) mean(x[x>75]))
> mean(data[ apply(data,1, function(x) all(x > 75), ])
> mean(data[data>75])
>
> Three questions to give more concrete help:
>
> i) Is your data set stored as a matrix or a data.frame
> ii) What are you trying to get the mean of -- all variables pooled or
> of each variable independently?
> iii) When you say >=75 for all variables, do you mean only use a row
> if it's >=75 for each element or just only use the >=75 elements for
> each calculation independently.
>
> Michael
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading table data from PDF files

2012-02-03 Thread jim holtman

I think a lot would depend on exactly how the data is formatted.  I
have used 'pdf2text' converters (many freely available on the web) to
convert to text and then use R to read-in/preprocess the data to get
it into a format to process.

You can invoke these converter with the 'system' function and then
read the output file that is generated.  I would think that you would
have to have some custom code to then interpret the data in the text
file depending on how it was created.

So I am sure you can do it within R, with some auxiliary functions
that are called with 'system', without much trouble.

On Fri, Feb 3, 2012 at 4:11 PM, Bryan McCloskey  wrote:
> All,
>
> Is anyone familiar with a way to use R to read table data from a large 
> collection of PDF files? I'm aware there are various command lines and 
> desktop utilities that might be able to (e.g.,) dump PDFs to text, which 
> could then be parsed for table data. But I'm hoping there is something more 
> integrated that could be incorporated into R functions and scripts to handle 
> large batches of PDFs in a more automated fashion.
>
> Has anyone used R to extract large amounts of tabular data from PDF documents?
>
> -bryan
>
> --
> Bryan McCloskey, Ph.D.
> IT Specialist (Data Management/Internet)
> U.S. Geological Survey
> St. Petersburg Coastal & Marine Science Center
> 600 Fourth St. South
> St. Petersburg, FL 33701
>
> South Florida Information Access: http://sofia.usgs.gov
> Everglades Depth Estimation Network: http://sofia.usgs.gov/eden
> Phone: 727.803.8747x3017 * Fax: 727.803.2032
> --
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Having trouble controlling plot() output (e.g., color)

2012-02-03 Thread Sarah Goslee

Hi David,

You need to do some detective work.

The data that gives an odd result:
> str(bs_mean)
'data.frame':   13 obs. of  2 variables:
 $ month   : Factor w/ 13 levels "2011_01","2011_02",..: 1 2 3 4 5 6 7
8 9 10 ...
 $ bs$log_t: num  1.2 1.28 1.31 1.35 1.37 ...

A factor: no wonder it doesn't plot points as you requested.

?plot suggests that you need to look at methods(plot) to find out which
plot method is being used, and yes, there's a plot.factor

?plot.factor says:
 If ‘y’ is missing ‘barplot’ is produced.  For numeric ‘y’ a
 ‘boxplot’ is used, and for a factor ‘y’ a ‘spineplot’ is shown.
 For any other type of ‘y’ the next ‘plot’ method is called,
 normally ‘plot.default’.

You have a numeric y, but a factor x, so boxplot is being used.

?boxplot lists two relevant arguments:
  border: an optional vector of colors for the outlines of the
  boxplots.  The values in ‘border’ are recycled if the length
  of ‘border’ is less than the number of plots.

 col: if ‘col’ is non-null it is assumed to contain colors to be
  used to colour the bodies of the box plots. By default they
  are in the background colour.

You're trying to set col, but you only have one y value for each
value of x, so there's no body to the boxes. (Also why it looks like
neat horizontal lines rather than a box plot.)

But:
plot(bs_mean, border="green")
will make the outside lines green. Ta-dah!

So no, not glaringly obvious. But the difference in the two plots should
definitely have made you suspicious that you were missing something.
If you want to get the kind of plot you expected, this will work:
plot(as.numeric(bs_mean[,1]), bs_mean[,2], col="green")

But then you'll need to mess with the axis to get the labels you want, using
xaxt="n" in plot(), followed by axis().

Thanks for providing a clear problem statement and reproducible example.

Sarah

On Fri, Feb 3, 2012 at 5:28 PM, David Wolfskill  wrote:
> I expect that there's something glaringly obvious that I'm overlooking,
> as I'm justr getting back involved in using R after a several-month
> hiatus (from R).  So I welcome clues.
>
> When I invoke plot(), merely specifying a data.frame with 2 columns,
> specify the plot type ("type") of "p" ("points"), and that I want the
> point to be green ('col = "green"'), sometimes I get the expected
> result; other times I get horizontal black lines instead -- and he
> behavior appears to be consistent for a given data.frame, but I don't
> seem to be able to predict (for a new data.frame) which behavior I'll
> get ... and I'm beginning to wonder about what's left of my sanity. :-}
>
>> R.Version()
> $platform
> [1] "i386-portbld-freebsd8.2"
>
> $arch
> [1] "i386"
>
> $os
> [1] "freebsd8.2"
>
> $system
> [1] "i386, freebsd8.2"
>
> $status
> [1] ""
>
> $major
> [1] "2"
>
> $minor
> [1] "14.1"
>
> $year
> [1] "2011"
>
> $month
> [1] "12"
>
> $day
> [1] "22"
>
> $`svn rev`
> [1] "57956"
>
> $language
> [1] "R"
>
> $version.string
> [1] "R version 2.14.1 (2011-12-22)"
>
>> dump("foo", file = "")
> foo <-
> structure(list(X1.5 = 1:5, X6.10 = 6:10), .Names = c("X1.5",
> "X6.10"), row.names = c(NA, -5L), class = "data.frame")
>> dump("bs_mean", file = "")
> bs_mean <-
> structure(list(month = structure(1:13, .Label = c("2011_01",
> "2011_02", "2011_03", "2011_04", "2011_05", "2011_06", "2011_07",
> "2011_08", "2011_09", "2011_10", "2011_11", "2011_12", "2012_01"
> ), class = "factor"), `bs$log_t` = c(1.2026062533015, 1.27747221429551,
> 1.30908704746547, 1.35386015390552, 1.36891795176966, 1.50313159806506,
> 1.41951401509, 1.2575359904, 1.21365151487245, 1.33079015825995,
> 1.50334927085608, 1.39072924382553, 1.44966367892355)), .Names = c("month",
> "bs$log_t"), row.names = c(NA, -13L), class = "data.frame")
>> attributes(foo)
> $names
> [1] "X1.5"  "X6.10"
>
> $row.names
> [1] 1 2 3 4 5
>
> $class
> [1] "data.frame"
>
>> attributes(bs_mean)
> $names
> [1] "month"    "bs$log_t"
>
> $row.names
>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
>
> $class
> [1] "data.frame"
>
>> plot( bs_mean, type = "p", col = "green" )
>> plot( foo, type = "p", col = "green" )
>
>
> The first plot() invocation -- the one that has the data I actually
> care about, of course -- displays a set of 13 horizontal bars, each
> of which appears to be black.  [I actually *like* the horizontal
> bars; I'd like to be able to control the color, though.]
>
> The second invocation draws a set of 5 green "points" (as I would
> expect).
>
> How may I plot "bs_mean" in a non-black color?
>
> Thanks.
>
> Peace,
> david
> --

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] If and apply?

2012-02-03 Thread R. Michael Weylandt

Try this:

apply(gc2, 2, function(x) ifelse(x < 0.1*median(x), NA, x)}

ifelse is the vectorized operation you are probably looking for.

Michael

On Fri, Feb 3, 2012 at 6:17 PM,   wrote:
> Hello,
>
> I'm trying to replace any value within a column where the value is less than 
> 10% of the median of the column with NA.  In other words, if the median of 
> one column is 500, any value in that column that is less than 50 should 
> become NA.
>
> Doing a lot of searches, it seems like I should be using apply.  But when I 
> put the if statement inside like this, I get serious errors:
>
>> test<-apply(gc2,2,function(x){if(x<(0.1*median(x))), NA})
> Error: unexpected ',' in 
> "test<-apply(gc2,2,function(x){if(x<(0.1*median(x))),"
>> test<-apply(gc2,2,function(x){if(x<(0.1*median(x))) NA})
> There were 50 or more warnings (use warnings() to see the first 50)
> In if (x < (0.1 * median(x))) x <- NA :
>  the condition has length > 1 and only the first element will be used
>
> Trying
>> test<-apply(gc2,2,function(x){x[x<(0.1*median(x))]<- NA})
>> head(test)
> NA.01.N NA.01.T NA.02.N NA.02.T NA.03.N NA.03.T
>     NA      NA      NA      NA      NA      NA
>
>
> I'm sure that I could get it to work if I read each column individually into 
> a vector, calculate the median, replace the values if less than 0.1*median, 
> then rebind it into a new matrix.  However, I'm hoping there is a more 
> elegant way.
>
> Do you have any suggestions?
>
>
> Thank you for your help.
> Rose
>
>
>     =
>
>     Please note that this e-mail and any files transmitted from
>     Memorial Sloan-Kettering Cancer Center may be privileged, confidential,
>     and protected from disclosure under applicable law. If the reader of
>     this message is not the intended recipient, or an employee or agent
>     responsible for delivering this message to the intended recipient,
>     you are hereby notified that any reading, dissemination, distribution,
>     copying, or other use of this communication or any of its attachments
>     is strictly prohibited.  If you have received this communication in
>     error, please notify the sender immediately by replying to this message
>     and deleting this message, any attachments, and all copies and backups
>     from your computer.
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] If and apply?

2012-02-03 Thread David Winsemius



On Feb 3, 2012, at 6:17 PM, brann...@mskcc.org wrote:


Hello,

I'm trying to replace any value within a column where the value is  
less than 10% of the median of the column with NA.  In other words,  
if the median of one column is 500, any value in that column that is  
less than 50 should become NA.


Doing a lot of searches, it seems like I should be using apply.  But  
when I put the if statement inside like this, I get serious errors:


You didn't show str(gc2) or show us how it might be built.




test<-apply(gc2,2,function(x){if(x<(0.1*median(x))), NA})
Error: unexpected ',' in "test<-apply(gc2,2,function(x) 
{if(x<(0.1*median(x))),"


You want ifelse() rather than if(). You also will need to look more  
carefully at how 'ifelse' is designed.




test<-apply(gc2,2,function(x){if(x<(0.1*median(x))) NA})

There were 50 or more warnings (use warnings() to see the first 50)
In if (x < (0.1 * median(x))) x <- NA :
 the condition has length > 1 and only the first element will be used

Trying

test<-apply(gc2,2,function(x){x[x<(0.1*median(x))]<- NA})
head(test)

NA.01.N NA.01.T NA.02.N NA.02.T NA.03.N NA.03.T
NA  NA  NA  NA  NA  NA


That probably indicates that you need to look more carefully at ? 
median because it appears you have at least one NA in each of your   
gc2 columns.





I'm sure that I could get it to work if I read each column  
individually into a vector, calculate the median, replace the values  
if less than 0.1*median, then rebind it into a new matrix.  However,  
I'm hoping there is a more elegant way.


Perhaps the sweep function?


[[alternative HTML version deleted]]


You should read the Posting Guide and learn how to post plain text.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Simulating from "matrix variate normal distribution"

2012-02-03 Thread Shantanu MULLICK

Hello everyone

Is there a function/command to simulate from "matrix variate normal
distribution" in R.

A follow up question would be is there a function/command to obtain the
density, distribution and quantile function of "matrix variate normal
distribution" in R.

Wikipedia has a good description of "matrix variate normal distribution"
which is also alternatively called "matrix normal distribution".

Thanks a lot !

Best
Shantanu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Having trouble controlling plot() output (e.g., color)

2012-02-03 Thread David Wolfskill

I expect that there's something glaringly obvious that I'm overlooking,
as I'm justr getting back involved in using R after a several-month
hiatus (from R).  So I welcome clues.

When I invoke plot(), merely specifying a data.frame with 2 columns,
specify the plot type ("type") of "p" ("points"), and that I want the
point to be green ('col = "green"'), sometimes I get the expected
result; other times I get horizontal black lines instead -- and he
behavior appears to be consistent for a given data.frame, but I don't
seem to be able to predict (for a new data.frame) which behavior I'll
get ... and I'm beginning to wonder about what's left of my sanity. :-}

> R.Version()
$platform
[1] "i386-portbld-freebsd8.2"

$arch
[1] "i386"

$os
[1] "freebsd8.2"

$system
[1] "i386, freebsd8.2"

$status
[1] ""

$major
[1] "2"

$minor
[1] "14.1"

$year
[1] "2011"

$month
[1] "12"

$day
[1] "22"

$`svn rev`
[1] "57956"

$language
[1] "R"

$version.string
[1] "R version 2.14.1 (2011-12-22)"

> dump("foo", file = "")
foo <-
structure(list(X1.5 = 1:5, X6.10 = 6:10), .Names = c("X1.5", 
"X6.10"), row.names = c(NA, -5L), class = "data.frame")
> dump("bs_mean", file = "")
bs_mean <-
structure(list(month = structure(1:13, .Label = c("2011_01", 
"2011_02", "2011_03", "2011_04", "2011_05", "2011_06", "2011_07", 
"2011_08", "2011_09", "2011_10", "2011_11", "2011_12", "2012_01"
), class = "factor"), `bs$log_t` = c(1.2026062533015, 1.27747221429551, 
1.30908704746547, 1.35386015390552, 1.36891795176966, 1.50313159806506, 
1.41951401509, 1.2575359904, 1.21365151487245, 1.33079015825995, 
1.50334927085608, 1.39072924382553, 1.44966367892355)), .Names = c("month", 
"bs$log_t"), row.names = c(NA, -13L), class = "data.frame")
> attributes(foo)
$names
[1] "X1.5"  "X6.10"

$row.names
[1] 1 2 3 4 5

$class
[1] "data.frame"

> attributes(bs_mean)
$names
[1] "month""bs$log_t"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13

$class
[1] "data.frame"

> plot( bs_mean, type = "p", col = "green" )
> plot( foo, type = "p", col = "green" )


The first plot() invocation -- the one that has the data I actually
care about, of course -- displays a set of 13 horizontal bars, each
of which appears to be black.  [I actually *like* the horizontal
bars; I'd like to be able to control the color, though.]

The second invocation draws a set of 5 green "points" (as I would
expect).

How may I plot "bs_mean" in a non-black color?

Thanks.

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgpGZoaxykO8N.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] If and apply?

2012-02-03 Thread brannona

Hello,

I'm trying to replace any value within a column where the value is less than 
10% of the median of the column with NA.  In other words, if the median of one 
column is 500, any value in that column that is less than 50 should become NA.

Doing a lot of searches, it seems like I should be using apply.  But when I put 
the if statement inside like this, I get serious errors:

> test<-apply(gc2,2,function(x){if(x<(0.1*median(x))), NA})
Error: unexpected ',' in "test<-apply(gc2,2,function(x){if(x<(0.1*median(x))),"
> test<-apply(gc2,2,function(x){if(x<(0.1*median(x))) NA})
There were 50 or more warnings (use warnings() to see the first 50)
In if (x < (0.1 * median(x))) x <- NA :
  the condition has length > 1 and only the first element will be used

Trying
> test<-apply(gc2,2,function(x){x[x<(0.1*median(x))]<- NA})
> head(test)
NA.01.N NA.01.T NA.02.N NA.02.T NA.03.N NA.03.T
 NA  NA  NA  NA  NA  NA


I'm sure that I could get it to work if I read each column individually into a 
vector, calculate the median, replace the values if less than 0.1*median, then 
rebind it into a new matrix.  However, I'm hoping there is a more elegant way.

Do you have any suggestions?


Thank you for your help.
Rose

 
 =
 
 Please note that this e-mail and any files transmitted from
 Memorial Sloan-Kettering Cancer Center may be privileged, confidential,
 and protected from disclosure under applicable law. If the reader of
 this message is not the intended recipient, or an employee or agent
 responsible for delivering this message to the intended recipient,
 you are hereby notified that any reading, dissemination, distribution, 
 copying, or other use of this communication or any of its attachments
 is strictly prohibited.  If you have received this communication in 
 error, please notify the sender immediately by replying to this message
 and deleting this message, any attachments, and all copies and backups
 from your computer.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread peter dalgaard

On Feb 3, 2012, at 18:03 , G See wrote:

> On Fri, Feb 3, 2012 at 10:39 AM, peter dalgaard  wrote:
>> 
>> So that's a nonbreak space alright. Next question: How did it get there? I'm 
>> mildly surprised that it crept into the data frame, I would expect it to 
>> happen much easier with things typed on the keyboard (Alt-Spc on my Mac 
>> keyboard, e.g.).
>> 
> 
> Peter,
> I won't venture to guess how, but this will do it.
> 
>> library(XML)
>> x <- readHTMLTable("http://earnings.com/company.asp?client=cb&ticker=GOOG";, 
>> stringsAsFactors=FALSE)[[21]]
>> charToRaw(x[28, 4])
> [1] 6e 2f 61 c2 a0
> 
> Garrett

OK, if you look at the source for that page, it actually contains stuff like

n/a 

and   is the infamous \uA0 alias nonbreak space. So the odd thing might 
actually be that the Mac manages to lose the trailing nonbreak space, whereas 
other systems do not. AFAICS, this boils down to the matching of [[:space:]] 
inside

> XML:::trim
function (x) 
gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)

A locale dependency, perhaps?

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ordering of factor levels in regression changes result

2012-02-03 Thread David Winsemius



On Feb 3, 2012, at 4:16 PM, Tulinsky, Thomas wrote:

I was surprised to find that just changing the base level of a  
factor variable changed the number of significant coefficients in  
the solution.


I was surprised at this and want to know how I should choose the  
order of the factors, if the order affects the result.


In the first model you are getting R's default contrast between the  
"control" levels and each of the other levels, while in the second you  
are getting contrasts between N25 and the others. I would think that  
the most interest would be on the first set of results , but it could  
also be that you are not testing for what your really want. Is it  
scientifically interesting to consider the ordinal scale of effects?  
Perhaps you should be looking at a linear or quadratic fit?


Looking at the text you cite, it becomes clear that you need to read  
the rest of the chapter before submitting questions to R-help.





Here is the small example. It is taken from 'The R Book', Crawley   
p. 365.


The data is at
http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/competition.txt

In R


comp<-read.table("C:\\Temp\\competition.txt", header=T)



attach(comp)


Data has dependent variable 'biomass' and different types of  
'clipping' that were done:

Control (none), n25, n50, r10, r5:


summary(comp)

  biomass clipping
  Min.   :415.0   control:6
   1st Qu.:508.8   n25:6
   Median :568.0   n50:6
   Mean   :561.8   r10:6
   3rd Qu.:631.8   r5 :6
   Max.   :731.0

List mean Biomass of each type of Clipping:


aggregate (comp$biomass, list (comp$clipping) , mean)

Group.1x
   control 465.1667
   n25 553.
   n50 569.
   r10 610.6667
r5 610.5000

do regression - get same result as book p. 365
Clipping type 'control' is not in list of coefficients because it is  
first alphabetically so it is folded into Intercept


In this case there are no other covariates,  so it is not so much  
folded into the intercept as it really IS the "Intercept".





model<-lm(biomass ~ clipping)
summary(model)


  Call:
  lm(formula = biomass ~ clipping)

  Residuals:
   Min   1Q   Median   3Q  Max
  -103.333  -49.6673.417   43.375  177.667

  Coefficients:
  Estimate Std. Error t value Pr(>|t|)
  (Intercept)   465.17  28.75  16.177  9.4e-15 ***
  clippingn2588.17  40.66   2.168  0.03987 *
  clippingn50   104.17  40.66   2.562  0.01683 *
  clippingr10   145.50  40.66   3.578  0.00145 **
  clippingr5145.33  40.66   3.574  0.00147 **
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  Residual standard error: 70.43 on 25 degrees of freedom
  Multiple R-squared: 0.4077, Adjusted R-squared: 0.3129
  F-statistic: 4.302 on 4 and 25 DF,  p-value: 0.008752


Relevel - make 'n25' the base level of Clipping:


comp$clipping <- relevel (comp$clipping, ref="n25")



summary(comp)

  biomass clipping
   Min.   :415.0   n25:6
   1st Qu.:508.8   control:6
   Median :568.0   n50:6
   Mean   :561.8   r10:6
   3rd Qu.:631.8   r5 :6
   Max.   :731.0

Redo LM with releveled data


modelRelev<-lm(biomass~clipping, data=comp)


Different results. (Some parts, Residuals and Std Errors,  are the  
same)

Especially note the Pr and Signifcance columns are different.


summary(modelRelev)


  Call:
  lm(formula = biomass ~ clipping, data = comp)

  Residuals:
   Min   1Q   Median   3Q  Max
  -103.333  -49.6673.417   43.375  177.667

  Coefficients:
  Estimate Std. Error t value Pr(>|t|)
  (Intercept)   553.33  28.75  19.244   <2e-16 ***
  clippingcontrol   -88.17  40.66  -2.168   0.0399 *
  clippingn5016.00  40.66   0.393   0.6973
  clippingr1057.33  40.66   1.410   0.1709
  clippingr5 57.17  40.66   1.406   0.1721
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  Residual standard error: 70.43 on 25 degrees of freedom
  Multiple R-squared: 0.4077, Adjusted R-squared: 0.3129
  F-statistic: 4.302 on 4 and 25 DF,  p-value: 0.008752

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ordering of factor levels in regression changes result

2012-02-03 Thread Tulinsky, Thomas

I was surprised to find that just changing the base level of a factor variable 
changed the number of significant coefficients in the solution.

I was surprised at this and want to know how I should choose the order of the 
factors, if the order affects the result.

Here is the small example. It is taken from 'The R Book', Crawley  p. 365. 

The data is at 
http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/competition.txt

In R

   > comp<-read.table("C:\\Temp\\competition.txt", header=T)

   > attach(comp)

Data has dependent variable 'biomass' and different types of 'clipping' that 
were done:
Control (none), n25, n50, r10, r5:

   >  summary(comp)
   biomass clipping
   Min.   :415.0   control:6  
1st Qu.:508.8   n25:6  
Median :568.0   n50:6  
Mean   :561.8   r10:6  
3rd Qu.:631.8   r5 :6  
Max.   :731.0  

List mean Biomass of each type of Clipping:

   > aggregate (comp$biomass, list (comp$clipping) , mean)
 Group.1x
control 465.1667
n25 553.
n50 569.
r10 610.6667
 r5 610.5000

do regression - get same result as book p. 365
Clipping type 'control' is not in list of coefficients because it is first 
alphabetically so it is folded into Intercept

   > model<-lm(biomass ~ clipping)
   > summary(model)

   Call:
   lm(formula = biomass ~ clipping)

   Residuals:
Min   1Q   Median   3Q  Max 
   -103.333  -49.6673.417   43.375  177.667 

   Coefficients:
   Estimate Std. Error t value Pr(>|t|)
   (Intercept)   465.17  28.75  16.177  9.4e-15 ***
   clippingn2588.17  40.66   2.168  0.03987 *  
   clippingn50   104.17  40.66   2.562  0.01683 *  
   clippingr10   145.50  40.66   3.578  0.00145 ** 
   clippingr5145.33  40.66   3.574  0.00147 ** 
   ---
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

   Residual standard error: 70.43 on 25 degrees of freedom
   Multiple R-squared: 0.4077, Adjusted R-squared: 0.3129 
   F-statistic: 4.302 on 4 and 25 DF,  p-value: 0.008752 


Relevel - make 'n25' the base level of Clipping:

   > comp$clipping <- relevel (comp$clipping, ref="n25")

   > summary(comp)
   biomass clipping
Min.   :415.0   n25:6  
1st Qu.:508.8   control:6  
Median :568.0   n50:6  
Mean   :561.8   r10:6  
3rd Qu.:631.8   r5 :6  
Max.   :731.0  

Redo LM with releveled data

   > modelRelev<-lm(biomass~clipping, data=comp)

Different results. (Some parts, Residuals and Std Errors,  are the same)
Especially note the Pr and Signifcance columns are different.

   > summary(modelRelev)

   Call:
   lm(formula = biomass ~ clipping, data = comp)

   Residuals:  
Min   1Q   Median   3Q  Max 
   -103.333  -49.6673.417   43.375  177.667 

   Coefficients:
   Estimate Std. Error t value Pr(>|t|)
   (Intercept)   553.33  28.75  19.244   <2e-16 ***
   clippingcontrol   -88.17  40.66  -2.168   0.0399 *  
   clippingn5016.00  40.66   0.393   0.6973
   clippingr1057.33  40.66   1.410   0.1709
   clippingr5 57.17  40.66   1.406   0.1721
   ---
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

   Residual standard error: 70.43 on 25 degrees of freedom
   Multiple R-squared: 0.4077, Adjusted R-squared: 0.3129 
   F-statistic: 4.302 on 4 and 25 DF,  p-value: 0.008752

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [fields] image.plot abends with NAs in image.plot.info

2012-02-03 Thread Doug Nychka

Tom, 

Before we go into this in detail can you double check that your second data 
image
is OK. 

Just try something like image( z)
where z is the matrix of values for the  second image.   ( the $z component of 
an image list)
My quick guess is that the second image is all NAs due to a few missing in the 
first image. 


Doug

- 
Doug Nychka,

Institute for Mathematics Applied to Geosciences
National Center for Atmospheric Research 
Boulder, CO 
Email: nychka "AT" ucar "DOT" edu  Web: www.image.ucar.edu/~nychka
Voice: 303-497-1711 FAX: 303-497-1298   Business Cell: 303-725-3199


On Feb 3, 2012, at 9:16 AM, Tom Roche wrote:

> 
> summary: image.plot-ing two sets of netCDF data, with the second
> derived from the first. First plots to PDF as expected (title, data,
> legend). Second plots the data and title, but abends before drawing
> the legend, with
> 
>> Error in if (del == 0 && to == 0) return(to) : 
>>  missing value where TRUE/FALSE needed
>> Calls: plot.layers.for.timestep -> image.plot -> seq -> seq.default
> 
> Debugging shows image.plot.info(...) is returning
> 
>> Browse[2]> info
>> $xlim
>> [1] NA
> 
>> $ylim
>> [1] NA
> 
>> $zlim
>> [1] NA
> 
>> $poly.grid
>> [1] FALSE
> 
> details:
> 
> (Hopefully the following is not too long-winded; I'm just trying to be
> complete.) I'm running on a cluster (where I don't have root) with
> 
> me@it4:~ $ lsb_release -a
> LSB Version:  
> :core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noarch
> Distributor ID:   RedHatEnterpriseServer
> Description:  Red Hat Enterprise Linux Server release 5.4 (Tikanga)
> Release:  5.4
> Codename: Tikanga
> me@it4:~ $ uname -a
> Linux it4 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 
> x86_64 GNU/Linux
> me@it4:~ $ R
> R version 2.14.0 (2011-10-31)
> ...
>> library(help = fields)
> ...
> Package:fields
> Version:6.6.2
> Date:   November 16, 2011
> ...
> Maintainer: Doug Nychka 
> 
> I have an IOAPI (netCDF classic plus spatial metadata) dataset with
> structure {cols, rows, layers, timestep=1}, where {cols,rows}
> represent a land-cover grid. The layers are sparse, in that all have
> some NAs; some have all NAs--call those "trivial"). The non-trivial
> layers have a problem: data was logged so that values sum. (I.e.,
> instead of logging value=a in gridcell [i,j] and value=b in the "next"
> non-NA gridcell[i+m,j+n], value(gridcell[i+m,j+n]) = a+b.) I wrote an
> R routine that "demonotonicizes" (since all data >= 0) the source
> data, writing to a new "fixed" or "target" file with the same
> structure as the source. As part of the routine I check that each
> target layer contains values s.t.
> 
> * â target values v: (v > 0) || is.na(v)
> * âi,j: is.na(value(source[i,j])) â is.na(value(target[i,j]))
> 
> I also want, for each layer, to plot both the source and the target.
> My plot code is like
> 
> plot.layers.for.timestep <- function(source.file, source.datavar,
>  target.datavar, datavar.name, datavar.n.layers, colors, map) {
>  # Get the grid origin, cell sizes, cell centers, etc
>  # ...
> 
>  pdf("compare.source.target.pdf", height=3.5, width=5, pointsize=1)
>  for (i.layer in 1:datavar.n.layers) {
># plot the source data
> # debugging
>print(paste('Non-null image.plot for source layer=', i.layer))
># for non-trivial layers
>if (sum(!is.na(source.datavar[,,i.layer]))) {
>  image.plot(x.cell.centers.km, y.cell.centers.km,
> source.datavar[,,i.layer],
> xlab="", ylab="", axes=F, col=colors(100),
> main=paste("Source: ", datavar.name, ",
> Layer: ", i.layer, sep="")
>  )
>  lines(map)
>} else { # trivial layers
> ...
>}
># plot the fixed data
> # debugging
>print(paste('Non-null image.plot for target layer=', i.layer))
>debug(image.plot)
># for non-trivial layers
>if (sum(!is.na(target.datavar[,,i.layer]))) {
>  image.plot(x.cell.centers.km, y.cell.centers.km, xlab="", ylab="", 
> target.datavar[,,i.layer], axes=F, col=colors(100),
> main=paste("Target: ", datavar.name,", Layer: ", i.layer, 
> sep=""))
>  lines(map)
>} else { # trivial layers
> ...
>}
>  }
>  dev.off()
> } # end function plot.layers.for.timestep.fun
> 
> The first layer is non-trivial, and the source layer plots to
> ./compare.source.target.pdf as expected: data, title, legend. 
> Then the target title and data plot, but abends before drawing
> the legend, with
> 
>> Error in if (del == 0 && to == 0) return(to) : 
>>  missing value where TRUE/FALSE needed
>> Calls: plot.layers.for.timestep -> image.plot -> seq -> seq.default
> 
> Being a relatively new R user, I read Peng's "Introduction to the
> Interactive Debugging Tools in R" (though 10 years old, everything

Re: [R] Hanging -- please help decipher event report

2012-02-03 Thread jeremyd

Thanks for your reply. At line 144 I call R_CheckUserInterrupt(). I added
this because otherwise, R would not respond to the user for thing like
switching over to the window until the C code exited back to R. I then moved
that command into a loop where it should occur quite frequently, because I
thought this would fix my hanging problem. I admit that I may be using this
command in an inefficient or flat-out wrong way, as this is my first foray
into R/C cooperation. 

I should add that I cannot get the C code to hang or produce any errors when
I run it on its own.

I would appreciate any additional thoughts or questions.

--
View this message in context: 
http://r.789695.n4.nabble.com/Hanging-please-help-decipher-event-report-tp4355128p4355709.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] zeroinfl package - Year effect and precision

2012-02-03 Thread Lee, Laura

Hello. I am using the zeroinfl package to fit a zero-inflated negative 
binomial. The explanatory variables are Year and Depth x STemp (interaction). I 
am in need of guidance for extracting the year effect and the associated 
precision.

Thank you for your time.

Cheers,

Laura
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] incomplete final line found on

2012-02-03 Thread Dimitri Liakhovitski

Thank you, Michael - it worked - it was exactly what I was looking for.
Thank you, David - I added the link to my toolbar - and sorry, you are
right, I should have searched more.
Dimitri

On Fri, Feb 3, 2012 at 4:52 PM, David Winsemius  wrote:
> Dimitri.
>
> This has been asked a whole bunch of times on this list. Do a search on the
> text in the error message if you doubt me. I have this link on my toolbar:
>
> R-search:
> http://search.r-project.org/cgi-bin/namazu.cgi?query=&max=100&result=normal&sort=score&idxname=Rhelp10&idxname=Rhelp08&idxname=Rhelp02&idxname=functions
>
> --
> David.
>
>
> On Feb 3, 2012, at 4:28 PM, R. Michael Weylandt wrote:
>
>> Try opening the file up in a text editor and inserting a blank line or
>> two on the end. (There's either an EOL or EOF character missing and
>> this trick usually works for me -- never sure why/when it happens
>> though)
>>
>> Michael
>>
>> On Fri, Feb 3, 2012 at 4:23 PM, Dimitri Liakhovitski
>>  wrote:
>>>
>>> Dear R-ers,
>>>
>>> I hope there is a really simple solution to my problem.
>>> I've written a function that I saved in an .r file. I source this file
>>> in my code. For a while it worked fine. But then when I run the line:
>>>
>>> source("F mylineplot.r")
>>>
>>> I started getting a warning:
>>> In readLines(file) : incomplete final line found on 'F mylineplot.r'
>>>
>>> I have no idea why - I tried to check and to recheck what's going on,
>>> but am not finding anything.
>>> The code works both when I try to run it NOT as a function and when I
>>> run it AS a function. So why the warning message?
>>>
>>> Just in case - the text of my function inside my file that I source. I
>>> really don't expect anyone to dig into it - but maybe something will
>>> jump at you?
>>> Thanks a lot!
>>> Dimitri
>>>
>>>
>>> ### Creating a plot with (aggregated) several lines:
>>> # indata - my data frame
>>> # datesvar - name of the variable that contains dates
>>> # inars - names of the variables to be graphed
>>> # myfunction - function to be used (mean or sum)
>>> # my metric - string for the metric
>>> # mytitle - title of the graph
>>> # fixedy - if 1, range on y axis starts with zero
>>> # indata=en;datesvar="Week";invars=seas[5];myfunction=mean
>>> # mymetric="TEST";fixedy=0;title="BLA"
>>>
>>> mylines =
>>> function(indata,datesvar,invars,myfunction,mymetric,mytitle,fixedy=0)
>>> {
>>>
>>>
>>>  all.colors<-c("#E0","#CD","#D4D4D4","#FFC1C1","#FFDEAD","#9ACD32",
>>>       "#99CCFF","#6495ED","#66CDAA","#EEC900","#BC8F8F",
>>>  "#C0","#696969","#473C8B","#8B4500",     "#FF7F00","#9370DB",
>>>       "#80","#104E8B","#228B22")[20:1]
>>>
>>>
>>>  myagg<-aggregate(indata[invars],by=indata[datesvar],FUN=myfunction)
>>>  yrange=range(pretty(as.matrix(myagg[2:length(myagg)])))
>>>  if(fixedy==0){
>>>   ymin<-yrange[1]
>>>   ymax<-yrange[2]} else {
>>>   ymin<-0
>>>   ymax<-yrange[2]}
>>>  ydistance<-ymax-ymin
>>>  if(ydistance>0.1 & ydistance<=1){mystep<-0.1} else {
>>>   if(ydistance>1 & ydistance<=10){mystep<-1/2} else {
>>>       if(ydistance>10 & ydistance<=100) {mystep<-10/5} else {
>>>       if(ydistance>100 & ydistance<=1000) {mystep<-100/5} else {
>>>         if(ydistance>1000 & ydistance<=1) {mystep<-1000/2} else {
>>>           if(ydistance>1 & ydistance<=10) {mystep<-1/5} else
>>> {
>>>             mystep<-10/5
>>>           }
>>>         }
>>>       }
>>>     }
>>>   }
>>>  }
>>>  nr.of.dates<-length(myagg[[datesvar]]); index<-seq(1,nr.of.dates,2)
>>>  par(bg = "white")
>>>
>>>  plot(x=myagg[[datesvar]],y=myagg[,2],ylim=c(ymin,ymax),col=all.colors[1],type='l',
>>>         ylab=mymetric,xlab="",lwd=2,xaxt='n',yaxt='n',main=mytitle)
>>>  mycolors<-1
>>>  for(i in 2:length(invars)){
>>>   mycolors<-c(mycolors,(i))
>>>  }
>>>
>>>  axis(1, labels =format(as.Date(myagg[[datesvar]][index],
>>> origin="1970-01-01"), "%Y-%m-%d"),
>>>               at=myagg[[datesvar]][index], las=2,cex.axis=0.8)
>>>  axis(2,
>>> labels=seq(ymin,ymax,by=mystep),at=seq(ymin,ymax,by=mystep),las=1,cex.axis=0.9)
>>>  abline(v=myagg[[datesvar]][index],lty="dotted",col = "lightgray")
>>> #  abline(h=seq(ymin,ymax,by=mystep), lty="dotted",col = "lightgray")
>>>
>>>  legend("topleft",inset=0,legend=invars,fill=all.colors[mycolors],horiz=T,bg="white",cex=1)
>>> # ?plot
>>>
>>>
>>>  points(myagg[[datesvar]],myagg[[invars[1]]],type="l",lwd=3,lty=i,col=all.colors[1])
>>>  for(i in 2:length(invars)){
>>>
>>> points(myagg[[datesvar]],myagg[[invars[i]]],type="l",lwd=2,lty=1,col=all.colors[i])
>>> # or lty=i
>>>   mycolors<-c(mycolors,(i))
>>>  }
>>>  return(myagg)
>>> }
>>>
>>>
>>> --
>>> Dimitri Liakhovitski
>>> marketfusionanalytics.com
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> __

Re: [R] incomplete final line found on

2012-02-03 Thread David Winsemius


Dimitri.

This has been asked a whole bunch of times on this list. Do a search  
on the text in the error message if you doubt me. I have this link on  
my toolbar:


R-search:
http://search.r-project.org/cgi-bin/namazu.cgi?query=&max=100&result=normal&sort=score&idxname=Rhelp10&idxname=Rhelp08&idxname=Rhelp02&idxname=functions

--
David.

On Feb 3, 2012, at 4:28 PM, R. Michael Weylandt wrote:


Try opening the file up in a text editor and inserting a blank line or
two on the end. (There's either an EOL or EOF character missing and
this trick usually works for me -- never sure why/when it happens
though)

Michael

On Fri, Feb 3, 2012 at 4:23 PM, Dimitri Liakhovitski
 wrote:

Dear R-ers,

I hope there is a really simple solution to my problem.
I've written a function that I saved in an .r file. I source this  
file

in my code. For a while it worked fine. But then when I run the line:

source("F mylineplot.r")

I started getting a warning:
In readLines(file) : incomplete final line found on 'F mylineplot.r'

I have no idea why - I tried to check and to recheck what's going on,
but am not finding anything.
The code works both when I try to run it NOT as a function and when I
run it AS a function. So why the warning message?

Just in case - the text of my function inside my file that I  
source. I

really don't expect anyone to dig into it - but maybe something will
jump at you?
Thanks a lot!
Dimitri


### Creating a plot with (aggregated) several lines:
# indata - my data frame
# datesvar - name of the variable that contains dates
# inars - names of the variables to be graphed
# myfunction - function to be used (mean or sum)
# my metric - string for the metric
# mytitle - title of the graph
# fixedy - if 1, range on y axis starts with zero
# indata=en;datesvar="Week";invars=seas[5];myfunction=mean
# mymetric="TEST";fixedy=0;title="BLA"

mylines =  
function(indata,datesvar,invars,myfunction,mymetric,mytitle,fixedy=0)

{

 all.colors<- 
c("#E0","#CD","#D4D4D4","#FFC1C1","#FFDEAD","#9ACD32",

   "#99CCFF","#6495ED","#66CDAA","#EEC900","#BC8F8F",
  "#C0","#696969","#473C8B","#8B4500", "#FF7F00","#9370DB",
   "#80","#104E8B","#228B22")[20:1]


 myagg<-aggregate(indata[invars],by=indata[datesvar],FUN=myfunction)
 yrange=range(pretty(as.matrix(myagg[2:length(myagg)])))
 if(fixedy==0){
   ymin<-yrange[1]
   ymax<-yrange[2]} else {
   ymin<-0
   ymax<-yrange[2]}
 ydistance<-ymax-ymin
 if(ydistance>0.1 & ydistance<=1){mystep<-0.1} else {
   if(ydistance>1 & ydistance<=10){mystep<-1/2} else {
   if(ydistance>10 & ydistance<=100) {mystep<-10/5} else {
   if(ydistance>100 & ydistance<=1000) {mystep<-100/5} else {
 if(ydistance>1000 & ydistance<=1) {mystep<-1000/2}  
else {
   if(ydistance>1 & ydistance<=10)  
{mystep<-1/5} else {

 mystep<-10/5
   }
 }
   }
 }
   }
 }
 nr.of.dates<-length(myagg[[datesvar]]); index<-seq(1,nr.of.dates,2)
 par(bg = "white")
 plot(x=myagg[[datesvar]],y=myagg[, 
2],ylim=c(ymin,ymax),col=all.colors[1],type='l',

 ylab=mymetric,xlab="",lwd=2,xaxt='n',yaxt='n',main=mytitle)
 mycolors<-1
 for(i in 2:length(invars)){
   mycolors<-c(mycolors,(i))
 }

 axis(1, labels =format(as.Date(myagg[[datesvar]][index],
origin="1970-01-01"), "%Y-%m-%d"),
   at=myagg[[datesvar]][index], las=2,cex.axis=0.8)
 axis(2,  
labels 
= 
seq 
(ymin,ymax,by=mystep),at=seq(ymin,ymax,by=mystep),las=1,cex.axis=0.9)

 abline(v=myagg[[datesvar]][index],lty="dotted",col = "lightgray")
#  abline(h=seq(ymin,ymax,by=mystep), lty="dotted",col = "lightgray")
  
legend 
("topleft 
",inset 
=0,legend=invars,fill=all.colors[mycolors],horiz=T,bg="white",cex=1)

# ?plot

  
points 
(myagg 
[[datesvar 
]],myagg[[invars[1]]],type="l",lwd=3,lty=i,col=all.colors[1])

 for(i in 2:length(invars)){

points 
(myagg 
[[datesvar 
]],myagg[[invars[i]]],type="l",lwd=2,lty=1,col=all.colors[i])

# or lty=i
   mycolors<-c(mycolors,(i))
 }
 return(myagg)
}


--
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GAM (mgcv) warning: matrix not positive definite

2012-02-03 Thread Simon Wood


It is completely safe to ignore this. Here is what is going on...

mgcv routine 'mroot' is calling R routine 'chol' to find the *pivoted* 
Choleski factor of a positive semi definite matrix. This is deliberate, 
and completely ok to do, but 'chol' issues a warning when a matrix is 
only positive semi-definite (as opposed to strictly +ve def), even if 
pivoting has been requested. 'mroot' therefore suppresses the warning.


best,
Simon

On 03/02/12 20:38, Arnaud Mosnier wrote:

Dear list,


I fitted the same GAM model using directly the function gam(mgcv) ...
then as a parameter of another function that capture the warnings
messages (see below).
In the first case, there is no warning message printed, but in the last
one, the function find two warning messages stating "matrix not positive
definite"

So my question is: Do I have to worry about those warnings and then why
are they not printed in the simple use of the gam function.

#

Here is some further description:

## Simple use of gam

gam(USE ~ X1 + s(X2) + s(X3), family = binomial, data = data,
method="REML") # print no warning message.

## Using a function that capture warnings

Model_n_Warnings <- function(expr) {
 localWarnings <- list()
 outModel <- withCallingHandlers(expr,
 warning = function(w) {
 localWarnings[[length(localWarnings)+1]] <<- w$message  #
store warning message
 invokeRestart("muffleWarning") # avoid printing warning
message to console
 })
 list(outModel=outModel, warnings=localWarnings)
   }

out <- Model_n_Warnings (gam(USE ~ X1 + s(X2) + s(X3), family =
binomial, data = data, method="REML"))

out$warnings

[[1]]
[1] "matrix not positive definite"

[[2]]
[1] "matrix not positive definite"


Thanks,

Arnaud



--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] incomplete final line found on

2012-02-03 Thread R. Michael Weylandt

Try opening the file up in a text editor and inserting a blank line or
two on the end. (There's either an EOL or EOF character missing and
this trick usually works for me -- never sure why/when it happens
though)

Michael

On Fri, Feb 3, 2012 at 4:23 PM, Dimitri Liakhovitski
 wrote:
> Dear R-ers,
>
> I hope there is a really simple solution to my problem.
> I've written a function that I saved in an .r file. I source this file
> in my code. For a while it worked fine. But then when I run the line:
>
> source("F mylineplot.r")
>
> I started getting a warning:
> In readLines(file) : incomplete final line found on 'F mylineplot.r'
>
> I have no idea why - I tried to check and to recheck what's going on,
> but am not finding anything.
> The code works both when I try to run it NOT as a function and when I
> run it AS a function. So why the warning message?
>
> Just in case - the text of my function inside my file that I source. I
> really don't expect anyone to dig into it - but maybe something will
> jump at you?
> Thanks a lot!
> Dimitri
>
>
> ### Creating a plot with (aggregated) several lines:
> # indata - my data frame
> # datesvar - name of the variable that contains dates
> # inars - names of the variables to be graphed
> # myfunction - function to be used (mean or sum)
> # my metric - string for the metric
> # mytitle - title of the graph
> # fixedy - if 1, range on y axis starts with zero
> # indata=en;datesvar="Week";invars=seas[5];myfunction=mean
> # mymetric="TEST";fixedy=0;title="BLA"
>
> mylines = 
> function(indata,datesvar,invars,myfunction,mymetric,mytitle,fixedy=0)
> {
>
>  all.colors<-c("#E0","#CD","#D4D4D4","#FFC1C1","#FFDEAD","#9ACD32",
>        "#99CCFF","#6495ED","#66CDAA","#EEC900","#BC8F8F",
>   "#C0","#696969","#473C8B","#8B4500",     "#FF7F00","#9370DB",
>        "#80","#104E8B","#228B22")[20:1]
>
>
>  myagg<-aggregate(indata[invars],by=indata[datesvar],FUN=myfunction)
>  yrange=range(pretty(as.matrix(myagg[2:length(myagg)])))
>  if(fixedy==0){
>    ymin<-yrange[1]
>    ymax<-yrange[2]} else {
>    ymin<-0
>    ymax<-yrange[2]}
>  ydistance<-ymax-ymin
>  if(ydistance>0.1 & ydistance<=1){mystep<-0.1} else {
>    if(ydistance>1 & ydistance<=10){mystep<-1/2} else {
>        if(ydistance>10 & ydistance<=100) {mystep<-10/5} else {
>        if(ydistance>100 & ydistance<=1000) {mystep<-100/5} else {
>          if(ydistance>1000 & ydistance<=1) {mystep<-1000/2} else {
>            if(ydistance>1 & ydistance<=10) {mystep<-1/5} else {
>              mystep<-10/5
>            }
>          }
>        }
>      }
>    }
>  }
>  nr.of.dates<-length(myagg[[datesvar]]); index<-seq(1,nr.of.dates,2)
>  par(bg = "white")
>  plot(x=myagg[[datesvar]],y=myagg[,2],ylim=c(ymin,ymax),col=all.colors[1],type='l',
>          ylab=mymetric,xlab="",lwd=2,xaxt='n',yaxt='n',main=mytitle)
>  mycolors<-1
>  for(i in 2:length(invars)){
>    mycolors<-c(mycolors,(i))
>  }
>
>  axis(1, labels =format(as.Date(myagg[[datesvar]][index],
> origin="1970-01-01"), "%Y-%m-%d"),
>                at=myagg[[datesvar]][index], las=2,cex.axis=0.8)
>  axis(2, 
> labels=seq(ymin,ymax,by=mystep),at=seq(ymin,ymax,by=mystep),las=1,cex.axis=0.9)
>  abline(v=myagg[[datesvar]][index],lty="dotted",col = "lightgray")
> #  abline(h=seq(ymin,ymax,by=mystep), lty="dotted",col = "lightgray")
>  legend("topleft",inset=0,legend=invars,fill=all.colors[mycolors],horiz=T,bg="white",cex=1)
> # ?plot
>
>  points(myagg[[datesvar]],myagg[[invars[1]]],type="l",lwd=3,lty=i,col=all.colors[1])
>  for(i in 2:length(invars)){
>    
> points(myagg[[datesvar]],myagg[[invars[i]]],type="l",lwd=2,lty=1,col=all.colors[i])
> # or lty=i
>    mycolors<-c(mycolors,(i))
>  }
>  return(myagg)
> }
>
>
> --
> Dimitri Liakhovitski
> marketfusionanalytics.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] incomplete final line found on

2012-02-03 Thread Dimitri Liakhovitski

Dear R-ers,

I hope there is a really simple solution to my problem.
I've written a function that I saved in an .r file. I source this file
in my code. For a while it worked fine. But then when I run the line:

source("F mylineplot.r")

I started getting a warning:
In readLines(file) : incomplete final line found on 'F mylineplot.r'

I have no idea why - I tried to check and to recheck what's going on,
but am not finding anything.
The code works both when I try to run it NOT as a function and when I
run it AS a function. So why the warning message?

Just in case - the text of my function inside my file that I source. I
really don't expect anyone to dig into it - but maybe something will
jump at you?
Thanks a lot!
Dimitri


### Creating a plot with (aggregated) several lines:
# indata - my data frame
# datesvar - name of the variable that contains dates
# inars - names of the variables to be graphed
# myfunction - function to be used (mean or sum)
# my metric - string for the metric
# mytitle - title of the graph
# fixedy - if 1, range on y axis starts with zero
# indata=en;datesvar="Week";invars=seas[5];myfunction=mean
# mymetric="TEST";fixedy=0;title="BLA"

mylines = function(indata,datesvar,invars,myfunction,mymetric,mytitle,fixedy=0)
{

  all.colors<-c("#E0","#CD","#D4D4D4","#FFC1C1","#FFDEAD","#9ACD32",
"#99CCFF","#6495ED","#66CDAA","#EEC900","#BC8F8F",
   "#C0","#696969","#473C8B","#8B4500", "#FF7F00","#9370DB",
"#80","#104E8B","#228B22")[20:1]


  myagg<-aggregate(indata[invars],by=indata[datesvar],FUN=myfunction)
  yrange=range(pretty(as.matrix(myagg[2:length(myagg)])))
  if(fixedy==0){
ymin<-yrange[1]
ymax<-yrange[2]} else {
ymin<-0
ymax<-yrange[2]}
  ydistance<-ymax-ymin
  if(ydistance>0.1 & ydistance<=1){mystep<-0.1} else {
if(ydistance>1 & ydistance<=10){mystep<-1/2} else {
if(ydistance>10 & ydistance<=100) {mystep<-10/5} else {
if(ydistance>100 & ydistance<=1000) {mystep<-100/5} else {
  if(ydistance>1000 & ydistance<=1) {mystep<-1000/2} else {
if(ydistance>1 & ydistance<=10) {mystep<-1/5} else {
  mystep<-10/5
}
  }
}
  }
}
  }
  nr.of.dates<-length(myagg[[datesvar]]); index<-seq(1,nr.of.dates,2)
  par(bg = "white")
  
plot(x=myagg[[datesvar]],y=myagg[,2],ylim=c(ymin,ymax),col=all.colors[1],type='l',
  ylab=mymetric,xlab="",lwd=2,xaxt='n',yaxt='n',main=mytitle)
  mycolors<-1
  for(i in 2:length(invars)){
mycolors<-c(mycolors,(i))
  }

  axis(1, labels =format(as.Date(myagg[[datesvar]][index],
origin="1970-01-01"), "%Y-%m-%d"),
at=myagg[[datesvar]][index], las=2,cex.axis=0.8)
  axis(2, 
labels=seq(ymin,ymax,by=mystep),at=seq(ymin,ymax,by=mystep),las=1,cex.axis=0.9)
  abline(v=myagg[[datesvar]][index],lty="dotted",col = "lightgray")
#  abline(h=seq(ymin,ymax,by=mystep), lty="dotted",col = "lightgray")
  
legend("topleft",inset=0,legend=invars,fill=all.colors[mycolors],horiz=T,bg="white",cex=1)
# ?plot

  
points(myagg[[datesvar]],myagg[[invars[1]]],type="l",lwd=3,lty=i,col=all.colors[1])
  for(i in 2:length(invars)){

points(myagg[[datesvar]],myagg[[invars[i]]],type="l",lwd=2,lty=1,col=all.colors[i])
# or lty=i
mycolors<-c(mycolors,(i))
  }
  return(myagg)
}


-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fiedler

2012-02-03 Thread Kjetil Halvorsen

Hola!

This can be done with the CRAN package igraph, which contains (part
of) the arpack
library for computing only some eigenvalues/eigenvectors of sparse
matrices. arpack gives you the option of computing a few of the
smallest or a few of the largest eigenvalues/vectors.

Do
library(igraph)
?arpack

Kjetil

On Thu, Feb 2, 2012 at 12:29 PM, Massimo Franceschet
 wrote:
> Hi.
>
> I am looking for a function in R for computing the Fiedler vector of a graph 
> (the eigenvector associated with the second smallest eigenvalue of the 
> Laplacian of the graph). Alternatively, I am searching for an efficient 
> method to compute just few eigenvalues/vectors of a matrix (the smallest).
>
> Many thanks.
>
> Massimo Franceschet
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clear last x entries of R console

2012-02-03 Thread 538280

You may want to look at functions like: txtProgressBar, winProgressBar
(windows onnly), or tkProgressBar (tcltk package), rather than
reinventing the wheel.

On Fri, Feb 3, 2012 at 7:00 AM, angliski_jigit
 wrote:
> Hi All,
>
> I am trying to build in a progress-tracker into my loops that let me have a
> sense of their progress. I'd like to be able to output to screen a series of
> periods "" etc. for each completion of the loop, but I  want to
> build a pyramid, e.g.
> .
> ..
> ...
> 
> etc. So I need to be able to delete  of the console entry to
> accomplish this. There are commands to erase the whole console, but that's
> not what I want either; ideally, the command would allow me to erase the
> last line or the last x lines.
>
> Thanks
> Angliski
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Clear-last-x-entries-of-R-console-tp4354669p4354669.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
--
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading table data from PDF files

2012-02-03 Thread Bryan McCloskey

All,

Is anyone familiar with a way to use R to read table data from a large 
collection of PDF files? I'm aware there are various command lines and desktop 
utilities that might be able to (e.g.,) dump PDFs to text, which could then be 
parsed for table data. But I'm hoping there is something more integrated that 
could be incorporated into R functions and scripts to handle large batches of 
PDFs in a more automated fashion.

Has anyone used R to extract large amounts of tabular data from PDF documents?

-bryan

--
Bryan McCloskey, Ph.D.
IT Specialist (Data Management/Internet)
U.S. Geological Survey
St. Petersburg Coastal & Marine Science Center
600 Fourth St. South
St. Petersburg, FL 33701

South Florida Information Access: http://sofia.usgs.gov
Everglades Depth Estimation Network: http://sofia.usgs.gov/eden
Phone: 727.803.8747x3017 * Fax: 727.803.2032
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] GAM (mgcv) warning: matrix not positive definite

2012-02-03 Thread Arnaud Mosnier

Dear list,


I fitted the same GAM model using directly the function gam(mgcv) ... then
as a parameter of another function that capture the warnings messages (see
below).
In the first case, there is no warning message printed, but in the last
one, the function find two warning messages stating "matrix not positive
definite"

So my question is: Do I have to worry about those warnings and then why are
they not printed in the simple use of the gam function.

#

Here is some further description:

## Simple use of gam

gam(USE ~ X1 + s(X2) + s(X3), family = binomial, data = data,
method="REML") # print no warning message.

## Using a function that capture warnings

Model_n_Warnings <- function(expr) {
localWarnings <- list()
outModel <- withCallingHandlers(expr,
warning = function(w) {
localWarnings[[length(localWarnings)+1]] <<- w$message  # store
warning message
invokeRestart("muffleWarning") # avoid printing warning message
to console
})
list(outModel=outModel, warnings=localWarnings)
  }

out <- Model_n_Warnings (gam(USE ~ X1 + s(X2) + s(X3), family = binomial,
data = data, method="REML"))

out$warnings

[[1]]
[1] "matrix not positive definite"

[[2]]
[1] "matrix not positive definite"


Thanks,

Arnaud

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fail to install odfWeave

2012-02-03 Thread David Winsemius



On Feb 3, 2012, at 3:28 PM, istarninwa wrote:


Hi all,

I am trying to install odfWeave package and get the following error:

###

install.packages('odfWeave')

Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
 package 'odfWeave' is not available (for R version 2.14.1)
###

The Google and R-list searches for this error didn't show related  
posts.


I am running R through Eclipse on Windows 7, 64bit.


http://cran.r-project.org/web/checks/check_results_odfWeave.html

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using {tabularx} latex package with the {xtable} package?

2012-02-03 Thread David Winsemius



On Feb 3, 2012, at 2:54 PM, Tal Galili wrote:

I am trying to solve the problem of having a latex table (produced  
using
the xtable , then
inserted to a latex file using Sweave), exceeding the margins of my  
LaTeX

document.

I found that one such solution can be based on the
tabularx package,
and I am wondering what would be the best way to implement it (or if  
there

is a better solution I am overlooking).

Right now the only way I am thinking of is to edit print.xtable so  
it would
work with the tabularx LaTeX package. Any other suggestions would be  
most

welcomed.


It would be more courteous to the rhelp readers if you let them know  
you cross-posted this question on SO:


http://tex.stackexchange.com/questions/43328/using-tabularx-latex-package-with-the-xtable-package-in-r



[[alternative HTML version deleted]]


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fail to install odfWeave

2012-02-03 Thread istarninwa

Hi all,

I am trying to install odfWeave package and get the following error:

###
> install.packages('odfWeave')
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
  package 'odfWeave' is not available (for R version 2.14.1)
###

The Google and R-list searches for this error didn't show related posts.

I am running R through Eclipse on Windows 7, 64bit.

Thanks in advance,
--Egor

--
View this message in context: 
http://r.789695.n4.nabble.com/Fail-to-install-odfWeave-tp4355696p4355696.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread HC

Bad news!

The readLines command works fine upto a certain limit. Once a few files have
been written the R program crashes.

I used the following code:
*
iFile<-"Test.txt"
con <- file(iFile, "r")

N<-125; 
iLoop<-1
 
while(length(Lines <- readLines(con, n = N)) > 0 & iLoop<41) { 
oFile<-paste("Split_",iLoop,".txt",sep="")
  write.table(Lines, oFile, sep = "\t", quote = FALSE, col.names= FALSE,
row.names = FALSE)
  iLoop<-iLoop+1
} 
close(con)


With above N=1.25 million, it wrote 28 files of about 57 mb each. That is a
total of about 1.6 GB and then crashed.
I tried with other values on N and it crashes at about the same place in
terms of total size output, i.e., about 1.6 GB.

Is this due to any limitation of Windows 7, in terms of not having the
pointer after this size?

Your insight would be very helpful.

Thank you.
HC






--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355679.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hanging -- please help decipher event report

2012-02-03 Thread Duncan Murdoch


On 03/02/2012 11:53 AM, jeremyd wrote:

I'm running some code in R64 on a Mac OS 10.6.8 that calls a C program
through the dyn.load() function. The code hangs after several days of
computation, and I've having trouble locating the problem. Can anyone
decipher this info from the error report, and tell me if this is a problem
in R64, or in the C code? Thanks very much in advance.

A few hints: "pa" is the name of the C function in R, and "polio_3A.so" is
the compiled code. I can post the R code or the C code if anyone needs to
see it, but my first priority is to figure out what this stack trace is
telling me.

Date/Time:   2012-02-03 10:35:57 -0500
OS Version:  10.6.8 (Build 10K549)
Architecture:x86_64
Report Version:  7

Command: R
Path:/Applications/R64.app/Contents/MacOS/R
Version: R 2.11.1 GUI 1.34 Leopard build 64-bit (5589)
Parent:  launchd [95]

PID: 37306
Event:   hang
Duration:5.96s (sampling started after 2 seconds)
Steps:   19 (100ms sampling interval)

Pageins: 88
Pageouts:0


Process: R [37306]
Path:/Applications/R64.app/Contents/MacOS/R
UID: 501

   Thread eef8d9 DispatchQueue 1
   User stack:
 19 start + 52 (in R) [0x11a74]
   19 main + 844 (in R) [0x11dec]
 19 -[REngine runREPL] + 102 (in R) [0x100010f86]
   19 run_REngineRmainloop + 192 (in R) [0x1000194a0]
 19 R_ReplDLLdo1 + 462 (in libR.dylib) [0x10016ce4e]
   19 Rf_eval + 1196 (in libR.dylib) [0x10013ca1c]
 19 do_for + 678 (in libR.dylib) [0x100140f66]
   19 Rf_eval + 1196 (in libR.dylib) [0x10013ca1c]
 19 do_begin + 308 (in libR.dylib) [0x100141724]
   19 Rf_eval + 1196 (in libR.dylib) [0x10013ca1c]
 19 Rf_eval + 1196 (in libR.dylib) [0x10013ca1c]
   19 do_begin + 308 (in libR.dylib) [0x100141724]
 19 Rf_eval + 1196 (in libR.dylib) [0x10013ca1c]
   19 do_while + 614 (in libR.dylib)
[0x10013dca6]
 19 Rf_eval + 1196 (in libR.dylib)
[0x10013ca1c]
   19 do_begin + 308 (in libR.dylib)
[0x100141724]
 19 Rf_eval + 1196 (in libR.dylib)
[0x10013ca1c]
   19 do_set + 709 (in libR.dylib)
[0x10013ea55]
 19 Rf_eval + 962 (in libR.dylib)
[0x10013c932]
   19 Rf_applyClosure + 724 (in
libR.dylib) [0x10013f364]
 19 Rf_eval + 1196 (in
libR.dylib) [0x10013ca1c]
   19 do_begin + 308 (in
libR.dylib) [0x100141724]
 19 Rf_eval + 1196 (in
libR.dylib) [0x10013ca1c]
   19 do_set + 709 (in
libR.dylib) [0x10013ea55]
 19 Rf_eval + 1676 (in
libR.dylib) [0x10013cbfc]


Down to here is normal stuff R will show when evaluating a complex 
expression.

   19 do_dotCode + 6074
(in libR.dylib) [0x100111e9a]


This is where we leave R and enter your code.

 6 pa + 2960
(polio_3A.c:258 in polio_3A.so) [0x101707b70]


Presumably polio_3A.c line 258 is meaningful to you.  It's where your 
code called some other function in your code,

and there were some nested calls...

 5 pa + 2989
(polio_3A.c:256 in polio_3A.so) [0x101707b8d]
 2 pa + 2964
(polio_3A.c:258 in polio_3A.so) [0x101707b74]
 2 pa + 2969
(polio_3A.c:259 in polio_3A.so) [0x101707b79]
 1 pa + 2966
(polio_3A.c:258 in polio_3A.so) [0x101707b76]
 1 pa + 1035
(polio_3A.c:144 in polio_3A.so) [0x1017073eb]


... down to here, on line 144 of  the same file, where you decided to 
let R process some events.



   1 R_ProcessEvents
+ 30 (in libR.dylib) [0x10021dc1e]
 1
Re_ProcessEvents + 29 (in R) [0x1f46d]
   1 gettimeofday
+ 43 (in libSystem.B.dylib) [0x7fff82a0814f]
 1
__gettimeofday + 80 (in commpage [libSystem.B.dylib]) [0x7fe00330]
 1 pa + 1262
(polio_3A.c:165 in polio_3A.so) [0x1017074ce]

[R] Using {tabularx} latex package with the {xtable} package?

2012-02-03 Thread Tal Galili

I am trying to solve the problem of having a latex table (produced using
the xtable , then
inserted to a latex file using Sweave), exceeding the margins of my LaTeX
document.

I found that one such solution can be based on the
tabularx package,
and I am wondering what would be the best way to implement it (or if there
is a better solution I am overlooking).

Right now the only way I am thinking of is to edit print.xtable so it would
work with the tabularx LaTeX package. Any other suggestions would be most
welcomed.

Thanks.

Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot theme_update

2012-02-03 Thread vd3000

Hi, David,

Thanks a lot.

I don't know how it works after adding the line after library(ggplot2)
/base_theme <- theme_update(axis.text.x = theme_text(angle = 0, hjust =
0.5), axis.text.y 
= theme_text(angle = 0, hjust = 0.5), panel.grid.major = theme_line(colour =
"grey90"), 
panel.grid.minor = theme_blank(), panel.background = theme_blank(),
axis.ticks = 
theme_blank(), legend.position = "none")/

However, I did add the same line after loading all the packages, but it
failed...lol
I was freaking out I put it in the wrong position or made a wrong coding in
profile...
Anyway, you really save me from this deadlock. THnaks.

--
View this message in context: 
http://r.789695.n4.nabble.com/ggplot-theme-update-tp4354015p4355528.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sapply help

2012-02-03 Thread Filoche

Thank you sire.

You explained it very well. This give ma a good point to start using sapply
more frequently.

Cordially,
Phil

--
View this message in context: 
http://r.789695.n4.nabble.com/sapply-help-tp4355092p4355376.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread HC

Thank you.

The readLines command is working fine and I am able to read 10^6 lines in
one go and write them using the write.table command.

Does this readLines command using a block concept to optimize or goes line
by line?

Steve has mentioned about *nix and split commands. Would there be any speed
benefit as compared to readLines?

Thank you.
HC

--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355362.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Hanging -- please help decipher event report

2012-02-03 Thread jeremyd

I'm running some code in R64 on a Mac OS 10.6.8 that calls a C program
through the dyn.load() function. The code hangs after several days of
computation, and I've having trouble locating the problem. Can anyone
decipher this info from the error report, and tell me if this is a problem
in R64, or in the C code? Thanks very much in advance.

A few hints: "pa" is the name of the C function in R, and "polio_3A.so" is
the compiled code. I can post the R code or the C code if anyone needs to
see it, but my first priority is to figure out what this stack trace is
telling me.

Date/Time:   2012-02-03 10:35:57 -0500
OS Version:  10.6.8 (Build 10K549)
Architecture:x86_64
Report Version:  7

Command: R
Path:/Applications/R64.app/Contents/MacOS/R
Version: R 2.11.1 GUI 1.34 Leopard build 64-bit (5589)
Parent:  launchd [95]

PID: 37306
Event:   hang
Duration:5.96s (sampling started after 2 seconds)
Steps:   19 (100ms sampling interval)

Pageins: 88
Pageouts:0


Process: R [37306]
Path:/Applications/R64.app/Contents/MacOS/R
UID: 501

  Thread eef8d9 DispatchQueue 1
  User stack:
19 start + 52 (in R) [0x11a74]
  19 main + 844 (in R) [0x11dec]
19 -[REngine runREPL] + 102 (in R) [0x100010f86]
  19 run_REngineRmainloop + 192 (in R) [0x1000194a0]
19 R_ReplDLLdo1 + 462 (in libR.dylib) [0x10016ce4e]
  19 Rf_eval + 1196 (in libR.dylib) [0x10013ca1c]
19 do_for + 678 (in libR.dylib) [0x100140f66]
  19 Rf_eval + 1196 (in libR.dylib) [0x10013ca1c]
19 do_begin + 308 (in libR.dylib) [0x100141724]
  19 Rf_eval + 1196 (in libR.dylib) [0x10013ca1c]
19 Rf_eval + 1196 (in libR.dylib) [0x10013ca1c]
  19 do_begin + 308 (in libR.dylib) [0x100141724]
19 Rf_eval + 1196 (in libR.dylib) [0x10013ca1c]
  19 do_while + 614 (in libR.dylib)
[0x10013dca6]
19 Rf_eval + 1196 (in libR.dylib)
[0x10013ca1c]
  19 do_begin + 308 (in libR.dylib)
[0x100141724]
19 Rf_eval + 1196 (in libR.dylib)
[0x10013ca1c]
  19 do_set + 709 (in libR.dylib)
[0x10013ea55]
19 Rf_eval + 962 (in libR.dylib)
[0x10013c932]
  19 Rf_applyClosure + 724 (in
libR.dylib) [0x10013f364]
19 Rf_eval + 1196 (in
libR.dylib) [0x10013ca1c]
  19 do_begin + 308 (in
libR.dylib) [0x100141724]
19 Rf_eval + 1196 (in
libR.dylib) [0x10013ca1c]
  19 do_set + 709 (in
libR.dylib) [0x10013ea55]
19 Rf_eval + 1676 (in
libR.dylib) [0x10013cbfc]
  19 do_dotCode + 6074
(in libR.dylib) [0x100111e9a]
6 pa + 2960
(polio_3A.c:258 in polio_3A.so) [0x101707b70]
5 pa + 2989
(polio_3A.c:256 in polio_3A.so) [0x101707b8d]
2 pa + 2964
(polio_3A.c:258 in polio_3A.so) [0x101707b74]
2 pa + 2969
(polio_3A.c:259 in polio_3A.so) [0x101707b79]
1 pa + 2966
(polio_3A.c:258 in polio_3A.so) [0x101707b76]
1 pa + 1035
(polio_3A.c:144 in polio_3A.so) [0x1017073eb]
  1 R_ProcessEvents
+ 30 (in libR.dylib) [0x10021dc1e]
1
Re_ProcessEvents + 29 (in R) [0x1f46d]
  1 gettimeofday
+ 43 (in libSystem.B.dylib) [0x7fff82a0814f]
1
__gettimeofday + 80 (in commpage [libSystem.B.dylib]) [0x7fe00330]
1 pa + 1262
(polio_3A.c:165 in polio_3A.so) [0x1017074ce]
1 pa + 2975
(polio_3A.c:260 in polio_3A.so) [0x101707b7f]


--
View this message in context: 
http://r.789695.n4.nabble.com/Hanging-please-help-decipher-event-report-tp4355128p4355128.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide comm

Re: [R] creating R package

2012-02-03 Thread J Toll

On Fri, Feb 3, 2012 at 11:25 AM, ql16717  wrote:
> Hi,
>
> I never acutally made a package before. I have a folder, say called
> "john" that has everything it needs to be in a R package. Some
> instruction says I need Rtools from R mirror site. I installed the
> Rtools, but under DOS, the command "Rcmd" is still not recognized.
>
> Any suggestions? Thanks
> John

I don't have an answer to your question, but have you read through:

http://cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf

 or

?package.skeleton

Those resources might be able to help.

James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sapply help

2012-02-03 Thread William Dunlap

Instead of colSums(t(aMatrix)), why not the more
direct rowSums(aMatrix)?

If time is an issue (which it won't be unless
the number of columns of M is big), compare:
  > M <- matrix(2e5:1, nrow=2)
  > v <- 1:ncol(M)
  > system.time(z1 <- sapply(seq_along(v), function(i) sum(M[,i] < v[i])))
 user  system elapsed
0.532   0.000   0.532
  > system.time(z2 <- colSums(t(apply(M, 1, "<", v
 user  system elapsed
0.004   0.000   0.006
  > system.time(z3 <- rowSums(apply(M, 1, "<", v)))
 user  system elapsed
0.008   0.000   0.005
  > system.time(z4 <- colSums(M < matrix(v, nrow=nrow(M), ncol=ncol(M), 
byrow=TRUE)))
 user  system elapsed
0.000   0.000   0.002
  > isTRUE(all.equal(z1, z2)) && isTRUE(all.equal(z1,z3)) && 
isTRUE(all.equal(z1,z4))
  [1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Milan Bouchet-
> Valat
> Sent: Friday, February 03, 2012 10:17 AM
> To: Ernest Adrogué
> Cc: r-help@r-project.org
> Subject: Re: [R] sapply help
> 
> Le vendredi 03 février 2012 à 18:27 +0100, Ernest Adrogué a écrit :
> > 3-02-2012, 08:37 (-0800); Filoche escriu:
> > > Hi every one.
> > >
> > > I'm learning how to use sapply (and other function of this family).
> > >
> > > Here's what I'm trying to do.
> > >
> > > I have a vector of lets say 5 elements. I also have a matrix of nX5. I 
> > > would
> > > like to know how many element by column are inferior to each element of my
> > > vector.
> > >
> > > On this example:
> > > v = c(1:5)
> > > M = matrix(3,2,5)
> > >
> > > I would like to have a vector at the end which give me
> > >
> > > 0 0 0 2 2
> > >
> >
> > This does that:
> >
> > > sapply(1:5, function(i) sum(M[,i] < v[i]))
> > [1] 0 0 0 2 2
> >
> > Basically, it's like a loop where at each iteration the function is
> > called with one element of the vector 1:5 as argument, so what this
> > really does is
> >
> > sum(M[,1] < v[1]))
> > sum(M[,2] < v[2]))
> > ...
> >
> > and then the results are put all together in a vector.
> Though in your case, I think there are shorter solutions. For example:
> > colSums(t(apply(M, 1, "<", v)))
> [1] 0 0 0 2 2
> 
> apply() is more suited to matrices. Here, it takes each row separately,
> and compares it with v. Then, you can just sum the result to count the
> number of cases that fulfill the condition.
> 
> 
> Cheers
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] an unusual use for R

2012-02-03 Thread 538280

Nice,

Last year I found that my office needed some decoration and my wife
has some fancy sewing machines that can be programmed to do embroidery
and cross-stitch.  So I designed some cross stitches (using R and the
program for the machines) that show distribution functions and
equations for the Central Limit Theorem, Bayes Theorem, and the Mean
Value Theorem of Integration and my wife stitched them out for me.  I
certainly get varied comments when people see them.

Nice to know there are others who mix R, Statistics, and Textile crafts.

On Thu, Feb 2, 2012 at 3:54 PM, Sarah Goslee  wrote:
> I thought some of you might be amused by this.
>
> In my non-work time, I'm an avid weaver and teacher of weaving. I'm
> working on a project involving creating many detailed weaving
> patterns, so I wrote R code to automate it.
>
> Details here:
> http://stringpage.com/blog/?p=822
>
> If the overlap between R users and avid tablet weavers turns out to be
>>> 1, I'll polish it up and turn it into a package.
>
> Sarah
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
--
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate arrays

2012-02-03 Thread David Winsemius

On Feb 3, 2012, at 11:45 AM, Leuzinger Sebastian wrote:

Dear list

after quite a bit of research in the archive, I gave up. This seems  
to be a simple problem:

I would like to aggregate a (3-dimensional) array, either by another  
array, or by a vector, indicating the dimension which should be  
aggregated.

I don't think I have to provide an example, it's really the 3- 
dimensional equivalent for the standard aggregate command. I am sure  
aggregate() itself can do it, but how do I need to specify the 'by'  
argument?

Why wouldn't `apply` do what you want?  (Which is perhaps my implicit  
rebuttal to your suggestion that you don't need to produce an example).

> arr <- array(1:27, c(3,3,3))
> apply(arr, 2:3, mean)
 [,1] [,2] [,3]
[1,]2   11   20
[2,]5   14   23
[3,]8   17   26
> apply(arr, 3, mean)
[1]  5 14 23

Any hint (also to an earlier post) appreciated

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot theme_update

2012-02-03 Thread David Winsemius



On Feb 3, 2012, at 10:14 AM, ONKELINX, Thierry wrote:


Hi Nameless,

You would have to add your code to the source of the ggplot2 package  
to make it permanent. But that not a vary good idea.


Just add the line to your script and rerun it when you restart your  
analysis.




Or add it  ( after executing load(ggplot2) )  to .Rprofile which gets  
executed at R Startup:


?Startup

--
David.

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for  
Nature and Forest

team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no  
more than asking him to perform a post-mortem examination: he may be  
able to say what the experiment died of.

~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does  
not ensure that a reasonable answer can be extracted from a given  
body of data.

~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-bounces@r- 
project.org] Namens vd3000

Verzonden: vrijdag 3 februari 2012 9:31
Aan: r-help@r-project.org
Onderwerp: [R] ggplot theme_update

Hi, all,

I am a newbie for [r].

I am currently trying to learn this example.
http://learnr.wordpress.com/2009/03/17/ggplot2-barplots/
http://learnr.wordpress.com/2009/03/17/ggplot2-barplots/

I know if I need to show the graph properly, I need to update the  
theme by this command:


*> immigration_theme <- theme_update(axis.text.x = theme_text(angle  
= 0, hjust = 0.5, size=20), axis.text.y = theme_text(angle = 0,  
hjust = 0.5, size=20), panel.grid.major = theme_line(colour =  
"grey90"), panel.grid.minor = theme_blank(), panel.background =  
theme_blank(), axis.ticks = theme_blank(), legend.position = "none")*


However, everytime when I close the r programme and run it again, I  
need to rerun theme_update again in order to show the picture  
properly. That is, the theme_update could not really update the  
theme after I close [r].


So, how could I change the parameters permenantly using theme_update?

I am tired to google for the theme_update for 3 days...

Hope some genius could help.

Thanks.


--
View this message in context: 
http://r.789695.n4.nabble.com/ggplot-theme-update-tp4354015p4354015.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sapply help

2012-02-03 Thread Milan Bouchet-Valat

Le vendredi 03 février 2012 à 18:27 +0100, Ernest Adrogué a écrit :
> 3-02-2012, 08:37 (-0800); Filoche escriu:
> > Hi every one.
> > 
> > I'm learning how to use sapply (and other function of this family).
> > 
> > Here's what I'm trying to do.
> > 
> > I have a vector of lets say 5 elements. I also have a matrix of nX5. I would
> > like to know how many element by column are inferior to each element of my
> > vector.
> > 
> > On this example:
> > v = c(1:5)
> > M = matrix(3,2,5)
> > 
> > I would like to have a vector at the end which give me
> > 
> > 0 0 0 2 2
> > 
> 
> This does that:
> 
> > sapply(1:5, function(i) sum(M[,i] < v[i]))
> [1] 0 0 0 2 2
> 
> Basically, it's like a loop where at each iteration the function is
> called with one element of the vector 1:5 as argument, so what this
> really does is
> 
> sum(M[,1] < v[1]))
> sum(M[,2] < v[2]))
> ...
> 
> and then the results are put all together in a vector.
Though in your case, I think there are shorter solutions. For example:
> colSums(t(apply(M, 1, "<", v)))
[1] 0 0 0 2 2

apply() is more suited to matrices. Here, it takes each row separately,
and compares it with v. Then, you can just sum the result to count the
number of cases that fulfill the condition.


Cheers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sapply help

2012-02-03 Thread Ernest Adrogué

 3-02-2012, 08:37 (-0800); Filoche escriu:
> Hi every one.
> 
> I'm learning how to use sapply (and other function of this family).
> 
> Here's what I'm trying to do.
> 
> I have a vector of lets say 5 elements. I also have a matrix of nX5. I would
> like to know how many element by column are inferior to each element of my
> vector.
> 
> On this example:
> v = c(1:5)
> M = matrix(3,2,5)
> 
> I would like to have a vector at the end which give me
> 
> 0 0 0 2 2
> 

This does that:

> sapply(1:5, function(i) sum(M[,i] < v[i]))
[1] 0 0 0 2 2

Basically, it's like a loop where at each iteration the function is
called with one element of the vector 1:5 as argument, so what this
really does is

sum(M[,1] < v[1]))
sum(M[,2] < v[2]))
...

and then the results are put all together in a vector.

-- 
Cheers,
Ernest

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] creating R package

2012-02-03 Thread ql16717

Hi,

I never acutally made a package before. I have a folder, say called
"john" that has everything it needs to be in a R package. Some
instruction says I need Rtools from R mirror site. I installed the
Rtools, but under DOS, the command "Rcmd" is still not recognized.

Any suggestions? Thanks
John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Scaling in optimization

2012-02-03 Thread John C Nash

Ben Bolker pointed out in a response about max. likelihood estimation that 
parameter
scaling is not available in nlminb

On 02/03/2012 06:00 AM, r-help-requ...@r-project.org wrote:
>  * if you were using one of the optimizing methods from optim() (rather
> than nlminb), e.g. L-BFGS-B, I would suggest you try using parscale to
> rescale the parameters to have approximately equal magnitudes near the
> solution.  This apparently isn't possible with nlminb, but you could try
> optimizer="optim" (the default), method="L-BFGS-B" and see how you do
> (although L-BFGS-B is often a bit finicky).  Alternatively, you can try
> optimizer="optimx", in which case you have a larger variety of
> unconstrained optimizers to choose from (you have to install the optimx
> package and take a look at its documentation).  Alternatively, you can
> scale your input variables (e.g. use scale() on your input matrix to get
> zero-centered, sd 1 variables), although you would then have to adjust
> your lower and upper bounds accordingly.
> 


This note is to mention that the R-forge version of optimx(), which I caution 
is still
being developed, has introduced parameter scaling for all 15 optimizers 
currently included
in the optimx() wrapper. Feedback and comments are welcome for this 
experimental version.
It is yet far from perfect, but progress is being made and will be accelerated 
by input
from users. Developers of optimizers cannot anticipate all the obstacles users 
will create
-- we need our programs put to hard tests. See

https://r-forge.r-project.org/R/?group_id=395

One goal of optimx is to provide a single syntax for calling all the optimizers.

Cheers,

JN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread G See

On Fri, Feb 3, 2012 at 10:39 AM, peter dalgaard  wrote:
>
> So that's a nonbreak space alright. Next question: How did it get there? I'm 
> mildly surprised that it crept into the data frame, I would expect it to 
> happen much easier with things typed on the keyboard (Alt-Spc on my Mac 
> keyboard, e.g.).
>

Peter,
I won't venture to guess how, but this will do it.

> library(XML)
> x <- readHTMLTable("http://earnings.com/company.asp?client=cb&ticker=GOOG";, 
> stringsAsFactors=FALSE)[[21]]
> charToRaw(x[28, 4])
[1] 6e 2f 61 c2 a0

Garrett

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] aggregate arrays

2012-02-03 Thread Leuzinger Sebastian

Dear list

after quite a bit of research in the archive, I gave up. This seems to be a 
simple problem:

I would like to aggregate a (3-dimensional) array, either by another array, or 
by a vector, indicating the dimension which should be aggregated. 

I don't think I have to provide an example, it's really the 3-dimensional 
equivalent for the standard aggregate command. I am sure aggregate() itself can 
do it, but how do I need to specify the 'by' argument?

Any hint (also to an earlier post) appreciated

Sebastian
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sapply help

2012-02-03 Thread Filoche

Hi every one.

I'm learning how to use sapply (and other function of this family).

Here's what I'm trying to do.

I have a vector of lets say 5 elements. I also have a matrix of nX5. I would
like to know how many element by column are inferior to each element of my
vector.

On this example:
v = c(1:5)
M = matrix(3,2,5)

I would like to have a vector at the end which give me

0 0 0 2 2

because in my matrix M, there's 2 row at my columns 4 and 5 that have number
< than values 4 and 5 respectively.

This is pretty simple to do with a loop, but I would like to know how to do
it with sapply.

I hope I have been clear enough.
Tx in advance.

Phil

 



--
View this message in context: 
http://r.789695.n4.nabble.com/sapply-help-tp4355092p4355092.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Uploading into R

2012-02-03 Thread carlb1

Thank you very much its appreciated 

--
View this message in context: 
http://r.789695.n4.nabble.com/Uploading-into-R-tp4354538p4354960.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Flexmix new data classification

2012-02-03 Thread loyolite270

hi

I built a flexmix GLM binomial model with 200 observations and the model
gave me 2 clusters, so if the model is named as newModel then i get the
cluster index for each row using newModel@clusters. Now is there any way to
predict  which cluster the new observation or 201 observation belongs to
using the above built model (newModel) ie so 201 observation can either
belong to cluster 1 or cluster 2.

Thanks

--
View this message in context: 
http://r.789695.n4.nabble.com/Flexmix-new-data-classification-tp4354912p4354912.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clear last x entries of R console

2012-02-03 Thread angliski_jigit

Thanks! I had a quick play with cat() in the command line (e.g. typing
cat(".")) and didn't seem to help because it just sent me to a new line; I
see now that when put into a script though cat() is all you need.
Thanks, AJ 

--
View this message in context: 
http://r.789695.n4.nabble.com/Clear-last-x-entries-of-R-console-tp4354669p4354949.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread peter dalgaard

On Feb 3, 2012, at 17:23 , G See wrote:

> Thank you Duncan, that is very helpful.
> 
> Although I think we've got it sorted out now, to answer your previous
> questions,  it is repeatable in a new R session, and the output of
> charToRaw is below.
> 
> On Ubuntu, I get the following:
>> charToRaw(x)
> [1] 6e 2f 61 c2 a0

So that's a nonbreak space alright. Next question: How did it get there? I'm 
mildly surprised that it crept into the data frame, I would expect it to happen 
much easier with things typed on the keyboard (Alt-Spc on my Mac keyboard, 
e.g.).

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread Petr Savicky

On Fri, Feb 03, 2012 at 10:10:56AM -0600, G See wrote:
> Sorry, I meant
> Do you know of a way to print a string such that I can see whether it
> contains a *space* or a no-break space?

Hi.

For unknown characters, the following may be useful

  x <- "n/a "

  library(Unicode)
  u_char_inspect(as.u_char_seq(x, ""))

  Code Name Char
  1 U+006E LATIN SMALL LETTER Nn
  2 U+002F  SOLIDUS/
  3 U+0061 LATIN SMALL LETTER Aa
  4 U+00A0   NO-BREAK SPACE 

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] iterating through for loop

2012-02-03 Thread William Dunlap

Use a common subscript to go through two or more objects in
parallel:
  > x<-c(1,2,4,7,34,6)
  > y<-c(3,5,6,9,34,7)
  > stopifnot(length(x)==length(y))
  > for(i in seq_along(x)) {
  +print(paste(x[i], y[i]))
  + }
  [1] "1 3"
  [1] "2 5"
  [1] "4 6"
  [1] "7 9"
  [1] "34 34"
  [1] "6 7"

For this toy example it is easier to just compute
  paste(x, y)
but I assume you plan on doing something more
substantial that isn't already vectorized.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of sagarnikam123
> Sent: Friday, February 03, 2012 1:32 AM
> To: r-help@r-project.org
> Subject: [R] iterating through for loop
> 
> how to iterate two elements each through for loop?
> e.g. x<-c(1,2,4,7,34,6)
> y<-c(3,5,6,9,34,7)
> 
> for(z in x){
> print(paste(z,y))  }
> 
> 
> i want both element of vector iterate serially with same position
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/iterating-through-for-loop-
> tp4354101p4354101.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread Duncan Murdoch


On 12-02-03 11:10 AM, G See wrote:

Sorry, I meant
Do you know of a way to print a string such that I can see whether it
contains a *space* or a no-break space?


Use tools::showNonASCII(x).  On Petr's example, it gives

1: n/a

Duncan Murdoch




On Fri, Feb 3, 2012 at 10:10 AM, G See  wrote:

Petr,

Thank you!  That is great.

Do you know of a way to print a string such that I can see whether it
contains a string or a no-break space?

Thanks,
Garrett

On Fri, Feb 3, 2012 at 10:01 AM, Petr Savicky  wrote:

On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:

I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA.  On Mac OSX, it works
to do this:
df[df == "n/a"]<- NA

However, it does not work on Ubuntu.  See below.

Thanks in advance,
Garrett


x<- df[27, 4] # complete data.frame dput is below
dput(x)

"n/a "


Hi.

This string contains a no-break space, not a space.

  "n/a " == "n/a\uA0"

  [1] TRUE

  "n/a\uA0"

  [1] "n/a "

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread G See

Thank you Duncan, that is very helpful.

Although I think we've got it sorted out now, to answer your previous
questions,  it is repeatable in a new R session, and the output of
charToRaw is below.

On Ubuntu, I get the following:
> charToRaw(x)
[1] 6e 2f 61 c2 a0

On Mac, I get:
> charToRaw(x)
[1] 6e 2f 61

Thanks to all for the help,
Garrett

On Fri, Feb 3, 2012 at 10:19 AM, Duncan Murdoch
 wrote:
> On 12-02-03 11:10 AM, G See wrote:
>>
>> Sorry, I meant
>> Do you know of a way to print a string such that I can see whether it
>> contains a *space* or a no-break space?
>
>
> Use tools::showNonASCII(x).  On Petr's example, it gives
>
> 1: n/a
>
> Duncan Murdoch
>
>
>>
>>
>> On Fri, Feb 3, 2012 at 10:10 AM, G See  wrote:
>>>
>>> Petr,
>>>
>>> Thank you!  That is great.
>>>
>>> Do you know of a way to print a string such that I can see whether it
>>> contains a string or a no-break space?
>>>
>>> Thanks,
>>> Garrett
>>>
>>> On Fri, Feb 3, 2012 at 10:01 AM, Petr Savicky  wrote:

 On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:
>
> I have a data.frame named "df". The dput of df is at the bottom of this
> e-mail.
> What I'd like to do is replace the "n/a " values with NA.  On Mac OSX,
> it works
> to do this:
> df[df == "n/a"]<- NA
>
> However, it does not work on Ubuntu.  See below.
>
> Thanks in advance,
> Garrett
>
>> x<- df[27, 4] # complete data.frame dput is below
>> dput(x)
>
> "n/a "


 Hi.

 This string contains a no-break space, not a space.

  "n/a " == "n/a\uA0"

  [1] TRUE

  "n/a\uA0"

  [1] "n/a "

 Hope this helps.

 Petr Savicky.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread Duncan Murdoch


On 12-02-03 10:25 AM, G See wrote:

I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA.  On Mac OSX, it works
to do this:
df[df == "n/a"]<- NA

However, it does not work on Ubuntu.  See below.

Thanks in advance,
Garrett


x<- df[27, 4] # complete data.frame dput is below
dput(x)

"n/a "

x == "n/a "

[1] FALSE

x == "n/a"

[1] FALSE


One would expect the first of these to be TRUE, but the second 
shouldn't.  On my system that's what happens.


Is this still repeatable in a new session?  If so, can you show us what 
you get from charToRaw?  I get


> charToRaw(x)
[1] 6e 2f 61 20

but perhaps you have some different character in the fourth position, 
one which just happens to display as a space.


If it is not repeatable in a new session, then it's hard to guess what 
went wrong, but conceivably memory corruption somewhere could have 
caused this.  It would be worthwhile keeping track of what you were 
doing if it ever happens again.


Duncan Murdoch



str(x)

  chr "n/a "

is.na(x)

[1] FALSE

grep("n/a ", x)

integer(0)

grep("n/a", x)

[1] 1



sessionInfo()

R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] XML_3.4-3  qmao_1.1.10
[3] FinancialInstrument_0.10.9 quantmod_0.3-17
[5] TTR_0.21-0 Defaults_1.1-1
[7] xts_0.8-3  zoo_1.7-6

loaded via a namespace (and not attached):
[1] grid_2.14.1lattice_0.20-0 tools_2.14.1





### More detail ###
## Here is the complete data.frame

dput(df)

structure(list(SYMBOL = c("GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG "), PERIOD = c("Q4 2011",
"Q3 2011", "Q2 2011", "Q1 2011", "Q4 2010", "Q3 2010", "Q2 2010",
"Q1 2010", "Q4 2009", "Q3 2009", "Q2 2009", "Q1 2009", "Q4 2008",
"Q3 2008", "Q2 2008", "Q1 2008", "Q4 2007", "Q3 2007", "Q2 2007",
"Q1 2007", "Q4 2006", "Q3 2006", "Q2 2006", "Q1 2006", "Q4 2005",
"Q3 2005", "Q2 2005", "Q1 2005", "Q4 2004", "Q3 2004"),
 `EVENT TITLE` = c("Q4 2011 Google Earnings Release", "Q3 2011
Google Inc Earnings Release",
 "Q2 2011 Google Inc Earnings Release", "Q1 2011 Google Inc
Earnings Release",
 "Q4 2010 Google Earnings Release", "Q3 2010 Google Earnings Release",
 "Q2 2010 Google Earnings Release", "Q1 2010 Google Earnings Release",
 "Q4 2009 Google Earnings Release", "Q3 2009 Google Earnings Release",
 "Q2 2009 Google Earnings Release", "Q1 2009 Google Earnings Release",
 "Q4 2008 Google Earnings Release", "Q3 2008 Google Earnings Release",
 "Q2 2008 Google Earnings Release", "Q1 2008 Google Earnings Release",
 "Q4 2007 Google Earnings Release", "Q3 2007 Google Earnings Release",
 "Q2 2007 Google Earnings Release", "Q1 2007 Google Earnings Release",
 "Q4 2006 Google Earnings Release", "Q3 2006 Google Earnings Release",
 "Q2 2006 Google Earnings Release", "Q1 2006 Google Earnings Release",
 "Q4 2005 Google Earnings Release", "Q3 2005 Google Earnings Release",
 "Q2 2005 Google Earnings Release", "Q1 2005 Google Earnings Release",
 "Q4 2004 Google Earnings Release", "Q3 2004 Google Earnings Release"
 ), `EPS ESTIMATE` = c("$ 10.49 ", "$ 8.74 ", "$ 7.85 ",
 "$ 8.10 ", "$ 8.09 ", "$ 6.68 ", "$ 6.52 ", "$ 6.60 ",
 "$ 6.50 ", "$ 5.42 ", "$ 5.09 ", "$ 4.93 ", "$ 4.95 ",
 "$ 4.76 ", "$ 4.74 ", "$ 4.52 ", "$ 4.44 ", "$ 3.78 ",
 "$ 3.59 ", "$ 3.30 ", "$ 2.92 ", "$ 2.42 ", "$ 2.22 ",
 "$ 1.97 ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ",
 "n/a "), `EPS ACTUAL` = c("$ 9.50 ", "$ 9.72 ", "$ 8.74 ",
 "$ 8.08 ", "$ 8.75 ", "$ 7.64 ", "$ 6.45 ", "$ 6.76 ",
 "$ 6.79 ", "$ 5.89 ", "$ 5.36 ", "$ 5.16 ", "$ 5.10 ",
 "$ 4.92 ", "$ 4.63 ", "$ 4.84 ", "$ 4.43 ", "$ 3.91 ",
 "$ 3.56 ", "$ 3.68 ", "$ 3.18 ", "$ 2.62 ", "$ 2.49 ",
 "$ 2.29 ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ",
 "n/a "), `PREV. YEAR ACTUAL` = c("$ 8.75 ", "$ 7.64 ",
 "$ 6.45 ", "$ 6.76 ", "$ 6.79 ", "$ 5.89 ", "$ 5.36 ",
 "$ 5.16 ", "$ 5.10 ", "$ 4.92 ", "$ 4.63 ", "$ 4.84 ",
 "$ 4.43 ", "$ 3.91 ", "$ 3.56 ", "$ 3.68 ", "$ 3.18 ",
 "$ 2.62 ", "$ 2.49 ", "$ 2.29 ", "n/a ", "n/a ", "n/a ",
 "n/a ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a "
 ), TIME = c("2012-01-19 15:15:00 CST", "2011-10-13 15:15:00 CDT",
 "2011-07-14 15:15:00 CDT", "2011-04-14 15:15:

[R] [fields] image.plot abends with NAs in image.plot.info

2012-02-03 Thread Tom Roche


summary: image.plot-ing two sets of netCDF data, with the second
derived from the first. First plots to PDF as expected (title, data,
legend). Second plots the data and title, but abends before drawing
the legend, with

> Error in if (del == 0 && to == 0) return(to) : 
>   missing value where TRUE/FALSE needed
> Calls: plot.layers.for.timestep -> image.plot -> seq -> seq.default

Debugging shows image.plot.info(...) is returning

> Browse[2]> info
> $xlim
> [1] NA

> $ylim
> [1] NA

> $zlim
> [1] NA

> $poly.grid
> [1] FALSE

details:

(Hopefully the following is not too long-winded; I'm just trying to be
complete.) I'm running on a cluster (where I don't have root) with

me@it4:~ $ lsb_release -a
LSB Version:
:core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noarch
Distributor ID: RedHatEnterpriseServer
Description:Red Hat Enterprise Linux Server release 5.4 (Tikanga)
Release:5.4
Codename:   Tikanga
me@it4:~ $ uname -a
Linux it4 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 
x86_64 GNU/Linux
me@it4:~ $ R
R version 2.14.0 (2011-10-31)
...
> library(help = fields)
...
Package:fields
Version:6.6.2
Date:   November 16, 2011
...
Maintainer: Doug Nychka 

I have an IOAPI (netCDF classic plus spatial metadata) dataset with
structure {cols, rows, layers, timestep=1}, where {cols,rows}
represent a land-cover grid. The layers are sparse, in that all have
some NAs; some have all NAs--call those "trivial"). The non-trivial
layers have a problem: data was logged so that values sum. (I.e.,
instead of logging value=a in gridcell [i,j] and value=b in the "next"
non-NA gridcell[i+m,j+n], value(gridcell[i+m,j+n]) = a+b.) I wrote an
R routine that "demonotonicizes" (since all data >= 0) the source
data, writing to a new "fixed" or "target" file with the same
structure as the source. As part of the routine I check that each
target layer contains values s.t.

* ∀ target values v: (v > 0) || is.na(v)
* ∀i,j: is.na(value(source[i,j])) ⇔ is.na(value(target[i,j]))

I also want, for each layer, to plot both the source and the target.
My plot code is like

plot.layers.for.timestep <- function(source.file, source.datavar,
  target.datavar, datavar.name, datavar.n.layers, colors, map) {
  # Get the grid origin, cell sizes, cell centers, etc
  # ...

  pdf("compare.source.target.pdf", height=3.5, width=5, pointsize=1)
  for (i.layer in 1:datavar.n.layers) {
# plot the source data
# debugging
print(paste('Non-null image.plot for source layer=', i.layer))
# for non-trivial layers
if (sum(!is.na(source.datavar[,,i.layer]))) {
  image.plot(x.cell.centers.km, y.cell.centers.km,
 source.datavar[,,i.layer],
 xlab="", ylab="", axes=F, col=colors(100),
 main=paste("Source: ", datavar.name, ",
 Layer: ", i.layer, sep="")
  )
  lines(map)
} else { # trivial layers
...
}
# plot the fixed data
# debugging
print(paste('Non-null image.plot for target layer=', i.layer))
debug(image.plot)
# for non-trivial layers
if (sum(!is.na(target.datavar[,,i.layer]))) {
  image.plot(x.cell.centers.km, y.cell.centers.km, xlab="", ylab="", 
 target.datavar[,,i.layer], axes=F, col=colors(100),
 main=paste("Target: ", datavar.name,", Layer: ", i.layer, 
sep=""))
  lines(map)
} else { # trivial layers
...
}
  }
  dev.off()
} # end function plot.layers.for.timestep.fun

The first layer is non-trivial, and the source layer plots to
./compare.source.target.pdf as expected: data, title, legend. 
Then the target title and data plot, but abends before drawing
the legend, with

> Error in if (del == 0 && to == 0) return(to) : 
>   missing value where TRUE/FALSE needed
> Calls: plot.layers.for.timestep -> image.plot -> seq -> seq.default

Being a relatively new R user, I read Peng's "Introduction to the
Interactive Debugging Tools in R" (though 10 years old, everything
worked as advertised) and instrumented as above. During the debug
session, I did

> debug(image.plot.info)

and stepped in. image.plot.info(...) tried/failed several ways to set
values, before exiting with

> Browse[2]> info
> $xlim
> [1] NA

> $ylim
> [1] NA

> $zlim
> [1] NA

> $poly.grid
> [1] FALSE

which then causes the exception above.

How to proceed? If there's a better way to report the bug, please let
me know (and feel free to forward). I can debug further if
instructions are provided. I can provide the offending dataset, but
it's fairly large (638 MB).

TIA, Tom Roche 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread G See

Sorry, I meant
Do you know of a way to print a string such that I can see whether it
contains a *space* or a no-break space?


On Fri, Feb 3, 2012 at 10:10 AM, G See  wrote:
> Petr,
>
> Thank you!  That is great.
>
> Do you know of a way to print a string such that I can see whether it
> contains a string or a no-break space?
>
> Thanks,
> Garrett
>
> On Fri, Feb 3, 2012 at 10:01 AM, Petr Savicky  wrote:
>> On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:
>>> I have a data.frame named "df". The dput of df is at the bottom of this 
>>> e-mail.
>>> What I'd like to do is replace the "n/a " values with NA.  On Mac OSX, it 
>>> works
>>> to do this:
>>> df[df == "n/a"] <- NA
>>>
>>> However, it does not work on Ubuntu.  See below.
>>>
>>> Thanks in advance,
>>> Garrett
>>>
>>> > x <- df[27, 4] # complete data.frame dput is below
>>> > dput(x)
>>> "n/a "
>>
>> Hi.
>>
>> This string contains a no-break space, not a space.
>>
>>  "n/a " == "n/a\uA0"
>>
>>  [1] TRUE
>>
>>  "n/a\uA0"
>>
>>  [1] "n/a "
>>
>> Hope this helps.
>>
>> Petr Savicky.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread G See

Petr,

Thank you!  That is great.

Do you know of a way to print a string such that I can see whether it
contains a string or a no-break space?

Thanks,
Garrett

On Fri, Feb 3, 2012 at 10:01 AM, Petr Savicky  wrote:
> On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:
>> I have a data.frame named "df". The dput of df is at the bottom of this 
>> e-mail.
>> What I'd like to do is replace the "n/a " values with NA.  On Mac OSX, it 
>> works
>> to do this:
>> df[df == "n/a"] <- NA
>>
>> However, it does not work on Ubuntu.  See below.
>>
>> Thanks in advance,
>> Garrett
>>
>> > x <- df[27, 4] # complete data.frame dput is below
>> > dput(x)
>> "n/a "
>
> Hi.
>
> This string contains a no-break space, not a space.
>
>  "n/a " == "n/a\uA0"
>
>  [1] TRUE
>
>  "n/a\uA0"
>
>  [1] "n/a "
>
> Hope this helps.
>
> Petr Savicky.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to use a sequence of covariates in linear model (lm)?

2012-02-03 Thread Joshua Wiley

Or pass the covariates as a matrix.  See ?lm for details.

On Feb 3, 2012, at 7:51, "R. Michael Weylandt"  
wrote:

> Usually that's what the dot in a formula is used for.
> 
> E.g.,
> 
> data(iris)
> str(iris)
> lm(Petal.Width ~ ., data = iris)
> 
> Michael
> 
> On Fri, Feb 3, 2012 at 10:45 AM, michael  wrote:
>> I have a high dimension linear model:
>> 
>> y~ x1 + x2 + ... + x_n.  n is very large.
>> 
>> For linear model fit, I wish to use a sequence of covariates, say X1 to
>> X200, without typing every single covariate in the function (my variable
>> names are coded in the pattern of X1, X2, ...). I think all.var or
>> all.names might have worked but I can't figure out how to do it. Please
>> help.
>> 
>> Thanks,
>> 
>> Michael
>> 
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread G See

Hi Sarah,

Thank you very much for the response.

In fact, it does work on Mac even without including the space:

> Symbol <- "GOOG"
> require(XML)
Loading required package: XML
> URL <- paste("http://earnings.com/company.asp?client=cb&ticker=";, Symbol, 
> sep="")
> x <- readHTMLTable(URL, stringsAsFactors=FALSE)
> table.loc <- tail(grep("Earnings Releases", x), 1) + 1
> if (identical(numeric(0), table.loc)) return(NULL)
> rdata <- x[[table.loc]]
> header <- rdata[1, ]
> rdata <- rdata[-1, ]
> colnames(rdata) <- header
> #format ticker column
> rdata[, 1] <- gsub("\r\n\t\t\t", "", rdata[, 1])
> rdata <- na.omit(rdata)
>
> any(is.na(rdata))
[1] FALSE
> rdata[rdata == "n/a"] <- NA
> any(is.na(rdata))
[1] TRUE

Garrett

On Fri, Feb 3, 2012 at 9:57 AM, Sarah Goslee  wrote:
> Is that exactly what you're doing, in a clean session?
>
> x <- rdata[27, 4]
>
>> x == "n/a "
> [1] TRUE
>> x == "n/a"
> [1] FALSE
>
> Because as long as the space is included, the test should be TRUE.
>
> (I renamed the dput object rdata, because df() is a base function.)
>
> df[df == "n/a"] <- NA
> shouldn't work on Mac, or any other system, because no elements of
> your data frame are "n/a", but are instead "n/a "
>
> If it were my data, I'd get rid of the spaces at the end of the values before
> trying to do anything, either before reading it into R, or with gsub() after.
>
> Sarah
>
> On Fri, Feb 3, 2012 at 10:25 AM, G See  wrote:
>> I have a data.frame named "df". The dput of df is at the bottom of this 
>> e-mail.
>> What I'd like to do is replace the "n/a " values with NA.  On Mac OSX, it 
>> works
>> to do this:
>> df[df == "n/a"] <- NA
>>
>> However, it does not work on Ubuntu.  See below.
>>
>> Thanks in advance,
>> Garrett
>>
>>> x <- df[27, 4] # complete data.frame dput is below
>>> dput(x)
>> "n/a "
>>> x == "n/a "
>> [1] FALSE
>>> x == "n/a"
>> [1] FALSE
>>> str(x)
>>  chr "n/a "
>>> is.na(x)
>> [1] FALSE
>>> grep("n/a ", x)
>> integer(0)
>>> grep("n/a", x)
>> [1] 1
>>
>>
>>> sessionInfo()
>> R version 2.14.1 (2011-12-22)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] XML_3.4-3                  qmao_1.1.10
>> [3] FinancialInstrument_0.10.9 quantmod_0.3-17
>> [5] TTR_0.21-0                 Defaults_1.1-1
>> [7] xts_0.8-3                  zoo_1.7-6
>>
>> loaded via a namespace (and not attached):
>> [1] grid_2.14.1    lattice_0.20-0 tools_2.14.1
>>>
>>
>>
>> ### More detail ###
>> ## Here is the complete data.frame
>>> dput(df)
>> structure(list(SYMBOL = c("GOOG ", "GOOG ", "GOOG ", "GOOG ",
>> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
>> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
>> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
>> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG "), PERIOD = c("Q4 2011",
>> "Q3 2011", "Q2 2011", "Q1 2011", "Q4 2010", "Q3 2010", "Q2 2010",
>> "Q1 2010", "Q4 2009", "Q3 2009", "Q2 2009", "Q1 2009", "Q4 2008",
>> "Q3 2008", "Q2 2008", "Q1 2008", "Q4 2007", "Q3 2007", "Q2 2007",
>> "Q1 2007", "Q4 2006", "Q3 2006", "Q2 2006", "Q1 2006", "Q4 2005",
>> "Q3 2005", "Q2 2005", "Q1 2005", "Q4 2004", "Q3 2004"),
>>    `EVENT TITLE` = c("Q4 2011 Google Earnings Release", "Q3 2011
>> Google Inc Earnings Release",
>>    "Q2 2011 Google Inc Earnings Release", "Q1 2011 Google Inc
>> Earnings Release",
>>    "Q4 2010 Google Earnings Release", "Q3 2010 Google Earnings Release",
>>    "Q2 2010 Google Earnings Release", "Q1 2010 Google Earnings Release",
>>    "Q4 2009 Google Earnings Release", "Q3 2009 Google Earnings Release",
>>    "Q2 2009 Google Earnings Release", "Q1 2009 Google Earnings Release",
>>    "Q4 2008 Google Earnings Release", "Q3 2008 Google Earnings Release",
>>    "Q2 2008 Google Earnings Release", "Q1 2008 Google Earnings Release",
>>    "Q4 2007 Google Earnings Release", "Q3 2007 Google Earnings Release",
>>    "Q2 2007 Google Earnings Release", "Q1 2007 Google Earnings Release",
>>    "Q4 2006 Google Earnings Release", "Q3 2006 Google Earnings Release",
>>    "Q2 2006 Google Earnings Release", "Q1 2006 Google Earnings Release",
>>    "Q4 2005 Google Earnings Release", "Q3 2005 Google Earnings Release",
>>    "Q2 2005 Google Earnings Release", "Q1 2005 Google Earnings Release",
>>    "Q4 2004 Google Earnings Release", "Q3 2004 Google Earnings Release"
>>    ), `EPS ESTIMATE` = c("$ 10.49 ", "$ 8.74 ", "$ 7.85 ",
>>    "$ 8.10 ", "$ 8.09 ", "$ 6.68 ", "$ 6.52 ", "$ 6.60 ",
>>    "$ 6.50 ", "$ 5.42 ", "$ 5.09 ", "$ 4.93 ", "$ 4.95 ",
>>    "$ 4.76 ", "$ 4.74 ", "$ 4.52 ", "$ 4.44

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread Petr Savicky

On Fri, Feb 03, 2012 at 09:25:10AM -0600, G See wrote:
> I have a data.frame named "df". The dput of df is at the bottom of this 
> e-mail.
> What I'd like to do is replace the "n/a " values with NA.  On Mac OSX, it 
> works
> to do this:
> df[df == "n/a"] <- NA
> 
> However, it does not work on Ubuntu.  See below.
> 
> Thanks in advance,
> Garrett
> 
> > x <- df[27, 4] # complete data.frame dput is below
> > dput(x)
> "n/a "

Hi.

This string contains a no-break space, not a space.

  "n/a " == "n/a\uA0"

  [1] TRUE

  "n/a\uA0"

  [1] "n/a "

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot get "==" operator to return TRUE

2012-02-03 Thread Sarah Goslee

Is that exactly what you're doing, in a clean session?

x <- rdata[27, 4]

> x == "n/a "
[1] TRUE
> x == "n/a"
[1] FALSE

Because as long as the space is included, the test should be TRUE.

(I renamed the dput object rdata, because df() is a base function.)

df[df == "n/a"] <- NA
shouldn't work on Mac, or any other system, because no elements of
your data frame are "n/a", but are instead "n/a "

If it were my data, I'd get rid of the spaces at the end of the values before
trying to do anything, either before reading it into R, or with gsub() after.

Sarah

On Fri, Feb 3, 2012 at 10:25 AM, G See  wrote:
> I have a data.frame named "df". The dput of df is at the bottom of this 
> e-mail.
> What I'd like to do is replace the "n/a " values with NA.  On Mac OSX, it 
> works
> to do this:
> df[df == "n/a"] <- NA
>
> However, it does not work on Ubuntu.  See below.
>
> Thanks in advance,
> Garrett
>
>> x <- df[27, 4] # complete data.frame dput is below
>> dput(x)
> "n/a "
>> x == "n/a "
> [1] FALSE
>> x == "n/a"
> [1] FALSE
>> str(x)
>  chr "n/a "
>> is.na(x)
> [1] FALSE
>> grep("n/a ", x)
> integer(0)
>> grep("n/a", x)
> [1] 1
>
>
>> sessionInfo()
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] XML_3.4-3                  qmao_1.1.10
> [3] FinancialInstrument_0.10.9 quantmod_0.3-17
> [5] TTR_0.21-0                 Defaults_1.1-1
> [7] xts_0.8-3                  zoo_1.7-6
>
> loaded via a namespace (and not attached):
> [1] grid_2.14.1    lattice_0.20-0 tools_2.14.1
>>
>
>
> ### More detail ###
> ## Here is the complete data.frame
>> dput(df)
> structure(list(SYMBOL = c("GOOG ", "GOOG ", "GOOG ", "GOOG ",
> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
> "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG "), PERIOD = c("Q4 2011",
> "Q3 2011", "Q2 2011", "Q1 2011", "Q4 2010", "Q3 2010", "Q2 2010",
> "Q1 2010", "Q4 2009", "Q3 2009", "Q2 2009", "Q1 2009", "Q4 2008",
> "Q3 2008", "Q2 2008", "Q1 2008", "Q4 2007", "Q3 2007", "Q2 2007",
> "Q1 2007", "Q4 2006", "Q3 2006", "Q2 2006", "Q1 2006", "Q4 2005",
> "Q3 2005", "Q2 2005", "Q1 2005", "Q4 2004", "Q3 2004"),
>    `EVENT TITLE` = c("Q4 2011 Google Earnings Release", "Q3 2011
> Google Inc Earnings Release",
>    "Q2 2011 Google Inc Earnings Release", "Q1 2011 Google Inc
> Earnings Release",
>    "Q4 2010 Google Earnings Release", "Q3 2010 Google Earnings Release",
>    "Q2 2010 Google Earnings Release", "Q1 2010 Google Earnings Release",
>    "Q4 2009 Google Earnings Release", "Q3 2009 Google Earnings Release",
>    "Q2 2009 Google Earnings Release", "Q1 2009 Google Earnings Release",
>    "Q4 2008 Google Earnings Release", "Q3 2008 Google Earnings Release",
>    "Q2 2008 Google Earnings Release", "Q1 2008 Google Earnings Release",
>    "Q4 2007 Google Earnings Release", "Q3 2007 Google Earnings Release",
>    "Q2 2007 Google Earnings Release", "Q1 2007 Google Earnings Release",
>    "Q4 2006 Google Earnings Release", "Q3 2006 Google Earnings Release",
>    "Q2 2006 Google Earnings Release", "Q1 2006 Google Earnings Release",
>    "Q4 2005 Google Earnings Release", "Q3 2005 Google Earnings Release",
>    "Q2 2005 Google Earnings Release", "Q1 2005 Google Earnings Release",
>    "Q4 2004 Google Earnings Release", "Q3 2004 Google Earnings Release"
>    ), `EPS ESTIMATE` = c("$ 10.49 ", "$ 8.74 ", "$ 7.85 ",
>    "$ 8.10 ", "$ 8.09 ", "$ 6.68 ", "$ 6.52 ", "$ 6.60 ",
>    "$ 6.50 ", "$ 5.42 ", "$ 5.09 ", "$ 4.93 ", "$ 4.95 ",
>    "$ 4.76 ", "$ 4.74 ", "$ 4.52 ", "$ 4.44 ", "$ 3.78 ",
>    "$ 3.59 ", "$ 3.30 ", "$ 2.92 ", "$ 2.42 ", "$ 2.22 ",
>    "$ 1.97 ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ",
>    "n/a "), `EPS ACTUAL` = c("$ 9.50 ", "$ 9.72 ", "$ 8.74 ",
>    "$ 8.08 ", "$ 8.75 ", "$ 7.64 ", "$ 6.45 ", "$ 6.76 ",
>    "$ 6.79 ", "$ 5.89 ", "$ 5.36 ", "$ 5.16 ", "$ 5.10 ",
>    "$ 4.92 ", "$ 4.63 ", "$ 4.84 ", "$ 4.43 ", "$ 3.91 ",
>    "$ 3.56 ", "$ 3.68 ", "$ 3.18 ", "$ 2.62 ", "$ 2.49 ",
>    "$ 2.29 ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ",
>    "n/a "), `PREV. YEAR ACTUAL` = c("$ 8.75 ", "$ 7.64 ",
>    "$ 6.45 ", "$ 6.76 ", "$ 6.79 ", "$ 5.89 ", "$ 5.36 ",
>    "$ 5.16 ", "$ 5.10 ", "$ 4.92 ", "$ 4.63 ", "$ 4.84 ",
>    "$ 4.43 ", "$ 3.91 ", "$ 3.56 ", "$ 3.68 ", "$ 3.18 ",
>    "$ 2.62 ", "$ 2.49 ", "$ 2.29 ", "n/a ", "n/a ", "n/a ",
>    "n/a ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a "
>    ), TIME = c("2012-01-19 15:15:00 CST", "2011-10-13 15:15:

Re: [R] How to use a sequence of covariates in linear model (lm)?

2012-02-03 Thread R. Michael Weylandt

Usually that's what the dot in a formula is used for.

E.g.,

data(iris)
str(iris)
lm(Petal.Width ~ ., data = iris)

Michael

On Fri, Feb 3, 2012 at 10:45 AM, michael  wrote:
> I have a high dimension linear model:
>
> y~ x1 + x2 + ... + x_n.  n is very large.
>
> For linear model fit, I wish to use a sequence of covariates, say X1 to
> X200, without typing every single covariate in the function (my variable
> names are coded in the pattern of X1, X2, ...). I think all.var or
> all.names might have worked but I can't figure out how to do it. Please
> help.
>
> Thanks,
>
> Michael
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to use a sequence of covariates in linear model (lm)?

2012-02-03 Thread michael

I have a high dimension linear model:

y~ x1 + x2 + ... + x_n.  n is very large.

For linear model fit, I wish to use a sequence of covariates, say X1 to
X200, without typing every single covariate in the function (my variable
names are coded in the pattern of X1, X2, ...). I think all.var or
all.names might have worked but I can't figure out how to do it. Please
help.

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Gabor Grothendieck

On Fri, Feb 3, 2012 at 8:08 AM, HC  wrote:
> This is a 160 GB tab-separated .txt file. It has 9 columns and 3.25x10^9
> rows.
>
> Can R handle it?
>

You can process a file N lines at time like this:

con <- file("myfile.dat", "r")
while(length(Lines <- readLines(con, n = N)) > 0) {
  ... whatever...
}

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cannot get "==" operator to return TRUE

2012-02-03 Thread G See

I have a data.frame named "df". The dput of df is at the bottom of this e-mail.
What I'd like to do is replace the "n/a " values with NA.  On Mac OSX, it works
to do this:
df[df == "n/a"] <- NA

However, it does not work on Ubuntu.  See below.

Thanks in advance,
Garrett

> x <- df[27, 4] # complete data.frame dput is below
> dput(x)
"n/a "
> x == "n/a "
[1] FALSE
> x == "n/a"
[1] FALSE
> str(x)
 chr "n/a "
> is.na(x)
[1] FALSE
> grep("n/a ", x)
integer(0)
> grep("n/a", x)
[1] 1


> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] XML_3.4-3  qmao_1.1.10
[3] FinancialInstrument_0.10.9 quantmod_0.3-17
[5] TTR_0.21-0 Defaults_1.1-1
[7] xts_0.8-3  zoo_1.7-6

loaded via a namespace (and not attached):
[1] grid_2.14.1lattice_0.20-0 tools_2.14.1
>


### More detail ###
## Here is the complete data.frame
> dput(df)
structure(list(SYMBOL = c("GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG ",
"GOOG ", "GOOG ", "GOOG ", "GOOG ", "GOOG "), PERIOD = c("Q4 2011",
"Q3 2011", "Q2 2011", "Q1 2011", "Q4 2010", "Q3 2010", "Q2 2010",
"Q1 2010", "Q4 2009", "Q3 2009", "Q2 2009", "Q1 2009", "Q4 2008",
"Q3 2008", "Q2 2008", "Q1 2008", "Q4 2007", "Q3 2007", "Q2 2007",
"Q1 2007", "Q4 2006", "Q3 2006", "Q2 2006", "Q1 2006", "Q4 2005",
"Q3 2005", "Q2 2005", "Q1 2005", "Q4 2004", "Q3 2004"),
`EVENT TITLE` = c("Q4 2011 Google Earnings Release", "Q3 2011
Google Inc Earnings Release",
"Q2 2011 Google Inc Earnings Release", "Q1 2011 Google Inc
Earnings Release",
"Q4 2010 Google Earnings Release", "Q3 2010 Google Earnings Release",
"Q2 2010 Google Earnings Release", "Q1 2010 Google Earnings Release",
"Q4 2009 Google Earnings Release", "Q3 2009 Google Earnings Release",
"Q2 2009 Google Earnings Release", "Q1 2009 Google Earnings Release",
"Q4 2008 Google Earnings Release", "Q3 2008 Google Earnings Release",
"Q2 2008 Google Earnings Release", "Q1 2008 Google Earnings Release",
"Q4 2007 Google Earnings Release", "Q3 2007 Google Earnings Release",
"Q2 2007 Google Earnings Release", "Q1 2007 Google Earnings Release",
"Q4 2006 Google Earnings Release", "Q3 2006 Google Earnings Release",
"Q2 2006 Google Earnings Release", "Q1 2006 Google Earnings Release",
"Q4 2005 Google Earnings Release", "Q3 2005 Google Earnings Release",
"Q2 2005 Google Earnings Release", "Q1 2005 Google Earnings Release",
"Q4 2004 Google Earnings Release", "Q3 2004 Google Earnings Release"
), `EPS ESTIMATE` = c("$ 10.49 ", "$ 8.74 ", "$ 7.85 ",
"$ 8.10 ", "$ 8.09 ", "$ 6.68 ", "$ 6.52 ", "$ 6.60 ",
"$ 6.50 ", "$ 5.42 ", "$ 5.09 ", "$ 4.93 ", "$ 4.95 ",
"$ 4.76 ", "$ 4.74 ", "$ 4.52 ", "$ 4.44 ", "$ 3.78 ",
"$ 3.59 ", "$ 3.30 ", "$ 2.92 ", "$ 2.42 ", "$ 2.22 ",
"$ 1.97 ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ",
"n/a "), `EPS ACTUAL` = c("$ 9.50 ", "$ 9.72 ", "$ 8.74 ",
"$ 8.08 ", "$ 8.75 ", "$ 7.64 ", "$ 6.45 ", "$ 6.76 ",
"$ 6.79 ", "$ 5.89 ", "$ 5.36 ", "$ 5.16 ", "$ 5.10 ",
"$ 4.92 ", "$ 4.63 ", "$ 4.84 ", "$ 4.43 ", "$ 3.91 ",
"$ 3.56 ", "$ 3.68 ", "$ 3.18 ", "$ 2.62 ", "$ 2.49 ",
"$ 2.29 ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ",
"n/a "), `PREV. YEAR ACTUAL` = c("$ 8.75 ", "$ 7.64 ",
"$ 6.45 ", "$ 6.76 ", "$ 6.79 ", "$ 5.89 ", "$ 5.36 ",
"$ 5.16 ", "$ 5.10 ", "$ 4.92 ", "$ 4.63 ", "$ 4.84 ",
"$ 4.43 ", "$ 3.91 ", "$ 3.56 ", "$ 3.68 ", "$ 3.18 ",
"$ 2.62 ", "$ 2.49 ", "$ 2.29 ", "n/a ", "n/a ", "n/a ",
"n/a ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a ", "n/a "
), TIME = c("2012-01-19 15:15:00 CST", "2011-10-13 15:15:00 CDT",
"2011-07-14 15:15:00 CDT", "2011-04-14 15:15:00 CDT", "2011-01-20
15:15:00 CST",
"2010-10-14 15:15:00 CDT", "2010-07-15 15:15:00 CDT", "2010-04-15
15:15:00 CDT",
"2010-01-21 15:15:00 CST", "2009-10-15 15:15:00 CDT", "2009-07-16
15:15:00 CDT",
"2009-04-16 15:15:00 CDT", "2009-01-22 15:15:00 CST", "2008-10-16
15:15:00 CDT",
"2008-07-17 15:15:00 CDT", "2008-04-17 15:15:00 CDT", "2008-01-31
15:15:00 CST",
"2007-10-18 15:15:00 CDT", "2007-07-19 15:15:00 CDT", "2007-04-19
15:15:00 CDT",
"2007-01-31 15:15:00 CST", "2006-10-19 15:15:00 CDT", "2006-07-20
15:15:00 CDT",
"2006-04-20 15:15:00 CDT", "2006-01-31 15:15:00 CST", "2005-10-20
15:15:00 CDT",
"2005-07-21 15:15:00 CDT", "2005-04-21 15:15:00 CDT", "2005-02-01
15:15

Re: [R] Uploading into R

2012-02-03 Thread Duncan Murdoch


On 12-02-03 8:00 AM, carlb1 wrote:

Hi

I am just starting out in R and im trying to upload some data into it. I
have saved a small file from excel as a .txt file in the working directory i
am using in a folder of the same name as the .txt file. When i write the
function to open it I keep getting this message


snow<-read.table("working directory")

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
   cannot open file 'working directory': Permission denied

I cant seem to get any help on the internet to what I am doing wrong and
what does this mean?


If you are really using the working directory in your read.table call, 
that's wrong.  Use the file path.


The easiest way to specify a file in Windows is file.choose().

Duncan Murdoch



any help appreciated

C

--
View this message in context: 
http://r.789695.n4.nabble.com/Uploading-into-R-tp4354538p4354538.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory leaks in .C-interface

2012-02-03 Thread Duncan Murdoch


On 12-02-03 9:26 AM, Grigory Alexandrovich wrote:

Hi,

I wrote a C-function which I call with the .C-interface  ( something
like .C("foo", x, y) ).
The function does a lot of things.
Amongst
other things it allocates much memory (stack and heap).
Every heap allocation (with malloc) has a corresponding free call.

My problem is, that if I call this function many times in a for-loop,
the memory amount used by R
converges to 100% and the process is being killed at the end.

Clearing the workspace doesn't help, since the memory is not occupied by
R-objects.

Is it a known problem with the .C interface? How can I found out what
kind of data fills the memory and especially how can I clean it?


The .C() function will create duplicate copies of its arguments, but 
they aren't saved, so garbage collection should reclaim them.  So it's 
most likely a problem in your own C code.  Debugging those is hard.


Duncan Murdoch



I work on Linux (OpenSuse 12.1)

Thanks

Grigory Alexandrovich





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clear last x entries of R console

2012-02-03 Thread Duncan Murdoch


On 12-02-03 9:00 AM, angliski_jigit wrote:

Hi All,

I am trying to build in a progress-tracker into my loops that let me have a
sense of their progress. I'd like to be able to output to screen a series of
periods "" etc. for each completion of the loop, but I  want to
build a pyramid, e.g.
.
..
...

etc. So I need to be able to delete  of the console entry to
accomplish this. There are commands to erase the whole console, but that's
not what I want either; ideally, the command would allow me to erase the
last line or the last x lines.



Just don't write out a newline.  E.g.

for (i in 1:10) {
  cat(".")
  flush.console()
  Sys.sleep(1)
}

You can write out a CR using \r if you want to overwrite the previous 
line, e.g.


for (i in 10:0) {
  cat(i, " \r")
  flush.console()
  Sys.sleep(1)
}


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SPATIAL QUESTION: HOW TO MAKE POLYGONS AROUND CLUSTERS OF POINTS AND EXTRACT AREAS AND COORDINATES OF THESE POLYGONS?

2012-02-03 Thread Bjørn Økland

Imagine that I have a large number of points (given by coordinates x and y) 
that vary in density per space. For the purpose of demonstration it could be 
generated like this: s <- 
data.frame(x=runif(1,0,900),y=runif(1,0,900)); plot(s)

I want to create polygons around the points where point density is greater than 
a selected threshold (for example, by using krieging or equivalent method). For 
these polygons, I want to have the centre coordinates and the size of the area 
for further use in analyses.

I would be very grateful if I could be shown the R packages and functions I 
should use to accomplish this, and even some outline of the code. Is it 
possible?

Best regards
Bjørn


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot theme_update

2012-02-03 Thread ONKELINX, Thierry

Hi Nameless,

You would have to add your code to the source of the ggplot2 package to make it 
permanent. But that not a vary good idea. 

Just add the line to your script and rerun it when you restart your analysis.

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
vd3000
Verzonden: vrijdag 3 februari 2012 9:31
Aan: r-help@r-project.org
Onderwerp: [R] ggplot theme_update

Hi, all,

I am a newbie for [r]. 

I am currently trying to learn this example. 
http://learnr.wordpress.com/2009/03/17/ggplot2-barplots/
http://learnr.wordpress.com/2009/03/17/ggplot2-barplots/ 

I know if I need to show the graph properly, I need to update the theme by this 
command:

*> immigration_theme <- theme_update(axis.text.x = theme_text(angle = 0, hjust 
= 0.5, size=20), axis.text.y = theme_text(angle = 0, hjust = 0.5, size=20), 
panel.grid.major = theme_line(colour = "grey90"), panel.grid.minor = 
theme_blank(), panel.background = theme_blank(), axis.ticks = theme_blank(), 
legend.position = "none")*

However, everytime when I close the r programme and run it again, I need to 
rerun theme_update again in order to show the picture properly. That is, the 
theme_update could not really update the theme after I close [r].

So, how could I change the parameters permenantly using theme_update?

I am tired to google for the theme_update for 3 days...

Hope some genius could help.

Thanks.
 

--
View this message in context: 
http://r.789695.n4.nabble.com/ggplot-theme-update-tp4354015p4354015.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread HC

This is a 160 GB tab-separated .txt file. It has 9 columns and 3.25x10^9
rows.

Can R handle it?  

Thank you.
HC



--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4354556.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Uploading into R

2012-02-03 Thread carlb1

Hi 

I am just starting out in R and im trying to upload some data into it. I
have saved a small file from excel as a .txt file in the working directory i
am using in a folder of the same name as the .txt file. When i write the
function to open it I keep getting this message 

> snow<-read.table("working directory")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'working directory': Permission denied

I cant seem to get any help on the internet to what I am doing wrong and
what does this mean? 

any help appreciated 

C

--
View this message in context: 
http://r.789695.n4.nabble.com/Uploading-into-R-tp4354538p4354538.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to plot several curves in the same frame

2012-02-03 Thread ikuzar

Ok. I really have 15 days to plot,  the x-axis is the date, and it is going
to the MINUTES in a day. I have to plot a curve per day (so 15 plots)

The real data is like this:

2012-02-01 00:01:00; 2100 
2012-02-01 02:02:00; 2200 
...
2012-02-15 23:58:00; 2400 
2012-02-15 23:59:00; 2400



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-plot-several-curves-in-the-same-frame-tp4354165p4354523.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Memory leaks in .C-interface

2012-02-03 Thread Grigory Alexandrovich

Hi,

I wrote a C-function which I call with the .C-interface  ( something 
like .C("foo", x, y) ).
The function does a lot of things. 
Amongst 
other things it allocates much memory (stack and heap).
Every heap allocation (with malloc) has a corresponding free call.

My problem is, that if I call this function many times in a for-loop, 
the memory amount used by R
converges to 100% and the process is being killed at the end.

Clearing the workspace doesn't help, since the memory is not occupied by 
R-objects.

Is it a known problem with the .C interface? How can I found out what 
kind of data fills the memory and especially how can I clean it?

I work on Linux (OpenSuse 12.1)

Thanks

Grigory Alexandrovich





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Clear last x entries of R console

2012-02-03 Thread angliski_jigit

Hi All,

I am trying to build in a progress-tracker into my loops that let me have a
sense of their progress. I'd like to be able to output to screen a series of
periods "" etc. for each completion of the loop, but I  want to
build a pyramid, e.g.
.
..
...

etc. So I need to be able to delete  of the console entry to
accomplish this. There are commands to erase the whole console, but that's
not what I want either; ideally, the command would allow me to erase the
last line or the last x lines.

Thanks
Angliski


--
View this message in context: 
http://r.789695.n4.nabble.com/Clear-last-x-entries-of-R-console-tp4354669p4354669.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ode() tries to allocate an absurd amount of memory

2012-02-03 Thread Thomas Brown

Hi there R-helpers:

I'm having problems with the function ode() found in the package deSolve.
It seems that when my state variables are too numerous (>33000 elements),
the function throws the following error:

Error in vode(y, times, func, parms, ...) :
  cannot allocate memory block of size 137438953456.0 Gb
In addition: Warning message:
In vode(y, times, func, parms, ...) : NAs introduced by coercion

This appears to be case regardless of the computer I use; that is, whether
it's a laptop or server with 24Gb of RAM. Why is ode() trying to allocate
137 billion gigabytes of memory?! (I receive exactly the same error message
whether I have, for example, 34000 or 8 state variables: the amount of
memory trying to be allocated is exactly the same.) I have included a
trivial example below that uses a function that returns a rate of change of
zero for all state variables.

> require(deSolve)
Loading required package: deSolve
> C<-rep(0,34000)
> TestFunc<-function(t,C,para){
+ return(list(rep(0,length(C
+ }
> soln<-ode(y=C,times=seq(0,1,0.1),func=TestFunc,parms=c(0),method="vode")
Error in vode(y, times, func, parms, ...) :
  cannot allocate memory block of size 137438953456.0 Gb
In addition: Warning message:
In vode(y, times, func, parms, ...) : NAs introduced by coercion
>

Am I making a foolish mistake somewhere or is this simply a limitation of
the function?

Thanks in advance!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strftime - Dates from Excel files

2012-02-03 Thread Mikko Korpela

On 02/03/2012 03:34 PM, Ana wrote:
> Hi
> 
> I have many excel files were the Date field was not declared as date,
> so the dates look like this: 1/2/1978
> I know that the format is day/month/year
> 
> How can I make R change this to Date format?
> 
> If I use strftime, I get wrong dates:
> 
> dataset=c("1/2/1978")
> 
> strftime(dataset,"%d/%m/%Y")
> "19/02/0001"

Hi!

Prof. Ripley already provided a nice, concise answer, but here's a more
verbose one.

The function strftime() is used for output formatting. In your example,
"%d/%m/%Y" is the chosen output format. Use strptime() for converting
character vectors (i.e. text input) to class "POSIXlt". For converting
to class "Date", use as.Date(). These are alternative classes for
representing dates in R.

> strptime(dataset, format="%d/%m/%Y")
[1] "1978-02-01"

> as.Date(dataset, format="%d/%m/%Y")
[1] "1978-02-01"

For converting "POSIXlt" or "Date" back to a character representation,
use as.character() or, for a customizable style, format().

> format(strptime(dataset, format="%d/%m/%Y"), "%a %b %d, %Y")
[1] "Wed Feb 01, 1978"

-- 
Mikko Korpela
Aalto University School of Science
Department of Information and Computer Science

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assigning objects to variable and variable to list in a for loop

2012-02-03 Thread Johannes Radinger

hi,

 Original-Nachricht 
> Datum: Fri, 3 Feb 2012 09:04:19 -0500
> Von: Steve Lianoglou 
> An: Johannes Radinger 
> CC: Joshua Wiley , r-help@r-project.org
> Betreff: Re: [R] Assigning objects to variable and variable to list in a for 
> loop

> Hi,
> 
> On Fri, Feb 3, 2012 at 8:00 AM, Johannes Radinger 
> wrote:
> > Hello,
> >
> > I tried to use the lapply approach, but I am not sure how to
> > se it correctly for my task. Here I just want to give an short
> > script which explains how my data structure looks like. It also
> > contains the second approach with a for loop which is working but
> > there is the question of how assining the result to a list.
> >
> > I think the script is somehow self explaining. Anyway what is called
> > res is in "reality" rather an object (maxent) than a single value (thats
> why I need the list)
> >
> > #create data
> > cat <- c("A","A","B","C","C")
> > value <- runif(5)
> > df <- data.frame(cat,value)
> >
> > # get names of cat with more than 1 entries
> > select.cat <- names(table(df$cat)[table(df$cat)>1])
> > cat.list <- as.list(select.cat)
> >
> > ## lapply approach 
> > fun = function(x){
> >        sub.df<- subset(df,cat ==  x)
> >        # here are other operations, result is an object
> >        res <- sub.df
> >        res #here just a single value but in the long script it is an
> object
> >        }
> > reslist <- lapply(cat.list, fun(unlist(cat.list)))
> 
> I think you may need to thumb through the ?lapply documentation a bit
> more. lapply will feed ever element in the list you are iterating over
> into the first argument of the function you have in lapply's second
> argument, so you would rather have something like:
> 
> reslist <- lapply(cat.list, fun)

Thank you! Thats it...fun doesn't need any argument to be specified as it is 
autmatically feed in as you said..

/johannes

> 
> Assuming that `fun` only needs one element from cat.list to do its duty
> ...
> 
> -steve
> 
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assigning objects to variable and variable to list in a for loop

2012-02-03 Thread Steve Lianoglou

Hi,

On Fri, Feb 3, 2012 at 8:00 AM, Johannes Radinger  wrote:
> Hello,
>
> I tried to use the lapply approach, but I am not sure how to
> se it correctly for my task. Here I just want to give an short
> script which explains how my data structure looks like. It also
> contains the second approach with a for loop which is working but
> there is the question of how assining the result to a list.
>
> I think the script is somehow self explaining. Anyway what is called
> res is in "reality" rather an object (maxent) than a single value (thats why 
> I need the list)
>
> #create data
> cat <- c("A","A","B","C","C")
> value <- runif(5)
> df <- data.frame(cat,value)
>
> # get names of cat with more than 1 entries
> select.cat <- names(table(df$cat)[table(df$cat)>1])
> cat.list <- as.list(select.cat)
>
> ## lapply approach 
> fun = function(x){
>        sub.df<- subset(df,cat ==  x)
>        # here are other operations, result is an object
>        res <- sub.df
>        res #here just a single value but in the long script it is an object
>        }
> reslist <- lapply(cat.list, fun(unlist(cat.list)))

I think you may need to thumb through the ?lapply documentation a bit
more. lapply will feed ever element in the list you are iterating over
into the first argument of the function you have in lapply's second
argument, so you would rather have something like:

reslist <- lapply(cat.list, fun)

Assuming that `fun` only needs one element from cat.list to do its duty ...

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strftime - Dates from Excel files

2012-02-03 Thread Prof Brian Ripley


On Fri, 3 Feb 2012, Ana wrote:


Hi

I have many excel files were the Date field was not declared as date,
so the dates look like this: 1/2/1978
I know that the format is day/month/year

How can I make R change this to Date format?

If I use strftime, I get wrong dates:


So use as.Date to convert to the Date class.


as.Date(dataset,"%d/%m/%Y")

[1] "1978-02-01"



dataset=c("1/2/1978")

strftime(dataset,"%d/%m/%Y")
"19/02/0001"


On some unstated OS (how year 1 is represented is OS-dependent). 
The format in strftime applies to output: the default one is used for 
input.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Steve Lianoglou

On Fri, Feb 3, 2012 at 7:37 AM, Gabor Grothendieck
 wrote:
> On Fri, Feb 3, 2012 at 6:03 AM, HC  wrote:
>> Thank you for indicating that SQLite may not handle a file as big as 160 GB.
>>
>> Would you know of any utility for *physically splitting *the 160 GB text
>> file into pieces. And if one can control the splitting at the  end of a
>> record.
>>
>
> If they are csv files or similar data files then you could use R or
> any scripting language to do that.

Or even the *nix `split` command ...

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] strftime - Dates from Excel files

2012-02-03 Thread Ana

Hi

I have many excel files were the Date field was not declared as date,
so the dates look like this: 1/2/1978
I know that the format is day/month/year

How can I make R change this to Date format?

If I use strftime, I get wrong dates:

dataset=c("1/2/1978")

strftime(dataset,"%d/%m/%Y")
"19/02/0001"


Thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] possibly Error in R version 2.12.1 (2010-12-16)

2012-02-03 Thread peter dalgaard


On Feb 2, 2012, at 21:24 , Frank Schwidom wrote:

> Hi, 
> 
> the following Code demonstrates an possibly Error in R
> (or you can explain me, why this happens, thanks in advance)

Looks like an effect of lazy evaluation: The value of i is not evaluated until 
after the loop has ended, at which point it will be 2. This is a feature, not 
an error, even if it confuses people at times...

-pd


> 
> Code:
> 
> #
> 
> testClass <- function( stackData= c())
> {
> 
> list(
> 
>  write= function( ...)
>  {
>   sChain= ""
>   for( s in c( stackData, ...))
>   {
>sChain= paste( sChain, '"', sub( '"', '"', s), '"', sep, sep='')
>   }
>   write( sChain, fHandle, append=TRUE)
>  },
> 
>  stackIt1 = function( ...)
>  {
>   testClass( stackData= c( stackData, ...))
>  },
> 
>  stackIt2 = function( ...)
>  {
>   tmp= c( stackData, ...)
>   testClass( stackData= tmp)
>  },
> 
>  getStack = function()
>  {
>   stackData
>  },
> 
>  NULL
> )
> }
> 
> to1= testClass()
> 
> for( i in 4:2)
> {
> to1= to1$stackIt1( i)
> }
> 
> print( all( rep( 2, 3) == to1$getStack())) # error!
> 
> to2= testClass()
> 
> for( i in 4:2)
> {
> to2= to2$stackIt2( i)
> }
> 
> print( all( 4:2 == to2$getStack())) # correct!
> 
> # what ist the difference between stackIt1 and stackIt2?
> # (error appears only by using an for loop)
> 
> "
>> version
> _
> platform   i486-pc-linux-gnu
> arch   i486
> os linux-gnu
> system i486, linux-gnu
> status
> major  2
> minor  12.1
> year   2010
> month  12
> day16
> svn rev53855
> language   R
> version.string R version 2.12.1 (2010-12-16)
> 
> Regards
> "
> 
> # End of Code
> 
> written in an R-File and called per source( '.R')
> shows 2 subsequent outputs of 'TRUE', which is not ok
> in my mind
> 
> Thanks for your attention
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assigning objects to variable and variable to list in a for loop

2012-02-03 Thread Johannes Radinger

Hello,

I tried to use the lapply approach, but I am not sure how to 
se it correctly for my task. Here I just want to give an short
script which explains how my data structure looks like. It also
contains the second approach with a for loop which is working but
there is the question of how assining the result to a list.

I think the script is somehow self explaining. Anyway what is called
res is in "reality" rather an object (maxent) than a single value (thats why I 
need the list)

#create data
cat <- c("A","A","B","C","C")
value <- runif(5)
df <- data.frame(cat,value)

# get names of cat with more than 1 entries
select.cat <- names(table(df$cat)[table(df$cat)>1])
cat.list <- as.list(select.cat)

## lapply approach 
fun = function(x){
sub.df<- subset(df,cat ==  x)
# here are other operations, result is an object
res <- sub.df
res #here just a single value but in the long script it is an object
}   
reslist <- lapply(cat.list, fun(unlist(cat.list)))

## For loop approach 
for(i in select.cat){
sub.df<- subset(df,cat ==  i)
res <- sub.df
print(res) #here just a single value but in the long script it is an 
object
#Now I have to collect the results in a list
}

# My task is to do run a function on different parts
#of a dataframe. This dataframe is subdivided with subset on
#one variable.


Thank you very much,

best regards,

Johannes

 Original-Nachricht 
> Datum: Fri, 3 Feb 2012 02:22:30 -0800
> Von: Joshua Wiley 
> An: Johannes Radinger 
> CC: r-help@r-project.org
> Betreff: Re: [R] Assigning objects to variable and variable to list in a for 
> loop

> Hi Johannes,
> 
> There is a relatively elegant solution if you assign in a list:
> 
> reslist <- lapply(1:3, function(x) runif(5))
> names(reslist) <- paste("result", LETTERS[1:3], sep = "_")
> 
> Cheers,
> 
> Josh
> 
> On Fri, Feb 3, 2012 at 2:07 AM, Johannes Radinger 
> wrote:
> > Hello,
> >
> > I want to use a for loop for repeadely calculating
> > a maxent model (package dismo, function maxent()) which
> > creates an object of the class maxent (S4).
> > I want to collect all the resulting object in a list.
> >
> > I tried to simplify my for loop to explain what I want.
> > There are two problems/questions:
> > 1) How can I create the new variables in the loop (using paste) and
> assign the objects
> > 2) How can I collect the results (objects) in a list
> >
> > X <- factor(c("A","B","C"))
> >
> > for(in in X){
> >        as.name(paste("result","X",sep="_")) <- runif(5) #any object
> >        # create list of objects with names
> >        }
> >
> > I read something about assign(), but that assigns a value and not an
> object to a variable. Some time ago I did something similar but with a matrix:
> Thus I created an empty matrix before the loop and indexed the matrix
> inside the loop to assign values. But here it is about assigning ojects to
> variables and coercing these to a list.
> >
> > Any suggestions are mostly welcomme.
> >
> > Thank you,
> >
> > best regards,
> > Johannes Radinger
> > --
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] iterating through for loop

2012-02-03 Thread Petr PIKAL

Hi

> [R] iterating through for loop
> 
> how to iterate two elements each through for loop?
> e.g. x<-c(1,2,4,7,34,6)
> y<-c(3,5,6,9,34,7)
> 
> for(z in x){
> print(paste(z,y))  }
> 
> 
> i want both element of vector iterate serially with same position

Not sure what the result shall be. but

paste(x,y)

and

do.call(paste, lapply(expand.grid(x,y), paste))

is what comes to my mind.

If you want something else please follow rules suggested in posting guide.

Regards
Petr


> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/iterating-
> through-for-loop-tp4354101p4354101.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to plot several curves in the same frame

2012-02-03 Thread Gabor Grothendieck

On Fri, Feb 3, 2012 at 5:05 AM, ikuzar  wrote:
> Hello,
> I'd like to know how to plot several curves in the same frame (1curve =
> 1line=1day).
> For instance (csv file):
>
> 2012-02-01 01:00:00; 2100
> 2012-02-01 02:00:00; 2200
> ...
> 2012-02-01 23:00:00; 2500
> 2012-02-02 01:00:00; 1000
> 2012-02-02 02:00:00; 1500
> ...
> 2012-02-02 23:00:00; 1700
>
> Here, I have to plot 2 curves in the same frame: 1 for 2012-02-01 (on the
> first line) and 1 for 2012-02-02 (on the second)
>

Assuming you want to plot the numbers against time for each day:

Lines <- "2012-02-01 01:00:00; 2100
2012-02-01 02:00:00; 2200
2012-02-01 23:00:00; 2500
2012-02-02 01:00:00; 1000
2012-02-02 02:00:00; 1500
2012-02-02 23:00:00; 1700"

library(zoo)
library(chron)

procTime <- function(x) times(sub(";", "", x))
z <- read.zoo(text = Lines, split = 1, index = 2, FUN = procTime)

cols <- rainbow(ncol(z))
plot(z, screen = 1, col = cols)
legend("bottomright", leg = colnames(z), col = cols, lty = 1)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Gabor Grothendieck

On Fri, Feb 3, 2012 at 6:03 AM, HC  wrote:
> Thank you for indicating that SQLite may not handle a file as big as 160 GB.
>
> Would you know of any utility for *physically splitting *the 160 GB text
> file into pieces. And if one can control the splitting at the  end of a
> record.
>

If they are csv files or similar data files then you could use R or
any scripting language to do that.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] newly install old and present R versions

2012-02-03 Thread Thomas Mang


Hi,

Moving to a new computer (Windows 7) but for reasons of reproduceability 
I would seriously like to also install my present R 2.8.1 along with all 
extension packages on that machine as well (that is besides R 2.14.1).


What's the best way of doing so?

My idea is:
Get the setup for 2.8.1 and install it; during install neither select 
'save version number in registry' nor 'associate Rdata files' to keep 
the registry clean.
Manually Copy the extenxion packages from my present machine to the lib 
folder of 2.8.1

That should mean 2.8.1 should run.

Then install 2.14.1 and download most up-to-date versions of packages.
How do i detect if packages were e.g. renamed or merged? Also manually 
copying my old packages to the lib dir and then running 
update.packages(checkBuilt=TRUE, ask=FALSE)? Will that work under these 
circumstances as well?


To make things more complicated I am also using Rtools. Is it possible 
to have two parallel versions of Rtools? Sure I can install it but I 
suppose as it sets some PATH variables only one can actually only be 
active at a time. Hence it might be smarter to avoid installing Rtools 
for 2.8.1 alltogether, that is live without it (having enough troubles 
with running Rtools for one version alongside with MinGW as also the 
path to g++ can conflict ...)


thanks,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replicate rows

2012-02-03 Thread Petr PIKAL

Hi> 
> Hello,
> 
> I have a matrix of 17 rows and 20 columns. I want to replicate this 
matrix
> 20 times, but I only want to replicate the rows. How do I do that?

Replicate index.
x<-matrix(1:4, 2,2)
x[rep(1:2, 20),]

Regards
Petr

> 
> Kind regards / Met vriendelijke groet / Med venlig hilsen,
> 
> Dr. Gijs Schumacher
> Postdoctoral Researcher
> Department of Political Science and Public Management, University of 
> Southern Denmark &
> Department of Political Science, VU University Amsterdam
> 
> Email: g...@sam.sdu.dk; g.schumac...@vu.nl<
> mailto:g.schumac...@fsw.vu.nl>
> Web: http://www.gijsschumacher.nl
> 
> 
>[[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to plot several curves in the same frame

2012-02-03 Thread jim holtman

Your data appears to have two different dates.  What is the x-axis of
the data you want plotted?  Is it just going to the the hours in a
day?

On Fri, Feb 3, 2012 at 5:05 AM, ikuzar  wrote:
> Hello,
> I'd like to know how to plot several curves in the same frame (1curve =
> 1line=1day).
> For instance (csv file):
>
> 2012-02-01 01:00:00; 2100
> 2012-02-01 02:00:00; 2200
> ...
> 2012-02-01 23:00:00; 2500
> 2012-02-02 01:00:00; 1000
> 2012-02-02 02:00:00; 1500
> ...
> 2012-02-02 23:00:00; 1700
>
> Here, I have to plot 2 curves in the same frame: 1 for 2012-02-01 (on the
> first line) and 1 for 2012-02-02 (on the second)
>
> Thanks for your help
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/how-to-plot-several-curves-in-the-same-frame-tp4354165p4354165.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 116 matches

Mail list logo