[R] relative L1 bound

2012-03-25 Thread yx78
In the package lasso2, there is a Prostate Data. To find coefficients in the
prostate cancer example we could impose L1 constraint on the parameters. 

code is:
data(Prostate)
 p.mean <- apply(Prostate, 5,mean)
 pros <- sweep(Prostate, 5, p.mean, "-")
 p.std <- apply(pros, 5, var)
 pros <- sweep(pros, 5, sqrt(p.std),"/")
 pros[, "lpsa"] <- Prostate[, "lpsa"]
l1ce(lpsa ~  . , pros, bound = 0.44)

I can't figure out what dose 0.44 come from. On the paper it said it was
from  generalized cross-validation 

paper name: Regression Shrinkage and Selection via the Lasso

author: Robert Tibshirani



--
View this message in context: 
http://r.789695.n4.nabble.com/relative-L1-bound-tp4504620p4504620.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simple question regarding domain restrictions/piecewise functions in R

2012-03-25 Thread chad.mills
I am a novice R user.

I would like to be able to graph some simple piecewise functions/functions
with domain restrictions in R, but I'm having trouble defining such
functions.  For example, I would like to define the following function:

f(x)={x^2 if -1http://r.789695.n4.nabble.com/Simple-question-regarding-domain-restrictions-piecewise-functions-in-R-tp4504199p4504199.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] The problem of using library(bigmemory)

2012-03-25 Thread yeheng...@gmail.com
To my understanding, library(bigmemory) should allow us to create a matrix
larger than 2GB (say 17,000 by 17,000) by temporarily storing the data on
harddisk.  But I try the following codes to create such a matrix.  

library(bigmemory)
A <- big . matrix (17000 , 17000 , type ="double", init = 0)

But I got error message: 
Error: memory could not be allocated for instance of type big.matrix

I wonder if there are anything wrong in my codes.  Thank you!


--
View this message in context: 
http://r.789695.n4.nabble.com/The-problem-of-using-library-bigmemory-tp4504295p4504295.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] row, col function but for a list (probably very easy question, cannot seem to find it though)

2012-03-25 Thread MBoersma
Hi guys,

I'm quite new to R but quite enthousiastic. I'm trying to rewrite a bit of
code in order to make it faster, so instead of nesting for loops I'm trying
some apply functions. Since it's a combination of lists of matrices and
other matrices, I would like to use lapply() on the list and still be able
to use the list index to index in the matrices.

So, in "code format"-
x <- list()
for (i in 1:10){
  x[[i]] <- c(1:i)
}

lapply(x,length)

is there a possible way to say "lapply(x, function(x) length(x)/index(x) )"
and return 1's. I would be looking for the index. I know that row() and
col() work in the matrix case. I would find it interesting to use it with
arrays as well.

Many thanks in advance!

Regards,
Mark


--
View this message in context: 
http://r.789695.n4.nabble.com/row-col-function-but-for-a-list-probably-very-easy-question-cannot-seem-to-find-it-though-tp4504216p4504216.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : ROC Analysis

2012-03-25 Thread Pascal Oettli
Hi Camille,

Probably you have to check wether there is any infinte value in x.

Or calculate something like that for your x-axis:
x[1:(ll-1)]+diff(x)/2


Regards,
Pascal


- Mail original -
De : Camille Leclerc 
À : r-help@r-project.org
Cc : 
Envoyé le : Lundi 26 mars 2012 0h32
Objet : Re: [R] ROC Analysis

Hi everybody,

Pascal, your code works, but when I want to do the graph I have an error
message. 

here is my code :
x<-rev(unlist(pred@cutoffs))
tpf<-unlist(performance(pred, "tpr")@y.values)
fpf<-unlist(performance(pred,"fpr")@y.values)
ll<-length(x)
p<-(tpf[1:(ll-1)]-tpf[2:ll])/(fpf[1:(ll-1)]-fpf[2:ll])
plot(x,p)

*Erreur dans xy.coords(x, y, xlabel, ylabel, log) : 
'x' and 'y' lengths differ*

So, when I look the lenghts of x and p, I have this :
*x : numeric[1735]
p : numeric[1734]*

On the other hand, it's normal since I have the slope between two points on
the ROC curve and so I have x points and x-1 slope values. How to get the
graph?!

All the best,
Camille

-
--
Camille Leclerc, Master student
Lab ESE, UMR CNRS 8079
Univ Paris-Sud
Bat 362
F-91405  Orsay Cedex FRANCE
--
View this message in context: 
http://r.789695.n4.nabble.com/ROC-Analysis-tp4469203p4503354.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sm.density kernel estimation for points

2012-03-25 Thread Matěj Plch

Hi!
I have two dimensional dataset which has and I need to decide if a point 
lies in some "confidence level". If a point has low confidence/density 
it can be anomaly which I need to find.


For example:
#load library
library(sm)

#get some data
x.locs = c(74, 74.5, 75, 77,74.5)
y.locs = c(64, 63.5, 63, 61,61.5)
points = cbind(x.locs, y.locs)

#plot it
plot(points)

#get points density
abc=sm.density(points, eval.points=points, eval.grid=FALSE)
#make a matrix from points and estimation
fc=cbind(abc$eval.points,abc$estimate)
#select points upper and lower "confidence" level
fca=subset(fc,fc[,3]>0.08)
fcb=subset(fc,fc[,3]<=0.08)

#show how contours look like with sm "nature" contour function
dens.2 =sm.density(points,display="slice")

#finally plot contour- green points are ok, red are some anomalies
contour(dens.2$eval.points[,1],dens.2$eval.points[,2],dens.2$estimate)
points(fca,col="green")
points(fcb,col="red")

This is OK.

From some reason I'm not able to use sm.density display="slice" with 
estimation on points. This is first problem.



Second is that when I use it with my dataset which has around 46000 
points the contours are too much rounded- it makes a circle, but data 
has more ellipse shape aproximation. I have tried to work with h 
parametr, but without success.


My best result looks: 
https://github.com/matejuh/doschecker_wiki_images/raw/master/linear_regression/density/sm_density.png


Btw. if you have some advice how to write example upper better or you 
have some suggestion that I can use another library I will appriciate it.


Thanks in advance
Matej Plch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multivariate function from univariate functions

2012-03-25 Thread physicistintheory
Petr, thanks!  I've now got a function I can work with.  Regards!

--
View this message in context: 
http://r.789695.n4.nabble.com/Multivariate-function-from-univariate-functions-tp4502670p4504599.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to test omitted level from a multiple level factor against overall mean in regression models?

2012-03-25 Thread Rolf Turner


The test you are requesting is ***MEANINGLESS***.  The ``effect value'' 
of a single
level is ill-defined (or in the more usual parlance, "not estimable").  
The dummy.coef()
procedure suggested by Gabor gives you point estimates *subject to the 
constraints*
imposed by the contrasts used.  The choice of contrasts is arbitrary, 
essentially a matter
of aesthetics/taste/convenience.  The values returned by dummy.coef() 
have, in and

of themselves, no meaning at all.

You can meaningfully estimate, and test for the "significance" of, 
*differences*
between the "effect values" of factor levels.   For the individual 
levels, no can do.


E.g.  Y = mu + alpha_i + E when the observation is at level i of the 
factor (and "E"
means "random error". In this setting mu = 0, alpha_1 = 1, alpha_2 = 2 
and alpha_3

= 3 is ***EXACTLY THE SAME MODEL*** as mu = 1, alpha_1 = 0, alpha_2 = 1 and
alpha_3 = 2.

It makes no sense to ask (or to test) whether alpha_1 differs from 0.

cheers,

Rolf Turner

On 26/03/12 02:08, "Biedermann, Jürgen" wrote:

Hi Gabor,

Thanks a lot for the answer.
However, I'm not so much focusing on the pure effect value of the omitted 
factor level, but more on the statistical test if it
differs significantly from 0.
Do you know a way for this purpose too?

Greetings Jürgen

Von: Gabor Grothendieck [ggrothendi...@gmail.com]
Gesendet: Sonntag, 25. März 2012 14:11
An: Biedermann, Jürgen
Cc: r-help@R-project.org
Betreff: Re: [R] How to test omitted level from a multiple level factor against 
overall mean in regression models?

2012/3/25 "Biedermann, Jürgen":

Hi there,

I have a linear model with one factor having three levels.
I want to check if the different levels significantly differ from the overall 
mean (using contr.sum).
However one level (the last) is omitted in the standard procedure.

To illustrate this:

x<- as.factor(c(1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3))
y<- 
c(1.1,1.15,1.2,1.1,1.1,1.1,1.2,1.2,1.2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3,3.1)
test<- data.frame(x,y)
reg1<- lm(y~C(x,contr.sum),data=test)
summary(reg1)

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
(Intercept)   1.60.06577  24.834 8.48e-15 ***
C(x, contr.sum)1 -0.483330.10792  -4.479  0.00033 ***
C(x, contr.sum)2 -0.483330.08936  -5.409 4.70e-05 ***

Is it possible to get the effect for the third level (against the overall mean) 
in the table too.

I figured out:

reg2<- lm(y~C(relevel(x,3),contr.sum),data=test)
summary(reg2)

C(relevel(x, 3), contr.sum)1  0.966670.07951  12.158 8.24e-10 ***
C(relevel(x, 3), contr.sum)2 -0.483330.10792  -4.479  0.00033 ***


The first row now test the third level against the overall mean, but I find 
this approach not so convenient.
Moreover, I wonder if it is meaningful at all regarding the cumulation of alpha 
error. Would a Bonferroni correction be sensible?


Try this:


options(contrasts = c("contr.sum", "contr.poly"))
reg1<- lm(y~x,data=test)
dummy.coef(reg1)

Full coefficients are

(Intercept):  1.63
x:   1  2  3
 -0.483 -0.483  0.967

--
Statistics&  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Updating a Markov Chain

2012-03-25 Thread stivi
Hello,

my question is if anyone has any good ideas how to create a Markov Chain
from ordered data. So, I have some sort of time series, and if value1
happens as time1 and value2 happens at time2 I record this as an update to
the probability transition matrix. The problem is that I cannot predefine
the size of the matrix (as I don't know how many states(values) I will have
in the end) and don't really know how to update the prob distribution in the
rows. 
If anyone has any thoughts, i would be grateful for sharing.
Stivi

--
View this message in context: 
http://r.789695.n4.nabble.com/Updating-a-Markov-Chain-tp4504156p4504156.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What does "package 'RDCOMClient' is not installed for 'arch=x64' " exactly mean?

2012-03-25 Thread dthomas
Hi,

I'm trying to use the excel.link package to write data to excel
spreadsheets. I've installed the RDCOMClient package as required but get the
error:

package 'RDCOMClient' is not installed for 'arch=x64'

I'm on Rx64 2.13.0.

I assume it means the RDCOMClient package does not work on the x64 version.
I've tried to see if there does a x64 RDCOMClient package version exist but
to no available. Any suggestions?

Cheers
D

--
View this message in context: 
http://r.789695.n4.nabble.com/What-does-package-RDCOMClient-is-not-installed-for-arch-x64-exactly-mean-tp4504469p4504469.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Error : DATA to MATRIX

2012-03-25 Thread David Winsemius


On Mar 25, 2012, at 2:43 PM, ritwi...@isical.ac.in wrote:


Thanks David, your suggestion works fine.btw I have another
question..If I set (n,m) little bit large, say (n=20,m=10), R  
cannot
handle the large data frame generated through "expand.grid".Is  
there
any way to increase R-memory so that I can tackle large data.frame  
in R

?


Please read the Posting Guide. After doing so, you may understand why  
you have gotten no responses.


--
David.


regards
ritwik



On Mar 23, 2012, at 2:53 AM, ritwi...@isical.ac.in wrote:


Dear Sir/Madam,

I'm getting a problem with a R-code which converts a data frame to a
matrix.

It first generate a (m^(n-m) * m) matrix A and then regenerate  
another

matrix B having less dimension than A which satisfy some condition.
Now I
wish to assign each row of B to a vector as individual.

My problem is when I set any choice of (n,m) except m=1 it works
fine but
setting m=1 I got the error : Error in B[i, ] : incorrect number of
dimensions.

Moreover if (n,m) is large (say, (20,8)) I got the error : Error:
cannot
allocate vector of size 3.0 Gb. I know this is due to large
dimension of
matrix A. How to solve this problem.

My code is given below:

**

n=5
m=3
R=numeric(0)
# Generate all possible m-tuple ( variables having range 0 to n  )
in a (
m^(n-m) * m ) matrix

r = expand.grid(rep(list(0:(n-m)), m))

write.table(r,file="test.txt",row.names=FALSE,col.names=FALSE)

a= read.table(file="test.txt",sep="",header=FALSE)

A= data.matrix(a)

#.

# Generate matrix whose rowsum = n-m

meet.crit = apply(A, 1, function(.row) any((sum(.row)) == n- 
m))  #

criteron for being rowsum = n

cbind(A,  
meet.crit)  #

Checking rowsum = n for each row
-m
B=A[meet.crit,]


At this point the default behavior of the "[" function is to return a
vector rather than a matrix. You need to add drop=FALSE as an
additional argument. Read the help page for ?"[".

 #

Generate matrix

#.


for(i in 1:choose(n-1,m-1)){
R=B[i,]
}

***

Can you please help me how to get rid of these errors. Thanking  
you in

advance.

Regards

Ritwik Bhattacharya


Senior Research Fellow
SQC & OR UNIT, KOLKATA
INDIAN STATISTICAL INSTITUTE


--

David Winsemius, MD
West Hartford, CT

This mail is scanned by Ironport





This mail is scanned by Ironport



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading header in txt file and making histogram

2012-03-25 Thread chuck.01
dbh is the column name.  What is the name of your data?
Lets assume your data is called "YOUR_DATA"

then try:

with(YOUR_DATA, hist(dbh))

OR

hist(YOUR_DATA$dbh)

OR

hist(YOUR_DATA[, "dbh"])

etc...






bamboohydraulics wrote
> 
> Dear all
> 
> I am a BEGINNER and have R on my Mac. I saved my excel file as .txt file,
> I have just one column with first row as the column name. My file when
> read by R looks like this. After reading the table I try to make a
> histogram by 
> hist(dbh), it says  object dbh not found. What am I doing wrong? thanks
>   dbh
> 1   11.53
> 2   16.05
> 37.36
> 4   16.05
> 58.66
> 6   12.74
> 7   22.93
> 87.55
> 95.10
> 10  21.34
> 11   0.32
> 12   9.71
> 13   0.00
> 14  15.92
> 15  11.85
> 16  14.43
> 17   4.59
> 18  14.43
> 19   7.99
> 20   2.61
> 21  23.69
> 22   6.05
> 23   9.87
> 24   4.90
> 25   3.18
> 26   3.82
> 27   0.00
> 28  12.10
> 29   5.22
> 30   6.37
> 31   6.50
> 32  10.83
> 33   3.63
> 34   2.93
> 35   3.60
> 36   5.89
> 37   3.63
> 38   2.55
> 39   2.96
> 40   3.69
> 41   5.57
> 42   0.00
> 43   7.07
> 44   5.48
> 45   9.55
> 46   3.44
> 47   0.00
> 48   0.00
> 49   3.69
> 50   2.48
> 51   3.38
> 52   2.64
> 53  12.10
> 54   3.34
> 55   0.00
> 56   4.52
> 57   6.59
> 58   9.65
> 59   9.01
> 60  12.93
> 61   0.00
> 62   4.24
> 63   9.75
> 64   4.39
> 65   2.90
> 66   6.85
> 67   9.08
> 68   5.54
> 69   6.02
> 70   7.64
> 71   6.85
> 72   0.00
> 73   3.12
> 74   6.75
> 75   4.24
> 76   2.32
> 77   2.58
> 78   2.55
> 79   8.60
> 80   0.00
> 81   0.32
> 82   2.23
> 83   3.98
> 84   6.05
> 85   7.64
> 86   3.03
> 87   0.00
> 88   9.27
> 89   0.00
> 90   0.00
> 91   3.92
> 92   0.00
> 93  20.54
> 94   8.41
> 95   0.00
> 96   2.10
> 97   0.00
> 98   0.00
> 99  13.79
> 100  5.22
> 101  9.43
> 102  6.31
> 103  4.11
> 104  0.00
> 105  0.00
> 106  6.15
> 107  3.41
> 108 14.01
> 109 15.67
> 110 17.20
> 111 20.00
> 112 11.43
> 113 10.35
> 114 10.29
> 115 13.18
> 116 10.61
> 117  0.00
> 118  0.32
> 119  0.00
> 120  7.01
> 121  1.97
> 122  4.52
> 123 19.14
> 124  3.50
> 125  2.99
> 126  3.18
> 127  1.82
> 128  3.18
> 129  6.02
> 130  4.04
> 131  2.61
> 132  2.48
> 133  2.71
> 134  2.68
> 135  1.82
> 136  2.52
> 137  6.72
> 138  2.61
> 139  2.74
> 140  3.15
> 141  3.25
> 142  3.22
> 143  3.44
> 144  4.01
> 145  4.97
> 146  6.15
> 147  9.62
> 148  2.87
> 149  5.83
> 150 17.90
> 151  3.18
> 152  3.54
> 153  2.99
> 154  4.62
> 155  6.85
> 156  0.00
> 157  5.83
> 158  4.49
> 159  2.74
> 160  6.82
> 161  6.82
> 162  4.94
> 163  5.83
> 164  6.02
> 165  0.96
> 


--
View this message in context: 
http://r.789695.n4.nabble.com/reading-header-in-txt-file-and-making-histogram-tp4504813p4504848.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Ashish Agarwal
thanks. works very well.

On Mon, Mar 26, 2012 at 12:12 PM, Berend Hasselman  wrote:

>
> On 26-03-2012, at 08:40, Ashish Agarwal wrote:
>
> > comment.char = NULL does not work.
> > Is there any way to make it NULL rather than having a specific character
> like '%'?
> >
> Why don't you try something?
>
> comment.char=""
>
> looks quite obvious and worth a try.
>
> Berend
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Berend Hasselman

On 26-03-2012, at 08:33, Ashish Agarwal wrote:

> OMG.
>  
> I think it uses comment character # as default in the argument.
>  
> comment.char = "#"
>  
> How do I turn it off?
???

How about comment.car="%" for example?

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Berend Hasselman

On 26-03-2012, at 08:40, Ashish Agarwal wrote:

> comment.char = NULL does not work.
> Is there any way to make it NULL rather than having a specific character like 
> '%'?
> 
Why don't you try something?

comment.char=""

looks quite obvious and worth a try.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Ashish Agarwal
comment.char = NULL does not work.
Is there any way to make it NULL rather than having a specific character
like '%'?


On Mon, Mar 26, 2012 at 12:06 PM, Berend Hasselman  wrote:

>
> > comment.char = "#"
> >
> > How do I turn it off?
> ???
>
> How about comment.car="%" for example?
>
> Berend
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Berend Hasselman

On 26-03-2012, at 08:26, Berend Hasselman wrote:

> 
> On 26-03-2012, at 08:16, Ashish Agarwal wrote:
> 
>> Why does the output in the following say 2 and not 6?
>> 
>>> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2
>> + LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1
>> + LL1532Ap,ABC# Depot-A+,China,1971,17,1
>> + LL1532Ap,ABC# Depot-A+,China,1971,33,1
>> + LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2
>> + LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1
>> + LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1
>> + LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1
>> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1
>> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1
>> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1
>> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1
>> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1
>> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1
>> + LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1
>> + LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",")
>> [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>> 
>> 
> 
> Have you done
> 
> ?count.fields
> 
> and read what it says about the argument "sep" and the default?
> 
> So if a comma is the separator what value would you give sep?
Sorry I should have had a closer look at what you had done.

But still ?count.fields should have given you a pointer.

Look at what it says in the entry for argument "comment.char".

You have a character # in your text.
Set comment.char to something other than # .

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Ashish Agarwal
OMG.

I think it uses comment character # as default in the argument.

comment.char = "#"

How do I turn it off?

On Mon, Mar 26, 2012 at 11:56 AM, Berend Hasselman  wrote:

>
> On 26-03-2012, at 08:16, Ashish Agarwal wrote:
>
> > Why does the output in the following say 2 and not 6?
> >
> >> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2
> > + LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1
> > + LL1532Ap,ABC# Depot-A+,China,1971,17,1
> > + LL1532Ap,ABC# Depot-A+,China,1971,33,1
> > + LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2
> > + LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1
> > + LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1
> > + LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1
> > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1
> > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1
> > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1
> > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1
> > + LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1
> > + LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1
> > + LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1
> > + LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",")
> > [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> >>
> >
>
>
> Have you done
>
> ?count.fields
>
> and read what it says about the argument "sep" and the default?
>
> So if a comma is the separator what value would you give sep?
>
> Berend
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Berend Hasselman

On 26-03-2012, at 08:16, Ashish Agarwal wrote:

> Why does the output in the following say 2 and not 6?
> 
>> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2
> + LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1
> + LL1532Ap,ABC# Depot-A+,China,1971,17,1
> + LL1532Ap,ABC# Depot-A+,China,1971,33,1
> + LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2
> + LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1
> + LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1
> + LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1
> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1
> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1
> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1
> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1
> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1
> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1
> + LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1
> + LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",")
> [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>> 
> 


Have you done

?count.fields

and read what it says about the argument "sep" and the default?

So if a comma is the separator what value would you give sep?

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Ashish Agarwal
Why does the output in the following say 2 and not 6?

> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2
+ LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1
+ LL1532Ap,ABC# Depot-A+,China,1971,17,1
+ LL1532Ap,ABC# Depot-A+,China,1971,33,1
+ LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2
+ LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1
+ LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1
+ LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1
+ LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1
+ LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1
+ LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1
+ LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1
+ LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1
+ LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1
+ LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1
+ LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",")
 [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>




On Fri, Mar 16, 2012 at 10:59 PM, David Winsemius wrote:
Looks like an encoding mismatch. You have not offered the requested
information about you setup so further comment would all be guesswork. But
you can perhaps educate yourself by reading:

?Encoding

And line ten has 7 elements.

> count.fields(textConnection(",**,,1968,21,0
+ ,,Boston,1968,13,0
+ ,,Boston,1968,18,0
+ ,,Chicago,1967,44,0
+ ,,Providence,1968,17,0
+ ,,Providence,1969,48,0
+ ,,Binky,1968,24,0
+ ,,Chicago,1968,23,0
+ ,,Dally,1968,7,0
+ ,,Raleigh, North Carol,1968,25,0
+ Addy ABC-Dogs Stars-W8.1,,Providence,1968,**38,0
+ DEF_REQPRF/,,Dartmouth,1967,**31,1
+ PL,,,1967,38,1
+ XY,PopatLal,,1967,5,1
+ XY,PopatLal,,1967,6,8
+ XY,PopatLal,,1967,7,7
+ XY,PopatLal,,1967,9,1
+ XY,PopatLal,,1967,10,1
+ XY,PopatLal,,1967,13,1
+ XY,PopatLal,Boston,1967,6,1
+ XY,PopatLal,Boston,1967,7,11
+ XY,PopatLal,Boston,1967,9,2
+ XY,PopatLal,Boston,1967,10,3
+ XY,PopatLal,Boston,1967,7,2"),**sep=",")
 [1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] trellis plot

2012-03-25 Thread Sebastián Daza
Hi everyone,
I am just trying to figure out how to do a xyplot where in addition to
 dots and lines I can change dots' colors according to an individual
variable (e.g., marital disruption across time, a dummy 0/1). When I
use "groups" specification (see below), I get two different lines for
each individual based on groups, and what I want is to get one line
connecting dots, and different dots' colors according to marital
disruption.

Any ideas about how to do that?

person  <- rep(1:2, each=4)
income  <- c(100, 120, 150, 200, 90, 100,120, 150)
disruption  <- rep(c(0,1), 4)
time  <- rep(c(1:4),2)
dat  <- as.data.frame(cbind(person,time, income, disruption))


library(lattice)
xyplot(income~time|as.factor(person),data=dat,
   type=c("p","g","o"), col.line="black",
   xlab="Time",
   ylab="Familiar Income")

# I just want to change dots' colors according to disruption, not to
get two different lines:

xyplot(income~time|as.factor(person),data=dat,
   type=c("p","g","o"), col.line="black", groups=disruption,
   xlab="Time",
   ylab="Familiar Income")



Thank you in advance!

-- 
Sebastián Daza

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Seeming failure of options(width=60)

2012-03-25 Thread Yihui Xie
I got an error message:

> options(width=60)
> pastured <- data.frame(
+ time=c(9, 14, 21, 28, 42, 57, 63, 70, 79),
+ yield= c(8.93, 10.8, 18.59, 22.33, 39.35,
+ 56.11, 61.73, 64.62, 67.08))
> regmod<-"yield ~ t1 - t2*exp(-exp(t3+t4*log(time)))"
> huetstart<-c(t1=70, t2=60, t3=0, t4=1)
> anlsx<-try(nls(regmod, start=huetstart, trace=TRUE, data=pastured))
13386.91 :  70 60  0  1
Error in nls(regmod, start = huetstart, trace = TRUE, data = pastured) :
  singular gradient
> print(anlsx)
[1] "Error in nls(regmod, start = huetstart, trace = TRUE, data =
pastured) : \n  singular gradient\n"
attr(,"class")
[1] "try-error"
attr(,"condition")



and I see the number of characters of the message string is 94:

> nchar(anlsx)
[1] 94

options(width) is not supposed to _wrap_ your character scalars; it
cannot break a single string into several lines (use strwrap() instead
in that case). See ?options for the exact meaning of the width option.

# this may help you understand 'width'
options(width = 50)
rep(1, 100)

options(width = 90)
rep(1, 100)


Regards,
Yihui
--
Yihui Xie 
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA



On Sun, Mar 25, 2012 at 9:17 PM, John C Nash  wrote:
> The following example gives output with a line length of 103 on my system. It 
> is causing a
> nuisance in creating a vignette. Is there something other than e.g., 
> options(width=60) I
> need to set? The Sweave FAQ suggests this should work.
>
> options(width=60)
> pastured <- data.frame(
> time=c(9, 14, 21, 28, 42, 57, 63, 70, 79),
> yield= c(8.93, 10.8, 18.59, 22.33, 39.35,
>         56.11, 61.73, 64.62, 67.08))
> regmod<-"yield ~ t1 - t2*exp(-exp(t3+t4*log(time)))"
> huetstart<-c(t1=70, t2=60, t3=0, t4=1)
> anlsx<-try(nls(regmod, start=huetstart, trace=TRUE, data=pastured))
> print(anlsx)
>
>
> John Nash
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Struggling with zoo and aggregate

2012-03-25 Thread Thomas Adams
Gabor,

Thank you for your help -- it did help me a lot. However, with my data:

   lead_time cycler_squared  fcst_date
1  6 0 5.405095e-02 07/31/2010
2 12 0 5.521620e-06 07/31/2010
3 18 0 1.565910e-04 07/31/2010
4 24 0 8.646822e-02 07/31/2010
5 30 0 1.719604e-02 07/31/2010
6 36 0 5.768113e-04 07/31/2010
7 42 0 2.501269e-06 07/31/2010
8 48 0 6.451727e-02 07/31/2010
9  612 2.857931e-01 07/31/2010
101212 1.138635e-01 07/31/2010
111812 2.225503e-02 07/31/2010
122412 1.182031e-03 07/31/2010
133012 8.841142e-04 07/31/2010
143612 1.082490e-01 07/31/2010
154212 1.502887e-05 07/31/2010
17 6 0 8.689588e-02 08/01/2010
1812 0 5.884336e-04 08/01/2010
1918 0 2.219316e-07 08/01/2010
2024 0 3.960752e-02 08/01/2010
2130 0 1.087413e-04 08/01/2010
2342 0 3.583030e-07 08/01/2010
2448 0 2.907109e-05 08/01/2010
25 612 8.693451e-02 08/01/2010
261212 3.208215e-02 08/01/2010
271812 0.00e+00 08/01/2010
28 6 0 2.962669e-02 08/02/2010
29 612 2.363506e-05 08/02/2010
301212 9.050178e-03 08/02/2010

from:

> z <- read.zoo(q,index = 4, FUN = as.yearmon, format =
"%m/%d/%Y",aggregate = mean)

I get:
> z
 lead_timecycle  r_squared
Jul 2010  25.6 5.60 0.05034771
Aug 2010  18.46154 4.615385 0.02191903

what I need is to NOT have the lead_time and cycle averaged, but only have
the r_squared values averaged by lead_time and cycle. I can not seem to
figure out the correct syntax to do this. I assume I use something like:

q_agg<-aggregate(q,by=list(q$lead_time,q$cycle),index = 4, FUN =
as.yearmon, format = "%m/%d/%Y")

but I get errors or nonsense when I follow with...

z <- read.zoo(q_agg,index = 4, FUN = as.yearmon, format =
"%m/%d/%Y",aggregate = mean)

or some variation of this.

Regards,
Tom


On Sat, Mar 24, 2012 at 10:58 PM, Gabor Grothendieck <
ggrothendi...@gmail.com> wrote:

> On Sat, Mar 24, 2012 at 10:44 PM, Thomas Adams 
> wrote:
> > All:
> >
> > I have a SQlite database where I have stored some verification data by
> date
> > & time (cycle Z/UTC), lead_time as well as type, duration, etc. I would
> > like to analyze & plot the data as monthly averages. I have looked at a
> > bunch of examples which use some combination of zoo and aggregate, but I
> > have not been able to successfully apply bits and pieces from the
> examples
> > I have found. Any help is appreciated. BTW, I calculate mae (mean
> absolute
> > error), mse (mean squared error), me (mean error), and other measures
> > obtained by using the R verification package.
> >
> > The example below is limited to 20 records and shows lead_time,
> r_squared,
> > (forecast) cycle, fcst_date (forecast date) -- the full data set is just
> > over 2 years of daily data with 3 forecast cycles (00Z, 12Z, and 18Z)
> daily.
> >
> > >From my query, below) how do I construct an appropriate data structure
> to
> > analyze & plot the data as monthly averages?
> >
> > Regards,
> > Tom
> >
> >> q<-dbGetQuery(con,"select lead_time,r_squared,cycle,fcst_date from
> > verify_table where duration=6 limit 20")
> >> q
> >   lead_timer_squared cycle  fcst_date
> > 1  6 5.405095e-0200 07/31/2010
> > 2 12 5.521620e-0600 07/31/2010
> > 3 18 1.565910e-0400 07/31/2010
> > 4 24 8.646822e-0200 07/31/2010
> > 5 30 1.719604e-0200 07/31/2010
> > 6 36 5.768113e-0400 07/31/2010
> > 7 42 2.501269e-0600 07/31/2010
> > 8 48 6.451727e-0200 07/31/2010
> > 9  6 2.857931e-0112 07/31/2010
> > 1012 1.138635e-0112 07/31/2010
> > 1118 2.225503e-0212 07/31/2010
> > 1224 1.182031e-0312 07/31/2010
> > 1330 8.841142e-0412 07/31/2010
> > 1436 1.082490e-0112 07/31/2010
> > 1542 1.502887e-0512 07/31/2010
> > 1648   NA12 07/31/2010
> > 17 6 8.689588e-0200 08/01/2010
> > 1812 5.884336e-0400 08/01/2010
> > 1918 2.219316e-0700 08/01/2010
> > 2024 3.960752e-0200 08/01/2010
> >
>
> Try this:
>
> Lines <- "lead_timer_squared cycle  fcst_date
> 1  6 5.405095e-0200 07/31/2010
> 2 12 5.521620e-0600 07/31/2010
> 3 18 1.565910e-0400 07/31/2010
> 4 24 8.646822e-0200 07/31/2010
> 5 30 1.719604e-0200 07/31/2010
> 6 36 5.768113e-0400 07/31/2010
> 7 42 2.501269e-0600 07/31/2010
> 8 48 6.451727e-0200 07/31/2010
> 9  6 2.857931e-0112 07/31/2010
> 1012 1.138635e-0112 07/31/2010
> 1118 2.225503e-0212 07/31/2010
> 1224 1.182031e-0312 07/31/2010
> 1330 8.841142e-0412 07/31/2010
> 143

[R] Seeming failure of options(width=60)

2012-03-25 Thread John C Nash
The following example gives output with a line length of 103 on my system. It 
is causing a
nuisance in creating a vignette. Is there something other than e.g., 
options(width=60) I
need to set? The Sweave FAQ suggests this should work.

options(width=60)
pastured <- data.frame(
time=c(9, 14, 21, 28, 42, 57, 63, 70, 79),
yield= c(8.93, 10.8, 18.59, 22.33, 39.35,
 56.11, 61.73, 64.62, 67.08))
regmod<-"yield ~ t1 - t2*exp(-exp(t3+t4*log(time)))"
huetstart<-c(t1=70, t2=60, t3=0, t4=1)
anlsx<-try(nls(regmod, start=huetstart, trace=TRUE, data=pastured))
print(anlsx)


John Nash

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2012-03-25 Thread Rolf Turner

On 26/03/12 00:18, Anjana Thampi wrote:

How do you decompose inequality in R, say by gender?



This has to be one of the most meaningless and ill-expressed
questions I've ever seen on this list.  And that's a high hurdle
to clear.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string substitution for argument in function

2012-03-25 Thread Henrique Dallazuanna
try this:

`names<-`(rep(2, 3), a)


On Sun, Mar 25, 2012 at 6:22 PM, Pedro Martinez
 wrote:
>
> hello,
> I want to iterate through a list of names and use each element as an
> argument in a function. For instance:
>
> > a = c('one','two','three')
> > data= c()
> > for(elem in a){data=cbind(elem = 2,data)}
> > data
>     elem elem elem
> [1,]    2    2    2
>
> instead I want 'elem' to be substituted by the string in the list. Doing
> it by hand would be:
> > data = cbind('one'=2,data)
> > data = cbind('two'=2,data)
> > data = cbind('three'=2,data)
> > data
>     'one' 'two' 'three'
> [1,]    2    2    2
>
> I guess that the clue would be in sub(),gsub(), paste() or similar but I
> didnt get it to work.
>
> I am comming from python were we woudl do something like:
> > a = ['one','two','three']
> > data = {}
>
> > for elem in a:
> >       data[elem] = 2
>
> > data
> {'three': 2, 'two': 2, 'one': 2}
>
> Thanks, Pedro
>
>
> -
> Prof. Dr. P. Martinez Arbizu
> DZMB-Forschungsinstitut Senckenberg
>
> Suedstrand 44
> D-26382 Wilhelmshaven
> Germany
>
> Tel: +49 (0)4421 9475-100
> Fax: +49 (0)4421 9475-111
>
> Email: pmarti...@senckenberg.de
>
> Senckenberg Gesellschaft für Naturforschung
> Rechtsfähiger Verein gemäß § 22 BGB
> Senckenberganlage 25
> 60325 Frankfurt
> Direktorium: Prof. Dr. Dr. h.c. Volker Mosbrugger, Prof. Dr. Michael
> Türkay, Dr. Johannes Heilmann, Prof. Dr. Pedro Martinez Arbizu, Prof.
> Dr. Georg Zizka, Prof. Dr. Uwe Fritz
> Vorsitzender des Präsidiums: Dietmar Schmid
> Aufsichtsbehörde: Magistrat der Stadt Frankfurt am Main (Ordnungsamt)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string substitution for argument in function

2012-03-25 Thread Weidong Gu
Hi,

This may help

 a = c('one','two','three')
data.frame(eval(substitute(rbind(var,2),list(var=a

?substitute
?eval

Weidong Gu


On Sun, Mar 25, 2012 at 5:22 PM, Pedro Martinez
 wrote:
> hello,
> I want to iterate through a list of names and use each element as an
> argument in a function. For instance:
>
>> a = c('one','two','three')
>> data= c()
>> for(elem in a){data=cbind(elem = 2,data)}
>> data
>     elem elem elem
> [1,]    2    2    2
>
> instead I want 'elem' to be substituted by the string in the list. Doing
> it by hand would be:
>> data = cbind('one'=2,data)
>> data = cbind('two'=2,data)
>> data = cbind('three'=2,data)
>> data
>     'one' 'two' 'three'
> [1,]    2    2    2
>
> I guess that the clue would be in sub(),gsub(), paste() or similar but I
> didnt get it to work.
>
> I am comming from python were we woudl do something like:
>> a = ['one','two','three']
>> data = {}
>
>> for elem in a:
>>       data[elem] = 2
>
>> data
> {'three': 2, 'two': 2, 'one': 2}
>
> Thanks, Pedro
>
>
> -
> Prof. Dr. P. Martinez Arbizu
> DZMB-Forschungsinstitut Senckenberg
>
> Suedstrand 44
> D-26382 Wilhelmshaven
> Germany
>
> Tel: +49 (0)4421 9475-100
> Fax: +49 (0)4421 9475-111
>
> Email: pmarti...@senckenberg.de
>
> Senckenberg Gesellschaft für Naturforschung
> Rechtsfähiger Verein gemäß § 22 BGB
> Senckenberganlage 25
> 60325 Frankfurt
> Direktorium: Prof. Dr. Dr. h.c. Volker Mosbrugger, Prof. Dr. Michael
> Türkay, Dr. Johannes Heilmann, Prof. Dr. Pedro Martinez Arbizu, Prof.
> Dr. Georg Zizka, Prof. Dr. Uwe Fritz
> Vorsitzender des Präsidiums: Dietmar Schmid
> Aufsichtsbehörde: Magistrat der Stadt Frankfurt am Main (Ordnungsamt)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] svycoxph and test statistics

2012-03-25 Thread Terry Therneau

On 03/24/2012 06:00 AM, r-help-requ...@r-project.org wrote:

I have been using the function 'svycoxph' in the Dr. Lumley's survey package 
(version 3.26) to compute coefficient estimates for Cox regression.

I have noticed the p-values output are based on normal distribution (like in 
coxph); however in svyglm (and in other software, such as Stata or SAS) the 
p-values are computed via the t distribution with degrees of freedom equal to 
the number of PSUs minus number of strata.

I am wondering why there is a difference here?
I'm not aware of any theory papers that back up the use of a 
t-distribution.  This is a Cox model, and "do what my Gaussian package 
does" is not usually the best approach.  I'm far from an expert in 
survey work though, so I'll yeild to Thomas L for a definitive answer.
  In the case of mixed effects models I see the exact same leaning 
towards (approximate) REML vs ML; this is an area that I do know deeply 
and and the "REML better than ML" arguments from linear mixed effects 
models to NOT transfer over.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Format wanted...

2012-03-25 Thread Hasan Diwan
Duncan,

On 25 March 2012 15:28, Duncan Murdoch  wrote:
> In case anyone is interested, I want to output code in a language (GLSL)
> that sees 1 and 1. as different types.  I want a floating point value, so I
> need the decimal point.

GLSL, assuming it's the one that I'm looking at[1], supports implicit
conversion from integer to float by appending ".0" to the end.

-- 
Sent from my mobile device
Envoyait de mon portable
1. http://www.opengl.org/wiki/GLSL_Types#Implicit_conversion

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Accessing more than two coefficients in a plot

2012-03-25 Thread Bert Gunter
Well, as a line in the plane is determined by 2 coefficients only, I'd
guess that trying to find an R function that plots a line defined by 4
coefficients has about the same chance of success as finding a unicorn
with 3 horns.

You do understand that your linear model defines a hyperplane in your
three covariates, do you not? Or do I misunderstand what you have
requested?

Cheers,
Bert

On Sun, Mar 25, 2012 at 2:32 PM, FJ M  wrote:
>
> I've successfully plotted (in the plot and abline code below) a simple 
> regression of Lambda1_2 on VV1_2. I then successfully regressed Lambda1_2 on 
> VV1_2, VV1_22 and VV1_212 producing lm2.l. When I go to plot lm2.l using 
> abline I get the warning:
>
> "1: In abline(lm2.l, col = "brown", lty = "dotted", lwd = 2) : only using the 
> first two of 4 regression coefficients"
>
> Is there another function like abline that will produce a line using the 
> constant and three coefficients from the lm2.l regression?
>
>
> lm.l <- lm(Lambda1_2 ~ VV1_2, method = "qr", model = TRUE, x = FALSE, y = 
> FALSE, qr = TRUE) # unweighted regression
>
> lm2.l <- lm(Lambda1_2 ~ VV1_2 + VV1_22 + VV1_212, method = "qr", model = 
> TRUE, x = FALSE, y = FALSE, qr = TRUE) # unweighted regression
>
> plot(VV1_2, Lambda1_2, ylim=yrange, tck=1, main="V(1) Parameters (V, V^2 & 
> V^0.5)", xlab="VV1_2", ylab="Lambda & Beta1_2",pch=19,col="red")
> {abline(lm2.l, col="brown", lty="dotted", lwd=2)
> abline(wlm2.l, col="gold",lty="longdash", lwd=2)
> points(VV1_2, Beta1_2, pch=19, col="blue")
> abline(lm2.b, col="black",lty="dotted", lwd=2)
> abline(wlm2.b, col="blue", lty="longdash", lwd=2)
> legend("topright", inset=.05, title="Parameters",
> labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)
> }
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Format wanted...

2012-03-25 Thread Duncan Murdoch

On 12-03-25 10:45 AM, Marc Schwartz wrote:


On Mar 25, 2012, at 7:14 AM, Duncan Murdoch wrote:


On 12-03-24 10:47 PM, J Toll wrote:

On Sat, Mar 24, 2012 at 7:30 PM, Duncan Murdoch
   wrote:

Do we have a format that always includes a decimal point and a given number
of significant digits, but otherwise drops unnecessary characters?  For
example, if I wanted 5 digits, I'd want the following:

Round to 5 digits:
1.234567  ->   "1.2346"

Drop unnecessary zeros:
1.23  ->   "1.23"

Force inclusion of a decimal point:
1 ->   "1."



Duncan,

Maybe sprintf() will work for you.  As it's a wrapper for C sprintf,
it should have its functionality.


Maybe, but with which format string?

Duncan Murdoch



I don't believe (though could be wrong), that you can do it all with one format string, 
but can do it conditionally based upon the input. According to the C printf 
documentation, the use of "#" forces a decimal point to be present, even if 
there are no trailing digits. Thus:


sprintf("%#.f", 1)

[1] "1."

The other two values seem to be handled by signif() when applied to each value 
individually:


signif(1.234567, 5)

[1] 1.2346


signif(1.23, 5)

[1] 1.23

But, not when a vector:


signif(c(1.234567, 1.23), 5)

[1] 1.2346 1.2300


So, wrapping that inside a function, using ifelse() to test for an integer 
value:

signif.d<- function(x, digits)
{
   ifelse(x == round(x),
  sprintf("%.#f", x),
  signif(x, digits))
}


x<- c(1.234567, 1.23, 1)


signif.d(x, 5)

[1] "1.2346" "1.23"   "1."


signif.d(x, 6)

[1] "1.23457" "1.23""1."


signif.d(x, 7)

[1] "1.234567" "1.23" "1."


Not extensively tested of course, but hopefully that might work for your needs 
Duncan.


Thanks.  I had put together a different conditional (just do the 
conversion, then add a decimal point at the end if none is seen), but I 
was surprised that there was no standard format for this.


In case anyone is interested, I want to output code in a language (GLSL) 
that sees 1 and 1. as different types.  I want a floating point value, 
so I need the decimal point.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Accessing more than two coefficients in a plot

2012-03-25 Thread FJ M

I've successfully plotted (in the plot and abline code below) a simple 
regression of Lambda1_2 on VV1_2. I then successfully regressed Lambda1_2 on 
VV1_2, VV1_22 and VV1_212 producing lm2.l. When I go to plot lm2.l using abline 
I get the warning:

"1: In abline(lm2.l, col = "brown", lty = "dotted", lwd = 2) : only using the 
first two of 4 regression coefficients"

Is there another function like abline that will produce a line using the 
constant and three coefficients from the lm2.l regression?


lm.l <- lm(Lambda1_2 ~ VV1_2, method = "qr", model = TRUE, x = FALSE, y = 
FALSE, qr = TRUE) # unweighted regression

lm2.l <- lm(Lambda1_2 ~ VV1_2 + VV1_22 + VV1_212, method = "qr", model = TRUE, 
x = FALSE, y = FALSE, qr = TRUE) # unweighted regression

plot(VV1_2, Lambda1_2, ylim=yrange, tck=1, main="V(1) Parameters (V, V^2 & 
V^0.5)", xlab="VV1_2", ylab="Lambda & Beta1_2",pch=19,col="red") 
{abline(lm2.l, col="brown", lty="dotted", lwd=2)
abline(wlm2.l, col="gold",lty="longdash", lwd=2)
points(VV1_2, Beta1_2, pch=19, col="blue")
abline(lm2.b, col="black",lty="dotted", lwd=2)
abline(wlm2.b, col="blue", lty="longdash", lwd=2)
legend("topright", inset=.05, title="Parameters",
labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)
} 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] string substitution for argument in function

2012-03-25 Thread Pedro Martinez
hello,
I want to iterate through a list of names and use each element as an
argument in a function. For instance:

> a = c('one','two','three')
> data= c()
> for(elem in a){data=cbind(elem = 2,data)}
> data
 elem elem elem
[1,]222

instead I want 'elem' to be substituted by the string in the list. Doing
it by hand would be:
> data = cbind('one'=2,data)
> data = cbind('two'=2,data)
> data = cbind('three'=2,data)
> data
 'one' 'two' 'three'
[1,]222

I guess that the clue would be in sub(),gsub(), paste() or similar but I
didnt get it to work.

I am comming from python were we woudl do something like:
> a = ['one','two','three']
> data = {}

> for elem in a:
>   data[elem] = 2
 
> data
{'three': 2, 'two': 2, 'one': 2}

Thanks, Pedro


-
Prof. Dr. P. Martinez Arbizu
DZMB-Forschungsinstitut Senckenberg

Suedstrand 44
D-26382 Wilhelmshaven
Germany

Tel: +49 (0)4421 9475-100
Fax: +49 (0)4421 9475-111

Email: pmarti...@senckenberg.de

Senckenberg Gesellschaft für Naturforschung
Rechtsfähiger Verein gemäß § 22 BGB
Senckenberganlage 25
60325 Frankfurt
Direktorium: Prof. Dr. Dr. h.c. Volker Mosbrugger, Prof. Dr. Michael
Türkay, Dr. Johannes Heilmann, Prof. Dr. Pedro Martinez Arbizu, Prof.
Dr. Georg Zizka, Prof. Dr. Uwe Fritz
Vorsitzender des Präsidiums: Dietmar Schmid
Aufsichtsbehörde: Magistrat der Stadt Frankfurt am Main (Ordnungsamt)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Error : DATA to MATRIX

2012-03-25 Thread ritwik_r
Thanks David, your suggestion works fine.btw I have another
question..If I set (n,m) little bit large, say (n=20,m=10), R cannot
handle the large data frame generated through "expand.grid".Is there
any way to increase R-memory so that I can tackle large data.frame in R
?

regards
ritwik

>
> On Mar 23, 2012, at 2:53 AM, ritwi...@isical.ac.in wrote:
>
>> Dear Sir/Madam,
>>
>> I'm getting a problem with a R-code which converts a data frame to a
>> matrix.
>>
>> It first generate a (m^(n-m) * m) matrix A and then regenerate another
>> matrix B having less dimension than A which satisfy some condition.
>> Now I
>> wish to assign each row of B to a vector as individual.
>>
>> My problem is when I set any choice of (n,m) except m=1 it works
>> fine but
>> setting m=1 I got the error : Error in B[i, ] : incorrect number of
>> dimensions.
>>
>> Moreover if (n,m) is large (say, (20,8)) I got the error : Error:
>> cannot
>> allocate vector of size 3.0 Gb. I know this is due to large
>> dimension of
>> matrix A. How to solve this problem.
>>
>> My code is given below:
>>
>> **
>>
>> n=5
>> m=3
>> R=numeric(0)
>> # Generate all possible m-tuple ( variables having range 0 to n  )
>> in a (
>> m^(n-m) * m ) matrix
>>
>> r = expand.grid(rep(list(0:(n-m)), m))
>>
>> write.table(r,file="test.txt",row.names=FALSE,col.names=FALSE)
>>
>> a= read.table(file="test.txt",sep="",header=FALSE)
>>
>> A= data.matrix(a)
>>
>> #.
>>
>> # Generate matrix whose rowsum = n-m
>>
>> meet.crit = apply(A, 1, function(.row) any((sum(.row)) == n-m))  #
>> criteron for being rowsum = n
>>
>> cbind(A, meet.crit)  #
>> Checking rowsum = n for each row
>> -m
>> B=A[meet.crit,]
>
> At this point the default behavior of the "[" function is to return a
> vector rather than a matrix. You need to add drop=FALSE as an
> additional argument. Read the help page for ?"[".
>
>   #
>> Generate matrix
>>
>> #.
>>
>>
>> for(i in 1:choose(n-1,m-1)){
>> R=B[i,]
>> }
>>
>> ***
>>
>> Can you please help me how to get rid of these errors. Thanking you in
>> advance.
>>
>> Regards
>>
>> Ritwik Bhattacharya
>>
>>
>> Senior Research Fellow
>> SQC & OR UNIT, KOLKATA
>> INDIAN STATISTICAL INSTITUTE
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
> This mail is scanned by Ironport
>
>


This mail is scanned by Ironport

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R numerical integration

2012-03-25 Thread casperyc
The quadinf command in library  pracma still fails when mu=-2.986731 with
sigma=53415.18.
While Maple gives me an estimate of 0.5001701024.

Maple: (for those who are interested)
myf:=(mu,sigma)->
evalf(Int(exp(-(x-mu)^2/2/sigma^2)/sigma/sqrt(2*Pi)/(1+exp(-x)),
x=-infinity..infinity));
myf(-2.986731, 53415.18 );
0.5001701024


These 'mu's and 'sigma's are now random starting points I generated for an
optimization problem like I have mentioned.

I should really investigate the behavior of this function before I ask R
doing the integration. As I have mentioned that I had already realized the
integral is between 0 and 1. And I have had a look at the contour plots of
different mu and sigma. I am going to 'restrict' mu and sigma to certain
(small) values, and still get the integral to produce a value between 0 and
1.

All of your help is much appreciated.

casper

-
###
PhD candidate in Statistics
School of Mathematics, Statistics and Actuarial Science, University of Kent
###

--
View this message in context: 
http://r.789695.n4.nabble.com/R-numerical-integration-tp4500095p4503766.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 'names' attribute must be the same length as the vector

2012-03-25 Thread jiefan
I have met into this problem when I tried to run panel regression by plm.
My code:

library(plm)
indus <- read.csv(file="full.csv",header=TRUE)
industry<-as.data.frame(indus)
reg<-lm(LnTSO2 ~ LnPGDP + LnPGDP2 + LnSOES + LnCOES + LnLIMD + 
LnSHOLD + LnPRIV + LnFIEs + LnEXP + LnIMP + LnLEXRE + LnVALTAX + 
LnIND1 + LnIND2 + LnIND3 + LnIND4 + LnIND5 + LnIND6 + LnIND7 + 
LnIND8 + LnIND9, data=industry, model = "pooling")
*Error in names(y) <- namesy : 
  'names' attribute [1144] must be the same length as the vector [0]*

My data is attached showing as a screenshot.
http://r.789695.n4.nabble.com/file/n4503946/%E6%9C%AA%E5%91%BD%E5%90%8D.jpg 

Does anyone know why this is happening and how to fix?\
Thanks very much.



--
View this message in context: 
http://r.789695.n4.nabble.com/names-attribute-must-be-the-same-length-as-the-vector-tp4503946p4503946.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] limma design matrix

2012-03-25 Thread statfan
I am trying to construct my design matrix needed in the {limma} function
"lmfit" but am having trouble with the formula I am to specify in the
function "model.matrix".  Namely when to I use ~0 + factors (ex 1) vs ~-1 +
factors (ex 2).  Any clarification on this would be greatly appreciated. 
Thanks in advanced.  

Ex 1:
  f <- factor(targets$Target, levels=c("RNA1","RNA2","RNA3"))
design <- model.matrix(~0+f)
[page 409, of Bioinformatics and computational biology solutions using R and
Bioconductor]

and 

Ex 2:
library(ALL)
pdat <- pData(ALL)
design <- model.matrix(~-1 + factor(pdat$type) 
[page 237, of Bioinformatics and computational biology solutions using R and
Bioconductor]



--
View this message in context: 
http://r.789695.n4.nabble.com/limma-design-matrix-tp4503651p4503651.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] avoiding for loops

2012-03-25 Thread Richard M. Heiberger
> df1 <- data.frame(group=c("red", "red", "red", "blue", "blue",
"blue"), id=c("A", "B", "C", "D", "E", "F"))
df1 <- data.frame(group=c("red", "red", "red", "blue", "blue",
+ "blue"), id=c("A", "B", "C", "D", "E", "F"))
> df1
  group id
1   red  A
2   red  B
3   red  C
4  blue  D
5  blue  E
6  blue  F
> with(df1, split(id, group))
$blue
[1] D E F
Levels: A B C D E F

$red
[1] A B C
Levels: A B C D E F




On Sun, Mar 25, 2012 at 4:46 PM, Ed Siefker  wrote:

> I have data that looks like this:
>
> > df1
>  group id
> 1   red  A
> 2   red  B
> 3   red  C
> 4  blue  D
> 5  blue  E
> 6  blue  F
>
>
> I want a list of the groups containing vectors with the ids.I am
> avoiding subset(), as it is
> only recommended for interactive use.  Here's what I have so far:
>
> df1 <- data.frame(group=c("red", "red", "red", "blue", "blue",
> "blue"), id=c("A", "B", "C", "D", "E", "F"))
>
> groups <- levels(df1$group)
> byid <- lapply(groups, "==", df1$group)
> groupIDX <- lapply(byid, which)
>
> > groupIDX
> [[1]]
> [1] 4 5 6
>
> [[2]]
> [1] 1 2 3
>
>
>
> This gives me a list of the indices for each group.  I want to subset
> df1 based on this list.
> If I want just one group I can do this:
>
> > df1[groupIDX[[1]],]$id
> [1] D E F
>
>
> What sort of statement should I use if I want a result like:
> [[1]]
> [1] D E F
> Levels: A B C D E F
>
> [[2]]
> [1] A B C
> Levels: A B C D E F
>
>
> So far, I've used a for loop.  Can I express this with apply statements?
>
> groupIDs <- list(1:length(groupIDX))
> groupData<-
> for (i in 1:length(groupIDX)) {
>groupIDs[[i]] <- df1[groupIDX[[i]],]$id
>}
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] avoiding for loops

2012-03-25 Thread Ed Siefker
I have data that looks like this:

> df1
  group id
1   red  A
2   red  B
3   red  C
4  blue  D
5  blue  E
6  blue  F


I want a list of the groups containing vectors with the ids.I am
avoiding subset(), as it is
only recommended for interactive use.  Here's what I have so far:

df1 <- data.frame(group=c("red", "red", "red", "blue", "blue",
"blue"), id=c("A", "B", "C", "D", "E", "F"))

groups <- levels(df1$group)
byid <- lapply(groups, "==", df1$group)
groupIDX <- lapply(byid, which)

> groupIDX
[[1]]
[1] 4 5 6

[[2]]
[1] 1 2 3



This gives me a list of the indices for each group.  I want to subset
df1 based on this list.
If I want just one group I can do this:

> df1[groupIDX[[1]],]$id
[1] D E F


What sort of statement should I use if I want a result like:
[[1]]
[1] D E F
Levels: A B C D E F

[[2]]
[1] A B C
Levels: A B C D E F


So far, I've used a for loop.  Can I express this with apply statements?

groupIDs <- list(1:length(groupIDX))
groupData<-
for (i in 1:length(groupIDX)) {
groupIDs[[i]] <- df1[groupIDX[[i]],]$id
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weird POSIXct behaviour

2012-03-25 Thread Joshua Ulrich
On Sun, Mar 25, 2012 at 3:01 PM, Worik R  wrote:
> My bad.  I should be clearer about the source of my confusion.
>
> Given two identical string representations of POSIXct objects, can the two
> objects represent different times?
>
Yes.  Here's an example (from my Ubuntu machine) of one way:

> (t1 <- Sys.time()); (t2 <- Sys.time())+0.001; t1 == t2
[1] "2012-03-25 15:13:48 CDT"
[1] "2012-03-25 15:13:48 CDT"
[1] FALSE
> options(digits.secs=3)
> (t1 <- Sys.time()); (t2 <- Sys.time())+0.001; t1 == t2
[1] "2012-03-25 15:17:36.520 CDT"
[1] "2012-03-25 15:17:36.523 CDT"
[1] FALSE

>
>> > Where has it gone?
>> >
>> It's hard to say, especially since you give no indication how you
>> assigned the value to Time.  A reproducible example, as requested in
>> the posting guide, would be helpful.  Also, what version of R, xts,
>> and zoo are you using?
>>
>>
> Browse[2]> Time
> [1] "2012-03-20 00:59:57 NZDT"
>
> Browse[2]> index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"])
> [1] "2012-03-20 00:59:57 NZDT"
>
> A reproducible example would be huge at this point.  WHat I need is an
> answer to that simple question.  If the answer is  "No" then it is worth
> doing the work for a reproducible example.  If "Yes" I need to learn why
> and better ways of pasing the objects in and out of matrices and vectors.
>
No need for it to be huge.  dput(DATA.ba[[p]]["2012-03-20
00:59:57","bid"]) would be a sufficient start.

> Using R version  2.12.1
>
> The zoo documentation (?zoo) does not include a version.  Where can I find
> it?
>
>From the output of sessionInfo(), or packageDescription("zoo").  The
output from sessionInfo() would be more helpful because it provides
more information about your installation.

> Worik
>
>

--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com

R/Finance 2012: Applied Finance with R
www.RinFinance.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weird POSIXct behaviour

2012-03-25 Thread Worik R
My bad.  I should be clearer about the source of my confusion.

Given two identical string representations of POSIXct objects, can the two
objects represent different times?


> > Where has it gone?
> >
> It's hard to say, especially since you give no indication how you
> assigned the value to Time.  A reproducible example, as requested in
> the posting guide, would be helpful.  Also, what version of R, xts,
> and zoo are you using?
>
>
Browse[2]> Time
[1] "2012-03-20 00:59:57 NZDT"

Browse[2]> index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"])
[1] "2012-03-20 00:59:57 NZDT"

A reproducible example would be huge at this point.  WHat I need is an
answer to that simple question.  If the answer is  "No" then it is worth
doing the work for a reproducible example.  If "Yes" I need to learn why
and better ways of pasing the objects in and out of matrices and vectors.

Using R version  2.12.1

The zoo documentation (?zoo) does not include a version.  Where can I find
it?

Worik


>
> > Looking closer
> >
> >
> > Browse[2]> index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"])
> > [1] "2012-03-20 00:59:57 NZDT"
> > Browse[2]>
> >
> > Browse[2]> Time
> > [1] "2012-03-20 00:59:57 NZDT"
> > Browse[2]>
> >
> > Browse[2]> index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"]) == Time
> > [1] FALSE
> > Browse[2]>
> >
> > Browse[2]> class(Time)
> > [1] "POSIXct" "POSIXt"
> >
> > Browse[2]> class(index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"]))
> > [1] "POSIXct" "POSIXt"
> >
> > So the variable 'Time' should be good to index DATA.ba[[p]].  I am
> > hopelessly confused.
> >
> The printed representation of the index value and class cannot be used
> to accurately determine equality.
>
> > cheers
> > Worik
> >
>
> --
> Joshua Ulrich  |  FOSS Trading: www.fosstrading.com
>
> R/Finance 2012: Applied Finance with R
> www.RinFinance.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonparmetric statistics with weighted data set?

2012-03-25 Thread Thomas Lumley
On Mon, Mar 26, 2012 at 8:45 AM, Barnet Wagman  wrote:
> I'm doing some work with a weighted data set (the CPS) and I'd like to use
> nonparametric techniques. Is there an R package that supports this?  The np
> package doesn't appear to have provisions for weighted data. (Or have I
> missed something?)  I need to perform both density estimation and
> regressions.

You may need to be more precise about what you mean by 'nonparametric
techniques', but if you want smoothing you can probably do what you
want with the survey package.

svysmooth() does densities and both mean and quantile scatterplot smoothing.

For regression models with smooth terms you would have to use
regression splines in svyglm(), but they are almost indistinguishable
from smoothing splines in practice.

For other sorts of non-parametric techniques there are two-sample rank
tests in the survey package.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple hexbin plots with varying greatest densities

2012-03-25 Thread Thomas Lumley
On Mon, Mar 26, 2012 at 4:59 AM, Wayne Gray  wrote:
> Greetings:
>
> I have multiple hexbin plots with varying greatest densities.
>
> Right now, the data on each plot varies from 1-256 levels of density. The 
> problem with that is that in Plot A the data are more scattered across a 2 x 
> 2 grid whereas in Plot B the data are more concentrated in fewer cells. 
> Hence, Plot A has a few very dense grid cells and plot B has nothing as dense 
> as plot A.
>
> (In Plot A the legend shows that the range of densities per cell is 1-700 
> whereas for Plot B it is 1-453. Both plots use the full range of available 
> colors with 256 gradients.)
>
> I want people to be able to compare the two plots and being able to see the 
> differences in variability of the densities is one of the things I want them 
> to see. AT PRESENT, each plot uses the full range of densities; hence, this 
> makes it seem as if each plot has the maximum highest and minimum lowest 
> densities.
>
> How can I do this?

With the 'lattice' or 'centroids' styles you can use the maxarea=
argument to plot.hexbin or to grid.hexagons.

survey::svycoplot() is an example of this: you can either have the
same scale across panels or have each panel use the full range of
sizes.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Nonparmetric statistics with weighted data set?

2012-03-25 Thread Barnet Wagman
I'm doing some work with a weighted data set (the CPS) and I'd like to 
use nonparametric techniques. Is there an R package that supports this?  
The np package doesn't appear to have provisions for weighted data. (Or 
have I missed something?)  I need to perform both density estimation and 
regressions.


thanks,

bw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handling 8GB .txt file in R?

2012-03-25 Thread steven mosher
As the other poster noted, you can just skip lines.

Big matrix should work just fine, except I am not sure how the dates will
be handled

Here is some sample code from my stuff
txtName   is the file name of the file you are reading
Directory   is the path where you want to write the file.backed matrix
filenameis the file.backed big matrix
dname  is a filename for describing the data
sep   What's your separator, comma or space?  below I use tab,
because my file is tab delimited

replace my column names with yours

PERMNO  DATETICKERPERMCO PRC
VOLNUMTRDvwretdewretd"

Your dates may be coerced in factors. Not sure how that will work.
You can also try ff

  options(bigmemory.allow.dimnames=TRUE)
 D<- read.big.matrix(txtName, skip = 5,
 backingpath = Directory,
  backingfile = filename,
 descriptorfile = dname,
 sep = "\t",
 type = "double",
 col.names =
 c("Id","SeriesNo","Date","Temp","Unc","Obs","Tobs")
 )



On Sat, Mar 24, 2012 at 9:20 PM, iliketurtles  wrote:

> Thanks to all the suggestions. To the first individual that replied, I
> can't
> do any stuff with unix or perl. All I know is R.
>
> @KEN:
> I'm using Windows 7, 64 bit.
>
> @Steve:
> Here's the readLines output.. As we can see, lines 1-3 are empty and line 5
> is empty, and there's also empty elements after line 5!.
>
>  [1] " "
>  [2] "
> "
>  [3] " "
>  [4] "  PERMNO  DATETICKERPERMCO PRC
> VOLNUMTRDvwretdewretd"
>  [5] ""
>  [6] "   106/01/19867952  .
> . . -0.000138  0.001926"
>  [7] "   107/01/1986OMFGA   7952-2.56250
> 1000 .  0.013809  0.011061"
>  [8] "   108/01/1986OMFGA   7952-2.5
> 12800 . -0.020744 -0.005117"
>  [9] "   109/01/1986OMFGA   7952-2.5
> 1400 . -0.011219 -0.011588"
>  [10] "   110/01/1986OMFGA   7952-2.5
> 8500 .  0.83  0.003651"
>  [11] "   113/01/1986OMFGA   7952-2.62500
> 5450 .  0.002749  0.002433"
>
> -
> 
>
> Isaac
> Research Assistant
> Quantitative Finance Faculty, UTS
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4502706.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading big files in chunks-ff package

2012-03-25 Thread Jan van der Laan
The 'normal' way of doing that with ff is to first convert your csv  
file completely to a
ffdf object (which stores its data on disk so shouldn't give any  
memory problems). You
can then use the chunk routine (see ?chunk) to divide your data in the  
required chunks.


Untested so may contain errors:

ffdf <- read.table.ffdf(...)

chnks <- chunk(from=1, to=nrow(yourffdf), by=5E6, method='seq')

for (chnk in chnks) {
  # read data
  data <- ffdf[chnk, ]
  # do your thing with the data
  # clean up
  rm(data)
  gc()
}


If you want to process your csv file directly in chunks, you could  
also have a look at
the LaF package. Especially the process_blocks routine which does  
exactly that. The
manual vignette  
(http://cran.r-project.org/web/packages/LaF/vignettes/LaF-manual.pdf)

contains some examples how to do that.

Jan



Quoting Mav :


Thank you Jan

My problem is the following:
For instance, I have 2 files with different number of rows (15 million and 8
million of rows each).
I would like to read the first one in chunks of 5 million each. However
between the first and second chunk, I would like to analyze those first 5
million of rows, write the analysis in a new csv and then proceed to read
and analyze the second chunk and so on until the third chunk. With the
second file, I would like to do the same...read the first chunk, analyze it
and continue to read the second and analyze it.

Basically my problem is that I manage to read the filesbut with so many
rows...I cannot do any analyses (even filtering the rows) because of the RAM
restrictions.

Sorry if is still not clear.

Thank you

--
View this message in context:   
http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4503642.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multivariate function from univariate functions

2012-03-25 Thread Petr Savicky
On Sat, Mar 24, 2012 at 08:04:49PM -0700, physicistintheory wrote:
> I'm relatively new to R and I'm stuck.
> 
> I'm trying to construct a surface to optimize from a multivariate dataset. 
> The dataset contains the response of a system to various stimuli.  I am
> trying to optimize the mix of stimuli to maximize the response.  To do so,
> I've interpolated the various datasets of type response vs. stimuli and I
> now have an array of functions "interps" whose length is the length of the
> array of the names of the various stimuli.  I've also created a vector
> containing names for the stimuli, vars = x.1, x.2, x.3...
> 
> Anyway, each of the functions in interps depends only on one variable
> (obviously).  I would like to construct a function, call it, "surface" which
> is essentially: surface(vars) = interps[[1]]vars[[1]] +
> interps[[2]]vars[[2]]+...
> 
> I've tried constructing surface recursively in a for loop:  
> surface <- function(vars){0}
> for(i in 1:length(vars)){
> surface <- function(vars){surface(vars) + interps[[i]](vars[[i]])}
> }

Hi.

Is the following close to what you are asking for?

  f1 <- function(x) 3*x^2 + x
  f2 <- function(x) 2*x^2 + 2*x
  f3 <- function(x) x^2 + 3*x
  lstf <- list(f1, f2, f3)
  args <- c("x2", "x4", "x5")

  # input as a matrix or data frame
  surface <- function(dat)
  {
  y <- rep(0, times=nrow(dat))
  for (i in seq.int(along=lstf)) {
  y <- y + lstf[[i]](dat[, args[i]])
  }
  y
  }

  dat <- matrix(1:35, ncol=5)
  colnames(dat) <- paste("x", 1:5, sep="")

  surface(dat)

  [1] 2140 2346 2564 2794 3036 3290 3556

  # compare with an explicit formula
  f1(dat[, "x2"]) + f2(dat[, "x4"]) + f3(dat[, "x5"])

  [1] 2140 2346 2564 2794 3036 3290 3556

  # input as a named vector
  surface1 <- function(x)
  {
  y <- 0
  for (i in seq.int(along=lstf)) {
  y <- y + lstf[[i]](x[args[i]])
  }
  unname(y)
  }

  vdat <- 1:5
  names(vdat) <- paste("x", 1:5, sep="")

  surface1(vdat) 

  [1] 94

  # compare again
  unname(f1(vdat["x2"]) + f2(vdat["x4"]) + f3(vdat["x5"]))

  [1] 94

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading big files in chunks-ff package

2012-03-25 Thread Mav
Thank you Jan

My problem is the following:
For instance, I have 2 files with different number of rows (15 million and 8
million of rows each).
I would like to read the first one in chunks of 5 million each. However
between the first and second chunk, I would like to analyze those first 5
million of rows, write the analysis in a new csv and then proceed to read
and analyze the second chunk and so on until the third chunk. With the
second file, I would like to do the same...read the first chunk, analyze it
and continue to read the second and analyze it.

Basically my problem is that I manage to read the filesbut with so many
rows...I cannot do any analyses (even filtering the rows) because of the RAM
restrictions.

Sorry if is still not clear.

Thank you

--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4503642.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple hexbin plots with varying greatest densities

2012-03-25 Thread Wayne Gray
Greetings:

I have multiple hexbin plots with varying greatest densities. 

Right now, the data on each plot varies from 1-256 levels of density. The 
problem with that is that in Plot A the data are more scattered across a 2 x 2 
grid whereas in Plot B the data are more concentrated in fewer cells. Hence, 
Plot A has a few very dense grid cells and plot B has nothing as dense as plot 
A. 

(In Plot A the legend shows that the range of densities per cell is 1-700 
whereas for Plot B it is 1-453. Both plots use the full range of available 
colors with 256 gradients.)

I want people to be able to compare the two plots and being able to see the 
differences in variability of the densities is one of the things I want them to 
see. AT PRESENT, each plot uses the full range of densities; hence, this makes 
it seem as if each plot has the maximum highest and minimum lowest densities. 

How can I do this?

Thanks,

Wayne Gray

--
View this message in context: 
http://r.789695.n4.nabble.com/multiple-hexbin-plots-with-varying-greatest-densities-tp4503405p4503405.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROC Analysis

2012-03-25 Thread Camille Leclerc
Hi everybody,

Pascal, your code works, but when I want to do the graph I have an error
message. 

here is my code :
x<-rev(unlist(pred@cutoffs))
tpf<-unlist(performance(pred, "tpr")@y.values)
fpf<-unlist(performance(pred,"fpr")@y.values)
ll<-length(x)
p<-(tpf[1:(ll-1)]-tpf[2:ll])/(fpf[1:(ll-1)]-fpf[2:ll])
plot(x,p)

*Erreur dans xy.coords(x, y, xlabel, ylabel, log) : 
'x' and 'y' lengths differ*

So, when I look the lenghts of x and p, I have this :
*x : numeric[1735]
p : numeric[1734]*

On the other hand, it's normal since I have the slope between two points on
the ROC curve and so I have x points and x-1 slope values. How to get the
graph?!

All the best,
Camille

-
--
Camille Leclerc, Master student
Lab ESE, UMR CNRS 8079
Univ Paris-Sud
Bat 362
F-91405  Orsay Cedex FRANCE
--
View this message in context: 
http://r.789695.n4.nabble.com/ROC-Analysis-tp4469203p4503354.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to get the input of a function right?

2012-03-25 Thread Robert Baer
Your definition of x1, x2 and x3 requires the c() function.  Try leaving 
some space around operators for increased readability.


Remember that the that the result of operations on sets containing NA is 
often set to "NA" as well.  My guess is that if you print out x it contains 
NA and that your function does operations on x that are included in the 
function result.


Rob

-Original Message- 
From: stella

Sent: Thursday, March 22, 2012 10:43 AM
To: r-help@r-project.org
Subject: [R] How to get the input of a function right?

Hi,

I wrote a function with three inputs fun(x,y,z).

x is a matrix of three vectors combined with cbind. e.g.

x1<-(1,2,3,4)
x2<-(2,3,4,5)
x3<-(3,4,5,6)

x<-cbind(x1,x2,x3)

y is a vector e.g
y<-c(7,8,9)

z is a real number e.g.
z<-2.5

If a give the function an input like this, I get 'NA' in return. If I give
the function a vector e.g c(1,2,3) instead of 'x' the function works just
fine.
Does anyone has an idea why the function would not except 'x' as an input?

Thank you very much!
Stella

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-get-the-input-of-a-function-right-tp4495879p4495879.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weird POSIXct behaviour

2012-03-25 Thread Joshua Ulrich
On Sun, Mar 25, 2012 at 3:15 AM, Worik R  wrote:
> Friends
>
> I have an xts that I wish to access.
>
> Browse[2]> DATA.ba[[p]]["2012-03-20 00:59:57","bid"]
>                       bid
> 2012-03-20 00:59:57 1.4993
>
> So far so good.
>
> Now putting the index into a variable:
>
> Browse[2]> Time
> [1] "2012-03-20 00:59:57 NZDT"
> Browse[2]> DATA.ba[[p]][Time, "bid"]
>     bid
>
> Where has it gone?
>
It's hard to say, especially since you give no indication how you
assigned the value to Time.  A reproducible example, as requested in
the posting guide, would be helpful.  Also, what version of R, xts,
and zoo are you using?

>
> Looking closer
>
>
> Browse[2]> index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"])
> [1] "2012-03-20 00:59:57 NZDT"
> Browse[2]>
>
> Browse[2]> Time
> [1] "2012-03-20 00:59:57 NZDT"
> Browse[2]>
>
> Browse[2]> index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"]) == Time
> [1] FALSE
> Browse[2]>
>
> Browse[2]> class(Time)
> [1] "POSIXct" "POSIXt"
>
> Browse[2]> class(index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"]))
> [1] "POSIXct" "POSIXt"
>
> So the variable 'Time' should be good to index DATA.ba[[p]].  I am
> hopelessly confused.
>
The printed representation of the index value and class cannot be used
to accurately determine equality.

> cheers
> Worik
>

--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com

R/Finance 2012: Applied Finance with R
www.RinFinance.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonparametric bivariate distribution estimation and sampling

2012-03-25 Thread heyi xiao
David,
Great idea! It is simple yet very effective. I didn’t know that sample can be 
used with probability/weight. I feel that I learned a lot. Thanks a lot for the 
help!
Heyi


--- On Fri, 3/23/12, David Winsemius  wrote:

> From: David Winsemius 
> Subject: Re: [R] Nonparametric bivariate distribution estimation and sampling
> To: "heyi xiao" 
> Cc: "Sarah Goslee" , r-help@r-project.org
> Date: Friday, March 23, 2012, 5:29 PM
> 
> On Mar 23, 2012, at 3:55 PM, heyi xiao wrote:
> 
> > David,
> > Thanks a lot for the specific suggestions. That’s
> very helpful. My question 1 is fully answered now. I guess I
> am not clear enough for my question 2. I would like to
> generate a random sample using the estimated probability
> density (as a result of my question 1) as the reference
> distribution.
> 
> > Say, I get a matrix of the estimated density (at some
> grid points) using MASS::kde2d. How can I use that result as
> a reference distribution to sample data from? I know it is a
> trivial issue for parametric distributions like bivariate
> normal, but what about such a nonparametric bivariate
> reference distribution? Any particular procedures or
> functions I can use?
> 
> See if this works:
> 
> data(geyser, package="MASS")
> x <- cbind(geyser$duration, geyser$waiting)
> est <- bkde2D(x, bandwidth=c(0.7, 7))
> 
> # Heh, I realized after I did this that I started with
> KernSmooth::bkde2D
> # and checked the results with MASS::kde2d
> # only difference appears to be name of density matrix
> # Construct a dataframe with X.Y information and the data
> from the bivariate density.
> # The output of bkde2D with n=50 is:
> 
> #List of 3
>  #$ x1  : num [1:51] -0.2167 -0.0823 0.052 0.1863
> 0.3207 ...
> # $ x2  : num [1:51] 32.5 34.2 35.9 37.7 39.4 ...
> # $ fhat: num [1:51, 1:51] 3.05e-19 2.17e-19 3.25e-19
> 2.17e-19 0.00 ...
> 
> # The index X.Y could be X + 51*Y and there would be a 1:1
> mapping from (X,Y) to X.Y
> # and the fhat values would be properly arranged
> 
> dfrm <- expand.grid(X=1:51, Y=1:51)
> dfrm$fhat <- c(est$fhat)
> 
> #Sample randomly from X.Y with length=51*51 using the fhat
> values for prob.
> 
> # The X.Y "index" never actually gets computed
> # but is implicit in the order of the data.frame
>  sampfrm <- dfrm[sample(51*51, 300, prob=est$fhat) , ]
>  f2 <- with(sampfrm,  MASS::kde2d(X, Y, n = 50, lims
> = c(0, 51, 0, 51)) )
>  persp(f2)
> 
> # Looks reasonable to my eye anyway.
> 
> --David.
> 
> > The reason I don’t want to use sampling (with
> replacement, I can sample more data than I have without
> replacement), as this will generate lots of duplicate data
> points, if I want to generated bigger dataset yet my raw
> data do not have a big sample size. The scatter plot of the
> sampled data doesn’t look good this way.
> > Heyi
> > 
> > 
> > --- On Fri, 3/23/12, David Winsemius 
> wrote:
> > 
> >> From: David Winsemius 
> >> Subject: Re: [R] Nonparametric bivariate
> distribution estimation and sampling
> >> To: "heyi xiao" 
> >> Cc: "Sarah Goslee" ,
> r-help@r-project.org
> >> Date: Friday, March 23, 2012, 2:20 PM
> >> 
> >> On Mar 23, 2012, at 1:53 PM, heyi xiao wrote:
> >> 
> >>> Sarah,
> >>> Thanks for the response. I actually have
> several years
> >> of working experience with R and statistics,
> although may
> >> not be as good as you. that’s why I am here ;) I
> dug
> >> deeper into R documentations and previous R-help
> posts, and
> >> couldn’t found anything particular help. Again, I
> want to
> >> do two things: (1) estimate the probability density
> of this
> >> bivariate distribution using some nonparametric
> method
> >> (kernel, spline etc);
> >> 
> >> ?MASS::kde2d
> >> ?KernSmooth::bkde2D
> >> ?ade4::s.kde2d
> >> help(package=locfit)
> >> 
> >>> (2) sample a big dataset from this bivariate
> >> distribution for a simulation study.
> >> 
> >> What is wrong with `sample`?
> >> 
> >> # to get sample of size n without replacement
> >> set.seed(42)
> >> dfrm[ sample(1:NROW(dfrm), n) , ]
> >> 
> >> --David.
> >>> If my questions are not clear enough show my
> how I can
> >> improve, or which part is not clear enough. If you
> have any
> >> particular suggestions/comments, you are more than
> welcome.
> >> Thanks!
> >>> Heyi
> >>> 
> >>> 
> >>> --- On Fri, 3/23/12, Sarah Goslee 
> >> wrote:
> >>> 
>  From: Sarah Goslee 
>  Subject: Re: [R] Nonparametric bivariate
> >> distribution estimation and sampling
>  To: "heyi xiao" 
>  Cc: r-help@r-project.org
>  Date: Friday, March 23, 2012, 12:26 PM
>  R can do all of that and more.
>  
>  But you'll need to put some work in reading
> about
> >> how to use
>  R, about
>  the statistical methods involved, and about
> how to
> >> use them
>  to best
>  effect. You might want, for instance,
> generalized
> >> additive
>  models. Or
>  not. If your question isn't more
> fully-formed than
> >> this,
>  your best bet
>  is almost certainly to talk to a local
> >> statistician, 

Re: [R] Multivariate function from univariate functions

2012-03-25 Thread Mitchell Maltenfort
Naïve question: would a saturated multivariate model work as an
interpolation function?

On 3/24/12, physicistintheory  wrote:
> I'm relatively new to R and I'm stuck.
>
> I'm trying to construct a surface to optimize from a multivariate dataset.
> The dataset contains the response of a system to various stimuli.  I am
> trying to optimize the mix of stimuli to maximize the response.  To do so,
> I've interpolated the various datasets of type response vs. stimuli and I
> now have an array of functions "interps" whose length is the length of the
> array of the names of the various stimuli.  I've also created a vector
> containing names for the stimuli, vars = x.1, x.2, x.3...
>
> Anyway, each of the functions in interps depends only on one variable
> (obviously).  I would like to construct a function, call it, "surface" which
> is essentially: surface(vars) = interps[[1]]vars[[1]] +
> interps[[2]]vars[[2]]+...
>
> I've tried constructing surface recursively in a for loop:
> surface <- function(vars){0}
> for(i in 1:length(vars)){
> surface <- function(vars){surface(vars) + interps[[i]](vars[[i]])}
> }
>
> This results in an infinite recursion error...which I think I understand.
> Essentially, I'm trying to do the analog of something like i = i+1, but
> instead of adding integers, I want to add terms to a function.
>
> Any help is appreciated.
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Multivariate-function-from-univariate-functions-tp4502670p4502670.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Sent from my mobile device


Ersatzistician and Chutzpahthologist
I can answer any question.  "I don't know" is an answer. "I don't know yet"
is a better answer.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SPLS does not output probabilities for predicted classes.

2012-03-25 Thread saskay
Hi,
I was using SPLS package to do multi-class classification and would require
probabilities to be output for each class.
In the vignette, the documentation does say that you can output
probabilities by requesting fit.type = "response", however I still only get
predicted classes, rather than the respective probabilities.

library(spls)
data(prostate)
# SPLSDA with eta=0.8 & 3 hidden components
f <- splsda( prostate$x, prostate$y, K=3, eta=0.8, scale.x=FALSE )
# Prediction on the training dataset
(pred.f <- predict( f, type="fit" ))
#predict probabilities
pred.f_withprobs <- predict( f, type="fit", fit.type = "response" )



--
View this message in context: 
http://r.789695.n4.nabble.com/SPLS-does-not-output-probabilities-for-predicted-classes-tp4503031p4503031.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to calculate deviance on test data

2012-03-25 Thread zmlr
Hi all,

If I got a Cox model based on training set, then how should I calculate the
Cox log partial likelihood for the test data?
Actually I am trying to calculate the deviance on test dataset to evaluate
the performance of prediction model, the equation is as follows: D =
-2{L(test)[beta_train] - L(test)[0]}. It means using the beta coefficients
got from training set to calculate the likelihood of test data. I know I can
got log likelihood for training model, but how to get it for test data?
Any package or function can do that? 


--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-calculate-deviance-on-test-data-tp4503206p4503206.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2012-03-25 Thread Anjana Thampi
How do you decompose inequality in R, say by gender?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Work -Shift Scheduling - Constraint Linear Programming

2012-03-25 Thread agent dunham
Dear Community, 

I've a Work -Shift Scheduling Problem I'd like to solve via constraint
linear programming. 

Maybe something similar to
http://support.sas.com/documentation/cdl/en/orcpug/63349/HTML/default/viewer.htm#orcpug_clp_sect037.htm

Can anybody suggest me any package/R examples to solve this?

If it's needed more details of my little problemm I can provide. 

Thanks in advance, u...@host.com as u...@host.com

--
View this message in context: 
http://r.789695.n4.nabble.com/Work-Shift-Scheduling-Constraint-Linear-Programming-tp4503037p4503037.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] output by(...)

2012-03-25 Thread INCOMA GfK
perfect, thanks a lot! :o)
-z


Odesílatel: Gabor Grothendieck [ggrothendi...@gmail.com]
Odesláno: 25. března 2012 12:27
Komu: Skála, Zdeněk (INCOMA GfK)
Kopie: r-help@r-project.org
Předmět: Re: [R] output by(...)

2012/3/25 Skála, Zdeněk (INCOMA GfK) :
> Dear all,
>
> I have a question that is probably pretty stupid, so apologies in advance...
>
> I do a simple
>
>> mydata.tab <- by(my.data.frame, my.data.frame$category, colMeans)
>
> ...works fine, but I need to output the results to some "flat" file (kind of 
> table) to work with it in Excel etc.
>
> So I am doing now
>
>> capture.output(data.frame(unlist(mydata.tab)), file="mydata.txt")
>
> ...and process the result in Excel.
> Do you know a more pretty way to do this task? Perhaps something other than 
> 'by()' to make a table of colMeans?
>

Try this:

do.call("rbind", by(iris[-5], iris[[5]], colMeans))




--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to test omitted level from a multiple level factor against overall mean in regression models?

2012-03-25 Thread Biedermann, Jürgen
Hi Gabor,

Thanks a lot for the answer. 
However, I'm not so much focusing on the pure effect value of the omitted 
factor level, but more on the statistical test if it
differs significantly from 0.
Do you know a way for this purpose too?

Greetings Jürgen

Von: Gabor Grothendieck [ggrothendi...@gmail.com]
Gesendet: Sonntag, 25. März 2012 14:11
An: Biedermann, Jürgen
Cc: r-help@R-project.org
Betreff: Re: [R] How to test omitted level from a multiple level factor against 
overall mean in regression models?

2012/3/25 "Biedermann, Jürgen" :
> Hi there,
>
> I have a linear model with one factor having three levels.
> I want to check if the different levels significantly differ from the overall 
> mean (using contr.sum).
> However one level (the last) is omitted in the standard procedure.
>
> To illustrate this:
>
> x <- as.factor(c(1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3))
> y <- 
> c(1.1,1.15,1.2,1.1,1.1,1.1,1.2,1.2,1.2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3,3.1)
> test <- data.frame(x,y)
> reg1 <- lm(y~C(x,contr.sum),data=test)
> summary(reg1)
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept)   1.60.06577  24.834 8.48e-15 ***
> C(x, contr.sum)1 -0.483330.10792  -4.479  0.00033 ***
> C(x, contr.sum)2 -0.483330.08936  -5.409 4.70e-05 ***
>
> Is it possible to get the effect for the third level (against the overall 
> mean) in the table too.
>
> I figured out:
>
> reg2 <- lm(y~C(relevel(x,3),contr.sum),data=test)
> summary(reg2)
>
> C(relevel(x, 3), contr.sum)1  0.966670.07951  12.158 8.24e-10 ***
> C(relevel(x, 3), contr.sum)2 -0.483330.10792  -4.479  0.00033 ***
>
>
> The first row now test the third level against the overall mean, but I find 
> this approach not so convenient.
> Moreover, I wonder if it is meaningful at all regarding the cumulation of 
> alpha error. Would a Bonferroni correction be sensible?
>

Try this:

> options(contrasts = c("contr.sum", "contr.poly"))
> reg1 <- lm(y~x,data=test)
> dummy.coef(reg1)
Full coefficients are

(Intercept):  1.63
x:   1  2  3
-0.483 -0.483  0.967

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Format wanted...

2012-03-25 Thread Marc Schwartz

On Mar 25, 2012, at 7:14 AM, Duncan Murdoch wrote:

> On 12-03-24 10:47 PM, J Toll wrote:
>> On Sat, Mar 24, 2012 at 7:30 PM, Duncan Murdoch
>>   wrote:
>>> Do we have a format that always includes a decimal point and a given number
>>> of significant digits, but otherwise drops unnecessary characters?  For
>>> example, if I wanted 5 digits, I'd want the following:
>>> 
>>> Round to 5 digits:
>>> 1.234567  ->  "1.2346"
>>> 
>>> Drop unnecessary zeros:
>>> 1.23  ->  "1.23"
>>> 
>>> Force inclusion of a decimal point:
>>> 1 ->  "1."
>>> 
>> 
>> Duncan,
>> 
>> Maybe sprintf() will work for you.  As it's a wrapper for C sprintf,
>> it should have its functionality.
> 
> Maybe, but with which format string?
> 
> Duncan Murdoch


I don't believe (though could be wrong), that you can do it all with one format 
string, but can do it conditionally based upon the input. According to the C 
printf documentation, the use of "#" forces a decimal point to be present, even 
if there are no trailing digits. Thus:

> sprintf("%#.f", 1)
[1] "1."

The other two values seem to be handled by signif() when applied to each value 
individually:

> signif(1.234567, 5)
[1] 1.2346

> signif(1.23, 5)
[1] 1.23

But, not when a vector:

> signif(c(1.234567, 1.23), 5)
[1] 1.2346 1.2300


So, wrapping that inside a function, using ifelse() to test for an integer 
value:

signif.d <- function(x, digits)
{
  ifelse(x == round(x), 
 sprintf("%.#f", x), 
 signif(x, digits))
}


x <- c(1.234567, 1.23, 1)

> signif.d(x, 5)
[1] "1.2346" "1.23"   "1."  

> signif.d(x, 6)
[1] "1.23457" "1.23""1." 

> signif.d(x, 7)
[1] "1.234567" "1.23" "1."  


Not extensively tested of course, but hopefully that might work for your needs 
Duncan.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Format wanted...

2012-03-25 Thread Duncan Murdoch

On 12-03-24 10:47 PM, J Toll wrote:

On Sat, Mar 24, 2012 at 7:30 PM, Duncan Murdoch
  wrote:

Do we have a format that always includes a decimal point and a given number
of significant digits, but otherwise drops unnecessary characters?  For
example, if I wanted 5 digits, I'd want the following:

Round to 5 digits:
1.234567  ->  "1.2346"

Drop unnecessary zeros:
1.23  ->  "1.23"

Force inclusion of a decimal point:
1 ->  "1."



Duncan,

Maybe sprintf() will work for you.  As it's a wrapper for C sprintf,
it should have its functionality.


Maybe, but with which format string?

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to test omitted level from a multiple level factor against overall mean in regression models?

2012-03-25 Thread Gabor Grothendieck
2012/3/25 "Biedermann, Jürgen" :
> Hi there,
>
> I have a linear model with one factor having three levels.
> I want to check if the different levels significantly differ from the overall 
> mean (using contr.sum).
> However one level (the last) is omitted in the standard procedure.
>
> To illustrate this:
>
> x <- as.factor(c(1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3))
> y <- 
> c(1.1,1.15,1.2,1.1,1.1,1.1,1.2,1.2,1.2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3,3.1)
> test <- data.frame(x,y)
> reg1 <- lm(y~C(x,contr.sum),data=test)
> summary(reg1)
>
> Coefficients:
>                 Estimate Std. Error t value Pr(>|t|)
> (Intercept)       1.6    0.06577  24.834 8.48e-15 ***
> C(x, contr.sum)1 -0.48333    0.10792  -4.479  0.00033 ***
> C(x, contr.sum)2 -0.48333    0.08936  -5.409 4.70e-05 ***
>
> Is it possible to get the effect for the third level (against the overall 
> mean) in the table too.
>
> I figured out:
>
> reg2 <- lm(y~C(relevel(x,3),contr.sum),data=test)
> summary(reg2)
>
> C(relevel(x, 3), contr.sum)1  0.96667    0.07951  12.158 8.24e-10 ***
> C(relevel(x, 3), contr.sum)2 -0.48333    0.10792  -4.479  0.00033 ***
>
>
> The first row now test the third level against the overall mean, but I find 
> this approach not so convenient.
> Moreover, I wonder if it is meaningful at all regarding the cumulation of 
> alpha error. Would a Bonferroni correction be sensible?
>

Try this:

> options(contrasts = c("contr.sum", "contr.poly"))
> reg1 <- lm(y~x,data=test)
> dummy.coef(reg1)
Full coefficients are

(Intercept):  1.63
x:   1  2  3
-0.483 -0.483  0.967

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] output by(...)

2012-03-25 Thread peter dalgaard

On Mar 25, 2012, at 12:27 , Gabor Grothendieck wrote:

> 2012/3/25 Skála, Zdeněk (INCOMA GfK) :
>> Dear all,
>> 
>> I have a question that is probably pretty stupid, so apologies in advance...
>> 
>> I do a simple
>> 
>>> mydata.tab <- by(my.data.frame, my.data.frame$category, colMeans)
>> 
>> ...works fine, but I need to output the results to some "flat" file (kind of 
>> table) to work with it in Excel etc.
>> 
>> So I am doing now
>> 
>>> capture.output(data.frame(unlist(mydata.tab)), file="mydata.txt")
>> 
>> ...and process the result in Excel.
>> Do you know a more pretty way to do this task? Perhaps something other than 
>> 'by()' to make a table of colMeans?
>> 
> 
> Try this:
> 
> do.call("rbind", by(iris[-5], iris[[5]], colMeans))
> 
> 

How about 

aggregate(iris[-5], iris[5], mean)

?

> 
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] output by(...)

2012-03-25 Thread Gabor Grothendieck
2012/3/25 Skála, Zdeněk (INCOMA GfK) :
> Dear all,
>
> I have a question that is probably pretty stupid, so apologies in advance...
>
> I do a simple
>
>> mydata.tab <- by(my.data.frame, my.data.frame$category, colMeans)
>
> ...works fine, but I need to output the results to some "flat" file (kind of 
> table) to work with it in Excel etc.
>
> So I am doing now
>
>> capture.output(data.frame(unlist(mydata.tab)), file="mydata.txt")
>
> ...and process the result in Excel.
> Do you know a more pretty way to do this task? Perhaps something other than 
> 'by()' to make a table of colMeans?
>

Try this:

do.call("rbind", by(iris[-5], iris[[5]], colMeans))




-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading big files in chunks-ff package

2012-03-25 Thread Jan van der Laan


Your question is not completely clear. read.csv.ffdf  automatically 
reads in the data in chunks. You don´t have to do anything for that. You 
can specify the size of the chunks using the next.rows option.


Jan


On 03/24/2012 09:29 PM, Mav wrote:

Hello!
A question about reading large CSV files

I need to analyse several files with  sizes larger than 3 GB. Those files
have more than 10million rows (and up to 25 million) and 9 columns. Since I
don´t have a large RAM memory,  I think that the ff package can really help
me. I am trying to use read.csv.ffdf but I have some questions:

How can I read the files in several chunks…with an automatic way of
calculating the number of rows to include in each chunk? (my problem is that
the files have different number of rows)

For instance…. I have used
read.csv.ffdf(NULL, “file.csv”, sep="|", dec=".",header = T,row.names =
NULL,colClasses = c(rep("integer", 3), rep("integer", 10), rep("integer",
6)))
  But with this way I am reading the whole fileI would prefer to read it
in chunksbut I don´t know  how to read it in chunks

I have read the ff documentation but I am not good with R!

Thanks in advance!

--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4502070.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] output by(...)

2012-03-25 Thread INCOMA GfK
Dear all,

I have a question that is probably pretty stupid, so apologies in advance...

I do a simple

> mydata.tab <- by(my.data.frame, my.data.frame$category, colMeans)

...works fine, but I need to output the results to some "flat" file (kind of 
table) to work with it in Excel etc.

So I am doing now

> capture.output(data.frame(unlist(mydata.tab)), file="mydata.txt")

...and process the result in Excel.
Do you know a more pretty way to do this task? Perhaps something other than 
'by()' to make a table of colMeans?

Many thanks!
Zdenek

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to test omitted level from a multiple level factor against overall mean in regression models?

2012-03-25 Thread Biedermann, Jürgen
Hi there,

I have a linear model with one factor having three levels.
I want to check if the different levels significantly differ from the overall 
mean (using contr.sum).
However one level (the last) is omitted in the standard procedure.

To illustrate this:

x <- as.factor(c(1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3))
y <- 
c(1.1,1.15,1.2,1.1,1.1,1.1,1.2,1.2,1.2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3,3.1)
test <- data.frame(x,y)
reg1 <- lm(y~C(x,contr.sum),data=test)
summary(reg1)

Coefficients:
 Estimate Std. Error t value Pr(>|t|)   
(Intercept)   1.60.06577  24.834 8.48e-15 ***
C(x, contr.sum)1 -0.483330.10792  -4.479  0.00033 ***
C(x, contr.sum)2 -0.483330.08936  -5.409 4.70e-05 ***

Is it possible to get the effect for the third level (against the overall mean) 
in the table too.

I figured out:

reg2 <- lm(y~C(relevel(x,3),contr.sum),data=test)
summary(reg2)

C(relevel(x, 3), contr.sum)1  0.966670.07951  12.158 8.24e-10 ***
C(relevel(x, 3), contr.sum)2 -0.483330.10792  -4.479  0.00033 ***


The first row now test the third level against the overall mean, but I find 
this approach not so convenient.
Moreover, I wonder if it is meaningful at all regarding the cumulation of alpha 
error. Would a Bonferroni correction be sensible?

Greetings and thanks in advance
Jürgen
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handling 8GB .txt file in R?

2012-03-25 Thread Jan van der Laan


What you could try to do is skip the first 5 lines. After that the file 
seems to be 'normal'. With read.table.ffdf you could try something like


# open a connection to the file
con <- file('yourfile', 'rt')
# skip first 5 lines
tmp <- readLines(con, n=5)
# read the remainder using read.table.ffdf
ffdf <- read.table.ffdf(file=con)
# close connection
close(con)

HTH

Jan

On 03/25/2012 06:20 AM, iliketurtles wrote:

Thanks to all the suggestions. To the first individual that replied, I can't
do any stuff with unix or perl. All I know is R.

@KEN:
I'm using Windows 7, 64 bit.

@Steve:
Here's the readLines output.. As we can see, lines 1-3 are empty and line 5
is empty, and there's also empty elements after line 5!.

  [1] " "
   [2] "
"
   [3] " "
   [4] "  PERMNO  DATETICKERPERMCO PRC
VOLNUMTRDvwretdewretd"
   [5] ""
   [6] "   106/01/19867952  .
. . -0.000138  0.001926"
   [7] "   107/01/1986OMFGA   7952-2.56250
1000 .  0.013809  0.011061"
   [8] "   108/01/1986OMFGA   7952-2.5
12800 . -0.020744 -0.005117"
   [9] "   109/01/1986OMFGA   7952-2.5
1400 . -0.011219 -0.011588"
  [10] "   110/01/1986OMFGA   7952-2.5
8500 .  0.83  0.003651"
  [11] "   113/01/1986OMFGA   7952-2.62500
5450 .  0.002749  0.002433"

-


Isaac
Research Assistant
Quantitative Finance Faculty, UTS
--
View this message in context: 
http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4502706.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weird POSIXct behaviour

2012-03-25 Thread peter dalgaard
I see "NZDT" going in and out of your time values. Could that have something to 
do with it?

On Mar 25, 2012, at 10:15 , Worik R wrote:

> Friends
> 
> I have an xts that I wish to access.
> 
> Browse[2]> DATA.ba[[p]]["2012-03-20 00:59:57","bid"]
>   bid
> 2012-03-20 00:59:57 1.4993
> 
> So far so good.
> 
> Now putting the index into a variable:
> 
> Browse[2]> Time
> [1] "2012-03-20 00:59:57 NZDT"
> Browse[2]> DATA.ba[[p]][Time, "bid"]
> bid
> 
> Where has it gone?
> 
> 
> Looking closer
> 
> 
> Browse[2]> index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"])
> [1] "2012-03-20 00:59:57 NZDT"
> Browse[2]>
> 
> Browse[2]> Time
> [1] "2012-03-20 00:59:57 NZDT"
> Browse[2]>
> 
> Browse[2]> index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"]) == Time
> [1] FALSE
> Browse[2]>
> 
> Browse[2]> class(Time)
> [1] "POSIXct" "POSIXt"
> 
> Browse[2]> class(index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"]))
> [1] "POSIXct" "POSIXt"
> 
> So the variable 'Time' should be good to index DATA.ba[[p]].  I am
> hopelessly confused.
> 
> cheers
> Worik
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to install tclRequire(Iwidgets)

2012-03-25 Thread peter dalgaard

On Mar 25, 2012, at 08:11 , mrzung wrote:

> hi, i'm trying to make "Tabbed Notebook Widget",but cannot install
> "tclRequire(Iwidgets)".
> 
> I installed Activetcl but I don't know what can be the next step to execute
> "tclRequire(Iwidgets)".
> 
> how can i do that?

By adding Iwidgets to your Tcl/Tk installation. How to do that is not an R 
issue. (It depends on your platform, so I really don't know, but you might try 
looking at http://incrtcl.sourceforge.net )

> 
> Thanks
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/How-to-install-tclRequire-Iwidgets-tp4502775p4502775.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Weird POSIXct behaviour

2012-03-25 Thread Worik R
Friends

I have an xts that I wish to access.

Browse[2]> DATA.ba[[p]]["2012-03-20 00:59:57","bid"]
   bid
2012-03-20 00:59:57 1.4993

So far so good.

Now putting the index into a variable:

Browse[2]> Time
[1] "2012-03-20 00:59:57 NZDT"
Browse[2]> DATA.ba[[p]][Time, "bid"]
 bid

Where has it gone?


Looking closer


Browse[2]> index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"])
[1] "2012-03-20 00:59:57 NZDT"
Browse[2]>

Browse[2]> Time
[1] "2012-03-20 00:59:57 NZDT"
Browse[2]>

Browse[2]> index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"]) == Time
[1] FALSE
Browse[2]>

Browse[2]> class(Time)
[1] "POSIXct" "POSIXt"

Browse[2]> class(index(DATA.ba[[p]]["2012-03-20 00:59:57","bid"]))
[1] "POSIXct" "POSIXt"

So the variable 'Time' should be good to index DATA.ba[[p]].  I am
hopelessly confused.

cheers
Worik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to install tclRequire(Iwidgets)

2012-03-25 Thread mrzung
hi, i'm trying to make "Tabbed Notebook Widget",but cannot install
"tclRequire(Iwidgets)".

I installed Activetcl but I don't know what can be the next step to execute
"tclRequire(Iwidgets)".

how can i do that?

Thanks

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-install-tclRequire-Iwidgets-tp4502775p4502775.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handling 8GB .txt file in R?

2012-03-25 Thread iliketurtles
Thanks to all the suggestions. To the first individual that replied, I can't
do any stuff with unix or perl. All I know is R.

@KEN:
I'm using Windows 7, 64 bit.

@Steve:
Here's the readLines output.. As we can see, lines 1-3 are empty and line 5
is empty, and there's also empty elements after line 5!. 

 [1] " "


   
  [2] " 


  
"
  [3] " "   



  [4] "  PERMNO  DATETICKERPERMCO PRC  
VOLNUMTRDvwretdewretd"  


  [5] ""



  [6] "   106/01/19867952  .
 
. . -0.000138  0.001926"

  
  [7] "   107/01/1986OMFGA   7952-2.56250 
1000 .  0.013809  0.011061" 

 
  [8] "   108/01/1986OMFGA   7952-2.5
12800 . -0.020744 -0.005117"

  
  [9] "   109/01/1986OMFGA   7952-2.5 
1400 . -0.011219 -0.011588" 

 
 [10] "   110/01/1986OMFGA   7952-2.5 
8500 .  0.83  0.003651" 

 
 [11] "   113/01/1986OMFGA   7952-2.62500 
5450 .  0.002749  0.002433" 
   

-


Isaac
Research Assistant
Quantitative Finance Faculty, UTS
--
View this message in context: 
http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4502706.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multivariate function from univariate functions

2012-03-25 Thread physicistintheory
I'm relatively new to R and I'm stuck.

I'm trying to construct a surface to optimize from a multivariate dataset. 
The dataset contains the response of a system to various stimuli.  I am
trying to optimize the mix of stimuli to maximize the response.  To do so,
I've interpolated the various datasets of type response vs. stimuli and I
now have an array of functions "interps" whose length is the length of the
array of the names of the various stimuli.  I've also created a vector
containing names for the stimuli, vars = x.1, x.2, x.3...

Anyway, each of the functions in interps depends only on one variable
(obviously).  I would like to construct a function, call it, "surface" which
is essentially: surface(vars) = interps[[1]]vars[[1]] +
interps[[2]]vars[[2]]+...

I've tried constructing surface recursively in a for loop:  
surface <- function(vars){0}
for(i in 1:length(vars)){
surface <- function(vars){surface(vars) + interps[[i]](vars[[i]])}
}

This results in an infinite recursion error...which I think I understand. 
Essentially, I'm trying to do the analog of something like i = i+1, but
instead of adding integers, I want to add terms to a function.

Any help is appreciated.

--
View this message in context: 
http://r.789695.n4.nabble.com/Multivariate-function-from-univariate-functions-tp4502670p4502670.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to compute within-group mean and sd?

2012-03-25 Thread reeyarn
Thanks Rui, Greg, and Michael.
Your answers helped a lot!

Best,
Reeyarn

On Sun, Mar 25, 2012 at 1:02 AM, Rui Barradas  wrote:

> Hello,
>
>
> You can use sql with package 'sqldf'.
>  library(sqldf) # It needs package 'tcltk'
> ?sqldf
>  tbl <- data.frame(firm_id=rep(c(1,2), 10), value=rnorm(20))
>  sqldf("SELECT firm_id, count(*), avg(value), stdev(value)
>  FROM tbl
>  GROUP BY firm_id;")
>
> I've changed the name of your data.frame because 'table' is an R function.
>  Hope this helps,
>
> Rui Barradas
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cubature

2012-03-25 Thread Jeff Newmiller
Who knows? You mention different machines, yet fail to hint what they are much 
less provide sessionInfo() output, and x is not defined in your global 
environment in your snippet. Please read the posting guide.

I am not familiar with this library, but I would be surprised if it were 
capable of handling variable limits. I didn't see any hint that such would be 
supported in the help page. Are you sure the same code is being used on 
different machines?
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

JMDS  wrote:

>Hi,
>
>I am using adaptIntegrate from Cubature to do numerical integration on
>a
>double integral with a 1 x 2 vector x.
>
>Say the function is something simple to start like f(x)=x1*x2 and I
>wish to
>integrate x1 over (0,365-x2) and x2 over (0,365)
>
>f <- function(x) {(x[2])*(x[1])} # "x" is vector
>int1<-adaptIntegrate(f, lowerLimit = c(0, 0), upperLimit = c(365-x[2],
>365))
>
>
>I recieve the following error:
>
>Error in adaptIntegrate(f, lowerLimit = c(0, 0), upperLimit = c(365 - 
>: 
>  object 'x' not found
>
>The problem is that this code works fine on other machines and I wonder
>If I
>am missing some referenced package somewhere.
>
>Any help would be greatly appreciated.
>
>Thanks!
>
>
>
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/cubature-tp4502516p4502516.html
>Sent from the R help mailing list archive at Nabble.com.
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.