Re: [R] [R-pkgs] New package "list" for analyzing list surveyexperiments

2010-07-14 Thread Allan Engelhardt

On 13/07/10 19:16, Erik Iverson wrote:

Raubertas, Richard wrote:
I agree that 'list' is a terrible package name, but only secondarily 
because it is a data type.  The primary problem is that it is so generic


as to be almost totally uninformative about what the package does.
For some reason package writers seem to prefer maximally 
uninformative names for their packages.  To take some examples of 
recently announced packages, can anyone guess what packages 'FDTH', 
'rtv', or 'lavaan' do?  Why the aversion to informative names along 
the lines of
'Freq_dist_and_histogram', 'RandomTimeVariables', and 
'Latent_Variable_Analysis', respectively? 


I'm sure it's part tradition...

ls
cat
rm
cp
mv
su


No need to leave R, which has some 9 one-letter symbols, 35 two-letter 
symbols, and 121 three letter symbols in the base and core packages.  It 
is just unreal.  (The equivalent numbers from my Linux system are 3, 56, 
and 148, and one of those one-letter commands is R!!)


Quiz yourself on these:

"c" "C" "D" "F" "I" "q" "s" "t" "T"

"ar" "as" "bs" "by" "cm" "de" "df" "dt" "el" "gc" "gl" "if" "Im" "is" 
"lh" "lm" "ls" "lu" "ns" "pf" "pi" "pt" "qf" "qq" "qr" "qt" "Re" "rf" 
"rm" "rt" "sd" "te" "ts" "VA" "vi"


"abs" "acf" "ACF" "AIC" "all" "aml" "any" "aov" "Arg" "ave" "bam" "bcv" 
"bdf" "BIC" "bmp" "BOD" "box" "bxp" "cat" "cav" "ccf" "cch" "cd4" "cgd" 
"co2" "CO2" "col" "cor" "cos" "cov" "cut" "DDT" "det"
"dim" "Dim" "dir" "end" "exp" "fft" "fgl" "fir" "fix" "for" "gam" "get" 
"glm" "gls" "Gun" "hat" "hcl" "hsv" "IGF" "IQR" "Kfn" "knn" "lag" "lcm" 
"lda" "lme" "log" "lqs" "mad" "Map" "max" "mca" "min" "mle" "Mod" "new" 
"nlm" "nls" "npk" "nsl" "OME" "one" "Ops" "pam" "par" "pbc" "PBG" "pdf" 
"pie" "png" "ppr" "qda" "raw" "rep" "rev" "rfs" "rgb" "rig" "rle" "rlm" 
"row" "rug" "seq" "sin" "SOM" "SSD" "SSI" "stl" "str" "sub" "sum" "svd" 
"svg" "tan" "tar" "tau" "tcl" "tmd" "try" "tsp" "two" "ucv" "unz" "url" 
"var" "x11" "X11" "xor"



Generated from R --vanilla with:

for (p in c("base", "boot", "class", "cluster", "codetools", "datasets", 
"foreign", "graphics", "grDevices", "grid", "KernSmooth", "lattice", 
"MASS", "Matrix", "methods", "mgcv", "nlme", "nnet", "rpart", "spatial", 
"splines", "stats", "stats4", "survival", "tcltk", "tools", "utils")) 
library(p, character.only=TRUE)

rm(p)
one <- unique(grep("^[[:alnum:]]+$", apropos("^.$"), value=TRUE))
two <- unique(grep("^[[:alnum:]]+$", apropos("^..$"), value=TRUE))
three <- unique(grep("^[[:alnum:]]+$", apropos("^...$"), value=TRUE))

and from the bash shell with

ls -1 {/usr,}/bin/? 2>/dev/null
ls -1 {/usr,}/bin/?? | perl -ne 'print substr $_,-3' | sort -u | wc -l
ls -1 {/usr,}/bin/??? | perl -ne 'print substr $_,-4' | sort -u | wc -l

Allan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] send out put to file in R

2010-07-14 Thread Gábor Csárdi
On Thu, Jul 15, 2010 at 7:24 AM, chakri_amateur  wrote:
[...]
> wri write.graph ("F://new", "pajek") <-      decompose.graph(g,  mode="weak", 
> max.comps=NA,
> min.vertices= 20)
>
>
> But even that doesn't work

No wonder, writing computer programs is not just typing in random
words and let the computer figure out what you are trying to do. At
least not yet.

> Is there a way in which I could convert my output to a graph ?

But you already have that in "test.net", no?

It would help if you could tell us what you are trying to do. If you
decompose a graph to components, you get a list of graphs; do you want
to save each component in a separate file? Then do this (untested):

for (i in seq_along(compo)) {
  write.graph(compo[[i]], file=paste(sep="", "new-", i, ".net"), format="pajek")
}

If you want to do something else, then please tell us.

Gabor

> Regards
> Chakri
>
> --- On Wed, 14/7/10, Peter Ehlers [via R] 
>  wrote:
>
> From: Peter Ehlers [via R] 
> Subject: Re: send out put to file in R
> To: "chakri_amateur" 
> Date: Wednesday, 14 July, 2010, 6:23 PM
>
>
>
>
> On 2010-07-14 4:04, chakri_amateur wrote:
>
>>
>
>> Hi
>
>>
>
>> I am using igraph package in R.
>
>> My goal is to read  a network (in "pajek" format) and decompose the network
>
>> into components.
>
>> In addition, I am also interested in sending this output to to a file.
>
>>
>
>> I am having problem in while writing to a file!
>
>>
>
>> my code looks like this
>
>> g<- read.graph ("F://test.net", "pajek")
>
>> compo<- decompose.graph(g,  mode="weak", max.comps=NA, min.vertices= 20)
>
>> write.graph (compo, "F://new", "pajek")
>
>>
>
>> The error message shown up was -- "Not a Graph Object"
>
>>
>
>> Could any one explain what is the problem here ?
>
> Sure: compo is not an igraph object; it's a list as
>
> class(compo) or reading the help page for decompose.graph
>
> would tell you.
>
>
>    -Peter Ehlers
>
>
>>
>
>> Thanks
>
>> chakri
>
>
> __
>
> [hidden email] mailing list
>
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>
>
>
> View message @ 
> http://r.789695.n4.nabble.com/send-out-put-to-file-in-R-tp2288515p2288693.html
>
>
> To unsubscribe from send out put to file in R, click here.
>
>
>
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/send-out-put-to-file-in-R-tp2288515p2289690.html
> Sent from the R help mailing list archive at Nabble.com.
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Gabor Csardi      UNIL DGM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Arrange values on a timeline

2010-07-14 Thread Remko Duursma
Try this:

a <- data.frame(timestamp=c(3,5,8), mylabel=c("abc","def","ghi"))
b <- data.frame(timestamp=c(1:10))

txt <- as.character(a$mylabel)

nrepeat <- diff(c(a$timestamp,nrow(b)))

b$mylabel <-  c( rep(NA, a$timestamp[1]),  rep(txt,
diff(c(a$timestamp,nrow(b )



greetings,
Remko



-
Remko Duursma
Research Lecturer

Centre for Plants and the Environment
University of Western Sydney
Hawkesbury Campus
Richmond NSW 2753

Dept of Biological Science
Macquarie University
North Ryde NSW 2109
Australia

Mobile: +61 (0)422 096908
www.remkoduursma.com



On Wed, Jul 14, 2010 at 3:04 PM, Ralf B  wrote:
> I have a set of labels arranged along a timeframe in a. Each label has
> a timestamp and marks a state until the next label. The dataframe a
> contains 5 such timestamps and 5 associated labels. This means, on a
> continious scale between 1-100, there are 5 markers. E.g. 'abc' marks
> the timestampls between 10 and 19, 'def' marks the timestamps between
> 20 and 32, and so on.
>
> a <- data.frame(timestamp=c(3,5,8), mylabel=c("abc","def","ghi"))
> b <- data.frame(timestamp=c(1:10))
>
> I would like to assign these labels as an extra collumn 'label' to the
> data.frame b which currently only consists of a the timestamp. The
> output would then look like this:
>
>       timestamp      label
> 1     1                    NA
> 2     2                    NA
> 3     3                    "abc"
> 4     4                    "abc"
> 5     5                    "def"
> 6     6                    "def"
> 7     7                    "def"
> 8     8                    "ghi"
> 9     9                    "ghi"
> 10  10                    "ghi"
>
> What is the simplest way to assign these labels based on timestamps to
> get this output. The real dataset is several millions of rows...
>
> Thanks,
> Ralf
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] I can't figure out my plm model. Any ideas?

2010-07-14 Thread Salaam Batur
Dear R users,

I am using plm packege in R to build my model, but from the result I can't
quite figure out what it is... Can anyone tell me why? Am I missing
something?

R Results:

*> ar1<-plm(formula=ADOP~lag(ADOP,1)+PE+WOR,
+ data=well, effect="time",model="within")
> summary(ar1)*

Oneway (time) effect Within Model

Call:
plm(formula = ADOP ~ lag(ADOP, 1) + PE + WOR, data = well, effect = "time",
model = "within")

Unbalanced Panel: n=135, T=1-119, N=10972

Residuals :
Min.  1st Qu.   Median  3rd Qu. Max.
-25.9000  -0.8950  -0.0627   0.7210  25.4000

Coefficients :
Estimate  Std. Error t-value  Pr(>|t|)
lag(ADOP, 1)  0.59081533  0.00598959  98.640 < 2.2e-16 ***
PE0.04263590  0.00087449  48.755 < 2.2e-16 ***
WOR  -0.03717528  0.00072192 -51.495 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares:148890
Residual Sum of Squares: 42814
F-statistic: 8960.98 on 3 and 10850 DF, p-value: < 2.22e-16

*> summary(fixef(ar1, effect="time"))
*   Estimate Std. Error t-value  Pr(>|t|)
22  2.666770.22335 11.9397 < 2.2e-16 ***
23  2.423030.22340 10.8464 < 2.2e-16 ***
24  2.499540.21370 11.6964 < 2.2e-16 ***
25  2.616190.21305 12.2799 < 2.2e-16 ***
26  3.219670.21451 15.0094 < 2.2e-16 ***
27  1.724000.21028  8.1986 2.220e-16 ***
28  2.581080.21009 12.2854 < 2.2e-16 ***
29  2.594610.21566 12.0309 < 2.2e-16 ***
200010  2.693610.21605 12.4676 < 2.2e-16 ***
200011  2.350140.22084 10.6419 < 2.2e-16 ***
200012  2.331550.22047 10.5751 < 2.2e-16 ***
200101  2.929300.21892 13.3808 < 2.2e-16 ***
200102  2.581670.22187 11.6361 < 2.2e-16 ***
200103  3.132880.21851 14.3377 < 2.2e-16 ***
200104  2.326520.21682 10.7303 < 2.2e-16 ***
200105  2.932560.21576 13.5918 < 2.2e-16 ***
200106  2.491280.21177 11.7640 < 2.2e-16 ***
200107  2.335280.21472 10.8759 < 2.2e-16 ***
200108  2.383400.21325 11.1767 < 2.2e-16 ***
200109  2.580500.21202 12.1709 < 2.2e-16 ***
..so
on
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>



ADOP(22)=0.59081533*ADOP(21)+0.04263590*PE(22)-0.03717528*WOR(22)+2.66677
(Feb2000)
and so on...

*It should look like this, right? When I put the data to this model, the
right hand side and left one is not equal. What did I miss? *

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] send out put to file in R

2010-07-14 Thread chakri_amateur

Dear Peter Ehlers,

Thanks. 

In decompose help page of 'R' It's clearly mentioned that the output of 
decompose command is a separate graph for each component . 

It is true that, Variable in which  I stored my output is not a graph object. 
I tried the following command also - To directly save the output as a file. 

wri write.graph ("F://new", "pajek") <-      decompose.graph(g, 
 mode="weak", max.comps=NA, 
min.vertices= 20)



But even that doesn't work 


Is there a way in which I could convert my output to a graph ?

Regards
Chakri 

--- On Wed, 14/7/10, Peter Ehlers [via R] 
 wrote:

From: Peter Ehlers [via R] 
Subject: Re: send out put to file in R
To: "chakri_amateur" 
Date: Wednesday, 14 July, 2010, 6:23 PM




On 2010-07-14 4:04, chakri_amateur wrote:

>

> Hi

>

> I am using igraph package in R.

> My goal is to read  a network (in "pajek" format) and decompose the network

> into components.

> In addition, I am also interested in sending this output to to a file.

>

> I am having problem in while writing to a file!

>

> my code looks like this

> g<- read.graph ("F://test.net", "pajek")

> compo<- decompose.graph(g,  mode="weak", max.comps=NA, min.vertices= 20)

> write.graph (compo, "F://new", "pajek")

>

> The error message shown up was -- "Not a Graph Object"

>

> Could any one explain what is the problem here ?

Sure: compo is not an igraph object; it's a list as

class(compo) or reading the help page for decompose.graph

would tell you.


   -Peter Ehlers


>

> Thanks

> chakri


__

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







View message @ 
http://r.789695.n4.nabble.com/send-out-put-to-file-in-R-tp2288515p2288693.html


To unsubscribe from send out put to file in R, click here.






-- 
View this message in context: 
http://r.789695.n4.nabble.com/send-out-put-to-file-in-R-tp2288515p2289690.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Histogram with two groups on the same graph (not on separate panels)

2010-07-14 Thread Kiyoshi Sasaki
I have been trying to produce a histogram that has two groups (male and female 
snakes) on the same graph (either superimposed or each frequency bar appears 
side by side). I found a couple of functions for superimposed histogram written 
by other people.
The below is the codes I used for my data containing a column of svl (body 
size; 
snout-vent length) and another column of sex (male or female). My data 
structure 
is shown at the bottom of my email message.
My question:  When I ran the codes below (modified from: 
http://onertipaday.blogspot.com/2007/04/how-to-superimpose-histograms.html), I 
got an error message, “Error in hist.default(X[[1L]], ...) : 'x' must be 
numeric”. Could anyone help me figure out the problem, please? Do you know 
alternative codes or a function that produce a histogram containing two groups, 
with each frequency bar appearing side by side, as shown in 
http://home.medewerker.uva.nl/g.dutilh/bestanden/multiple%20group%20histogram.png
 ?
 
gb <- read.csv(file = "D:\\data 
12.24.06\\AllMamushiCorrected5.8.10_7.12.10.csv", header=TRUE, 
strip.white=TRUE, 
na.strings="")
attach(gb)
 
superhist2pdf <- function(x, filename = "super_histograms.pdf",
dev = "pdf", title = "Superimposed Histograms", nbreaks ="Sturges") {
junk = NULL
grouping = NULL
for(i in 1:length(x)) {
junk = c(junk,x[[i]])
grouping <- c(grouping, rep(i,length(x[[i]]))) }
grouping <- factor(grouping)
n.gr <- length(table(grouping))
xr <- range(junk)
histL <- tapply(junk, grouping, hist, breaks=nbreaks, plot = FALSE)
maxC <- max(sapply(lapply(histL, "[[", "counts"), max))
if(dev == "pdf") { pdf(filename, version = "1.4") } else{}
if((TC <- transparent.cols <- .Device %in% c("pdf", "png"))) {
cols <- hcl(h = seq(30, by=360 / n.gr, length = n.gr), l = 65, alpha = 0.5) }
else {
h.den <- c(10, 15, 20)
h.ang <- c(45, 15, -30) }
if(TC) {
plot(histL[[1]], xlim = xr, ylim= c(0, maxC), col = cols[1], xlab = "x", main = 
title) }
else { plot(histL[[1]], xlim = xr, ylim= c(0, maxC), density = h.den[1], angle 
= 
h.ang[1], xlab = "x") }
if(!transparent.cols) {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, density = h.den[j], angle = 
h.ang[j]) } else {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, col = cols[j]) }
invisible()
if( dev == "pdf") {
dev.off() }
}

female <- subset(gb, sex=="f", select=svl)
male <- subset(gb, sex=="m", select=svl)
l1 = list(female, male)
superhist2pdf(l1, nbreaks="Sturges")
 
Error in hist.default(X[[1L]], ...) : 'x' must be numeric
 

FYI: The object ‘female’ and ‘male’ looks like:
> female
     svl
1   51.5
2   52.5
3   52.5
4   58.5

277   NA
278   NA
279 55.4
280 57.5 
 
> male
     svl
5   41.8
14  49.5
17  49.0
20  53.0

231 47.6
235   NA
238 50.3
241 50.5
243 62.8
244 59.0
 
FYI:  The structure of my dataset, gb looks like:
> str(gb)
'data.frame':   308 obs. of 43 variables:
 $ id           : Factor w/ 290 levels "(023.541.040) G7",..: 241 
244 243 245 
278 193 194 195 196 197 ...
 $ studysite    : Factor w/ 29 levels "Assabu","Astushinai",..: 20 19 19 19 
19 
29 29 29 29 29 ...
 $ studysitecode: int  NA NA NA NA NA 7 7 7 7 7 ...
 $ subsite      : int  NA NA NA NA NA 18 18 18 18 18 ...
 $ sitecond     : int  NA NA NA NA NA 1 1 1 1 1 ...
 $ Habitat      : Factor w/ 6 levels "Beach","On road",..: 5 5 5 5 5 5 5 
5 5 5 
...
 $ Baskingspots : int  1 1 1 1 1 1 1 1 1 1 ...
 $ sex          : Factor w/ 2 levels "f","m": 1 1 1 1 2 1 1 1 1 1 ...
 $ sexcode      : int  NA NA NA 0 1 0 0 0 0 0 ...
 $ svl          : num  51.5 52.5 52.5 58.5 41.8 57.6 59 55.6 62 58.5 
...
 $ tl           : num  8 8 8 10 8.2 9.2 9 8.5 9.5 8.8 ...
 $ bm           : num  142 128 148.3 192.3 70.5 ...
 $ defensiveness: int  NA NA NA NA NA 1 1 1 1 2 ...
 $ ftr          : int  NA NA NA NA NA 5 10 60 18 30 ...
 $ latency      : num  NA NA NA NA NA NA NA NA NA NA ...
 $ dvc          : int  NA NA NA NA NA 25 25 38 28 28 ...
 $ RepCnd       : Factor w/ 4 levels "Male","Nonpregnant",..: 4 4 4 2 1 
2 2 2 2 
2 ...
 $ repcnd       : int  1 1 1 0 2 0 0 0 0 0 ...
 $ repstatus    : int  1 1 1 1 NA 0 0 0 0 0 ...
 $ LS.w_egg     : int  NA NA NA NA NA NA NA NA NA NA ...
 $ estimatedLS  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ lit.size     : int  NA NA NA NA NA NA NA NA NA NA ...
 $ lit.mass     : num  NA NA NA NA NA NA NA NA NA NA ...
 $ post.BM      : num  NA NA NA NA NA NA NA NA NA NA ...
 $ rcm          : num  NA NA NA NA NA NA NA NA NA NA ...
 $ mean.nSVL    : num  NA NA NA NA NA NA NA NA NA NA ...
 $ mean.nBM     : num  NA NA NA NA NA NA NA NA NA NA ...
 $ ta           : num  23 20 20 20 21 20 20 20 20 20 ...
 $ tm           : num  24 29 29 30 20 20.8 20.8 20.8 20.8 20.8 ...
 $ tb           : num  23 29 30 NA 23 30 29.6 30.2 29 27 ...
 $ partdate     : int  NA NA

[R] RMySQL Load Error: package/namespace load failed for 'RMySQL'

2010-07-14 Thread neatgadgets

Hi,

I am brand new to the world of R, so please bare with me while I goof my way
through a question.

I am attempting to trial using R with MySQL.

The MySQL server is on a Linux box and I am using the Windows (32bit XP)
version of R. 

I have installed RMySQL successfully, however when I load it I get the
error:

Error : .onLoad failed in loadNamespace() for 'RMySQL', details:
  call: fun(...)
  error: A MySQL Registry key was found but the folder C:\Program
Files\MySQL\MySQL Tools for 5.0\/. doesn't contain a bin or lib/opt folder.
That's where we need to find libmySQL.dll. 
Error: package/namespace load failed for 'RMySQL'

Question 1: Does this mean MySQL has to be on the same machine?
Question 2: I just happen to have MySQL installed on this machine, and I
even have the file libmySQL.dll in a folder C:\Program Files\MySQL\MySQL
Tools for 5.0/ so I am unsure why this is an error
Question 3: Should I give up on windows before I get any further? 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/RMySQL-Load-Error-package-namespace-load-failed-for-RMySQL-tp2289650p2289650.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RMySQL Load Error: package/namespace load failed for 'RMySQL'

2010-07-14 Thread asbro

Ok that was quick, I found the problem. I put a bin folder in and copied the
dll from above that folder and put it in.

HOWEVER:

Now I get this error:

Error : .onLoad failed in loadNamespace() for 'RMySQL', details:
  call: inDL(x, as.logical(local), as.logical(now), ...)
  error: unable to load shared library
'C:/PROGRA~1/R/R-211~1.1/library/RMySQL/libs/RMySQL.dll':
  LoadLibrary failure:  Invalid access to memory location.


Error: package/namespace load failed for 'RMySQL'

-- 
View this message in context: 
http://r.789695.n4.nabble.com/RMySQL-Load-Error-package-namespace-load-failed-for-RMySQL-tp2289650p2289658.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extreme Value Regression model

2010-07-14 Thread Shubha Vishwanath Karanth
Hi R,

 

Just like a Poisson regression model, is there a package in R to do
Extreme Value Regression model? Thanks.

 

Thanks and Regards,

Shubha Karanth

 

 

This e-mail may contain confidential and/or privileged i...{{dropped:13}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] taking daily means from hourly data

2010-07-14 Thread Meissner, Tony (DFW)
Thanks Alan and Gabor.  Alan’s code appears to be the simplest to run and 
Gabor provided some further insight.


Tschüß
Tony Meissner
Principal Scientist (Monitoring)
Resources Monitoring Group
Science, Monitoring and Information Division
Department for Water
"Imagine" ©
•(ph) (08) 8595 2209
•(mob) 0401 124 971
Ê(fax) (08) 8595 2232
• 28 Vaughan Terrace, Berri SA 5343
PO Box 240, Berri SA 5343
DX 51103
***The information in this e-mail may be confidential and/or legally 
privileged.  Use or disclosure of the information by anyone other than the 
intended recipient is prohibited and may be unlawful.  If you have received 
this e-mail in error, please advise by return e-mail or by telephoning +61 8 
8595 2209





From: Allan Engelhardt [mailto:all...@cybaea.com]
Sent: Thursday, 15 July 2010 3:09 PM
To: Meissner, Tony (DFW)
Cc: r-help@r-project.org
Subject: Re: [R] taking daily means from hourly data

This is one way:

df <- data.frame(Time=as.POSIXct("2009-01-01", format="%Y-%m-%d") + seq(0, 
60*60*24*365-1, 60*60),

 lev.morgan=3+runif(24*365),

 lev.lock2=3+runif(24*365),

 flow=1000+rnorm(24*365, 200),

 direction=runif(24*365, 0, 360),

 velocity=runif(24*365, 0, 10))

(df2 <- aggregate(df[ , -1], list(date=as.Date(df$Time)), FUN=mean, na.rm=TRUE))


Hope this helps

Allan


On 15/07/10 05:52, Meissner, Tony (DFW) wrote:

I have a data frame (morgan) of hourly river flow, river levels and wind 
direction and speed thus:

 Time   hour lev.morgan lev.lock2 lev.lock1 flow   direction  
velocity

1  2009-07-06 15:00:00   15  3.266 3.274 3.240 1710.6   180.282
4.352

2  2009-07-06 16:00:00   16  3.268 3.272 3.240 1441.8   192.338
5.496

3  2009-07-06 17:00:00   17  3.268 3.271 3.240 1300.1   202.294
2.695

4  2009-07-06 18:00:00   18  3.267 3.274 3.241 1099.1   237.161
2.035

5  2009-07-06 19:00:00   19  3.265 3.277 3.243  986.6   237.576
0.896

6  2009-07-06 20:00:00   20  3.266 3.281 3.242 1237.6   205.686
1.257

7  2009-07-06 21:00:00   21  3.267 3.280 3.242 1513.326.080
0.664

8  2009-07-06 22:00:00   22  3.267 3.281 3.242 1819.5   264.280
0.646

9  2009-07-06 23:00:00   23  3.267 3.281 3.242 1954.4   337.137
0.952

10 2009-07-07 00:00:000  3.267 3.281 3.242 1518.9   260.006
0.562

11 2009-07-07 01:00:001  3.267 3.281 3.242 1082.6   252.172
0.673

12 2009-07-07 02:00:002  3.267 3.280 3.243 1215.9   190.007
1.286

13 2009-07-07 03:00:003  3.267 3.279 3.244 1093.5   260.415
1.206

: :   :   :   :  : ::   
  :

: :   :   :   :  : ::   
  :



Time is of class POSIXct

I wish to take daily means of the flow, levels, and wind parameters and put 
them into a new dataframe.  I envisage doing this with the following example 
code:



morgan$fTime <- factor(substr(as.character(morgan$Time),1,10))

dflow <- tapply(morgan[,"flow"], morgan$fTime, mean)

day <- tapply(morgan[,"Time"], morgan$fTime, mean)

  :

  :



daily <- as.data.frame(cbind(day,dflow, dlev.morg,dlev.lock2, ...))

daily$day <- with(daily, as.POSIXct("1970-01-01", "%Y-%m-%d", 
tz="Australia/Adelaide") + day)

rownames(daily) <- NULL



Is there a more efficient way of doing this?  I am running R-2.11.0 under 
Windows XP



Tschüß

Tony Meissner

Principal Scientist (Monitoring)

Resources Monitoring Group

Science, Monitoring and Information Division

Department for Water

"Imagine" ©

*(ph) (08) 8595 2209

*(mob) 0401 124 971

*(fax) (08) 8595 2232

* 28 Vaughan Terrace, Berri SA 5343

PO Box 240, Berri SA 5343

DX 51103

***The information in this e-mail may be confidential and/or legally 
privileged.  Use or disclosure of the information by anyone other than the 
intended recipient is prohibited and may be unlawful.  If you have received 
this e-mail in error, please advise by return e-mail or by telephoning +61 8 
8595 2209









[[alternative HTML version deleted]]









__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] taking daily means from hourly data

2010-07-14 Thread Allan Engelhardt
This is one way:

df<- data.frame(Time=as.POSIXct("2009-01-01", format="%Y-%m-%d") + seq(0, 
60*60*24*365-1, 60*60),
  lev.morgan=3+runif(24*365),
  lev.lock2=3+runif(24*365),
  flow=1000+rnorm(24*365, 200),
  direction=runif(24*365, 0, 360),
  velocity=runif(24*365, 0, 10))
(df2<- aggregate(df[ , -1], list(date=as.Date(df$Time)), FUN=mean, na.rm=TRUE))



Hope this helps

Allan


On 15/07/10 05:52, Meissner, Tony (DFW) wrote:
> I have a data frame (morgan) of hourly river flow, river levels and wind 
> direction and speed thus:
>   Time   hour lev.morgan lev.lock2 lev.lock1 flow   direction 
>  velocity
> 1  2009-07-06 15:00:00   15  3.266 3.274 3.240 1710.6   180.282   
>  4.352
> 2  2009-07-06 16:00:00   16  3.268 3.272 3.240 1441.8   192.338   
>  5.496
> 3  2009-07-06 17:00:00   17  3.268 3.271 3.240 1300.1   202.294   
>  2.695
> 4  2009-07-06 18:00:00   18  3.267 3.274 3.241 1099.1   237.161   
>  2.035
> 5  2009-07-06 19:00:00   19  3.265 3.277 3.243  986.6   237.576   
>  0.896
> 6  2009-07-06 20:00:00   20  3.266 3.281 3.242 1237.6   205.686   
>  1.257
> 7  2009-07-06 21:00:00   21  3.267 3.280 3.242 1513.326.080   
>  0.664
> 8  2009-07-06 22:00:00   22  3.267 3.281 3.242 1819.5   264.280   
>  0.646
> 9  2009-07-06 23:00:00   23  3.267 3.281 3.242 1954.4   337.137   
>  0.952
> 10 2009-07-07 00:00:000  3.267 3.281 3.242 1518.9   260.006   
>  0.562
> 11 2009-07-07 01:00:001  3.267 3.281 3.242 1082.6   252.172   
>  0.673
> 12 2009-07-07 02:00:002  3.267 3.280 3.243 1215.9   190.007   
>  1.286
> 13 2009-07-07 03:00:003  3.267 3.279 3.244 1093.5   260.415   
>  1.206
> : :   :   :   :  : :: 
> :
> : :   :   :   :  : :: 
> :
>
> Time is of class POSIXct
> I wish to take daily means of the flow, levels, and wind parameters and put 
> them into a new dataframe.  I envisage doing this with the following example 
> code:
>
> morgan$fTime<- factor(substr(as.character(morgan$Time),1,10))
> dflow<- tapply(morgan[,"flow"], morgan$fTime, mean)
> day<- tapply(morgan[,"Time"], morgan$fTime, mean)
>:
>:
>
> daily<- as.data.frame(cbind(day,dflow, dlev.morg,dlev.lock2, ...))
> daily$day<- with(daily, as.POSIXct("1970-01-01", "%Y-%m-%d", 
> tz="Australia/Adelaide") + day)
> rownames(daily)<- NULL
>
> Is there a more efficient way of doing this?  I am running R-2.11.0 under 
> Windows XP
>
> Tschüß
> Tony Meissner
> Principal Scientist (Monitoring)
> Resources Monitoring Group
> Science, Monitoring and Information Division
> Department for Water
> "Imagine" ©
> *(ph) (08) 8595 2209
> *(mob) 0401 124 971
> *(fax) (08) 8595 2232
> * 28 Vaughan Terrace, Berri SA 5343
>  PO Box 240, Berri SA 5343
>  DX 51103
> ***The information in this e-mail may be confidential and/or legally 
> privileged.  Use or disclosure of the information by anyone other than the 
> intended recipient is prohibited and may be unlawful.  If you have received 
> this e-mail in error, please advise by return e-mail or by telephoning +61 8 
> 8595 2209
>
>
>
>
>   [[alternative HTML version deleted]]
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] taking daily means from hourly data

2010-07-14 Thread Gabor Grothendieck
On Thu, Jul 15, 2010 at 12:52 AM, Meissner, Tony (DFW)
 wrote:
> I have a data frame (morgan) of hourly river flow, river levels and wind 
> direction and speed thus:
>         Time           hour lev.morgan lev.lock2 lev.lock1 flow   direction  
> velocity
> 1  2009-07-06 15:00:00   15      3.266     3.274     3.240 1710.6   180.282   
>  4.352
> 2  2009-07-06 16:00:00   16      3.268     3.272     3.240 1441.8   192.338   
>  5.496
> 3  2009-07-06 17:00:00   17      3.268     3.271     3.240 1300.1   202.294   
>  2.695
> 4  2009-07-06 18:00:00   18      3.267     3.274     3.241 1099.1   237.161   
>  2.035
> 5  2009-07-06 19:00:00   19      3.265     3.277     3.243  986.6   237.576   
>  0.896
> 6  2009-07-06 20:00:00   20      3.266     3.281     3.242 1237.6   205.686   
>  1.257
> 7  2009-07-06 21:00:00   21      3.267     3.280     3.242 1513.3    26.080   
>  0.664
> 8  2009-07-06 22:00:00   22      3.267     3.281     3.242 1819.5   264.280   
>  0.646
> 9  2009-07-06 23:00:00   23      3.267     3.281     3.242 1954.4   337.137   
>  0.952
> 10 2009-07-07 00:00:00    0      3.267     3.281     3.242 1518.9   260.006   
>  0.562
> 11 2009-07-07 01:00:00    1      3.267     3.281     3.242 1082.6   252.172   
>  0.673
> 12 2009-07-07 02:00:00    2      3.267     3.280     3.243 1215.9   190.007   
>  1.286
> 13 2009-07-07 03:00:00    3      3.267     3.279     3.244 1093.5   260.415   
>  1.206
> :         :               :       :               :          :     :        : 
>         :
> :         :               :       :               :          :     :        : 
>         :
>

There are many possibilities.  Here are three.

#1 can be done with only the core of R.

#2 produces a zoo series which seems to be the logical representation
since it is, in fact, a series so its now already in the form for
other series operations.  See the 3 vignettes that come with zoo.

with #3 its easy to take different functions (avg, count, etc.) of
different columns and if you already know SQL its particularly
convenient.  See http://sqldf.googlecode.com

# DF2 is used in #1 and #3
DF2 <- data.frame(DF, Day = as.Date(format(DF$Time)))

# 1 - aggregate
aggregate(cbind(flow, direction, velocity) ~ Day, DF2, mean)

# 2 - zoo
library(zoo)
z <- read.zoo(DF, header = TRUE, tz = "GMT")
aggregate(z, as.Date, mean)

# 3 - sqldf
library(sqldf)
sqldf("select
   Day, avg(flow) Flow, avg(direction) Direction, avg(velocity) Velocity
   from DF2
   group by Day")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Longitudinal negative binomial regression - robust sandwich estimator standard errors

2010-07-14 Thread Matt Cooper
Hi All,

I have a dataset, longitudinal in nature, each row is a 'visit' to a clinic,
which has numerous data fields and a count variable for the number of
'events' that occurred since the previous visit.

 ~50k rows, ~2k unique subjects so ~25 rows/visits per subject, some have 50
some have 3 or 4.

In STATA there is an adjustment for the fact that you have multiple rows per
subject, in that you can produce robust standard errors using sandwich
estimators, to control for the fact there is a correlation structure within
each subjects set of visits. This function is reasonably straight forward,
however I'm trying to find something similar in R.

http://www.stata.com/help.cgi?nbreg
http://www.stata.com/help.cgi?vce_option

I'll admit I'm not all that familiar with the inner workings of these
functions but am learning about them.

glm.nb gives the same coefficients as nbreg in stata which is reassuring,
but I haven't yet found the same adjustment that vce is doing.

I've tried the cluster function, and the Zelig package with

zelig(my model, data = mydata, model= "negbin", by="id")

but I get the following error:

Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :
  contrasts can be applied only to factors with 2 or more levels

I'm not actually sure that is the correct command as when I tried it with a
3 level factor instead of id is just ran the model 3 times, once for each
level of the factor, not what I'm after or what I expected from it.

Any thoughts or direction on this appreciated.

Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] taking daily means from hourly data

2010-07-14 Thread Meissner, Tony (DFW)
I have a data frame (morgan) of hourly river flow, river levels and wind 
direction and speed thus:
 Time   hour lev.morgan lev.lock2 lev.lock1 flow   direction  
velocity
1  2009-07-06 15:00:00   15  3.266 3.274 3.240 1710.6   180.282
4.352
2  2009-07-06 16:00:00   16  3.268 3.272 3.240 1441.8   192.338
5.496
3  2009-07-06 17:00:00   17  3.268 3.271 3.240 1300.1   202.294
2.695
4  2009-07-06 18:00:00   18  3.267 3.274 3.241 1099.1   237.161
2.035
5  2009-07-06 19:00:00   19  3.265 3.277 3.243  986.6   237.576
0.896
6  2009-07-06 20:00:00   20  3.266 3.281 3.242 1237.6   205.686
1.257
7  2009-07-06 21:00:00   21  3.267 3.280 3.242 1513.326.080
0.664
8  2009-07-06 22:00:00   22  3.267 3.281 3.242 1819.5   264.280
0.646
9  2009-07-06 23:00:00   23  3.267 3.281 3.242 1954.4   337.137
0.952
10 2009-07-07 00:00:000  3.267 3.281 3.242 1518.9   260.006
0.562
11 2009-07-07 01:00:001  3.267 3.281 3.242 1082.6   252.172
0.673
12 2009-07-07 02:00:002  3.267 3.280 3.243 1215.9   190.007
1.286
13 2009-07-07 03:00:003  3.267 3.279 3.244 1093.5   260.415
1.206
: :   :   :   :  : ::   
  :
: :   :   :   :  : ::   
  :

Time is of class POSIXct
I wish to take daily means of the flow, levels, and wind parameters and put 
them into a new dataframe.  I envisage doing this with the following example 
code:

morgan$fTime <- factor(substr(as.character(morgan$Time),1,10))
dflow <- tapply(morgan[,"flow"], morgan$fTime, mean)
day <- tapply(morgan[,"Time"], morgan$fTime, mean)
  :
  :

daily <- as.data.frame(cbind(day,dflow, dlev.morg,dlev.lock2, ...))
daily$day <- with(daily, as.POSIXct("1970-01-01", "%Y-%m-%d", 
tz="Australia/Adelaide") + day)
rownames(daily) <- NULL

Is there a more efficient way of doing this?  I am running R-2.11.0 under 
Windows XP

Tschüß
Tony Meissner
Principal Scientist (Monitoring)
Resources Monitoring Group
Science, Monitoring and Information Division
Department for Water
"Imagine" ©
*(ph) (08) 8595 2209
*(mob) 0401 124 971
*(fax) (08) 8595 2232
* 28 Vaughan Terrace, Berri SA 5343
PO Box 240, Berri SA 5343
DX 51103
***The information in this e-mail may be confidential and/or legally 
privileged.  Use or disclosure of the information by anyone other than the 
intended recipient is prohibited and may be unlawful.  If you have received 
this e-mail in error, please advise by return e-mail or by telephoning +61 8 
8595 2209




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] acf significance levels

2010-07-14 Thread nuncio m
Dear useRs,
 How to save the correlations corresponding to the
significance levels from ACF function
Thanks
nuncio


-- 
Nuncio.M
Research Scientist
National Center for Antarctic and Ocean research
Head land Sada
Vasco da Gamma
Goa-403804

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R2wd and ESS: printing source?

2010-07-14 Thread Bill Harris
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I'm using R2wd and ESS.  ESS-mode doesn't let one fill wdBody() calls,
and printing out of Emacs (M-x print-buffer or M-x print-region) doesn't
wrap, so I miss most of the text on printed listings.  What do others
do to address that?

Thanks,

Bill
- -- 
Bill Harris  http://makingsense.facilitatedsystems.com/
Facilitated Systems  Everett, WA 98208 USA
http://www.facilitatedsystems.com/   phone: +1 425 374-1845
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkw+iasACgkQ3J3HaQTDvd9IHwCfdfowaeo52pgFpbpCS4QGH8uk
EmAAnAt7Jc/rGkbRrvtoyvww/2scGadJ
=8V1I
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Longitudinal negative binomial regression - robust sandwich estimator standard errors

2010-07-14 Thread nzcoops

Hi All,

I have a dataset, longitudinal in nature, each row is a 'visit' to a clinic,
which has numerous data fields and a count variable for the number of
'events' that occurred since the previous visit.

 ~50k rows, ~2k unique subjects so ~25 rows/visits per subject, some have 50
some have 3 or 4. 

In STATA there is an adjustment for the fact that you have multiple rows per
subject, in that you can produce robust standard errors using sandwich
estimators, to control for the fact there is a correlation structure within
each subjects set of visits. This function is reasonably straight forward,
however I'm trying to find something similar in R. 

http://www.stata.com/help.cgi?nbreg
http://www.stata.com/help.cgi?vce_option

I'll admit I'm not all that familiar with the inner workings of these
functions but am learning about them.

glm.nb gives the same coefficients as nbreg in stata which is reassuring,
but I haven't yet found the same adjustment that vce is doing.

I've tried the cluster function, and the Zelig package with 

zelig(my model, data = mydata, model= "negbin", by="id")

but I continually get the following error:

Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") : 
  contrasts can be applied only to factors with 2 or more levels

I'm not actually sure that is the correct command as when I tried it with a
3 level factor instead of id is just ran the model 3 times, once for each
level of the factor, not what I'm after or what I expected from it.

Any thoughts or direction on this appreciated.

Matt

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Longitudinal-negative-binomial-regression-robust-sandwich-estimator-standard-errors-tp2289656p2289656.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cannot Build R From Source - Windows XP

2010-07-14 Thread Steve Pederson

Thanks for the help. That sorted it out straight away.

Duncan Murdoch wrote:

On 14/07/2010 12:01 PM, Steve Pederson wrote:

Hi,

I can't seem to install R from source. I've downloaded the latest 
Rtools211.exe from http://www.murdoch-sutherland.com/Rtools/ & done a 
full installation of that and Inno Setup.


I have set R_HOME as C:\R (and also tried using C:\R\R-2.11.1)

After successfully running 'tar xf R-2.11.1.tar.gz' the modifications 
I have made and saved as MkRules.local are:

BUILD_HTML = YES & ISDIR=C:/Program Files/Inno Setup 5

I've then run 'make all recommended' from R_HOME\src\gnuwin32 and it 
runs nicely for ages, until I get the following message:



building package 'base'
cannot create /tmp/R612: directory nonexistent
mv: cannot stat '/tmp/R612': No such file or directory
  


R thinks your temporary directory is called /tmp, but there's no such 
directory on your system.  R looks for a temporary directory in the 
environment variables
TMPDIR, TMP, TEMP.  Set TMPDIR to the path to a writable directory and 
you should get past this error.


Duncan Murdoch

make[3]: ***[mkR] Error 1
make[2]: ***[all] Error 2
make[1]: ***[R] Error 1
make: ***[all] Error 2


Sometimes the number changes to /tmp/R5776, or something similar but I 
don't think that's the issue.



My current setting for PATH are:
c:\Rtools\bin;c:\Rtools\perl\bin;c:\Rtools\MinGW\bin;C:\Program 
Files\PC Connectivity Solution\;C:\Program Files\Common 
Files\ArcSoft\Bin;%GTK_BASEPATH%\bin;c:\program 
files\imagemagick-6.4.1-q16;C:\texmf\miktex\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;c:\dev-cpp\bin\;C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727;C:\Program 
Files\Common Files\DivX Shared\;C:\Program Files\QuickTime\QTSystem\



FWIW, It's a 4yo Dell laptop & I've come across a few quirks with 
installing software over the years. I had also previously installed 
2.11.1 from the windows executable, but this was uninstalled using the 
uninstall function that comes with it. I'm trying to rebuild to begin 
incorporating some calls to C I'm working on.


Thanks in advance,

Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.
  





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about food sampling analysis

2010-07-14 Thread kMan
Dear Sarah,

[snip...]
"I know that samples within each facility cannot be treated as independent,
so I need an approach that accounts for (1) clustering within facilities
and"

You could just use lm() & some planning. The data from within a specific
facility can be fit with a model to generate parameters that are compared
between facilities. Not to practical though - assuming the 57 production
facilities each have their own analytical lab, you'll have 57 different fits
to get your parameters from to use in your between test. Questions about
dependent data are fairly common, so it should be relatively straight
forward to get a solution and/or idea for a suitable package from the
archives.

"(2) the different number of samples taken at each facility."
It's a waste of time to worry about that. You'll be comparing aggregate
values between groups, and you'll have too few data-points within a group to
detect within effects... 

[snip...]

Sincerely,
KeithC.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R's Data Dredging Philosophy for Distribution Fitting

2010-07-14 Thread Ben Bolker
emorway  engr.colostate.edu> writes:

> 
> 
> Forum, 
> 
> I'm a grad student in Civil Eng, took some Stats classes that required
> students learn R, and I have since taken to R and use it for as much as I
> can.  Back in my lab/office, many of my fellow grad students still use
> proprietary software at the behest of advisers who are familiar with the
> recommended software (Statistica, @Risk (Excel Add-on), etc).  I have spent
> a lot of time learning R and am confident it can generally out-process,
> out-graph, or more simply stated, out-perform most of these other software
> packages.  However, one area my view has been humbled in is distribution
> fitting.
> 
> I started by reading through
> http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf  After that
> I started digging around on this forum and found posts like this one
> http://r.789695.n4.nabble.com/Fitting-usual-distributions-td80.html#a80
> that are close to what I'm after.  That is, given an observation dataset, I
> would like to call a function that cycles through numerous distributions
> (common or not) and then ranks them for me based on Chi-Square,
> Kolmogorov-Smirnov and/or Anderson-Darling, for example.  
> 
> This question was asked back in 2004:
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/37053.html but the response
> was that this kind of thing wasn't in R nor in proprietary software to the
> best of the responding author's memory.  In 2010, however, this is no longer
> true as @Risk's
> (http://www.palisade.com/risk/?gclid=CKvblPSM7KICFZQz5wodDRI2fg)
> "Distribution Fitting" function does this very thing.  And it is here that
> my R pride has taken a hit.  Based on the first response to the question
> posed here
>
http://r.789695.n4.nabble.com/Which-distribution-best-fits-the-data-td859448.html#a859448
> is it fair to say that the R community (I realize this is only 1 view) would
> take exception to this kind of "data mining"?  
> 
> Unless I've missed a discussion of a package that does this very thing, it
> seems as though I would need to code something up using fitdistr() and do
> all the ranking myself.  Undoubtedly that would be a good exercise for me,
> but its hard for me to believe R would be a runner-up to something like
> distribution fitting in @Risk.
> 

   I was one of the respondents in some of the threads you list above,
and I still question why you're doing this in the first place: it's not
*necessarily* a silly thing to do, but that would be my default position.

  It's not hard to hack up something that tries all the distributions
fitdistr() knows up and compares their AIC values (completely ignoring
sensible considerations like whether the distribution is discrete
or not ...)  See below ...

  It's hard to see how you could have a mechanistic (rather
than phenomenological) model in mind if you just want to try
a whole variety of families (not 1 or 2).  Perhaps some flexible
family like Johnson distributions 

would be appropriate, or log-spline densities
 ...


distlist <- c("beta","cauchy","chi-squared","exponential",
  "f","gamma","geometric","log-normal","logistic",
  "negative binomial","normal","poisson","t","weibull")


x <- runif(1000)

dd <- function(...) {
  try(fitdistr(...),silent=TRUE)
}

library(MASS)
s <- lapply(as.list(distlist),dd,x=x)
names(s) <- distlist

sapply(s,function(z) if (inherits(z,"try-error")) NA else AIC(z))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R's Data Dredging Philosophy for Distribution Fitting

2010-07-14 Thread Frank E Harrell Jr

On 07/14/2010 06:22 PM, emorway wrote:


Forum,

I'm a grad student in Civil Eng, took some Stats classes that required
students learn R, and I have since taken to R and use it for as much as I
can.  Back in my lab/office, many of my fellow grad students still use
proprietary software at the behest of advisers who are familiar with the
recommended software (Statistica, @Risk (Excel Add-on), etc).  I have spent
a lot of time learning R and am confident it can generally out-process,
out-graph, or more simply stated, out-perform most of these other software
packages.  However, one area my view has been humbled in is distribution
fitting.

I started by reading through
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf  After that
I started digging around on this forum and found posts like this one
http://r.789695.n4.nabble.com/Fitting-usual-distributions-td80.html#a80
that are close to what I'm after.  That is, given an observation dataset, I
would like to call a function that cycles through numerous distributions
(common or not) and then ranks them for me based on Chi-Square,
Kolmogorov-Smirnov and/or Anderson-Darling, for example.

This question was asked back in 2004:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/37053.html but the response
was that this kind of thing wasn't in R nor in proprietary software to the
best of the responding author's memory.  In 2010, however, this is no longer
true as @Risk's
(http://www.palisade.com/risk/?gclid=CKvblPSM7KICFZQz5wodDRI2fg)
"Distribution Fitting" function does this very thing.  And it is here that
my R pride has taken a hit.  Based on the first response to the question
posed here
http://r.789695.n4.nabble.com/Which-distribution-best-fits-the-data-td859448.html#a859448
is it fair to say that the R community (I realize this is only 1 view) would
take exception to this kind of "data mining"?

Unless I've missed a discussion of a package that does this very thing, it
seems as though I would need to code something up using fitdistr() and do
all the ranking myself.  Undoubtedly that would be a good exercise for me,
but its hard for me to believe R would be a runner-up to something like
distribution fitting in @Risk.

Eric


Eric,

I didn't read the links you provided but the approach you have advocated 
(and you are not alone) is futile.  If you entertain more than about 2 
distributions, the variance of the final fits is no better than the 
variance of the empirical cumulative distribution function (once you 
properly adjust variances for model uncertainty).  So just go empirical. 
 In general if your touchstone is the observed data (as in checking 
goodness of fit of various parametric distributions), your final 
estimators will have the variance of empirical estimators.


Frank
--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix Size

2010-07-14 Thread Christos Argyropoulos






If the system is sparse and you have a really large cluster to play 
with, then maybe (emphasis) PETSc/TAO is the right combination of tools 
for your problem.



http://www.mcs.anl.gov/petsc/petsc-as/



Christos




  
_
Hotmail: Powerful Free email with security by Microsoft.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] export tables to excel files on multiple sheets with titles for each table

2010-07-14 Thread Whit Armstrong
It isn't beautiful, but I use this package to write excel files from linux.

http://github.com/armstrtw/Rexcelpoi

the basic idea is that each element of a list is written as a separate
sheet, but if a list element is itself a list, then all the elements
of that list are written to the same sheet (with a title corresponding
to the name of the list element).

-Whit


On Tue, Jul 13, 2010 at 4:21 PM, eugen pircalabelu
 wrote:
> Hello R-users,
> Checking the archives, I recently came across this topic:
> "export tables to Excel files"
> (http://r.789695.n4.nabble.com/export-tables-to-Excel-files-td1565679.html#a1565679),
>  and the following interesting references have been proposed:
> http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows
> http://www.r-bloggers.com/export-data-frames-to-multi-worksheet-excel-file-2/
>
> but my problem is somehow a small extension to what has been discussed, and
> although i have a solution, i seek something more elegant. I want to export
> multiple dataframes (on multiple sheets), but i also want each of them to have
> its own title that is to be written also in Excel. The packages/functions 
> that i
> have checked, cannot accommodate a title that is to be written on the sheet,
> along with the actual dataframe of interest.
>
> I can do something similar to what i need, but without writing the dataframes 
> on
> multiple sheets.
>
> #head(USArrests) and head(iris) written with accompanying title one under each
> other
>
> write.excel<-function (tab, ...){
>  zz <- file("example.dat", "a+b")
>  cat("\"TITLE extra line",file = zz, sep = "\n")
>  write.table(tab, file=zz, row.names=F,sep="\t")
>  close(zz)}
>  write.excel(head(USArrests))
>  write.excel(head(iris))
>
> Any suggestion on how to export the same information on two separate sheets, 
> and
> keeping also a title for each of them, is highly appreciated, as i have been
> searching for some time for a good solution.
>
> Thank you very much and have a great day ahead!
>
>
>
>
>
>
>  Eugen Pircalabelu
> (0032)471 842 140
> (0040)727 839 293
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix Size

2010-07-14 Thread paul s

On 07/14/2010 07:07 PM, Duncan Murdoch wrote:

It is capable of handling large data, but not that large in a single
matrix. The limit on the number of entries in any vector (and matrices
are stored as vectors) is about 2^31 ~ 2 billion. Your matrix needs
about 340 billion entries, so it's too big. (It won't all fit in memory,


Thank you for confirming this.


You need to break up the work into smaller pieces.


i agree, i will review what a sparse matrix is and how that impacts our 
overall calculations.


is it possible to distribute calculations through R across computers? so 
if i strung a series of PS3's(or the lab computers, ps3 cluster is just 
a fantasy) together, could those pieces then be distributed?


cheers
paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix Size

2010-07-14 Thread paul s

On 07/14/2010 06:10 PM, Douglas Bates wrote:

R stores matrices and other data objects in memory.  A matrix of that
size would require

2e+06*17*8/2^30

[1] 2533.197


great, that is my understanding as well..


probably easier, rethink your problem.


yes. i am starting to do that now as i have run into the same memory 
issue with my code and wanted to look at statical packages for crunching 
huge amounts of data.



Results from a linear regression producing 170,000 coefficient
estimates are unlikely to be useful.  The model matrix is essentially
guaranteed to be rank deficient.


interesting, i also stated this point as observations grew and plotting 
the regression had minimal impact on the characteristics, however i am 
working with an academic that claims all observations are needed to 
reflect a pure hedonic index.


is this what you mean by rank deficient?
http://en.wikipedia.org/wiki/Rank_%28linear_algebra%29

thank you for your response.

cheers
paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix Size

2010-07-14 Thread paul s



On 07/14/2010 06:15 PM, Peter Dalgaard wrote:

A quick calculation reveals that a matrix of that size requires about
2.7 TERAbytes of storage, so I'm a bit confused as to how you might
expect to fit it into 16GB of RAM...

However, even with terabytes of memory, you would be running into the
(current) limitation that a single vector in R can have at most 2^31-1 =
ca. 2 trillion elements.


thank you for also confirming what Douglas had written.


Yes, you could be doing it wrong, but what is "it"?


we are trying to create a hedonic index: http://tinyurl.com/2fnl3jf


If the matrix is sparse, there are sparse matrix tools around...


interesting yet again! just read what this was and it seems like it 
could be. also part of the matrix could be a diagonal matrix.


1 0 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 0 0 1 0 0
0 1 0 0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 1
0 0 1 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 0 1 0
0 0 0 1 0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0 0 1 0

if it is a sparse matrix how would i test? just a smaller subset of data 
that i have run regression on producing similar coefficients?


cheers
paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R's Data Dredging Philosophy for Distribution Fitting

2010-07-14 Thread emorway

Forum, 

I'm a grad student in Civil Eng, took some Stats classes that required
students learn R, and I have since taken to R and use it for as much as I
can.  Back in my lab/office, many of my fellow grad students still use
proprietary software at the behest of advisers who are familiar with the
recommended software (Statistica, @Risk (Excel Add-on), etc).  I have spent
a lot of time learning R and am confident it can generally out-process,
out-graph, or more simply stated, out-perform most of these other software
packages.  However, one area my view has been humbled in is distribution
fitting.

I started by reading through
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf  After that
I started digging around on this forum and found posts like this one
http://r.789695.n4.nabble.com/Fitting-usual-distributions-td80.html#a80
that are close to what I'm after.  That is, given an observation dataset, I
would like to call a function that cycles through numerous distributions
(common or not) and then ranks them for me based on Chi-Square,
Kolmogorov-Smirnov and/or Anderson-Darling, for example.  

This question was asked back in 2004:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/37053.html but the response
was that this kind of thing wasn't in R nor in proprietary software to the
best of the responding author's memory.  In 2010, however, this is no longer
true as @Risk's
(http://www.palisade.com/risk/?gclid=CKvblPSM7KICFZQz5wodDRI2fg)
"Distribution Fitting" function does this very thing.  And it is here that
my R pride has taken a hit.  Based on the first response to the question
posed here
http://r.789695.n4.nabble.com/Which-distribution-best-fits-the-data-td859448.html#a859448
is it fair to say that the R community (I realize this is only 1 view) would
take exception to this kind of "data mining"?  

Unless I've missed a discussion of a package that does this very thing, it
seems as though I would need to code something up using fitdistr() and do
all the ranking myself.  Undoubtedly that would be a good exercise for me,
but its hard for me to believe R would be a runner-up to something like
distribution fitting in @Risk.

Eric
-- 
View this message in context: 
http://r.789695.n4.nabble.com/R-s-Data-Dredging-Philosophy-for-Distribution-Fitting-tp2289508p2289508.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix Size

2010-07-14 Thread Duncan Murdoch

On 14/07/2010 5:23 PM, paul s wrote:

hi -

i just started using R as i am trying to figure out how perform a linear 
regression on a huge matrix.


i am sure this topic has passed through the email list before but could 
not find anything in the archives.


i have a matrix that is 2,000,000 x 170,000 the values right now are 
arbitray.


i try to allocate this on a x86_64 machine with 16G of ram and i get the 
following:


 > x <- matrix(0,200,17);
Error in matrix(0, 2e+06, 17) : too many elements specified
 >

is R capable of handling data of this size? am i doing it wrong?


It is capable of handling large data, but not that large in a single 
matrix.  The limit on the number of entries in any vector (and matrices 
are stored as vectors) is about 2^31 ~ 2 billion.  Your matrix needs 
about 340 billion entries, so it's too big.  (It won't all fit in 
memory, either:  you've only got space for 2 billion numeric values in 
your 16G of RAM, and you also need space for the OS, etc.  But the OS 
can use disk space as virtual memory, so you might be able to get that 
much memory, it would just be very, very slow.)


You need to break up the work into smaller pieces.

Duncan Murdoch



cheers
paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] export tables to excel files on multiple sheets with titles for each table

2010-07-14 Thread Erich Neuwirth
You also could use RExcel and write some VBA macros doing this task for you.
You can essentially have the rcom R-centric solution or the
VBA-centric RExcel solution.


On Jul 14, 2010, at 12:19 AM, Marc Schwartz wrote:

> If I am correctly understanding what Eugen is trying to do, WriteXLS() won't 
> get him there. WriteXLS() will enable you to label/name the worksheets (tabs) 
> but not allow you to precede the actual data frame rows and columns on the 
> sheet with a title or label.
> 
> I suspect that you may have to look at the RCom package tools for this. This 
> provides greater flexibility in writing to the worksheets and cells. 
> 
> See http://rcom.univie.ac.at/ for more information.
> 
> HTH,
> 
> Marc Schwartz
> 
> 
> On Jul 13, 2010, at 4:09 PM, Felipe Carrillo wrote:
> 
>> Check the WriteXLS package, I think it does that and also saves
>> each R object on a different excel sheet.
>> 
>> Felipe D. Carrillo
>> Supervisory Fishery Biologist
>> Department of the Interior
>> US Fish & Wildlife Service
>> California, USA
>> 
>> 
>> 
>> - Original Message 
>>> From: eugen pircalabelu 
>>> To: R-help 
>>> Sent: Tue, July 13, 2010 1:21:33 PM
>>> Subject: [R] export tables to excel files on multiple sheets with titles 
>>> for 
>>> each table
>>> 
>>> Hello R-users,
>>> Checking the archives, I recently came across this topic: 
>>> "export tables to Excel files" 
>>> (http://r.789695.n4.nabble.com/export-tables-to-Excel-files-td1565679.html#a1565679),
>>> ,
>>> and the following interesting references have been proposed:
>>> http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows
>>> http://www.r-bloggers.com/export-data-frames-to-multi-worksheet-excel-file-2/
>>> 
>>> but my problem is somehow a small extension to what has been discussed, and 
>>> although i have a solution, i seek something more elegant. I want to export 
>>> multiple dataframes (on multiple sheets), but i also want each of them to 
>>> have 
>> 
>>> its own title that is to be written also in Excel. The packages/functions 
>>> that 
>>> i 
>>> 
>>> have checked, cannot accommodate a title that is to be written on the 
>>> sheet, 
>>> along with the actual dataframe of interest.
>>> 
>>> I can do something similar to what i need, but without writing the 
>>> dataframes 
>>> on 
>>> 
>>> multiple sheets.
>>> 
>>> #head(USArrests) and head(iris) written with accompanying title one under 
>>> each 
>> 
>>> other 
>>> 
>>> write.excel<-function (tab, ...){
>>> zz <- file("example.dat", "a+b") 
>>> cat("\"TITLE extra line",file = zz, sep = "\n")
>>> write.table(tab, file=zz, row.names=F,sep="\t")
>>> close(zz)}
>>> write.excel(head(USArrests))
>>> write.excel(head(iris))
>>> 
>>> Any suggestion on how to export the same information on two separate 
>>> sheets, 
>>> and 
>>> 
>>> keeping also a title for each of them, is highly appreciated, as i have 
>>> been 
>>> searching for some time for a good solution.
>>> 
>>> Thank you very much and have a great day ahead!
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

--
Erich Neuwirth
Didactic Center for Computer Science and Institute for Scientific Computing
University of Vienna





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix Size

2010-07-14 Thread Peter Dalgaard
paul s wrote:
> hi -
> 
> i just started using R as i am trying to figure out how perform a linear 
> regression on a huge matrix.
> 
> i am sure this topic has passed through the email list before but could 
> not find anything in the archives.
> 
> i have a matrix that is 2,000,000 x 170,000 the values right now are 
> arbitray.
> 
> i try to allocate this on a x86_64 machine with 16G of ram and i get the 
> following:
> 
>  > x <- matrix(0,200,17);
> Error in matrix(0, 2e+06, 17) : too many elements specified
>  >
> 
> is R capable of handling data of this size? am i doing it wrong?

A quick calculation reveals that a matrix of that size requires about
2.7 TERAbytes of storage, so I'm a bit confused as to how you might
expect to fit it into 16GB of RAM...

However, even with terabytes of memory, you would be running into the
(current) limitation that a single vector in R can have at most 2^31-1 =
ca. 2 trillion elements.

Yes, you could be doing it wrong, but what is "it"? If the matrix is
sparse, there are sparse matrix tools around...


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix Size

2010-07-14 Thread Douglas Bates
On Wed, Jul 14, 2010 at 4:23 PM, paul s  wrote:
> hi -

> i just started using R as i am trying to figure out how perform a linear
> regression on a huge matrix.

> i am sure this topic has passed through the email list before but could not
> find anything in the archives.

> i have a matrix that is 2,000,000 x 170,000 the values right now are
> arbitray.

> i try to allocate this on a x86_64 machine with 16G of ram and i get the
> following:

>> x <- matrix(0,200,17);
> Error in matrix(0, 2e+06, 17) : too many elements specified

R stores matrices and other data objects in memory.  A matrix of that
size would require
> 2e+06*17*8/2^30
[1] 2533.197

gigabytes of memory.  Start looking for a machine with at least 5
terabytes of memory (you will need more than one copy of the matrix to
be stored) or, probably easier, rethink your problem.

Results from a linear regression producing 170,000 coefficient
estimates are unlikely to be useful.  The model matrix is essentially
guaranteed to be rank deficient.

>>
>
> is R capable of handling data of this size? am i doing it wrong?
>
> cheers
> paul
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to see the process of function sample()?

2010-07-14 Thread Peter Dalgaard
Wu Gong wrote:
> I have the same question about how to see the process behind a function. If I
> type sample in R, it really tells nothing about how R selects from a data
> set and creates samples. 
> 
> Thank in advance for any help.
> 
> -
> A R learner.

Uwe Ligges. R Help Desk: Accessing the sources. R News, 6(4):43-45,
October 2006.  http://cran.r-project.org/doc/Rnews/


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to see the process of function sample()?

2010-07-14 Thread Wu Gong

I have the same question about how to see the process behind a function. If I
type sample in R, it really tells nothing about how R selects from a data
set and creates samples. 

Thank in advance for any help.

-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-see-inside-of-this-function-tp2289376p2289417.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merging columns along time line

2010-07-14 Thread Wu Gong

Correction:
b$label <- cut(b$timestamp, breaks=bks, labels=lbs, include.lowest = T,
right=F)

-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Merging-columns-along-time-line-tp2289147p2289401.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matrix Size

2010-07-14 Thread paul s

hi -

i just started using R as i am trying to figure out how perform a linear 
regression on a huge matrix.


i am sure this topic has passed through the email list before but could 
not find anything in the archives.


i have a matrix that is 2,000,000 x 170,000 the values right now are 
arbitray.


i try to allocate this on a x86_64 machine with 16G of ram and i get the 
following:


> x <- matrix(0,200,17);
Error in matrix(0, 2e+06, 17) : too many elements specified
>

is R capable of handling data of this size? am i doing it wrong?

cheers
paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] .Rprofile interfering with update.packages()

2010-07-14 Thread stephen sefick
I know this is a double post, but the subject line was really
misleading.  Sorry again.

This is the first time that I have tried to update packages with a
tinkered around with .Rprofile.  I start R with R --vanilla and it
does not load my .Rprofile, but when I issue the command
update.packages() R downloads the packages as expected, but then seems
to load .Rprofile before compiling the packages sources.  What am I
doing wrong?
kindest regards,

Stephen Sefick

see- Session info output and .Rprofile

Ubuntu 10.04
R 2.11.1

Session info:

R version 2.11.1 (2010-05-31)
x86_64-pc-linux-gnu


locale:
 [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8   LC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] graphics  grDevices utils datasets  stats grid  methods
[8] base

other attached packages:
 [1] gpclib_1.5-1StreamMetabolism_0.03-3 maptools_0.7-34
 [4] lattice_0.18-8  sp_0.9-64   foreign_0.8-40
 [7] chron_2.3-35zoo_1.6-3   vegan_1.17-3
[10] ggplot2_0.8.8   proto_0.3-8 reshape_0.8.3
[13] plyr_0.1.9


.Rprofile:
#source USGS graphing function for base data
source("~/R_scripts/USGS.R")
source("~/R_scripts/publication_ggplot2_theme.R")
source("~/R_scripts/llScript.R")


#set help_type
options(help_type="html")

#exit to get around anoying q behavior
exit <- function(save="no"){q(save=save)}

#most used libraries
library(ggplot2)
library(vegan)
library(StreamMetabolism)

#allow gpclib package to be used
gpclibPermit()

-- 
Stephen Sefick

| Auburn University                                   |
| Department of Biological Sciences           |
| 331 Funchess Hall                                  |
| Auburn, Alabama                                   |
| 36849                                                    |
|___|
| sas0...@auburn.edu                             |
| http://www.auburn.edu/~sas0025             |
|___|

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

                                -K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Query about wilcox.test() P-value

2010-07-14 Thread Peter Dalgaard
Marc Schwartz wrote:
> You need to understand the difference between how a value is stored
> in an R object with full floating point precision versus how a value
> in R is displayed (printed) in the console with a print "method".
> 
> In this case, wilcox.test() returns an object of class 'htest' (as
> noted in the Value section of ?wilcox.test). When the result of
> wilcox.test() is printed to the console (using print.htest()), the p
> value is displayed using the function format.pval(), which in this
> case returns:
> 
>> format.pval(2.928121e-165)
> [1] "< 2.22e-16"
> 
> This is common in R, where floating point values are not printed to
> full precision. The value displayed will be impacted upon by various
> characteristics, in some cases due to the application of specific
> print/formatting operations, or due to default options in R (see
> ?print.default).


On the other hand, it should be admitted that this at least partly
originates in times when we were less careful about computing the
appropriate tail of test statistic distributions. So one main point was
that a p-value of 0 could arise as "1 minus a number very close to 1".

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rows process in DF

2010-07-14 Thread Phil Spector

If you don't mind avoiding a loop, here's one way to
solve your problem:


myDF = 
data.frame(id=c(100,101),d1=c(.3,.3),d2=c(.4,.4),d3=c(-.2,.5),d4=c(-.3,.6),d5=c(.5,-.2),d6=c(.6,-.4),d7=c(-.9,-.5),d8=c(-.8,-.6))
doit = 
function(x)c(x[1],sum_positive=sum(x[-1][x[-1]>0]),sum_negative=sum(x[-1][x[-1]<0]))
t(apply(myDF,1,doit))

  id sum_positive sum_negative
[1,] 100  1.8 -2.2
[2,] 101  1.8 -1.7

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Wed, 14 Jul 2010, jd6688 wrote:



I have the following datasets:

id  d1  d2   d3   d4   d5   d6   d7   d8
1 100 0.3 0.4 -0.2 -0.3  0.5  0.6 -0.9 -0.8
2 101 0.3 0.4  0.5  0.6 -0.2 -0.4 -0.5 -0.6

what I am trying to accomplish:

loop through the  rows && do the following:
   if the values from the columns of the current row >0 then
sum_positive=total
   if the values from the columns of the current row <0 then
sum_negtive=total

   then discard the columns and create a new table


idsum_positive  sum_negtive

1  100  1.8   -2.2
2   101 1.8   -1.7

I tried the following:but didn't make it work, any inputs would be greatly
appreciated.

for (i in 1:nrow(myDF))  {
+myrow <-myDF[i,]
+   don't know how to move forward?
+ }

--
View this message in context: 
http://r.789695.n4.nabble.com/rows-process-in-DF-tp2289378p2289378.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merging columns along time line

2010-07-14 Thread Wu Gong

I take this case as cut a data set by breaks and assign each segment a label
name.

a <- data.frame(timestamp=c(3,5,8), mylabel=c("abc","def","ghi"))
b <- data.frame(timestamp=c(1:10))
bks <- c(a$timestamp,max(b$timestamp))
lbs <- a$mylabel
b$label <- cut(b$timestamp, breaks=bks, labels=lbs, include.lowest = T,
right=T)

-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Merging-columns-along-time-line-tp2289147p2289395.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to see inside of this function?

2010-07-14 Thread Erik Iverson



Bogaso Christofer wrote:

Hi, there is a function Skewness() under fBasics package. If I type
"skewness", I get followings:



Case matters in R, so please be precise.


 


skewness


function (x, ...) 


{

UseMethod("skewness")

}



 


Would be great if someone tell me how to see the codes of this function.


That *is* the code of the function.

There are then further methods used for specific classes, which is 
probably what you want to see.


For instance, look at the code of summary vs. summary.lm .

Try methods(skewness) to see which ones have been defined.



 


2ndly suppose I create following function:

 


fn1 <- function(x) return(x+2)

 


How I can make above kind of shape like, when user types "fn1" then he will
see:

 


fn1

function(x,.)

{

UseMethod("fn1")

}


I don't know what this means.  You just have to define you function that 
way.  Use an actual text editor to write your functions, then send them 
to R.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] qplot in ggplot2 not working any longer - (what did I do?)

2010-07-14 Thread stephen sefick
This is the first time that I have tried to update packages with a
tinkered around with .Rprofile.  I start R with R --vanilla and it
does not load my .Rprofile, but when I issue the command
update.packages() R downloads the packages as expected, but then seems
to load .Rprofile before compiling the packages sources.  What am I
doing wrong?
kindest regards,

Stephen Sefick

see- Session info output and .Rprofile

Ubuntu 10.04
R 2.11.1

Session info:

R version 2.11.1 (2010-05-31)
x86_64-pc-linux-gnu


locale:
 [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8   LC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] graphics  grDevices utils datasets  stats grid  methods
[8] base

other attached packages:
 [1] gpclib_1.5-1StreamMetabolism_0.03-3 maptools_0.7-34
 [4] lattice_0.18-8  sp_0.9-64   foreign_0.8-40
 [7] chron_2.3-35zoo_1.6-3   vegan_1.17-3
[10] ggplot2_0.8.8   proto_0.3-8 reshape_0.8.3
[13] plyr_0.1.9


.Rprofile:
#source USGS graphing function for base data
source("~/R_scripts/USGS.R")
source("~/R_scripts/publication_ggplot2_theme.R")
source("~/R_scripts/llScript.R")


#set help_type
options(help_type="html")

#exit to get around anoying q behavior
exit <- function(save="no"){q(save=save)}

#most used libraries
library(ggplot2)
library(vegan)
library(StreamMetabolism)

#allow gpclib package to be used
gpclibPermit()



-- 
Stephen Sefick

| Auburn University                                   |
| Department of Biological Sciences           |
| 331 Funchess Hall                                  |
| Auburn, Alabama                                   |
| 36849                                                    |
|___|
| sas0...@auburn.edu                             |
| http://www.auburn.edu/~sas0025             |
|___|

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

                                -K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rows process in DF

2010-07-14 Thread jd6688

I have the following datasets:

 id  d1  d2   d3   d4   d5   d6   d7   d8
1 100 0.3 0.4 -0.2 -0.3  0.5  0.6 -0.9 -0.8
2 101 0.3 0.4  0.5  0.6 -0.2 -0.4 -0.5 -0.6

what I am trying to accomplish:

 loop through the  rows && do the following:
if the values from the columns of the current row >0 then
sum_positive=total
if the values from the columns of the current row <0 then
sum_negtive=total

then discard the columns and create a new table 


 idsum_positive  sum_negtive

1  100  1.8   -2.2 
2   101 1.8   -1.7 

I tried the following:but didn't make it work, any inputs would be greatly
appreciated.

 for (i in 1:nrow(myDF))  { 
+myrow <-myDF[i,] 
+   don't know how to move forward?
+ } 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/rows-process-in-DF-tp2289378p2289378.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to see inside of this function?

2010-07-14 Thread Bogaso Christofer
Hi, there is a function Skewness() under fBasics package. If I type
"skewness", I get followings:

 

> skewness

function (x, ...) 

{

UseMethod("skewness")

}



 

Would be great if someone tell me how to see the codes of this function.

 

2ndly suppose I create following function:

 

fn1 <- function(x) return(x+2)

 

How I can make above kind of shape like, when user types "fn1" then he will
see:

 

fn1

function(x,.)

{

UseMethod("fn1")

}

..

Etc.

 

Thanks and regards,

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Boxplot: Scale outliers

2010-07-14 Thread Peter Ehlers

On 2010-07-13 12:11, Robert Peter wrote:

Hello!

I am trying to scale the outliers in a boxplot. I am passing "pars =
list(boxwex=0.1, staplewex=0.1, outwex=0.1)" to the boxplot command. The
boxes are scaled correctly, but the circles (outliers) are not scaled at
all, and thus pretty big compared to the boxes scaled with 0.1.
Am I missing something?


Maybe you're looking for the 'outcex' parameter?
See ?bxp.

  -Peter Ehlers



Thanks in advance!
Robert


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Import graph object

2010-07-14 Thread Marc Schwartz
On Jul 14, 2010, at 3:19 PM, Michael Haenlein wrote:

> Dear all,
> 
> I have a txt file of the following format that describes the relationships
> between a network of a certain number of nodes.
> 
> {4, 2, 3}
> {3, 4, 1}
> {4, 2, 1}
> {2, 1, 3}
> {2, 3}
> {}
> {2, 5, 1}
> {3, 5, 4}
> {3, 4}
> {2, 5, 3}
> 
> For example the first line {4, 2, 3} implies that there is a connection
> between Node 1 and Node 4, a connection between Node 1 and Node 2 and a
> connection between Node 1 and Node 3. The second line {3, 4, 1} implies that
> there is a connection between Node 2 and Node 3 as well as Node 4 and Node
> 1. Note that some of the nodes can be isolated (i.e., not have any
> connections to any other node) which is then indicated by {}. Also note that
> the elements in each row are not necessarily ordered (i.e., {4, 2, 3}
> instead of {2, 3, 4}). I would like to (a) read the txt file into R and (b)
> convert it to an adjacency matrix. For example the adjacency matrix
> corresponding to the aforementioned example is as follows:
> 
> 0 1 1 1 0 0 0 0 0 0
> 1 0 1 1 0 0 0 0 0 0
> 1 1 0 1 0 0 0 0 0 0
> 1 1 1 0 0 0 0 0 0 0
> 0 1 1 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0
> 1 1 0 0 1 0 0 0 0 0
> 0 0 1 1 1 0 0 0 0 0
> 0 0 1 1 0 0 0 0 0 0
> 0 1 1 0 1 0 0 0 0 0
> 
> Is there any convenient way of doing this?
> 
> Thanks,
> 
> Michael


Read the file in with readLines():

# I am on OSX, so copied from the clipboard

Lines <- readLines(pipe("pbpaste"))

> Lines
 [1] "{4, 2, 3}" "{3, 4, 1}" "{4, 2, 1}" "{2, 1, 3}" "{2, 3}"   
 [6] "{}""{2, 5, 1}" "{3, 5, 4}" "{3, 4}""{2, 5, 3}"


# parse the numbers from Lines

L.split <- strsplit(Lines, split = "[{},]")

> L.split
[[1]]
[1] ""   "4"  " 2" " 3"

[[2]]
[1] ""   "3"  " 4" " 1"

[[3]]
[1] ""   "4"  " 2" " 1"

[[4]]
[1] ""   "2"  " 1" " 3"

[[5]]
[1] ""   "2"  " 3"

[[6]]
[1] "" ""

[[7]]
[1] ""   "2"  " 5" " 1"

[[8]]
[1] ""   "3"  " 5" " 4"

[[9]]
[1] ""   "3"  " 4"

[[10]]
[1] ""   "2"  " 5" " 3"


# Create an initial square matrix of 0's

mat <- matrix(0, length(Lines), length(Lines))

> mat
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]000000000 0
 [2,]000000000 0
 [3,]000000000 0
 [4,]000000000 0
 [5,]000000000 0
 [6,]000000000 0
 [7,]000000000 0
 [8,]000000000 0
 [9,]000000000 0
[10,]000000000 0



# Set the positions in each row to 1

for (i in seq(along = L.split)) mat[i, as.numeric(L.split[[i]])] <- 1

> mat
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]011100000 0
 [2,]101100000 0
 [3,]110100000 0
 [4,]111000000 0
 [5,]011000000 0
 [6,]000000000 0
 [7,]110010000 0
 [8,]001110000 0
 [9,]001100000 0
[10,]011010000 0


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Import graph object

2010-07-14 Thread Duncan Murdoch

On 14/07/2010 4:19 PM, Michael Haenlein wrote:

Dear all,

I have a txt file of the following format that describes the relationships
between a network of a certain number of nodes.

{4, 2, 3}
{3, 4, 1}
{4, 2, 1}
{2, 1, 3}
{2, 3}
{}
{2, 5, 1}
{3, 5, 4}
{3, 4}
{2, 5, 3}

For example the first line {4, 2, 3} implies that there is a connection
between Node 1 and Node 4, a connection between Node 1 and Node 2 and a
connection between Node 1 and Node 3. The second line {3, 4, 1} implies that
there is a connection between Node 2 and Node 3 as well as Node 4 and Node
1. Note that some of the nodes can be isolated (i.e., not have any
connections to any other node) which is then indicated by {}. Also note that
the elements in each row are not necessarily ordered (i.e., {4, 2, 3}
instead of {2, 3, 4}). I would like to (a) read the txt file into R and (b)
convert it to an adjacency matrix. For example the adjacency matrix
corresponding to the aforementioned example is as follows:

0 1 1 1 0 0 0 0 0 0
1 0 1 1 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
1 1 0 0 1 0 0 0 0 0
0 0 1 1 1 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0
0 1 1 0 1 0 0 0 0 0

Is there any convenient way of doing this?


It'll take a little work, but the general strategy is this:

Use readLines to read the whole file, putting each line into one element 
of a character vector.


Use a loop of some sort to proceed through the vector.  Strip off the 
braces, use scan() to read the numbers, use the numbers to set the 1's 
in the adjacency matrix.  For example (untested):


x <- readLines( .. )
M <- matrix(0, length(x), length(x))
for (i in seq_along(x)) {
  y <- gsub("[{},]", " ", x[i])
  entries <- scan(textConnection(y))
  M[i, entries] <- 1
}

This leaves a bunch of textConnections open so you could clean up by 
calling closeAllConnections (or close each one), but other than that it 
should work.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about string handling....

2010-07-14 Thread Henrique Dallazuanna
Another option could be:

df$var3 <- gsub(".*\\((.*)\\).*", "\\1", df$var2)

On Wed, Jul 14, 2010 at 3:21 PM, karena  wrote:

>
> Hi,
>
> I have a data.frame as following:
> var1 var2
> 1   ab_c_(ok)
> 2   okf789(db)_c
> 3   jojfiod(90).gt
> 4   "ij"_(78)__op
> 5   (iojfodjfo)_ab
>
> what I want is to create a new variable called "var3". the value of var3 is
> the content in the Parentheses. so var3 would be:
> var3
> ok
> db
> 90
> 78
> iojfodjfo
>
> how to do this?
>
> thanks,
>
> karena
> --
> View this message in context:
> http://r.789695.n4.nabble.com/question-about-string-handling-tp2289178p2289178.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Import graph object

2010-07-14 Thread Michael Haenlein
Dear all,

I have a txt file of the following format that describes the relationships
between a network of a certain number of nodes.

{4, 2, 3}
{3, 4, 1}
{4, 2, 1}
{2, 1, 3}
{2, 3}
{}
{2, 5, 1}
{3, 5, 4}
{3, 4}
{2, 5, 3}

For example the first line {4, 2, 3} implies that there is a connection
between Node 1 and Node 4, a connection between Node 1 and Node 2 and a
connection between Node 1 and Node 3. The second line {3, 4, 1} implies that
there is a connection between Node 2 and Node 3 as well as Node 4 and Node
1. Note that some of the nodes can be isolated (i.e., not have any
connections to any other node) which is then indicated by {}. Also note that
the elements in each row are not necessarily ordered (i.e., {4, 2, 3}
instead of {2, 3, 4}). I would like to (a) read the txt file into R and (b)
convert it to an adjacency matrix. For example the adjacency matrix
corresponding to the aforementioned example is as follows:

0 1 1 1 0 0 0 0 0 0
1 0 1 1 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
1 1 0 0 1 0 0 0 0 0
0 0 1 1 1 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0
0 1 1 0 1 0 0 0 0 0

Is there any convenient way of doing this?

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about string handling....

2010-07-14 Thread Wu Gong

Try this:

text <- 'var1 var2
1 ab_c_(ok)
2 okf789(db)_c
3 jojfiod(90).gt
4 "ij"_(78)__op
5 (iojfodjfo)_ab'

df <- read.table(textConnection(text), head=T, sep=" ",quote="")
df$var3 <- gsub("(.*\\()(.*)(\\).*)","\\2",df$var2)


-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/question-about-string-handling-tp2289178p2289327.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about string handling....

2010-07-14 Thread Gabor Grothendieck
On Wed, Jul 14, 2010 at 2:21 PM, karena  wrote:
>
> Hi,
>
> I have a data.frame as following:
> var1         var2
> 1           ab_c_(ok)
> 2           okf789(db)_c
> 3           jojfiod(90).gt
> 4           "ij"_(78)__op
> 5           (iojfodjfo)_ab
>
> what I want is to create a new variable called "var3". the value of var3 is
> the content in the Parentheses. so var3 would be:
> var3
> ok
> db
> 90
> 78
> iojfodjfo
>

Here are several alternatives.  The gsub solution matches everything
up to the ( as well as everything after the ) and replaces each with
nothing.  The strsplit solution splits each into three fields,
everything before the (, everything with in the (), and everything
after the ) and the picks off the second.  The strapply solution
matches everything from ( to ) and returns everything between them.
The below works whether DF$var2 is factor or character but if you know
its character you can drop the as.character in #2 and #3.

# 1
gsub(".*[(]|[)].*", "", DF$var2)

# 2
sapply(strsplit(as.character(DF$var2), "[()]"), "[", 2)

# 3
library(gsubfn)
strapply(as.character(DF$var2), "[(](.*)[)]", simplify = TRUE)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question about string handling....

2010-07-14 Thread karena

Hi, 

I have a data.frame as following:
var1 var2
1   ab_c_(ok)
2   okf789(db)_c
3   jojfiod(90).gt
4   "ij"_(78)__op
5   (iojfodjfo)_ab

what I want is to create a new variable called "var3". the value of var3 is
the content in the Parentheses. so var3 would be:
var3
ok
db
90
78
iojfodjfo

how to do this?

thanks,

karena
-- 
View this message in context: 
http://r.789695.n4.nabble.com/question-about-string-handling-tp2289178p2289178.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add Significance Codes to Data Frame

2010-07-14 Thread Marc Schwartz
On Jul 14, 2010, at 2:16 PM, darckeen wrote:

> 
> I was hoping that there might be some way to attach significance code like
> the ones from summary.lm to a dataframe.  Anyone know how to do something
> like that.  Here is the function i'd like to add that functionality to:
> 
> 
> add1.coef <- function(model,scope,test="F",p.value=1,order.by.p=FALSE)
> {
>   num <- length(model$coefficients)
>   add <- add1(model,scope,test=test)
>   sub <- subset(add,add$'Pr(F)'<=p.value)
>   est <- sapply(rownames(sub), function(x) update(model,paste("~ .
> +",x))$coefficients[num+1])
>   ret <- data.frame(est,sub$'Pr(F)')
>   
>   colnames(ret) <- c("Estimate","Pr(F)")
>   rownames(ret) <- rownames(sub)
>   ret <- format(ret,digits=1,nsmall=1,scientific=F)
>   
>   if (order.by.p) { ret <- ret[order(ret$'Pr(F)'),]}
>   return(ret)
> }
> 
> fscope <- as.formula(paste("dep.sign.up ~ ", paste(names(lr)[2:10],
> collapse= "+")))
> rslt <- add1.coef(lm(dep.sign.up ~ 1,
> data=lr),fscope,p.value=1,order.by.p=FALSE)
> 
>   Estimate Pr(F)
> r1.pop.total   0.02  0.09
> r1.pop.household   0.05  0.09
> r1.pop.female  0.03  0.09
> r1.pop.pct.female  14594.39  0.35
> r1.pop.male0.04  0.08
> r1.pop.pct.male   -13827.51  0.37
> r1.pop.density 0.06  0.09
> r1.age.0.4.pct 14581.65  0.39
> r1.age.5.14.pct 2849.15  0.76



Review the code for print.summary.lm(), in which you will find the use of 
printCoefmat(), in which you will find the use of symnum(). 

Using lm.D9 from example(lm):

ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2,10,20, labels=c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)


> printCoefmat(coef(summary(lm.D9)))
Estimate Std. Error t value  Pr(>|t|)
(Intercept)  5.032000.22022 22.8501 9.547e-15 ***
groupTrt-0.371000.31143 -1.1913 0.249
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 



pv <- coef(summary(lm.D9))[, "Pr(>|t|)"]


> pv
 (Intercept) groupTrt 
9.547128e-15 2.490232e-01 


Signif <- symnum(pv, corr = FALSE, na = FALSE, 
 cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1), 
 symbols = c("***", "**", "*", ".", " "))


> Signif
(Intercept)groupTrt 
*** 
attr(,"legend")
[1] 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Note that 'Signif' is:

> str(Signif)
Class 'noquote'  atomic [1:2] ***  
  ..- attr(*, "legend")= chr "0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1"


So you will need to coerce it to a vector before cbind()ing to a data frame:

> as.vector(Signif)
[1] "***" " "  


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fGarch: garchFit() with fixed coefficents

2010-07-14 Thread georger2

hello everybody,

I would like to fit a model to a times series (testing set) for out of
sample predictions  using garchFit(). I would like to keep the coefficients
of ARMA/GARCH model fixed (as found by fitting the model to my training
set). The arima fitting function has such an option for that (fixed=NULL)
but the garchFit() doesnt. 

It is very important for me to keep the same coefficients for my testing set
as for my training set (where the model is found) in order to make out of
sample predictions for a new set ( testing set) using the same model found
for my training set. So basically what I would like is to fit an ARMA-GARCH
model but with predetermined coefficients. If you have any ideas please
share them with me...I will be very grateful.


Thanks,
G
-- 
View this message in context: 
http://r.789695.n4.nabble.com/fGarch-garchFit-with-fixed-coefficents-tp2289304p2289304.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-pkgs] New package "list" for analyzing list surveyexperiments

2010-07-14 Thread Yves Rosseel

On 07/13/2010 07:46 PM, Raubertas, Richard wrote:

I agree that 'list' is a terrible package name, but only secondarily
because it is a data type.  The primary problem is that it is so generic

as to be almost totally uninformative about what the package does.

For some reason package writers seem to prefer maximally uninformative
names for their packages.  To take some examples of recently announced
packages, can anyone guess what packages 'FDTH', 'rtv', or 'lavaan'
do?  Why the aversion to informative names along the lines of
'Freq_dist_and_histogram', 'RandomTimeVariables', and
'Latent_Variable_Analysis', respectively?


As an author of a package with a maximally uninformative name (lavaan), 
I like to believe that strange names can have strange attractions. After 
all, you did notice the package, didn't you?


Yves Rosseel, UGent.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the opposite of rbind?

2010-07-14 Thread schuster
On Wednesday 14 July 2010 09:06:01 pm Addi Wei wrote:
> I combined 2 data frames together using rbind...  How do I unbind the data
>  at a specific row to create 2 separate data frames?
> 

if you have a column with an identifier: try split()

Example: 
 split(iris, iris$Species)


-- 

Friedrich Schuster
Dompfaffenweg 6
69123 Heidelberg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calling a c function from R

2010-07-14 Thread Fahim Md
Thanks a lot Matt,

Just if someone like to see how I called a C function from R with files as
argumets.
For simplicity, this example is copying  the content of input file into
output file

-
  My main program is :
  source("parse.R")
  parseGBest('gbest40.seq', 'gbest40.out');  // .seq is input file, .out is
output file
---

  I wrote a wrapper function (parse.R) as follows:

  dyn.load("parse.so");
  parseGBest= function(infile, outfile)
  {
   # Do some whatever you like here
  .C("parse", infile, outfile)
  }
--
parse.c File looks like follows:

#include
#include 
#include 
#include

void adel(char **infile, char **outfile)
{

  char line[81];
   FILE *fr, *of;

  if (!(fr = fopen(*infile, "r")))
{
  fprintf(stdout, "Error while opening input file\n");
  return ;
}

  if (!(of = fopen(*outfile, "w")))
{
  fprintf(stdout, "Error while opening output file\n");
  return ;
}

  while(fgets(line, 81, fr) != NULL)
{
fputs(line, of);
}
fclose(fr);
fclose(of);
}
--

Thats it!
I saved almost a week in parsing all those genbank est files(almost 413
files).

Thanks R&C interface team.
--Fahim







On Wed, Jul 14, 2010 at 10:18 AM, Matt Shotwell  wrote:

> Fahim,
>
> Please see the Writing R Extensions manual
> http://cran.r-project.org/doc/manuals/R-exts.pdf
>
> There are simple instructions in this document under the heading "System
> and foreign language interfaces."
>
> -Matt
>
>
> On Wed, 2010-07-14 at 01:21 -0400, Fahim Md wrote:
> > Hi,
> > I am trying to call a C function, that I wrote to parse a flat file,
>  into
> > R. The argument that will go into this function is an input file that I
> need
> > to parse and write the desired output in an output file.  I used some hit
> > and trial approach but i keep on getting the "file not found" or
> > "segmentation fault" error. I know that the error is in passing the
> argument
> > but I could not solve it.
> >
> > After reading  some of the tutorials, I understood how to do this if the
> > arguments are integers or floats. I am stuck when i am trying to send the
> > files. I am attaching stub of each file.
> > Help appreciated.
> > Thanks
> >
> > ---
> > My function call would be:
> > source("parse.R")
> > parseGBest('./gbest/inFile.seq',   './gbest/outFile.out');
> > ---
> > I wrote a wrapper function (parse.R) as follows:
> >
> > dyn.load("parse.so");
> > parseGBest = function(inFile, outFile)
> > {
> > .C( "parse" , inFile , outFile);
> > }
> >
> > How to write receive the filenames in function( , ) above. and how to
> call
> > .C
> >
> > 
> > parse.c file is as below:  How to receive the argument in funcion and how
> to
> > make it compatible with my argv[ ].
> >
> >
> > void parse( int argc, char *argv[] )  //This is working as standalone C
> > program. How to receive
> >   // the above files so
> that
> > it become compatible with my argv[ ]
> > {
> >
> > FILE *fr, *of;
> > char line[81];
> >
> >
> >  if ( *argc == 3 )*/
> > {
> > if ( ( fr = fopen( argv[0], "r" )) == NULL )
> > {
> > puts( "Can't open input file.\n" );
> > exit( 0 );
> > }
> > if ( ( of = fopen( argv[1], "w" )) == NULL )
> > {
> > puts( "Output file not given.\n" );
> > }
> >   }
> >else
> > {printf("wrong usage: Try Agay!!! correct usage is:=
>  functionName
> > inputfileToParse outFileToWriteInto\n");
> >}
> > while(fgets(line, 81, fr) != NULL)
> >
> > --
> > ---
> > --
> > }
> >
> >
> >
> > Thanks again
> > Fahim
> >
> --
> Matthew S. Shotwell
> Graduate Student
> Division of Biostatistics and Epidemiology
> Medical University of South Carolina
> http://biostatmatt.com
>
>


-- 
Fahim Mohammad
Bioinforformatics Lab
University of Louisville
Louisville, KY, USA
Ph:  +1-502-409-1167

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Embedding graphics in a pdf()

2010-07-14 Thread Thomas Levine
Woah! That's so awesome!

And now I've found even more functions of my drawing programs that can be
replaced with R.

Tom

2010/7/14 Marc Schwartz 

> On Jul 14, 2010, at 1:38 PM, Thomas Levine wrote:
>
> > I've had two reasons for wanting to embed graphics in R pdf output.
> >
> > 1. I am plotting something on top of a surface (It's actually a desk.) of
> > which I have a picture and would like to place a picture underneath.
> > 2. I can produce all of my presentation slides in R without LaTeX but
> have a
> > few pictures that I need to include as slides. I would like to add images
> > inside the R script instead of manipulating them afterwards with
> Imagemagick
> > and pdftk.
> >
> > Can these be done?
> >
> > Tom
>
>
> See this reply (from Sunday) by David Winsemius on a similar query:
>
>  https://stat.ethz.ch/pipermail/r-help/2010-July/245291.html
>
> HTH,
>
> Marc Schwartz
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Add Significance Codes to Data Frame

2010-07-14 Thread darckeen

I was hoping that there might be some way to attach significance code like
the ones from summary.lm to a dataframe.  Anyone know how to do something
like that.  Here is the function i'd like to add that functionality to:


add1.coef <- function(model,scope,test="F",p.value=1,order.by.p=FALSE)
{
num <- length(model$coefficients)
add <- add1(model,scope,test=test)
sub <- subset(add,add$'Pr(F)'<=p.value)
est <- sapply(rownames(sub), function(x) update(model,paste("~ .
+",x))$coefficients[num+1])
ret <- data.frame(est,sub$'Pr(F)')

colnames(ret) <- c("Estimate","Pr(F)")
rownames(ret) <- rownames(sub)
ret <- format(ret,digits=1,nsmall=1,scientific=F)

if (order.by.p) { ret <- ret[order(ret$'Pr(F)'),]}
return(ret)
}

fscope <- as.formula(paste("dep.sign.up ~ ", paste(names(lr)[2:10],
collapse= "+")))
rslt <- add1.coef(lm(dep.sign.up ~ 1,
data=lr),fscope,p.value=1,order.by.p=FALSE)

   Estimate Pr(F)
r1.pop.total   0.02  0.09
r1.pop.household   0.05  0.09
r1.pop.female  0.03  0.09
r1.pop.pct.female  14594.39  0.35
r1.pop.male0.04  0.08
r1.pop.pct.male   -13827.51  0.37
r1.pop.density 0.06  0.09
r1.age.0.4.pct 14581.65  0.39
r1.age.5.14.pct 2849.15  0.76

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Add-Significance-Codes-to-Data-Frame-tp2289263p2289263.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the opposite of rbind?

2010-07-14 Thread RICHARD M. HEIBERGER
a <- rbind(1:3, 4:6, 7:9, 10:12, 13:15)
a1 <- a[1:3,]
a2 <- a[4:5,]
a1
a2

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What is the opposite of rbind?

2010-07-14 Thread Addi Wei

I combined 2 data frames together using rbind...  How do I unbind the data at
a specific row to create 2 separate data frames?  
-- 
View this message in context: 
http://r.789695.n4.nabble.com/What-is-the-opposite-of-rbind-tp2289244p2289244.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multilevel IRT Modelling

2010-07-14 Thread Stuart Luppescu
On Wed, 2010-07-14 at 04:31 -0700, Dr. Federico Andreis wrote:
> does anybody know of a package (working under Linux) for multilevel
> IRT modelling?
> I'd love to do this without having to go on WINSTEPS or the like..

The first place to look would be the special issue of the Journal of
Statistical Software focusing on psychometrics in R. It has a lot of
valuable information.

http://www.jstatsoft.org/v20
-- 
Stuart Luppescu -=- slu .at. ccsr.uchicago.edu
University of Chicago -=- CCSR 
才文と智奈美の父 -=-Kernel 2.6.33-gentoo-r2
 To paraphrase
 provocatively, 'machine learning is statistics
 minus any checking of models and assumptions'.
 -- Brian D. Ripley (about the difference between
 machine learning and   statistics)   useR!
 2004, Vienna (May 2004)  >

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Java Heap Space Error while running kpca command

2010-07-14 Thread Patrick J Rogers
Hi All,

I'm trying to run a kernel principal component analysis on a corpus with 89769 
text documents. I'm using the kpca command from the kernlab package. Here is 
the code:

output<-kpca(text, kernel=worddot(type="spectrum", length=1))

The problem is that when I run the kpca, it bails after only a few minutes, 
generating the following error:

Error in 
.jcall(.jnew("opennlp/maxent/io/SuffixSensitiveGISModelReader",  : 
  java.lang.OutOfMemoryError: Java heap space

The openNLP package is a dependency for the kernlab package.

output<-kpca(text, kernel=worddot(type="spectrum", length=1))

I appreciate any help anyone can give!

--
Patrick Rogers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Printing status updates in while-loop

2010-07-14 Thread jim holtman
try:

print(counter)
flush.console()  # force the output

On Wed, Jul 14, 2010 at 2:31 PM, Michael Haenlein
 wrote:
> Dear all,
>
> I'm using a while loop in the context of an iterative optimization
> procedure. Within my while loop I have a counter variable that helps me to
> determine how long the loop has been running. Before the loop I initialize
> it as counter <- 0 and the last condition within my loop is counter <-
> counter + 1.
>
> I'd like to print out the current status of "counter" while the loop is
> running to know where the optimization routine is standing. I tried to do so
> by adding print(counter) within the while loop. This does however not seem
> to work as instead of printing regular updates all print commands are
> executed only after the loop is finished.
>
> Is there some easy way to print regular status updates while the while loop
> is still running?
>
> Thanks,
>
> Michael
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Embedding graphics in a pdf()

2010-07-14 Thread Marc Schwartz
On Jul 14, 2010, at 1:38 PM, Thomas Levine wrote:

> I've had two reasons for wanting to embed graphics in R pdf output.
> 
> 1. I am plotting something on top of a surface (It's actually a desk.) of
> which I have a picture and would like to place a picture underneath.
> 2. I can produce all of my presentation slides in R without LaTeX but have a
> few pictures that I need to include as slides. I would like to add images
> inside the R script instead of manipulating them afterwards with Imagemagick
> and pdftk.
> 
> Can these be done?
> 
> Tom


See this reply (from Sunday) by David Winsemius on a similar query:

  https://stat.ethz.ch/pipermail/r-help/2010-July/245291.html

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Write value to PHP webpage

2010-07-14 Thread Wu Gong

I suggest trying to write the data into .php file directly.

outfile <-paste(filepath,"distance",".php",sep="") 
data <- "Distance=num"
num <- 1000
data <- sub("num", num,data)
write(data,file=outfile)



-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Write-value-to-PHP-webpage-tp2288169p2289206.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Embedding graphics in a pdf()

2010-07-14 Thread Thomas Levine
I've had two reasons for wanting to embed graphics in R pdf output.

1. I am plotting something on top of a surface (It's actually a desk.) of
which I have a picture and would like to place a picture underneath.
2. I can produce all of my presentation slides in R without LaTeX but have a
few pictures that I need to include as slides. I would like to add images
inside the R script instead of manipulating them afterwards with Imagemagick
and pdftk.

Can these be done?

Tom

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Printing status updates in while-loop

2010-07-14 Thread Michael Haenlein
Dear all,

I'm using a while loop in the context of an iterative optimization
procedure. Within my while loop I have a counter variable that helps me to
determine how long the loop has been running. Before the loop I initialize
it as counter <- 0 and the last condition within my loop is counter <-
counter + 1.

I'd like to print out the current status of "counter" while the loop is
running to know where the optimization routine is standing. I tried to do so
by adding print(counter) within the while loop. This does however not seem
to work as instead of printing regular updates all print commands are
executed only after the loop is finished.

Is there some easy way to print regular status updates while the while loop
is still running?

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pointers to solutions to this PCA or Cononical Correlation type problem?

2010-07-14 Thread LosemindL

Hi all,

I am sure this is a well-studied stats problem, could anybody give me some
pointers?

It's similar to Canonical Correlation study.

We have a bunch of random variables, and want to figure out the set of
linear combinations of these variables, such that their mutual correlations
are all bounded by the upbound alpha. 

That's to say, giving n random variables x_1, ..., x_n, we want to obtain m
new random variables, 

y_1, y_2, ..., y_m, 

each one is a linear combination of the {x_1, ..., x_n},

and we want the max mutual correlation among {y_1, y_2, ..., y_m} to be
confined by an upbound alpha. 

Under that constraint, we would like to have m as large as possible.

Any "optimal" way of doing that? Or good engineering approach?

Thanks a lot!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Pointers-to-solutions-to-this-PCA-or-Cononical-Correlation-type-problem-tp2289177p2289177.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Want to exclude a column when using sample function

2010-07-14 Thread Addi Wei

Maybe I'm missing something, but after reading the reshape package, I'm still
not quite sure how reshaping the data will help me with storing the
previously used samples and preventing me from selecting previously used
samples in the future.  

This is my pseudo code thought process, but I'm not sure how to implement in
R

1.  Store the 10 samples out of 180 factors into an object called sample10.
2.  Delete sample10 from 180 factors to create a "new factor list" object. 
(should then have 170 factors left)
3. Run my analysis, then sample 3 out of sample10 and call it sample3
4.  Replace sample3 with 3 new factors from "new factor list" and call those
3 new factors new3factors
5. Subtract new3 factors from "new factor list"  (we're down to 167)
6. Run analysis on new factors 
6. and continue doing this until we run out of factors in "new factor list"  
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Want-to-exclude-a-column-when-using-sample-function-tp2287988p2289175.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Want to exclude a column when using sample function

2010-07-14 Thread T.D. Rudolph

sample(data, 3, replace=FALSE)[,-1:2]
or
sample(data, 3, replace=FALSE)[,-c("id","pID50")]

Tyler

On Wed, Jul 14, 2010 at 1:17 PM, Addi Wei [via R] <
ml-node+2289092-1912443153-77...@n4.nabble.com
> wrote:

>   id pID50  apol a_acca_acid a_aro a_base   a_count
> 1 mol.11  3.63 -0.882267 -0.527967 -0.298197 -1.032380  0 -1.063410
> 2 mol.14  3.38 -1.007330 -0.527967 -0.298197 -1.032380  0 -1.063410
> 3 mol.19  3.18  1.153560  1.407910 -0.298197  1.254100  0  1.160080
> 4 mol.20  3.14  0.183448 -0.527967 -0.298197  0.873019  0  0.290021
> 5 mol.29  2.77 -0.273901 -0.527967 -0.298197  0.110860  0 -0.193347
> 6 mol.30  2.74 -0.230593 -0.527967 -0.298197  0.110860  0  0.00
> 7 mol.40  2.16 -1.117550 -0.527967 -0.298197 -1.032380  0 -1.256760
> 8 mol.45  1.90 -0.383560 -0.527967 -0.298197  0.110860  0 -0.290021
> 9 mol.48  1.73 -0.383560 -0.527967 -0.298197  0.110860  0 -0.290021
>
>
> What if I want to exclude 2 columns?
> For example:   sample(data, 3, replace=FALSE) ##from my sample, i want
> to exclude *both *id and pID50
>
> Thanks much.
>
> --
>  View message @
> http://r.789695.n4.nabble.com/Want-to-exclude-a-column-when-using-sample-function-tp2287988p2289092.html
> To unsubscribe from Re: Want to exclude a column when using sample
> function, click here< (link removed) ==>.
>
>
>

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Want-to-exclude-a-column-when-using-sample-function-tp2287988p2289173.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using which function with xts

2010-07-14 Thread Wu Gong

Correction: 

data[abs(data$price-avg)<=3*std,]


-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Using-which-function-with-xts-tp226p2289166.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using which function with xts

2010-07-14 Thread Wu Gong

Let's say your "data" has 2 columns: one is "date" and another is "price",
then

avg = mean(data$price, na.rm=T)
std = sd(data$price, na.rm=T)

The data after those unwanted removed should be:

data[data$price-avg<=3*std,]

-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Using-which-function-with-xts-tp226p2289163.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count - help

2010-07-14 Thread Dieter Menne


barbara horta e costa wrote:
> 
> 
> Sample Season  Area Gear  Depth
>1   W   1  5   1
>2   Sp  1  3   1
>2   Sp  1  5   1
>2   Sp  1 11  1
>2   Sp  1 11  1
> 
> Sample: 1:28
> Season: I, P, V, O
> Area: 1:3
> Fishing gear: 1:12
> Depth: 1:8
> 
> each variable level is coded with numbers. e.g I have 3 areas and 12 gears
> 
> 

Here is an example how to create a data frame. If you post again, it is
polite and you will get faster answers when you supply sample data in this
form.

Dieter

n = 100
set.seed(123)
d = data.frame(Sample = sample(1:28,n,TRUE),  
   Season=sample(c("I","P","V","O"),n,TRUE),
   Area = sample(c(1:3,NA),n,TRUE),
   Gear = sample(c(1:12,NA),n,TRUE),
   Depth = sample(1:8,n,TRUE))

# The real job is rather simple in your case (package reshape is for the
hard work)
xt = xtabs(d,exclude=NULL,na.action=na.pass)
xt
ftable(xt, exclude=NULL,na.action=na.pass)


-- 
View this message in context: 
http://r.789695.n4.nabble.com/count-help-tp2288571p2289161.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count - help

2010-07-14 Thread Dieter Menne


-- 
View this message in context: 
http://r.789695.n4.nabble.com/count-help-tp2288571p2289156.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merging columns along time line

2010-07-14 Thread Ralf B
I am resending this, as I believe it has not arrived on the mailing
list when I first emailed.

I have a set of labels arranged along a timeframe in a. Each label has
a timestamp and marks a state until the next label. The dataframe a
contains 5 such timestamps and 5 associated labels. This means, on a
continious scale between 1-100, there are 5 markers. E.g. 'abc' marks
the timestampls between 10 and 19, 'def' marks the timestamps between
20 and 32, and so on.

a <- data.frame(timestamp=c(3,5,8), mylabel=c("abc","def","ghi"))
b <- data.frame(timestamp=c(1:10))

I would like to assign these labels as an extra collumn 'label' to the
data.frame b which currently only consists of a the timestamp. The
output would then look like this:

  timestamp  label
1 1NA
2 2NA
3 3"abc"
4 4"abc"
5 5"def"
6 6"def"
7 7"def"
8 8"ghi"
9 9"ghi"
10  10"ghi"

What is the simplest way to assign these labels based on timestamps to
get this output. The real dataset is several millions of rows...

Thanks,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Want to exclude a column when using sample function

2010-07-14 Thread Jeff Newmiller
I would use "melt" from the reshape package so that "sample" could be used, 
rather than trying to process various selections of columns from a wide 
data.frame.

"Addi Wei"  wrote:

>
>Sorry to post multiple questions, but this is still related to the sample
>function.
>
>In my previous example, how do I save/store the current sample so when I run
>sample again (after analysis) I can exclude the samples that were previously
>chosen. 
>
>For example if I have 180 factors or columns...and I sample 10 from the data
>set using:
>sample10 <- sample(data, 10, replace=FALSE) 
>
>##and then from this sample of 10, I'll run my analysis, and then I wish to
>randomly replace 3 samples out of 10, with 3 new factors from the list of
>180 factors (that have not been chosen thus far).  I wish to store all the
>factors that have been chosen for analysis into an object to avoid me going
>back and picking those same factors again  
>-- 
>View this message in context: 
>http://r.789695.n4.nabble.com/Want-to-exclude-a-column-when-using-sample-function-tp2287988p2289118.html
>Sent from the R help mailing list archive at Nabble.com.
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] long to wide on larger data set

2010-07-14 Thread Juliet Hannah
Hi Matthew and Jim,

Thanks for all the suggestions as always. Matthew's post was very
informative in showing how things can be done much more efficiently
with data.table. I haven't had a chance to finish the reshaping
because my group was a in rush,
and someone else decided to do it in Perl. However, I did get a chance
to use the data.table package for the first time. In some preliminary
steps, I had to do some subsetting and recoding and this was superfast
with data.table. The tutorials were helpful in getting me up to speed.
Over the next few days
I plan to carry out the reshaping as a learning exercise so I'll be
ready next time. I'll post my results afterwards.

Thanks,

Juliet

On Mon, Jul 12, 2010 at 11:50 AM, Matthew Dowle  wrote:
> Juliet,
>
> I've been corrected off list. I did not read properly that you are on 64bit.
>
> The calculation should be :
>    53860858 * 4 * 8 /1024^3 = 1.6GB
> since pointers are 8 bytes on 64bit.
>
> Also, data.table is an add-on package so I should have included :
>
>   install.packages("data.table")
>   require(data.table)
>
> data.table is available on all platforms both 32bit and 64bit.
>
> Please forgive mistakes: 'someoone' should be 'someone', 'percieved' should
> be
> 'perceived' and 'testDate' should be 'testData' at the end.
>
> The rest still applies, and you might have a much easier time than I thought
> since you are on 64bit. I was working on the basis of squeezing into 32bit.
>
> Matthew
>
>
> "Matthew Dowle"  wrote in message
> news:i1faj2$lv...@dough.gmane.org...
>>
>> Hi Juliet,
>>
>> Thanks for the info.
>>
>> It is very slow because of the == in  testData[testData$V2==one_ind,]
>>
>> Why? Imagine someoone looks for 10 people in the phone directory. Would
>> they search the entire phone directory for the first person's phone
>> number, starting
>> on page 1, looking at every single name, even continuing to the end of the
>> book
>> after they had found them ?  Then would they start again from page 1 for
>> the 2nd
>> person, and then the 3rd, searching the entire phone directory from start
>> to finish
>> for each and every person ?  That code using == does that.  Some of us
>> call
>> that a 'vector scan' and is a common reason for R being percieved as slow.
>>
>> To do that more efficiently try this :
>>
>> testData = as.data.table(testData)
>> setkey(testData,V2)    # sorts data by V2
>> for (one_ind in mysamples) {
>>   one_sample <- testData[one_id,]
>>   reshape(one_sample)
>> }
>>
>> or just this :
>>
>> testData = as.data.table(testData)
>> setkey(testDate,V2)
>> testData[,reshape(.SD,...), by=V2]
>>
>> That should solve the vector scanning problem, and get you on to the
>> memory
>> problems which will need to be tackled. Since the 4 columns are character,
>> then
>> the object size should be roughly :
>>
>>    53860858 * 4 * 4 /1024^3 = 0.8GB
>>
>> That is more promising to work with in 32bit so there is hope. [ That
>> 0.8GB
>> ignores the (likely small) size of the unique strings in global string
>> hash (depending
>> on your data). ]
>>
>> Its likely that the as.data.table() fails with out of memory.  That is not
>> data.table
>> but unique. There is a change in unique.c in R 2.12 which makes unique
>> more
>> efficient and since factor calls unique, it may be necessary to use R
>> 2.12.
>>
>> If that still doesn't work, then there are several more tricks (and we
>> will need
>> further information), and there may be some tweaks needed to that code as
>> I
>> didn't test it,  but I think it should be possible in 32bit using R 2.12.
>>
>> Is it an option to just keep it in long format and use a data.table ?
>>
>>   testDate[, somecomplexrfunction(onecolumn, anothercolumn), by=list(V2) ]
>>
>> Why you you need to reshape from long to wide ?
>>
>> HTH,
>> Matthew
>>
>>
>>
>> "Juliet Hannah"  wrote in message
>> news:aanlktinyvgmrvdp0svc-fylgogn2ro0omnugqbxx_...@mail.gmail.com...
>> Hi Jim,
>>
>> Thanks for responding. Here is the info I should have included before.
>> I should be able to access 4 GB.
>>
>>> str(myData)
>> 'data.frame':   53860857 obs. of  4 variables:
>> $ V1: chr  "23" "26" "200047" "200050" ...
>> $ V2: chr  "cv0001" "cv0001" "cv0001" "cv0001" ...
>> $ V3: chr  "A" "A" "A" "B" ...
>> $ V4: chr  "B" "B" "A" "B" ...
>>> sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> On Mon, Jul 12, 2010 at 7:54 AM, jim holtman  wrote:
>>> What is the configuration you are running on (OS, memory, etc.)? What
>>> does your object consist of? Is it numeric, factors, etc.? Provide a
>

[R] Dot Plot with Confidence Limits

2010-07-14 Thread Neil123

Hi, 

I have the following dataset and I would like to create a dotplot with
confidence limits:

CAT1  CAT2  MEAN Lower 
Upper
1 1 10.619 0.392
  
0.845
2 110   1.774 1.030 
 
2.518
3 1   100  7.700 4.810  

10.586
4 1  1000 45.536   23.612   
 
67.392
5 2 10.500 0.244
  
0.755
6 210   1.725 1.109 
 
2.341
7 2   100  15.200   10.924  
  
19.473
8 2  1000 88.200   48.030   
 
128.369

I need the data grouped by the two categories (independent variables CAT1 &
CAT2). Each row would be a separate dot, and colour-coded by CAT1. 

I have been able to create a dotplot with the data of just one of the
CAT1's, but not both together in the same graph...

Thanks in advance for your help.

Neil
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Dot-Plot-with-Confidence-Limits-tp2289086p2289086.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] BaselR meeting July 2010

2010-07-14 Thread Sarah Lewis
Good afternoon,

Mango Solutions AG (Switzerland) is proud to announce the forthcoming BaselR 
meeting. 

Thank you to everyone who supported our inaugural BaselR meeting on April 
28th; we were fortunate to have three excellent presentations which prompted 
lively discussion - our thanks go to Andreas Krause, Yann Abraham and Charles 
Roosen for their presentations, details of which are available at www.baselr.og 

The next BaselR meeting is Wednesday 28th July 2010

Venue:
transBARent
Viaduktstrasse 3
CH-4051 Basel
Tel. 061 222 21 31
Fax 061 222 21 32
i...@transbarent.ch

http://transbarent.sv-group.ch/de.html 

Agenda:

Introduction - Charles Roosen, Mango Solutions AG
Desktop Publishing with Sweave - Andrew Ellis, ETH Zurich
Professional Reporting with RExcel - Dominik Locher, THETA AG
R Generator Tool for Google Motion Charts - Sebastian Pérez Saaibi, ETH Zurich

The following BaselR meeting has been scheduled for: 

Wednesday 13th October 2010 

If you would like to join the BaselR mailing list and receive details of all 
BaselR meetings please email us at bas...@mango-solutions.com  

What is BaselR?

Similar to the well-known LondonR, this informal meeting is intended to serve 
as a platform for all local (and regional) R users to present and exchange 
their experiences and ideas around the usage of R. 

Mango Solutions aims to host such meetings about every quarter. A typical 
BaselR meeting will consist of 3-4 talks of about 20-25 min to give plenty of 
room for sharing your R experiences, discussions and exchange of ideas.

How to contribute? 

We are always looking for volunteers to present at subsequent meetings. If you 
think you have something interesting to present or know of someone who has, 
please contact us.

Take a look at previous presentations given at LondonR www.londonr.org 

For more information about Mango Solutions please contact us or visit our 
website www.mango-solutions.ch 



Sarah Lewis

Hadley Wickham, Creator of ggplot2 - first time teaching in the UK. 1st - 2nd  
November 2010. 
To book your seat please go to http://mango-solutions.com/news.html 
T: +44 (0)1249 767700 Ext: 200
F: +44 (0)1249 767707
M: +44 (0)7746 224226
www.mango-solutions.com
Unit 2 Greenways Business Park 
Bellinger Close
Chippenham
Wilts
SN15 1BN
UK 


LEGAL NOTICE
This message is intended for the use o...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Want to exclude a column when using sample function

2010-07-14 Thread Addi Wei

Sorry to post multiple questions, but this is still related to the sample
function.

In my previous example, how do I save/store the current sample so when I run
sample again (after analysis) I can exclude the samples that were previously
chosen. 

For example if I have 180 factors or columns...and I sample 10 from the data
set using:
sample10 <- sample(data, 10, replace=FALSE) 

##and then from this sample of 10, I'll run my analysis, and then I wish to
randomly replace 3 samples out of 10, with 3 new factors from the list of
180 factors (that have not been chosen thus far).  I wish to store all the
factors that have been chosen for analysis into an object to avoid me going
back and picking those same factors again  
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Want-to-exclude-a-column-when-using-sample-function-tp2287988p2289118.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Want to exclude a column when using sample function

2010-07-14 Thread Joshua Wiley
On Wed, Jul 14, 2010 at 10:24 AM, Federico Andreis
 wrote:
> I guess you could just use as an argument to sample
>
> data[,-c(1,2)]
>
> where 1 and 2 are id and pID50 column number

or if you do not know them and do not want to find out:

sample(data[ , - match(c("id", "x3"), names(data))], 3, replace=FALSE)


>
> On Wed, Jul 14, 2010 at 7:17 PM, Addi Wei  wrote:
>
>>
>>      id pID50      apol     a_acc    a_acid     a_aro a_base   a_count
>> 1 mol.11  3.63 -0.882267 -0.527967 -0.298197 -1.032380      0 -1.063410
>> 2 mol.14  3.38 -1.007330 -0.527967 -0.298197 -1.032380      0 -1.063410
>> 3 mol.19  3.18  1.153560  1.407910 -0.298197  1.254100      0  1.160080
>> 4 mol.20  3.14  0.183448 -0.527967 -0.298197  0.873019      0  0.290021
>> 5 mol.29  2.77 -0.273901 -0.527967 -0.298197  0.110860      0 -0.193347
>> 6 mol.30  2.74 -0.230593 -0.527967 -0.298197  0.110860      0  0.00
>> 7 mol.40  2.16 -1.117550 -0.527967 -0.298197 -1.032380      0 -1.256760
>> 8 mol.45  1.90 -0.383560 -0.527967 -0.298197  0.110860      0 -0.290021
>> 9 mol.48  1.73 -0.383560 -0.527967 -0.298197  0.110860      0 -0.290021
>>
>>
>> What if I want to exclude 2 columns?
>> For example:   sample(data, 3, replace=FALSE)     ##from my sample, i want
>> to exclude both id and pID50
>>
>> Thanks much.
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Want-to-exclude-a-column-when-using-sample-function-tp2287988p2289092.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about SVM in e1071

2010-07-14 Thread Jack Luo
Pau,

Thanks a lot for your email, I found it very helpful. Please see below for
my reply, thanks.

-Jack

On Wed, Jul 14, 2010 at 10:36 AM, Pau Carrio Gaspar wrote:

>  Hello Jack,
>
> 1 ) why do you thought that " larger C is prone to overfitting than smaller
> C" ?
>

  *There is some statement in the link http://www.dtreg.com/svm.htm

"To allow some flexibility in separating the categories, SVM models have a
cost parameter, C, that controls the trade off between allowing training
errors and forcing rigid margins. It   creates a soft margin that permits
some misclassifications. Increasing the value of C increases the cost of
misclassifying points and forces the creation of a more accurate model that
may not generalize well."

My understanding is that this means larger C may not generalize well (prone
to overfitting).
*
2 ) if you look at the formulation of the quadratic program problem you will
see that  C rules the error of the "cutting plane " ( and overfitting ).
Therfore for hight C you allow that the "cutting plane" cuts worse the set,
so SVM needs less points to build it. a proper explanation is in Kristin P.
Bennett and Colin Campbell, "Support Vector Machines: Hype or Hallelujah?",
SIGKDD Explorations, 2,2, 2000, 1-13.
http://www.idi.ntnu.no/emner/it3704/lectures/papers/Bennett_2000_Support.pdf

*Could you be more specific about this? I don't quite understand. *

>
> 3) you might find usefull this plots:
>
> library(e1071)
> m1 <- matrix( c(
> 0,0,0,1,1,2, 1, 2,3,2,3, 3, 0,
> 1,2,3,0, 1, 2, 3,
> 1,2,3,2,3,3, 0, 0,0,1, 1, 2, 4, 4,4,4,
> 0, 1, 2, 3,
> 1,1,1,1,1,1,-1,-1,  -1,-1,-1,-1, 1 ,1,1,1, 1,
> 1,-1,-1
> ), ncol = 3 )
>
> Y = m1[,3]
> X = m1[,1:2]
>
> df = data.frame( X , Y )
>
> par(mfcol=c(4,2))
> for( cost in c( 1e-3 ,1e-2 ,1e-1, 1e0,  1e+1, 1e+2 ,1e+3)) {
> #cost <- 1
> model.svm <- svm( Y ~ . , data = df ,  type = "C-classification" , kernel =
> "linear", cost = cost,
>  scale =FALSE )
> #print(model.svm$SV)
>
> plot(x=0,ylim=c(0,5), xlim=c(0,3),main= paste( "cost: ",cost, "#SV: ",
> nrow(model.svm$SV) ))
> points(m1[m1[,3]>0,1], m1[m1[,3]>0,2], pch=3, col="green")
> points(m1[m1[,3]<0,1], m1[m1[,3]<0,2], pch=4, col="blue")
> points(model.svm$SV[,1],model.svm$SV[,2], pch=18 , col = "red")
> }
> *
> *

*Thanks a lot for the code, I really appreciate it. I've run it, but I am
not sure how should I interpret the scatter plot, although it is obvious
that number of SVs decreases with cost increasing. *

>
> Regards
> Pau
>
>
> 2010/7/14 Jack Luo 
>
>> Hi,
>>
>> I have a question about the parameter C (cost) in svm function in e1071. I
>> thought larger C is prone to overfitting than smaller C, and hence leads
>> to
>> more support vectors. However, using the Wisconsin breast cancer example
>> on
>> the link:
>> http://planatscher.net/svmtut/svmtut.html
>> I found that the largest cost have fewest support vectors, which is
>> contrary
>> to what I think. please see the scripts below:
>> Am I misunderstanding something here?
>>
>> Thanks a bunch,
>>
>> -Jack
>>
>> > model1 <- svm(databctrain, classesbctrain, kernel = "linear", cost =
>> 0.01)
>> > model2 <- svm(databctrain, classesbctrain, kernel = "linear", cost = 1)
>> > model3 <- svm(databctrain, classesbctrain, kernel = "linear", cost =
>> 100)
>> > model1
>>
>> Call:
>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear",
>>cost = 0.01)
>>
>>
>> Parameters:
>>   SVM-Type:  C-classification
>>  SVM-Kernel:  linear
>>   cost:  0.01
>>  gamma:  0.111
>>
>> Number of Support Vectors:  99
>>
>> > model2
>>
>> Call:
>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear",
>>cost = 1)
>>
>>
>> Parameters:
>>   SVM-Type:  C-classification
>>  SVM-Kernel:  linear
>>   cost:  1
>>  gamma:  0.111
>>
>> Number of Support Vectors:  46
>>
>> > model3
>>
>> Call:
>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear",
>>cost = 100)
>>
>>
>> Parameters:
>>   SVM-Type:  C-classification
>>  SVM-Kernel:  linear
>>   cost:  100
>>  gamma:  0.111
>>
>> Number of Support Vectors:  44
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Want to exclude a column when using sample function

2010-07-14 Thread Federico Andreis
I guess you could just use as an argument to sample

data[,-c(1,2)]

where 1 and 2 are id and pID50 column number

On Wed, Jul 14, 2010 at 7:17 PM, Addi Wei  wrote:

>
>  id pID50  apol a_acca_acid a_aro a_base   a_count
> 1 mol.11  3.63 -0.882267 -0.527967 -0.298197 -1.032380  0 -1.063410
> 2 mol.14  3.38 -1.007330 -0.527967 -0.298197 -1.032380  0 -1.063410
> 3 mol.19  3.18  1.153560  1.407910 -0.298197  1.254100  0  1.160080
> 4 mol.20  3.14  0.183448 -0.527967 -0.298197  0.873019  0  0.290021
> 5 mol.29  2.77 -0.273901 -0.527967 -0.298197  0.110860  0 -0.193347
> 6 mol.30  2.74 -0.230593 -0.527967 -0.298197  0.110860  0  0.00
> 7 mol.40  2.16 -1.117550 -0.527967 -0.298197 -1.032380  0 -1.256760
> 8 mol.45  1.90 -0.383560 -0.527967 -0.298197  0.110860  0 -0.290021
> 9 mol.48  1.73 -0.383560 -0.527967 -0.298197  0.110860  0 -0.290021
>
>
> What if I want to exclude 2 columns?
> For example:   sample(data, 3, replace=FALSE) ##from my sample, i want
> to exclude both id and pID50
>
> Thanks much.
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Want-to-exclude-a-column-when-using-sample-function-tp2287988p2289092.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Want to exclude a column when using sample function

2010-07-14 Thread Addi Wei

  id pID50  apol a_acca_acid a_aro a_base   a_count
1 mol.11  3.63 -0.882267 -0.527967 -0.298197 -1.032380  0 -1.063410
2 mol.14  3.38 -1.007330 -0.527967 -0.298197 -1.032380  0 -1.063410
3 mol.19  3.18  1.153560  1.407910 -0.298197  1.254100  0  1.160080
4 mol.20  3.14  0.183448 -0.527967 -0.298197  0.873019  0  0.290021
5 mol.29  2.77 -0.273901 -0.527967 -0.298197  0.110860  0 -0.193347
6 mol.30  2.74 -0.230593 -0.527967 -0.298197  0.110860  0  0.00
7 mol.40  2.16 -1.117550 -0.527967 -0.298197 -1.032380  0 -1.256760
8 mol.45  1.90 -0.383560 -0.527967 -0.298197  0.110860  0 -0.290021
9 mol.48  1.73 -0.383560 -0.527967 -0.298197  0.110860  0 -0.290021


What if I want to exclude 2 columns?   
For example:   sample(data, 3, replace=FALSE) ##from my sample, i want
to exclude both id and pID50

Thanks much.  
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Want-to-exclude-a-column-when-using-sample-function-tp2287988p2289092.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a very particular plot

2010-07-14 Thread Ista Zahn
Hi all,
Maybe I missed some crucial context (I did not follow the example all
the way through). But of course ggplot will make legends based on
integers. For example:

dat <- data.frame(x=sample(1:10, 100, replace=TRUE),
  y=sample(1:10, 100, replace=TRUE),
  z=sample(1:10,100, replace=TRUE)
  )

str(dat)

ggplot(dat, aes(x=x, y=y, color=z)) + geom_point()

-Ish

On Wed, Jul 14, 2010 at 10:45 AM, Jeff Newmiller
 wrote:
> Ggplot will only produce legends based on factors. Integers are not factors.
>
> "Ian Bentley"  wrote:
>
>>Hi Dennis,
>>
>>Thanks for the quick reply.
>>Once I removed solid = TRUE, which was giving errors, the code is accepted
>>fine.
>>
>>It's strange though, no legend appears.  Even when I try something simple
>>like:
>>p + scale_shape_manual(values=1:3)
>>
>>No legend appears.  I can't find any similar problems on google.
>>
>>Thanks again,
>>Ian
>>
>>
>>On 14 July 2010 03:56, Dennis Murphy  wrote:
>>
>>> Hi:
>>>
>>> This is untested, so caveat emptor. I believe Hadley is busy teaching a
>>> ggplot2 course this week so his availability is limited at best. I guess I
>>> can give it a shot...
>>>
>>> You need a scale_shape_* construct to add to your plot, perhaps something
>>> like
>>>
>>> scale_shape_manual('Statistic', breaks = 1:3, labels = c('Min', 'Median',
>>> 'Max'), solid = TRUE)
>>>
>>> The 'Statistic' puts a title on the legend, the breaks argument should
>>> supply the values of the shapes,
>>> the labels argument should provide the label to associate to each shape,
>>> and solid = TRUE should
>>> produce the same behavior as in the geom_point() calls wrt shapes. [Notice
>>> how I say 'should'...]
>>>
>>> No guarantees this will work - scales are one of my greatest frustrations
>>> in ggplot2. Expect this to be the first of several iterations you'll have to
>>> go through to get it to work the way you want.
>>>
>>> HTH,
>>> Dennis
>>>
>>>
>>> On Tue, Jul 13, 2010 at 4:32 PM, Ian Bentley wrote:
>>>
 I've got a couple of more changes that I want to make to my plot, and I
 can't figure things out.  Thanks for all the help.

 I'm using this R script

 library(ggplot2)
 library(lattice)
 # Generate 50 data sets of size 100 and assign them to a list object

 low <- 1
 n <- 50
 #Load data from file
 for(i in low:n) assign(paste('df', i, sep = ''),
         read.table(paste("tot-LinkedList",i*100,"query.log",sep=''),
 header=TRUE))


 dnames <- paste('df', low:n, sep = '')
 l <- vector('list', n)
 for(i in seq_along(dnames)) l[[i]] <- with(get(dnames[i]), Send + Receive)
 ml <- melt(l)

 dsum <- ddply(ml, 'L1', summarise, mins = min(value), meds =
 median(value),
               maxs = max(value))


 p <- ggplot(ml, aes(x = L1*100, y = value)) +
     geom_point(alpha = 0.2) +
     geom_point(data = dsum, aes(y = mins), shape = 1, size = 3,
 solid=TRUE, colour='blue') +
     geom_point(data = dsum, aes(y = meds), shape = 2, size = 3,
 solid=TRUE, colour='blue') +
     geom_point(data = dsum, aes(y = maxs), shape = 3, size = 3,
 solid=TRUE, colour='blue') +
     geom_smooth(data = dsum, aes(y = mins)) +
     geom_smooth(data = dsum, aes(y = meds)) +
     geom_smooth(data = dsum, aes(y = maxs)) +
     opts(axis.text.x = theme_text(size = 7, angle = 90, hjust = 1), title
 = 'Linked List Query Costs Increasing Network Size')  +
     xlab('Network Complexity (nodes)') + ylab('Battery Cost (uJ)') +

 --END--

 And this works great, except that I think that I am not being very R'y,
 since now I want to add a legend saying that circle (i.e. shape 1) is the
 minimum, and shape 2 is the med, and shape 3 is max.

 I'd also like to be able to move the legend to the top left part of the
 plot since that area is empty anyways.

 Is there any way that I can do it easily?

 Thanks
 Ian





 On 11 July 2010 10:29, Ian Bentley  wrote:

> Thanks to both of you!
>
>
> I was able to get exactly the plot I was looking for!
>
> Ian
>
> On 11 July 2010 09:30, Hadley Wickham  wrote:
>
>> Hi Ian,
>>
>> Have a look at the examples in http://had.co.nz/ggplot2/geom_tile.html
>> for some ideas on how to do this with ggplot2.
>>
>> Hadley
>>
>> On Sat, Jul 10, 2010 at 8:10 PM, Ian Bentley 
>> wrote:
>> > Hi all,
>> >
>> > Thanks for the really great help I've received on this board in the
>> past.
>> >
>> > I have a very particular graph that I'm trying to plot, and I'm not
>> really
>> > sure how to do it.  I think I should be able to use ggplot for this,
>> but I'm
>> > not really sure how.
>> >
>> > I have a data.frame which contains fifty sub frames containing one
>> hundred
>> > data points each

Re: [R] Cannot Build R From Source - Windows XP

2010-07-14 Thread Duncan Murdoch

On 14/07/2010 12:01 PM, Steve Pederson wrote:

Hi,

I can't seem to install R from source. I've downloaded the latest 
Rtools211.exe from http://www.murdoch-sutherland.com/Rtools/ & done a 
full installation of that and Inno Setup.


I have set R_HOME as C:\R (and also tried using C:\R\R-2.11.1)

After successfully running 'tar xf R-2.11.1.tar.gz' the modifications I 
have made and saved as MkRules.local are:

BUILD_HTML = YES & ISDIR=C:/Program Files/Inno Setup 5

I've then run 'make all recommended' from R_HOME\src\gnuwin32 and it 
runs nicely for ages, until I get the following message:



building package 'base'
cannot create /tmp/R612: directory nonexistent
mv: cannot stat '/tmp/R612': No such file or directory
  


R thinks your temporary directory is called /tmp, but there's no such 
directory on your system.  R looks for a temporary directory in the 
environment variables
TMPDIR, TMP, TEMP.  Set TMPDIR to the path to a writable directory and 
you should get past this error.


Duncan Murdoch

make[3]: ***[mkR] Error 1
make[2]: ***[all] Error 2
make[1]: ***[R] Error 1
make: ***[all] Error 2


Sometimes the number changes to /tmp/R5776, or something similar but I 
don't think that's the issue.



My current setting for PATH are:
c:\Rtools\bin;c:\Rtools\perl\bin;c:\Rtools\MinGW\bin;C:\Program Files\PC 
Connectivity Solution\;C:\Program Files\Common 
Files\ArcSoft\Bin;%GTK_BASEPATH%\bin;c:\program 
files\imagemagick-6.4.1-q16;C:\texmf\miktex\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;c:\dev-cpp\bin\;C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727;C:\Program 
Files\Common Files\DivX Shared\;C:\Program Files\QuickTime\QTSystem\



FWIW, It's a 4yo Dell laptop & I've come across a few quirks with 
installing software over the years. I had also previously installed 
2.11.1 from the windows executable, but this was uninstalled using the 
uninstall function that comes with it. I'm trying to rebuild to begin 
incorporating some calls to C I'm working on.


Thanks in advance,

Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Estimating a differential equation parameters based on a data set

2010-07-14 Thread Oscar Rodriguez
Hello R community: 

Here is another question, 

How can I estimate a differential equation parameters based on a data set? 

If you know where I can find some examples will be better, 

Thanks again, and cheers, 









Oscar Rodriguez Gonzalez 
Mobile: 519.823.3409 
PhD Student 
Canadian Research Institute for Food Safety 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert data.frame to matrix -> NA factor problem

2010-07-14 Thread Erik Iverson



syrvn wrote:

Thanks again for your quick reply.

I understood your procedure but now it is even more strange why my
conversion does not work.
In your example, the NA values are in "brackets"  and what your
procedure does is to
convert these  values into NA and then it seems to be possible to use
data.matrix to
correctly convert the data.frame into a data.matrix. But the data I read
into R are already in that
form that the NA values are displayed as NA rather than . So the
conversion should actually work.


Have not followed this thread, but I think you're confused about what a 
true NA value is, at least with factors.  When values of a factor are 
missing, they are printed  to distinguish them from an actual factor 
level of NA.  Numeric missings are printed NA.


> f1 <- factor("NA")
> f1
[1] NA
Levels: NA
> is.na(f1)
[1] FALSE

vs.

> n1 <- c(NA)
> n1
[1] NA
> is.na(n1)
[1] TRUE

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem of loading workspace

2010-07-14 Thread Yan Jiao
I'm using R x64 2.11.0(windows)

I was trying to load workspace I saved some days ago, but  got the
error:

ReadItem: unknown type 63, perhaps written by later version of R.

Has anyone come across the same problem? Any solutions?

 

 

Many thanks 

yan


**
This email and any files transmitted with it are confide...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ver 2.11.1 from xterm on Mac without lme4

2010-07-14 Thread Marc Schwartz
On Jul 14, 2010, at 10:17 AM, Andrea Foulkes wrote:

> Hi,
> 
> I just installed version 2.11.1 on my Mac OS X 10.5.2. When I try running R 
> from my xterm, I get the following error:
> 
> Error in loadNamespace(name) : there is no package called 'lme4'
> Fatal error: unable to restore saved data in .RData
> 
> I am returned to the terminal prompt and can not run R. I can open this 
> version of R from the R console and tried installing the lme4 source package 
> (per D. Bates' suggestion on 6-July) but get the following error:
> 
> trying URL 'http://cran.cict.fr/src/contrib/lme4_0.999375-34.tar.gz'
> Content type 'application/x-tar' length 1028012 bytes (1003 Kb)
> opened URL
> ==
> downloaded 1003 Kb
> 
> * installing *source* package ‘lme4’ ...
> ** libs
> *** arch - i386
> sh: make: command not found
> ERROR: compilation failed for package ‘lme4’
> * removing 
> ‘/Library/Frameworks/R.framework/Versions/2.11/Resources/library/lme4’
> 
> I do not need to use lme4 (though I would like to) but I do very much prefer 
> to work from my xterminal and apparently can not do without lme4. Any 
> suggestions appreciated!
> 
> Andrea


You appear to have something in your default .RData file that directly or 
indirectly (a package dependency) requires the presence of lme4.

You can try running:

  R --vanilla

from the command line to see if this resolves the problem. 

>From memory, lme4 will not pass the required CRAN checks for OSX, which means 
>that you have to install it from source. Since you are getting an error 
>message about 'make' not being available, this indicates that you do not have 
>the required Apple XCode Tools installed on your system, along with other 
>required development tools.

So your choice is to delete the offending .RData file, which will be 
/Users/YOUR.USER.NAME/.RData, or to install XCode Tools and the other required 
components to enable you to install lme4 from source code.  

If you choose the former, note that by default, files that begin with a '.' are 
hidden from the normal Finder views. You can modify that behavior or since you 
seem comfortable using xterm, using 'ls -a' will list the files in the current 
directory, including those that are otherwise hidden.

If you choose the latter option, XCode Tools is available (after registering) 
from:

  http://developer.apple.com/technologies/xcode.html

and the additional tools are available from:

  http://cran.us.r-project.org/bin/macosx/tools/

As a future reference, since this issue is specific to OSX, you would be better 
off posting to the R-sig-mac list:

  https://stat.ethz.ch/mailman/listinfo/r-sig-mac

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to update a bugs-model from inside R using R2WinBUGS

2010-07-14 Thread Ben Bolker
Frédéric Holzwarth  uni-leipzig.de> writes:

> 
> Hello there,
> 
> is there a way to update a model, which was called by "bugs()"? For 
> instance starting with few iterations and then updating more and more, 
> as is possible inside the WinBUGS-window. If set "debug=TRUE", then the 
> window remains open and updating is possible, but it is no longer 
> written into the R-object. Is it generally impossible?
> 
> Thanks,
> Frederic
> 
> 


  As far as I know (not having checked carefully), it's impossible --
this is one of the advantages of JAGS/Rjags/R2jags.

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Win Server x64/R: Memory Allocation Problem

2010-07-14 Thread Dirk Eddelbuettel
On Wed, Jul 14, 2010 at 05:51:17PM +0200, will.ea...@gmx.net wrote:
> Dear all,
> 
> how can I use R on a 64-bit Windows Server 2003 machine (24GB RAM) with more 
> than 3GB of working memory and make full use of it.
> 
> I started R --max-mem-size=3G since I got the warning that larger values are 
> too large and ignored.
> 
> In R I got: 
> > memory.size(max=FALSE)
> [1] 10.5
> > memory.size(max=TRUE)
> [1] 12.69
> > memory.limit()
> [1] 3072
> 
> but when I run the next command, I get an error:
> >climb.expset <- ReadAffy(celfile.path="./Data/Original/CLIMB/CEL/")
> Error: cannot allocate vector of size 2.4 Gb
> 
> Here is the R version I am using:
> platform   i386-pc-mingw32  
> arch   i386 
> os mingw32  
> system i386, mingw32   
> version.string R version 2.11.1 (2010-05-31)
> 
> What can I do?

Maybe you want to consider switching to the 64-bit version of R.

-- 
  Regards, Dirk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert data.frame to matrix -> NA factor problem

2010-07-14 Thread syrvn

Thanks again for your quick reply.

I understood your procedure but now it is even more strange why my
conversion does not work.
In your example, the NA values are in "brackets"  and what your
procedure does is to
convert these  values into NA and then it seems to be possible to use
data.matrix to
correctly convert the data.frame into a data.matrix. But the data I read
into R are already in that
form that the NA values are displayed as NA rather than . So the
conversion should actually work.

Cheers
-- 
View this message in context: 
http://r.789695.n4.nabble.com/convert-data-frame-to-matrix-NA-factor-problem-tp2288828p2289022.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ver 2.11.1 from xterm on Mac without lme4

2010-07-14 Thread Faisal Moledina
On Wed, Jul 14, 2010 at 11:17 AM, Andrea Foulkes
 wrote:
> Hi,
>
> I just installed version 2.11.1 on my Mac OS X 10.5.2. When I try running R
> from my xterm, I get the following error:
>
> Error in loadNamespace(name) : there is no package called 'lme4'
> Fatal error: unable to restore saved data in .RData
>

It is likely trying to load the file ~/.RData which was saved from a
previous session. An object in that session must require the lme4
package. Delete that file to load a clear R session.

> I am returned to the terminal prompt and can not run R. I can open this
> version of R from the R console and tried installing the lme4 source package
> (per D. Bates' suggestion on 6-July) but get the following error:
>
> trying URL 'http://cran.cict.fr/src/contrib/lme4_0.999375-34.tar.gz'
> Content type 'application/x-tar' length 1028012 bytes (1003 Kb)
> opened URL
> ==
> downloaded 1003 Kb
>
> * installing *source* package ‘lme4’ ...
> ** libs
> *** arch - i386
> sh: make: command not found
> ERROR: compilation failed for package ‘lme4’
> * removing
> ‘/Library/Frameworks/R.framework/Versions/2.11/Resources/library/lme4’
>

You need to install the XCode Tools in order to get gcc installed on
your system, which includes 'make'. This will allow you to install
source packages. For OS X, binary packages are also available for most
packages.

> I do not need to use lme4 (though I would like to) but I do very much prefer
> to work from my xterminal and apparently can not do without lme4. Any
> suggestions appreciated!
>
> Andrea

Faisal

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Query about wilcox.test() P-value

2010-07-14 Thread Marc Schwartz
You need to understand the difference between how a value is stored in an R 
object with full floating point precision versus how a value in R is displayed 
(printed) in the console with a print "method".

In this case, wilcox.test() returns an object of class 'htest' (as noted in the 
Value section of ?wilcox.test). When the result of wilcox.test() is printed to 
the console (using print.htest()), the p value is displayed using the function 
format.pval(), which in this case returns:

> format.pval(2.928121e-165)
[1] "< 2.22e-16"

This is common in R, where floating point values are not printed to full 
precision. The value displayed will be impacted upon by various 
characteristics, in some cases due to the application of specific 
print/formatting operations, or due to default options in R (see 
?print.default).

You might also want to look at ?.Machine which will provide other information 
specific to your platform relative to numerical characteristics.

HTH,

Marc Schwartz


On Jul 14, 2010, at 10:49 AM, Govind Chandra wrote:

> Hi Peter,
> 
> Thanks for your response. Yes, I am interested in P-values smaller
> than 1e-16. Below a certain value they may not tell much about
> significance but are useful for ordering (ranking), for example,
> differentially expressed genes in microarray data.  Something similar
> is done by sequence similarity searching tools such as BLAST (although
> they use expect values not P-values) to rank hits to a database. To me
> this is practically useful and harmless.
> 
> I am not a statistician but I use statistics and wish to avoid
> misusing it unknowingly or knowingly. Hence the query.
> 
> I would still like to know why there is this difference between
> the P-value printed on the console and that stored in the returned
> object.
> 
> Govind
> 
> 
> 
> On Wed, Jul 14, 2010 at 02:32:39PM +0100, Peter Ehlers wrote:
>> On 2010-07-14 3:53, Govind Chandra wrote:
>>> Hi,
>>> 
>>> I find that the p-value printed out by wilcox.test() and the p-value
>>> stored in the p.value attribute in the object returned by
>>> wilcox.test() are not the same. There seems to be a lower limit of
>>> 2.2e-16 for the printed value although it does say that it is less
>>> than that. What I want to know is the reason for the lower limit in
>>> the printed value of p-value and also whether I am doing the right
>>> thing by picking up the p-value from the p.value attribute of the
>>> returned object. An example R session is pasted below (although the
>>> test is probably not the right one for the kind of data).
>>> 
  x<- rnorm(500, mean = 30, sd = 3);
  y<- rnorm(500, mean = 8000, sd = 6);
  wilcox.test(x, y, alternative = "l");
>>> 
>>> Wilcoxon rank sum test with continuity correction
>>> 
>>> data:  x and y
>>> W = 0, p-value<  2.2e-16
>>> alternative hypothesis: true location shift is less than 0
>>> 
  wt<- wilcox.test(x, y, alternative = "l");
  wt$p.value;
>>> [1] 2.928121e-165
>> 
>> Are you really interested in P-values smaller than 10^(-16)?
>> Why? A reported P-value of 3e-165 is certainly not accurate
>> to 165 decimal places and should perhaps be reported as zero,
>> as t.test() does.
>> 
>> As to your example: there is no sense at all in doing a
>> test on such data (other than to satisfy some hypothetical
>> fanatical journal editor).
>> 
>>   -Peter Ehlers
>> 
>> 
>>> 
>>> My version for R is 2.11.1 (2010-05-31) running on x86_64 GNU/Linux
>>> (RHEL).
>>> 
>>> Thanks in advance for any help with this.
>>> 
>>> Govind

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cannot Build R From Source - Windows XP

2010-07-14 Thread Steve Pederson

Hi,

I can't seem to install R from source. I've downloaded the latest 
Rtools211.exe from http://www.murdoch-sutherland.com/Rtools/ & done a 
full installation of that and Inno Setup.


I have set R_HOME as C:\R (and also tried using C:\R\R-2.11.1)

After successfully running 'tar xf R-2.11.1.tar.gz' the modifications I 
have made and saved as MkRules.local are:

BUILD_HTML = YES & ISDIR=C:/Program Files/Inno Setup 5

I've then run 'make all recommended' from R_HOME\src\gnuwin32 and it 
runs nicely for ages, until I get the following message:



building package 'base'
cannot create /tmp/R612: directory nonexistent
mv: cannot stat '/tmp/R612': No such file or directory
make[3]: ***[mkR] Error 1
make[2]: ***[all] Error 2
make[1]: ***[R] Error 1
make: ***[all] Error 2


Sometimes the number changes to /tmp/R5776, or something similar but I 
don't think that's the issue.



My current setting for PATH are:
c:\Rtools\bin;c:\Rtools\perl\bin;c:\Rtools\MinGW\bin;C:\Program Files\PC 
Connectivity Solution\;C:\Program Files\Common 
Files\ArcSoft\Bin;%GTK_BASEPATH%\bin;c:\program 
files\imagemagick-6.4.1-q16;C:\texmf\miktex\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;c:\dev-cpp\bin\;C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727;C:\Program 
Files\Common Files\DivX Shared\;C:\Program Files\QuickTime\QTSystem\



FWIW, It's a 4yo Dell laptop & I've come across a few quirks with 
installing software over the years. I had also previously installed 
2.11.1 from the windows executable, but this was uninstalled using the 
uninstall function that comes with it. I'm trying to rebuild to begin 
incorporating some calls to C I'm working on.


Thanks in advance,

Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert data.frame to matrix -> NA factor problem

2010-07-14 Thread Joshua Wiley
More importantly than just that they are factors, NA is actually a
level of X3.  If it was a factor column, but NA was not a level, than
in the conversion to numeric, it would not change into a 4, but it is
a level (in fact the 4th level), so it becomes a 4.  From ?factor here
is the recommended way of converting factors to numeric.  You see that
this converts to a matrix properly

samp.frame <- data.frame(a = 1:10,
 b = factor(c(rep(1:3, each = 3), NA), exclude = NULL))
str(samp.frame)
#Here NA becomes 4
samp.matrix <- data.matrix(samp.frame)
samp.matrix
#Convert the column in samp.frame first
samp.frame$b <- as.numeric(levels(samp.frame$b))[samp.frame$b]
str(samp.frame)
#Now convert to a matrix
samp.matrix <- data.matrix(samp.frame)
samp.matrix

I've never used the xlsx package, but an alternative to this process
would be to save the file from Excel as a text file and then read it
into R.  That way you could control whether things were read in as
factors or not.

Cheers,

Josh
On Wed, Jul 14, 2010 at 7:58 AM, syrvn  wrote:
>
> Hi,
>
> I used str() on my data set:
>
> $ X1            : num  1 1 0 1 1 1 1 1 1 1 ...
> $ X2            : num  0 1 0 2 1 2 0 2 2 0 ...
> $ X3            : Factor w/ 4 levels "0","1","2","NA": 2 1 3 1 1 1 1 1 1 3
> ...
> 
>
>
> The difference to your str() output is that in your case NA columns are
> "num" columns
> and in my case they are Factors. That's prob. why it replaces the NAs with 4
> after
> applying data.matrix.
>
> I use the package xlsx to read the data in as an excel file.
>
> Cheers
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/convert-data-frame-to-matrix-NA-factor-problem-tp2288828p227.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Win Server x64/R: Memory Allocation Problem

2010-07-14 Thread will . eagle
Dear all,

how can I use R on a 64-bit Windows Server 2003 machine (24GB RAM) with more 
than 3GB of working memory and make full use of it.

I started R --max-mem-size=3G since I got the warning that larger values are 
too large and ignored.

In R I got: 
> memory.size(max=FALSE)
[1] 10.5
> memory.size(max=TRUE)
[1] 12.69
> memory.limit()
[1] 3072

but when I run the next command, I get an error:
>climb.expset <- ReadAffy(celfile.path="./Data/Original/CLIMB/CEL/")
Error: cannot allocate vector of size 2.4 Gb

Here is the R version I am using:
platform   i386-pc-mingw32  
arch   i386 
os mingw32  
system i386, mingw32   
version.string R version 2.11.1 (2010-05-31)

What can I do?

Thanks a lot in advance,

Will

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Query about wilcox.test() P-value

2010-07-14 Thread Govind Chandra
Hi Peter,

Thanks for your response. Yes, I am interested in P-values smaller
than 1e-16. Below a certain value they may not tell much about
significance but are useful for ordering (ranking), for example,
differentially expressed genes in microarray data.  Something similar
is done by sequence similarity searching tools such as BLAST (although
they use expect values not P-values) to rank hits to a database. To me
this is practically useful and harmless.

I am not a statistician but I use statistics and wish to avoid
misusing it unknowingly or knowingly. Hence the query.

I would still like to know why there is this difference between
the P-value printed on the console and that stored in the returned
object.

Govind



On Wed, Jul 14, 2010 at 02:32:39PM +0100, Peter Ehlers wrote:
> On 2010-07-14 3:53, Govind Chandra wrote:
> > Hi,
> >
> > I find that the p-value printed out by wilcox.test() and the p-value
> > stored in the p.value attribute in the object returned by
> > wilcox.test() are not the same. There seems to be a lower limit of
> > 2.2e-16 for the printed value although it does say that it is less
> > than that. What I want to know is the reason for the lower limit in
> > the printed value of p-value and also whether I am doing the right
> > thing by picking up the p-value from the p.value attribute of the
> > returned object. An example R session is pasted below (although the
> > test is probably not the right one for the kind of data).
> >
> >>   x<- rnorm(500, mean = 30, sd = 3);
> >>   y<- rnorm(500, mean = 8000, sd = 6);
> >>   wilcox.test(x, y, alternative = "l");
> >
> >  Wilcoxon rank sum test with continuity correction
> >
> > data:  x and y
> > W = 0, p-value<  2.2e-16
> > alternative hypothesis: true location shift is less than 0
> >
> >>   wt<- wilcox.test(x, y, alternative = "l");
> >>   wt$p.value;
> > [1] 2.928121e-165
> 
> Are you really interested in P-values smaller than 10^(-16)?
> Why? A reported P-value of 3e-165 is certainly not accurate
> to 165 decimal places and should perhaps be reported as zero,
> as t.test() does.
> 
> As to your example: there is no sense at all in doing a
> test on such data (other than to satisfy some hypothetical
> fanatical journal editor).
> 
>-Peter Ehlers
> 
> 
> >
> > My version for R is 2.11.1 (2010-05-31) running on x86_64 GNU/Linux
> > (RHEL).
> >
> > Thanks in advance for any help with this.
> >
> > Govind
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-pkgs] New package "list" for analyzing list surveyexperiments

2010-07-14 Thread Jeffrey J. Hallman
Well, as the author of two CRAN packages with short names (tis and
fame), I maintain that short names can be fairly informative. The fame
package is an interface to FAME time series databases, and the tis
package implements the tis (TimeIndexedSeries) class and support classes
that it needs. 

When writing a package, you sometimes have to make reference to its
name.  For example, in .C() calls I use the 'package = "pkgname"'
argument pretty often. And it's nice to have the output from calling
search() look nice.

Jeff

"Raubertas, Richard"  writes:

> I agree that 'list' is a terrible package name, but only secondarily 
> because it is a data type.  The primary problem is that it is so generic
>
> as to be almost totally uninformative about what the package does.  
>
> For some reason package writers seem to prefer maximally uninformative 
> names for their packages.  To take some examples of recently announced 
> packages, can anyone guess what packages 'FDTH', 'rtv', or 'lavaan' 
> do?  Why the aversion to informative names along the lines of
> 'Freq_dist_and_histogram', 'RandomTimeVariables', and 
> 'Latent_Variable_Analysis', respectively? 

-- 
Jeff

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >