[R] write.csv causes system() and ? to stop working

2008-08-22 Thread J. Scott Olsson
I have a large (100MB, 940k lines) csv file.  If I run

g <- read.csv('combineTfe-dataFrame-TD.dat')
write.csv(g, file="tmp.data");

Now ?mean returns without printing anything to screen and
system("touch test.dat") does nothing.

If instead I just write out the top of the data, as in

g <- read.csv('combineTfe-dataFrame-TD.dat')
tmp = g[1:100,]
write.csv(tmp, file="tmp.data");

?mean and system("ls") have the expected beviour.

This is on an AMD64 machine running Ubuntu,  under both R 2.6.2 and
2.7.1 (a fresh install).  The same minimal code works fine on a
different (x86) machine.

Any help will be greatly appreciated.

thanks,
Scott

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inner margins for lattice

2008-08-22 Thread Deepayan Sarkar
On Fri, Aug 22, 2008 at 11:28 AM, Todd Hatfield <[EMAIL PROTECTED]> wrote:
> I would like to control the inner margins of a lattice graph.  The graph is
> a single superposed panel, but there is too much white space around the data
> for my liking.  I've played around with some of the layout options, but so
> far have found only how to control outer margins.

If you mean the amount by which the data range is extended, try:

lattice.options(axis.padding = list(numeric = 0.01))
xyplot(1:10 ~ 1:10)

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to handle "~" character after csv importation

2008-08-22 Thread jim holtman
Try 'check.names=FALSE' in the read.table call.

On Fri, Aug 22, 2008 at 6:11 PM, Sébastien <[EMAIL PROTECTED]> wrote:
> Dear R users,
>
> I have to import some csv files in which column headers contain the
> character "~". Following the following import call, the character seems to
> be replaced by dots in the column names of my data frame. Plus, I cannot
> query names(mydata) to find the column index which header should contain "~"
> or "."
>
>> mydata <- data.frame(read.table("mycsv.csv", sep = ",", header = TRUE,
>> na.strings = ".", as.is = TRUE))
>> mydata
>  P1 P2 P3 P1.P2 P1.P3 P2.P3 A B C
> 1  1  2  3 4 5 6 7 8 9
>> names(mydata)
> [1] "P1""P2""P3""P1.P2" "P1.P3" "P2.P3" "A" "B" "C"   >
> grep("~",names(mydata))
> integer(0)
>> grep(".",names(mydata))
> [1] 1 2 3 4 5 6 7 8 9
>
> How can I check for this character ? And by the way, can anyone explain to
> me the result of the last grep call?
>
> Thank you in advance.
>
> Sebastien
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dimension values of an array

2008-08-22 Thread jim holtman
rownames(yourMatrix) <- sprintf("%.1f", seq(.1, 7, .1))

On Fri, Aug 22, 2008 at 6:48 PM, Jan Akko Eleveld
<[EMAIL PROTECTED]> wrote:
>
> Question:
>
> I created an array with 164 columns and 70 rows.
>
> I would like every column to have unit value 1
> and the rows 0.1, 0.2, 0.3 up to 7. How can I define that?
>
> Thanks for your help!
>
> Good weekend!
>
> Akko
> _
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Deleting NA in list()

2008-08-22 Thread jim holtman
Here is one way of doing it:

> lt
$a
[1] 1 2 3

$b
[1] NA

$c
[1] "a" "b" "c"

$d
[1] NA

> lt[!unlist(sapply(lt, function(x) any(is.na(x[1]]
$a
[1] 1 2 3

$c
[1] "a" "b" "c"

>


On Fri, Aug 22, 2008 at 2:41 PM, Dong-hyun Oh <[EMAIL PROTECTED]> wrote:
> Dear useRs,
>
> I would like to know the way of deleting NA in list().
>
> Following is a example.
>
> lt <- list(a = 1:3, b = NA, c = letters[1:3], d = NA)
>
> for(i in length(lt):1) {
>if(is.na(lt[[i]])) lt[[i]] <- NULL
> }
>
> How to simplify for() loop by using (l)apply family?
>
> Thank you in advance.
>
>
>
>
> =
> Dong-hyun Oh
> Center of Excellence for Science and Innovation Studies
> Royal Institute or Technology, Sweden
> e-mail: [EMAIL PROTECTED]
> cel: +46 73 563 45 22
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with rbind

2008-08-22 Thread jim holtman
Try 'check.names=FALSE' in the read.table call.

On Fri, Aug 22, 2008 at 6:02 PM, kayj <[EMAIL PROTECTED]> wrote:
>
>
> I am trying to use rbind to have the two data on the top of each other but I
> am getting an extra X on the column header and the rows are numberd , How to
> get rid of this problem? I appreciate your help
>
>
>
> x1<- read.table(file="data1.txt", header=T, sep="\t")
>  x2<-read.table(file="data2.txt", header=T, sep="\t")
>  y<-rbind(x1,x2)
>> y
>X0   X0.0.05  X0.05.0.1  X0.1.0.2  X0.2.0.3  X0.3.0.4  X0.4.0.5
> 1  0.1971 0.108 0.1 0.1 0.156 0.38 0.22
> 2  0.2024 0.106 0.0 0.1 0.15 0.1408173 0.55
>>
>
> --
> View this message in context: 
> http://www.nabble.com/problem-with-rbind-tp19116234p19116234.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple look up

2008-08-22 Thread PDXRugger

WTF mate. you think im about wasting time.  I been seeking a solution for
this for two weeks.  excuse my lack of programming savvy, but this is why i
posted.

bartjoosen wrote:
> 
> Please take a look at the manuals before posting (DO read the posting
> guide!)
> 
> Bart
> PS: yourdataframe[yourdataframe$TAZ==103,2]
> 
> 
> 
> 
> 
> PDXRugger wrote:
>> 
>> So i am hoping this solution is simple, which i beleive it is.  I would
>> like to look up a value in one column of a data set and display the
>> corresponding value in the 2nd column.  For example
>> TAZVACANT ACRES
>> 100   45
>> 101   62
>> 102   23
>> 103   64
>> 104   101
>> 105   280
>> 
>> So if i want to find TAZ 103 it would give me 64 as the corresponding
>> value.  Hope you guys can help
>> 
>> Cheers
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Simple-look-up-tp19098777p19113270.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with rbind

2008-08-22 Thread kayj


I am trying to use rbind to have the two data on the top of each other but I
am getting an extra X on the column header and the rows are numberd , How to
get rid of this problem? I appreciate your help



x1<- read.table(file="data1.txt", header=T, sep="\t")
 x2<-read.table(file="data2.txt", header=T, sep="\t")
 y<-rbind(x1,x2)
> y
X0   X0.0.05  X0.05.0.1  X0.1.0.2  X0.2.0.3  X0.3.0.4  X0.4.0.5
1  0.1971 0.108 0.1 0.1 0.156 0.38 0.22
2  0.2024 0.106 0.0 0.1 0.15 0.1408173 0.55
>

-- 
View this message in context: 
http://www.nabble.com/problem-with-rbind-tp19116234p19116234.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R

2008-08-22 Thread leidy patricia garzon rodriguez
Hi my name is leidy. I am a student of sistem ingenier, i do not speak
english.

Now I am working in a proyect using R but with a interface of java, I have a
problem because a need  acces to p



> s<-box.cox.powers(resp,2)

> s

Box-Cox Transformation to Normality



 Est.Power Std.Err. Wald(Power=0) Wald(Power=1)

   -0.0419   0.0274-1.528  -38.0037



L.R. test, power = 0:  2.3469   df = 1   p = 0.1255

L.R. test, power = 1:  1732.1631   df = 1   p = 0



This is the code then I need get the p value in each test when the power is
0 and 1
hi my name is Leidy, I am a student of engineering systems, actualmete I'm
working on a project using academic R but I have had some drawbacks.
I ask in advance excuse because I do not speak English
my problem is as follows, I need to get values p testing with power and
power = 0 = 1
ie in this case I need to get the 0.1255 and 0
but not with the instruction that I get.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Deleting NA in list()

2008-08-22 Thread Dong-hyun Oh

Dear useRs,

I would like to know the way of deleting NA in list().

Following is a example.

lt <- list(a = 1:3, b = NA, c = letters[1:3], d = NA)

for(i in length(lt):1) {
if(is.na(lt[[i]])) lt[[i]] <- NULL
}

How to simplify for() loop by using (l)apply family?

Thank you in advance.




=
Dong-hyun Oh
Center of Excellence for Science and Innovation Studies
Royal Institute or Technology, Sweden
e-mail: [EMAIL PROTECTED]
cel: +46 73 563 45 22

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sending "..." to a C external

2008-08-22 Thread Emmanuel Charpentier
Le vendredi 22 août 2008 à 15:16 -0400, John Tillinghast a écrit :
> I'm trying to figure this out with "Writing R Extensions" but there's not a
> lot of detail on this issue.
> I want to write a (very simple really) C external that will be able to take
> "..." as an argument.
> (It's for optimizing a function that may have several parameters besides the
> ones being optimized.)

!!! That's a hard one. I have never undertaken this kind of job, but I expect 
that your "..." argument, if you can reach it from C (which I don't know) will 
be bound to a Lisp-like structure, notoriously hard to decode in C. Basically,
you'll have to create very low level code (an duplicate a good chunk of the R
parser-interpreter...). 

I'd rather treat the "..." argument in a wrapper that could call the relevant 
C function with all arguments interpreted and bound... This wrapper would
probably be an order of magnitude slower than C code, but two orders of 
magnitude
easier to write (and maintain !). Since "..." argument parsing would be done 
*once*
before the "grunt work" is accomplished by C code, the slowdown would (probably)
be negligible...

> I got the "showArgs" code (from R-exts) to compile and install, but it only
> gives me error messages when I run it. I think I'm supposed to pass it
> different arguments from what I'm doing, but I have no idea which ones.
> 
> What exactly are CAR, CDR, and CADR anyway? Why did the R development team
> choose this very un-C-like set of commands? 

'Cause they are fundamental to the Lisp-like language that is S/R. Read on...

> They are not explained much in
> R-exts.

At the risk of incurring the R-help deities wrath :

"In the beginning, John Mc Carthy created Lisp 1.5, who begat MACLISP who begat 
..., who begat Scheme ... ". Nonwhistanding a long inheritance story, full of 
sound,
fury, holy wars and bastardry (sometimes evoking European royalty lines...), 
all 
Lisp-like language bear one mark (dominant allele) of their common ancestor, 
which
is the list structure. This structure is implemented as a pair of pointers, the
first one  pointing to the first element of the list, the second pointing to ...
the list of the rest of the members (or nothing, aka NIL).

In McCarthy's implementation, which was machine code on an IBM 704 (we were in 
1957, 
mind you...) such  word was spanned among different parts of a CPU word, known 
in the
technical documentation of this dinosaur as "address register" and "decrement 
register
respectively. Thus the mnemonic names of "Content of the Address Register" and 
"Content
of Decrement Register" (aka "CAR" and "CDR") for these two pointers, names 
which were
used for the (lisp) functions allowing to access them.

Those names have stuck mainly for historical (sentimental ? hysterical ? ) 
reasons.
Even when "reasonable" synonyms were introduced (e. g. "first" and "rest" in 
Common Lisp),
all the old hands (and young dummies who wanted to emulate them) kept "car" and 
"cdr" close
to their hearts.

So, fifty one years after McCarthy stroke of genius, this piece of CS snobbery 
is still
with us (and probably will 'til 64-bit Linux time counters roll over...).

Despite its C-like syntax, its huge collection of array and array-like 
structures,
its loops, S and R are fundamentally Lisp-like languages. Most notably, the 
representation
of executable code is accessible by the code itself : one can create a function 
which
computes another function. Lisp was the first language explicitly created for 
this
purpose, and  it is no happenstance that it (or one of his almost uncountable 
dialects)
that many (most ?) of such language use Lisp fundamentals. Many of R/S "common 
way of
doing things" have a Lisp smell : for example, it is no chance that 
{s|t|l}apply(),
outer() and suchlike are *way* more efficient than for(), and they are strongly
reminescent of Lisp's mapcar...

BTW, this ability to "compute the language" is probably the most fundamental 
point
distinguishing S/R from all the rest of "statistical packages" (SPSS, Stata, 
SAS and
the rest of the crowd...

Now I have to say that I'm not old enough to have first-hand knowledge of this 
history.
I was born, but not weaned, when McCarthy unleashed Lisp on an unsuspecting 
world ...
I learned that ca. 1978-9, while discovering VLisp. 

While I can't really help you (I still think that processing "..." at C level 
is either
hubris of the worst sort, pure folly or a desperate case), I hope to have 
entertained you.

Sincerely,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Visualization of two-mode matrix/ networks/ graphs

2008-08-22 Thread Brendan Casey

Jakob,

Could you please tell me how you got your 2-mode data INTO R/statnet/sna,
etc... ?

I can do many things with this data in ucinet, and even export it to pajek,
but I cannot get it to import even using the read.paj routine.  I have about
620 actors involved in 230 events.  When I convert it to an actor only
network, I also cannot get rid of the arrows!  I am trying to import from a
Pajek .net file

Success so far with drawing the two modes differently in UCInet, but I want
to do more statnet work on these networks, and also use the event timestamps
to show dynamic features of the network down the line.  Anyone?

-Brendan Casey



Jakob Mumm wrote:
> 
> Hey all,
> I'm looking for a plotting method for two-mode networks. Having a  n x m
> matrix where n is the first set of entities/ actors and m representing the
> second set, I want to colour both sets of actors differently. I tried
> "plot.network" from the network-package, but I did not succeed. Even the
> proposed solution from Gabor Grothendiek
> (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/87003.html ) seems not
> well
> suited for the problem. Further, I'm wondering how to interpret
> "vertex.col :
> color for vertices; may be given as a vector or a vertex attribute name,
> if
> vertices are to be of different colors." . For the latter solution via
> attribute name the documentation gives the following example
> "vertex.col=2+(network.vertex.names(nflo)=="Medici"" which is not a
> problem,
> but how to use the former solution via vector? 
>  The command "gplot" out of the sna-package should have incorporated a way
> of
> considering two-mode matrix via gmode=="twomode" (see help for gplot), but
> then one has to work with graphs and will loose (so far I have caught it)
> information which is stored in the class of network. 
> For example, take the following adjacency matrix, where 1-6 (rows) is the
> first set of entities and 7-9 (cols) the second:
>   7 8 9
> 1 0 0 0
> 2 1 0 0
> 3 1 0 0
> 4 0 0 0
> 5 0 0 0
> 6 0 0 0
> Thanks in advance,
> Jakob 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Visualization-of-two-mode-matrix--networks--graphs-tp12959659p19114539.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nlme gls front end crash

2008-08-22 Thread Daniel Malter
Hi, when I try to run GLS, R crashes. I have seen on the mailing list that
somebody has encountered the problem before (
http://tolstoy.newcastle.edu.au/R/help/05/01/10915.html ), but there seemed
to be no solution / the thread appears incomplete. The program runs gls1
(without correlation structure; code below) and crashes when I run gls2
(with correlation structure; code further below). Windows then shows a
standard crash window (the one where Windows asks to send an error report to
Microsoft) saying something like: "R GUI has encountered a problem and has
to close. We are sorry for the inconvenience..."  

The crash seems to depend on the specific correlation structure. It does not
crash with all correlation structures. My data has about 2000 data rows. Is
this a memory problem?

Thanks for any advice.
Daniel


The R code I am running is the following

newdata.new=groupedData(cash.end~rs|subj,data=data.new)
cs1 <- corARMA(value=c(0.9),p=1,form=~rs|subj)
cs1 <- Initialize(cs1, data = newdata.new)
corMatrix(cs1)
 
gls1=gls(log(cash.end+1)~factor(rs)+nogs+nols+ret+factor(poc)+posh+posl+negh
+negl,data=newdata.new,na.action=na.omit)

 
gls2=gls(log(cash.end+1)~factor(rs)+nogs+nols+ret+factor(poc)+posh+posl+negh
+negl,data=newdata.new,na.action=na.omit,corr=cs1)

And the session info is: 

R version 2.6.2 (2008-02-08) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] nlme_3.1-87

loaded via a namespace (and not attached):
[1] grid_2.6.2 lattice_0.17-4


-
cuncta stricte discussurus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dimension values of an array

2008-08-22 Thread Jan Akko Eleveld

Question: 
 
I created an array with 164 columns and 70 rows. 
 
I would like every column to have unit value 1 
and the rows 0.1, 0.2, 0.3 up to 7. How can I define that?
 
Thanks for your help!
 
Good weekend!
 
Akko
_


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to handle "~" character after csv importation

2008-08-22 Thread Sébastien

Dear R users,

I have to import some csv files in which column headers contain the 
character "~". Following the following import call, the character seems 
to be replaced by dots in the column names of my data frame. Plus, I 
cannot query names(mydata) to find the column index which header should 
contain "~" or "."


> mydata <- data.frame(read.table("mycsv.csv", sep = ",", header = 
TRUE, na.strings = ".", as.is = TRUE))

> mydata
 P1 P2 P3 P1.P2 P1.P3 P2.P3 A B C
1  1  2  3 4 5 6 7 8 9
> names(mydata)
[1] "P1""P2""P3""P1.P2" "P1.P3" "P2.P3" "A" "B" "C"   
> grep("~",names(mydata))

integer(0)
> grep(".",names(mydata))
[1] 1 2 3 4 5 6 7 8 9

How can I check for this character ? And by the way, can anyone explain 
to me the result of the last grep call?


Thank you in advance.

Sebastien

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using interactive plots to get information about data points

2008-08-22 Thread jcarmichael

I have been experimenting with interactive packages such iplots and playwith. 
Consider the following sample dataset:

A  B  C  D
1  5  5  9
3  2  8  4
1  7  3  0
7  2  2  6

Let's say I make a plot of variable A.  I would like to be able to click on
a data point (e.g. 3) and have a pop-up window tell me the corresponding
value for variable D (e.g. 4).  I am also trying to produce multiple small
plots.  For example, four side-by-side boxplots for each of the four
variables A, B, C, D.

Any help would be greatly appreciated!  Thanks!
-- 
View this message in context: 
http://www.nabble.com/Using-interactive-plots-to-get-information-about-data-points-tp19116114p19116114.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Newbie programming help

2008-08-22 Thread Chuck Cleland
On 8/22/2008 5:34 PM, Ranney, Steven wrote:
> All - 
> 
> Not sure if this is a real programming question, but here goes:
> 
> I have data that looks like
> 
> Lake  Length  Weight
> 1 158 45
> 1 179 70
> 1 200 125
> 1 202 150
> 1 206 145
> 1 209 165
> 1 210 140
> 1 215 175
> 1 216 152
> 1 220 150
> 1 221 165
> ...
> 
> where lake goes from 1 - 84 and the number of rows for each lake is variable 
> (but > ~20).  
> I'm trying to do two things: 1) build a simple linear model of the form 
> 
> {lm(log10(Weight)~log10(Length)}
> 
> for every separate lake in the data set; 2) I'd like to save the intercepts 
> and slopes 
> from each of these linear regressions into a seperate data frame.  Any ideas? 
>  I think it would 
> probably require some kind of 'for' statement, but I'm just not that smart.

  Assuming the data are in a data frame called "mydf":

library(nlme)

fm1 <- lmList(log10(Weight)~log10(Length) | Lake, mydf)

coef(fm1)

?lmList

or

t(sapply(split(mydf, mydf$Lake),
function(x){coef(lm(log10(Weight)~log10(Length), data=x))}))

> Thanks for your help, 
> 
> SR  
> 
> Steven H. Ranney
> Graduate Research Assistant (Ph.D)
> USGS Montana Cooperative Fishery Research Unit
> Montana State University
> PO Box 173460
> Bozeman, MT 59717-3460
> 
> phone: (406) 994-6643
> fax:   (406) 994-7479
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help needed for HWE.exact in library "genetics"

2008-08-22 Thread David Duffy

"Sun, Ying" asked:


I have a genotype data for both case and controls and would
like to calculate the HW p-value.  However, since the
number of one genotype is 0, I got wierd result.  Would
someone help me to figure it out?  Or confirm it's right?
Thanks a lot.

...

HWE.exact(g1)
Exact Test for Hardy-Weinberg Equilibrium
data:  g1
N11 = 71, N12 = 9, N22 = 0, N1 = 151, N2 = 9,
p-value = 1


Yes, that is correct.  Double check it by calculating the
goodness-of-fit chi-square for the same hypothesis:

p <- 151/160; q <- 1-p; 80*c(p^2,2*p*q,q^2)
[1] 71.253125  8.493750  0.253125
...

David Duffy.

--
| David Duffy (MBBS PhD) ,-_|\
| email: [EMAIL PROTECTED]  ph: INT+61+7+3362-0217 fax: -0101  / *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Newbie programming help

2008-08-22 Thread Ranney, Steven
All - 

Not sure if this is a real programming question, but here goes:

I have data that looks like

LakeLength  Weight
1   158 45
1   179 70
1   200 125
1   202 150
1   206 145
1   209 165
1   210 140
1   215 175
1   216 152
1   220 150
1   221 165
...

where lake goes from 1 - 84 and the number of rows for each lake is variable 
(but > ~20).  
I'm trying to do two things: 1) build a simple linear model of the form 

{lm(log10(Weight)~log10(Length)}

for every separate lake in the data set; 2) I'd like to save the intercepts and 
slopes 
from each of these linear regressions into a seperate data frame.  Any ideas?  
I think it would 
probably require some kind of 'for' statement, but I'm just not that smart.

Thanks for your help, 

SR  

Steven H. Ranney
Graduate Research Assistant (Ph.D)
USGS Montana Cooperative Fishery Research Unit
Montana State University
PO Box 173460
Bozeman, MT 59717-3460

phone: (406) 994-6643
fax:   (406) 994-7479


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Building colorspace on RHEL5

2008-08-22 Thread Marc Schwartz
Hi Hadley,

on 08/22/2008 03:52 PM hadley wickham wrote:
> Dear all,
> 
> I'm having problems installing the colorspace package on Red Hat
> Enterprise Linux 5:
> 
> * Installing *source* package 'colorspace' ...
> ** libs
> gcc -std=gnu99 -I/usr/lib/R/include -I/usr/lib/R/include
> -I/usr/local/include-fpic  -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
> --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic
> -fasynchronous-unwind-tables -c colorspace.c -o colorspace.o
> In file included from /usr/include/features.h:352,
>  from /usr/include/stdlib.h:25,
>  from /usr/lib/R/include/R.h:28,
>  from colorspace.c:1:
> /usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or 
> directory

It looks like the above (gnu/stubs-32.h) is the key missing file with
the required headers preventing compilation.

> colorspace.c: In function 'CheckHex':
> colorspace.c:411: warning: implicit declaration of function 'isxdigit'
> colorspace.c: In function 'RGB_to_RColor':
> colorspace.c:1024: warning: unused variable 'Zn'
> colorspace.c:1024: warning: unused variable 'Yn'
> colorspace.c:1024: warning: unused variable 'Xn'
> colorspace.c: In function 'hex_to_RGB':
> colorspace.c:1115: warning: passing argument 1 of 'decodeHexStr'
> discards qualifiers from pointer target type
> colorspace.c: In function 'polarLAB_to_LAB':
> colorspace.c:218: warning: control reaches end of non-void function
> colorspace.c: In function 'XYZ_to_LUV':
> colorspace.c:314: warning: control reaches end of non-void function
> make: *** [colorspace.o] Error 1
> 
> I'm running 2.6.1 because lab computers can only have official rpms
> and uname -a gives:
> 
> Linux rl102-07.lab.rice.edu 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15
> EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> Any ideas as to why the compile is failing?

At least on F9:

$ locate stubs-32.h
/usr/include/gnu/stubs-32.h

$ rpm -q --whatprovides /usr/include/gnu/stubs-32.h
glibc-devel-2.8-8.i386


So:

  # yum install glibc-devel

should take care of it, presuming that you are not missing anything else...

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Building colorspace on RHEL5

2008-08-22 Thread hadley wickham
Dear all,

I'm having problems installing the colorspace package on Red Hat
Enterprise Linux 5:

* Installing *source* package 'colorspace' ...
** libs
gcc -std=gnu99 -I/usr/lib/R/include -I/usr/lib/R/include
-I/usr/local/include-fpic  -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic
-fasynchronous-unwind-tables -c colorspace.c -o colorspace.o
In file included from /usr/include/features.h:352,
 from /usr/include/stdlib.h:25,
 from /usr/lib/R/include/R.h:28,
 from colorspace.c:1:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory
colorspace.c: In function 'CheckHex':
colorspace.c:411: warning: implicit declaration of function 'isxdigit'
colorspace.c: In function 'RGB_to_RColor':
colorspace.c:1024: warning: unused variable 'Zn'
colorspace.c:1024: warning: unused variable 'Yn'
colorspace.c:1024: warning: unused variable 'Xn'
colorspace.c: In function 'hex_to_RGB':
colorspace.c:1115: warning: passing argument 1 of 'decodeHexStr'
discards qualifiers from pointer target type
colorspace.c: In function 'polarLAB_to_LAB':
colorspace.c:218: warning: control reaches end of non-void function
colorspace.c: In function 'XYZ_to_LUV':
colorspace.c:314: warning: control reaches end of non-void function
make: *** [colorspace.o] Error 1

I'm running 2.6.1 because lab computers can only have official rpms
and uname -a gives:

Linux rl102-07.lab.rice.edu 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15
EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Any ideas as to why the compile is failing?

Thanks,

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Writing Rcmdr Plugins

2008-08-22 Thread Richard M. Heiberger
Bernhard and Irina, 

While tailoring the Rcmdr menu directly does work, I strongly do not
recommend it.  It will lead to mass confusion, most likely for the author,
several
months from now when it is installed on a different machine or when R or
Rcmdr is updated.

It is much cleaner to have a separate package.  It is also much easier to
write and maintain
a separate package than it is to edit the original Rcmdr-menus.txt.  While
the startup cost to
the author of writing an RcmdrPlugin.xxx is about the same as the cost for
modifying Rcmdr
itself, the maintenance is fantastically simpler for a Plugin package.

It is clear to the user of a Plugin package that your changes are not the
original.
When you modify the original Rcmdr, users will not be told explicitly that
they are using
a modified version.  That will lead to confusion on this list.

I distribute a Plugin (RcmdrPlugin.HH) on CRAN.  I originally distributed it
as a modification of
the Rcmdr.  John Fox invented plugins partially as a result of my
experience.  It is much
better for both of us have a clear separation of his responsibilities as
designer of the
Rcmdr, and mine as designer of several specific functions.

Rich

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Pfaff, Bernhard Dr.
Sent: Thursday, August 21, 2008 06:01
To: G. Jay Kerns; Irina Ursachi
Cc: r-help@r-project.org
Subject: Re: [R] Writing Rcmdr Plugins

Dear Irina,

though you asked explicitly for writing a RCommander-plugin package; I
just wanted to add that the former approach of tailor-making menues in
the Commander still works. That is, just copy your R file with the
tcl/tk functions into the /etc directory of the RCommander and include
your menu structure into the file "Rcmdr-menus.txt"

Best,
Bernhard

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] History pruning

2008-08-22 Thread Richard M. Heiberger
We have different starting points.  Please be sure that your modularity
allows a cleaned region as well as a history log to be the input to your
next
step.  The history log is incomplete; lines sent to the *R* buffer by C-c
C-n are
explicitly excluded from history.  Lines picked up from a saved transcript
file aren't in history.  Will your program handle this correctly:

aa <- bb +
cc

?  It is valid code.  Suppressing the line with cc will make it not valid
code.


Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sd(NA)

2008-08-22 Thread hadley wickham
Here's one approach:

try_default <- function(expr, default = NA, quiet = FALSE) {
  result <- default
  if (quiet) {
tryCatch(result <- expr, error = function(e) {})
  } else {
try(result <- expr)
  }
  result
}

failwith <- function(default = NULL, f, quiet = FALSE) {
  function(...) try_default(f(...), default, quiet = quiet)
}


sd2 <- failwith(NA, sd)
sd2(NA, na.rm=T)

sd3 <- failwith(NA, sd, quiet = T)
sd3(NA, na.rm=T)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sd(NA)

2008-08-22 Thread Roy Mendelssohn

Hi All:

This was discussed in the R-developers list   (see the thread  
starting at http://tolstoy.newcastle.edu.au/R/e3/devel/ 
07/12/0560.html).  It has to do with the behavior of sd() when the  
entire vector is NA.  The behavior has changed between 2.6 and 2.7.1  
as follows:


Run in Version 2.7.1
> tt<-rep(NA, 10)
> mean(tt, na.rm=T)
[1] NaN
>sd(tt, na.rm=T)
Error in var(x, na.rm=na.rm) : no complete element pairs

Run in Version 2.6.1
> tt<-rep(NA, 10)
> mean(tt, na.rm=T)
[1] NaN
>sd(tt, na.rm=T)
Na

If I understand the discussion, 2.7.1 fails because 'rm=T' removes  
the NA's so that it is now empty. What I have been unable to find in  
any of the mail lists is how you are suppose to program this  in  
2.7.1. when you are looping through a large number of  
"variables"  (we have  time series on a grid and are calculating the  
mean and sd of the time series at each grid point).


Any suggestions much appreciated.

-Roy M.

**
"The contents of this message do not reflect any position of the U.S.  
Government or NOAA."

**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: [EMAIL PROTECTED] (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme questions re: repeated measures & covariance structure

2008-08-22 Thread Mark Na
Great, thanks for your reply!

I've just tracked down a copy of Pinheiro & Bates (2000) so I'll look at
that, too.

Thanks again, Mark


On Fri, Aug 22, 2008 at 9:56 AM, Christoph Scherber <
[EMAIL PROTECTED]> wrote:

> Dear Mark,
>
> I would include the repeated measure as the smallest stratum in the random
> effects specification:
>
> random=~1|sampleunit/year
>
> Setting up user-defined variance structures should be possible using for
> example:
>
> weights=varPower(form=~habitat)
>
> or also try out the available corStruct() classes (found in Pinheiro and
> Bates 2000)
>
> HTH
> Christoph
>
>
>
>
> Mark Na schrieb:
>
>> Hello,
>>
>> We are attempting to use nlme to fit a linear mixed model to explain bird
>> abundance as a function of habitat:
>>
>>
>> lme(abundance~habitat-1,data=data,method="ML",random=~1|sampleunit)
>>
>> The data consist of repeated counts of birds in sample units across
>> multiple
>> years, and we have two questions:
>>
>> 1) Is it necessary (and, if so, how) to specify the repeated measure
>> (years)? As written, the above code does not.
>>
>> 2) How can we specify a Toeplitz heterogeneous covariance structure for
>> this
>> model? We have searched the help file for lme, and the R-help archives,
>> but
>> cannot find any pertinent information. If that's not possible, can we
>> adapt
>> an existing covariance structure, and if so how?
>>
>> Thanks, Mark
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> .
>>
>>
> --
> Dr. rer.nat. Christoph Scherber
> University of Goettingen
> DNPW, Agroecology
> Waldweg 26
> D-37073 Goettingen
> Germany
>
> phone +49 (0)551 39 8807
> fax   +49 (0)551 39 8806
>
> Homepage http://www.gwdg.de/~cscherb1 
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sending "..." to a C external

2008-08-22 Thread John Tillinghast
I'm trying to figure this out with "Writing R Extensions" but there's not a
lot of detail on this issue.
I want to write a (very simple really) C external that will be able to take
"..." as an argument.
(It's for optimizing a function that may have several parameters besides the
ones being optimized.)

I got the "showArgs" code (from R-exts) to compile and install, but it only
gives me error messages when I run it. I think I'm supposed to pass it
different arguments from what I'm doing, but I have no idea which ones.

What exactly are CAR, CDR, and CADR anyway? Why did the R development team
choose this very un-C-like set of commands? They are not explained much in
R-exts.

Thanks in advance for any info.

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and GNUPLOT

2008-08-22 Thread Greg Snow
There are some interface commands in the TeachingDemos package (see ?gp.open) 
for communicating between R and Gnuplot.  These commands are pretty basic, but 
looking at the code may give you some ideas of what to do next in your project 
(or maybe what not to do).

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111



> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Fristachi, Anthony
> Sent: Friday, August 22, 2008 12:43 PM
> To: r-help@R-project.org
> Subject: [R] R and GNUPLOT
>
> Hello,
>
>  I am trying to send commands to GNUPLOT to load a .plt file
> that was generated with a C++ software module called with R,
> and to replot.
>
>  I am able to open the program with shell.exec (below)  but I
> am at a loss at what do next.  Any suggestions??
>
> shell.exec("C:\\ gnuplot \\ bin \\ wgnuplot.exe")
>
> Thanks
>
> >>>---
> Tony Fristachi
> Battelle, Statistics & Information Analysis
> 505 King Avenue, Rm 11-7-056
> Columbus, OH 43201
> Email: [EMAIL PROTECTED]
> Phone: 614/424.4910   Fax: 614/458.4910
> Cell: 513/375.3263
> Web:
> http://www.battelle.org/solutions/default.aspx?Nav_Area=Tech&N
> av_SectionID=3
>
> "We are what we repeatedly do. Excellence then, is not an
> act, but a habit."
> Aristotle
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R and GNUPLOT

2008-08-22 Thread Fristachi, Anthony
Hello,

 I am trying to send commands to GNUPLOT to load a .plt file that was generated 
with a C++ software module called with R, and to replot.

 I am able to open the program with shell.exec (below)  but I am at a loss at 
what do next.  Any suggestions??

shell.exec("C:\\ gnuplot \\ bin \\ wgnuplot.exe")

Thanks

>>>---
Tony Fristachi
Battelle, Statistics & Information Analysis
505 King Avenue, Rm 11-7-056
Columbus, OH 43201
Email: [EMAIL PROTECTED]
Phone: 614/424.4910   Fax: 614/458.4910
Cell: 513/375.3263
Web: 
http://www.battelle.org/solutions/default.aspx?Nav_Area=Tech&Nav_SectionID=3

"We are what we repeatedly do. Excellence then, is not an act, but a habit."
Aristotle


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] inner margins for lattice

2008-08-22 Thread Todd Hatfield
I would like to control the inner margins of a lattice graph.  The graph is
a single superposed panel, but there is too much white space around the data
for my liking.  I've played around with some of the layout options, but so
far have found only how to control outer margins.

R version 2.7.1 (2008-06-23)
MS Vista OS

Thanks for any help.

Todd Hatfield

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test of Homogeneity of Variances

2008-08-22 Thread Liaw, Andy
You don't need to test that the _sample_ variances are different:  They
already are.  Statistical tests of hypotheses are not about sample
statistics, but distributional charateristics.

It seems to me that reinforcement of some basic stat concept may do you
quite a bit of good.  If you don't have the basics, you're just going to
get more and more puzzled every step of the way.  Just a frank
suggestion.

Best,
Andy 

From: Daren Tan
> 
> I am testing whether the sample variances are equal. When 
> p-value < 0.05 (alpha), should accept null hypothesis (sample 
> variances are equal) or reject it ?
> 
> 
> The two new examples with each having same sample variances 
> also puzzle me. Why are the p-values different ?
> 
> bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4))
> 
> 
> Bartlett test of homogeneity of variances
> 
> 
> data:  rep(rnorm(5), times = 4) and rep(1:5, each = 4)
> Bartlett's K-squared = 0.8681, df = 4, p-value = 0.929
> 
> 
> bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4))
> 
> 
> Bartlett test of homogeneity of variances
> 
> 
> data:  rep(rnorm(5), times = 4) and rep(1:5, each = 4)
> Bartlett's K-squared = 3.5599, df = 4, p-value = 0.4688
> 
> 
> > From: [EMAIL PROTECTED]
> > To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> > Date: Fri, 22 Aug 2008 11:25:36 -0400
> > Subject: RE: [R] Test of Homogeneity of Variances
> >
> > What are your hypotheses? Once you state what they are, 
> interpretation should be straightforward.
> >
> >
> >
> > -Original Message-
> > From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Daren Tan
> > Sent: Friday, August 22, 2008 11:18 AM
> > To: [EMAIL PROTECTED]
> > Subject: [R] Test of Homogeneity of Variances
> >
> >
> > I am testing the homogeneity of variances via bartlett.test 
> and fligner.test. Using the following example, how should I 
> interpret the p-value in order to accept or reject the null 
> hypothesis ?
> >
> > set.seed(5)
> > x <- rnorm(20)
> > bartlett.test(x, rep(1:5, each=4))
> >
> >
> > Bartlett test of homogeneity of variances
> >
> >
> > data: x and rep(1:5, each = 4)
> > Bartlett's K-squared = 1.7709, df = 4, p-value = 0.7778
> >
> > fligner.test(x, rep(1:5, each=4))
> >
> > Fligner-Killeen test of homogeneity of variances
> >
> >
> > data: x and rep(1:5, each = 4)
> > Fligner-Killeen:med chi-squared = 1.0819, df = 4, p-value = 0.8971
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > This email message, including any attachments, is for 
> ...{{dropped:6}}
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test of Homogeneity of Variances

2008-08-22 Thread Mark Difford



Have you read the documentation to either of the functions you are using?

?bartlett.test

"Performs Bartlett's test of the null that the variances in each of the
groups (samples) are the same."

This explicitly tells you what is being tested, i.e. the null tested is that
var1 = var2.

?rnorm

Generates random deviates, so the answer [i.e. p-value] will (almost
certainly) differ each time. Only by setting a seed for the random number
generator to use will rnorm() generate the same number-set/distribution and
therefore the same p-value.

?set.seed

##
set.seed(7)
bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4))
set.seed(7)
bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4))
set(23)
bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4))
bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4))

HTH, Mark.


Daren Tan wrote:
> 
> 
> I am testing whether the sample variances are equal. When p-value < 0.05
> (alpha), should accept null hypothesis (sample variances are equal) or
> reject it ?
> 
> 
> The two new examples with each having same sample variances also puzzle
> me. Why are the p-values different ?
> 
> bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4))
> 
> 
> Bartlett test of homogeneity of variances
> 
> 
> data:  rep(rnorm(5), times = 4) and rep(1:5, each = 4)
> Bartlett's K-squared = 0.8681, df = 4, p-value = 0.929
> 
> 
> bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4))
> 
> 
> Bartlett test of homogeneity of variances
> 
> 
> data:  rep(rnorm(5), times = 4) and rep(1:5, each = 4)
> Bartlett's K-squared = 3.5599, df = 4, p-value = 0.4688
> 
> 
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
>> Date: Fri, 22 Aug 2008 11:25:36 -0400
>> Subject: RE: [R] Test of Homogeneity of Variances
>>
>> What are your hypotheses? Once you state what they are, interpretation
>> should be straightforward.
>>
>>
>>
>> -Original Message-
>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
>> On Behalf Of Daren Tan
>> Sent: Friday, August 22, 2008 11:18 AM
>> To: [EMAIL PROTECTED]
>> Subject: [R] Test of Homogeneity of Variances
>>
>>
>> I am testing the homogeneity of variances via bartlett.test and
>> fligner.test. Using the following example, how should I interpret the
>> p-value in order to accept or reject the null hypothesis ?
>>
>> set.seed(5)
>> x <- rnorm(20)
>> bartlett.test(x, rep(1:5, each=4))
>>
>>
>> Bartlett test of homogeneity of variances
>>
>>
>> data: x and rep(1:5, each = 4)
>> Bartlett's K-squared = 1.7709, df = 4, p-value = 0.7778
>>
>> fligner.test(x, rep(1:5, each=4))
>>
>> Fligner-Killeen test of homogeneity of variances
>>
>>
>> data: x and rep(1:5, each = 4)
>> Fligner-Killeen:med chi-squared = 1.0819, df = 4, p-value = 0.8971
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> This email message, including any attachments, is for ...{{dropped:6}}
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Test-of-Homogeneity-of-Variances-tp19109383p19112613.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read malformed csv files with read.table?

2008-08-22 Thread Martin Ballaschk


Hi Brian,


Am 22.08.2008 um 19:22 schrieb Prof Brian Ripley:

On Fri, 22 Aug 2008, Martin Ballaschk wrote:
My solution is reading the files without the headers (skip = 1) and  
seperately reading the headers with scan (scan("myfile.CSV", what =  
"character", sep = "\t", nlines = 1). After throwing out the first  
two columns it should be possible to assign the scanned colnames to  
the data.frame colnames.


Yes, but if you read the header first you can set the col.names via  
the arg to read.table().


Thanks! I plan to do it like that (actually it will be stuffed into a  
loop to read a bunch of files), seems to work:


> headernames <- scan("test.CSV", what = "character", sep = "\t",  
nlines = 1, skip = 4)
> my.table <- read.table("test.CSV", header=F, skip = 5, col.names =  
c("crap.1", "crap.2", headernames))


> head(my.table)
  crap.1 crap.2 time.ms C550.KMS Cyt_b559.KMS [...] etc.
1  0 Point1  -599.50.0000.000
2  0 Point2  -598.00.019   -0.014
3  0 Point3  -596.50.025   -0.023
4  0 Point4  -595.00.034   -0.029
5  0 Point5  -593.50.049   -0.033
6  0 Point6  -592.00.068   -0.033

Cheers
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple generation of artificial data with defined features

2008-08-22 Thread Christos Hatzis
On the general question on how to create a dataset that matches the
frequencies in a table, function as.data.frame can be useful.  It takes as
argument an object of a class 'table' and returns a data frame of
frequencies.

Consider for example table 6.1 of Fleiss et al (3rd Ed):

> birth.weight <- c(10,15,40,135)
> attr(birth.weight, "class") <- "table"
> attr(birth.weight, "dim") <- c(2,2)
> attr(birth.weight, "dimnames") <- list(c("A", "Ab"), c("B", "Bb"))
> birth.weight
 B  Bb
A   10  40
Ab  15 135
> summary(birth.weight)
Number of cases in table: 200 
Number of factors: 2 
Test for independence of all factors:
Chisq = 3.429, df = 1, p-value = 0.06408
> 
> bw.dt <- as.data.frame(birth.weight)

Observations (rows) in this table can then be replicated according to their
corresponding frequencies to yield the expanded dataset that conforms with
the original table. 

> bw.dt.exp <- bw.dt[rep(1:nrow(bw.dt), bw.dt$Freq), -ncol(bw.dt)]
> dim(bw.dt.exp)
[1] 200   2
> table(bw.dt.exp)
Var2
Var1   B  Bb
  A   10  40
  Ab  15 135 

The above approach is not restricted to 2x2 tables, and should be
straightforward generate datasets that conform to arbitrary nxm frequency
tables.

-Christos Hatzis


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Greg Snow
> Sent: Friday, August 22, 2008 12:41 PM
> To: drflxms; r-help@r-project.org
> Subject: Re: [R] simple generation of artificial data with 
> defined features
> 
> I don't think that the election data is the right data to 
> demonstrate Kappa, you need subjects that are classified by 2 
> or more different raters/methods.  The election data could be 
> considered classifying the voters into which party they voted 
> for, but you only have 1 rater.  Maybe if you had some survey 
> data that showed which party each voter voted for in 2 or 
> more elections, then that may be a good example dataset.  
> Otherwise you may want to stick with the sample datasets.
> 
> There are other packages that compute Kappa values as well (I 
> don't know if others calculate this particular version), but 
> some of those take the summary data as input rather than the 
> raw data, which may be easier if you just have the summary tables.
> 
> 
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> [EMAIL PROTECTED]
> (801) 408-8111
> 
> 
> 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of drflxms
> > Sent: Friday, August 22, 2008 6:12 AM
> > To: r-help@r-project.org
> > Subject: [R] simple generation of artificial data with defined 
> > features
> >
> > Dear R-colleagues,
> >
> > I am quite a newbie to R fighting my stupidity to solve a probably 
> > quite simple problem of generating artificial data with defined 
> > features.
> >
> > I am conducting a study of inter-observer-agreement in 
> > child-bronchoscopy. One of the most important measures is Kappa 
> > according to Fleiss, which is very comfortable available in 
> R through 
> > the irr-package.
> > Unfortunately medical doctors like me don't really 
> understand much of 
> > statistics. Therefore I'd like to give the reader an easy 
> > understandable example of Fleiss-Kappa in the Methods part. 
> To achieve 
> > this, I obtained a table with the results of the German 
> election from 
> > 2005:
> >
> > partynumber of votespercent
> >
> > SPD1619466534,2
> > CDU1313674027,8
> > CSU34943097,4
> > Gruene38383268,1
> > FDP46481449,8
> > PDS41181948,7
> >
> > I want to show the agreement of voters measured by Fleiss-Kappa. To 
> > calculate this with the kappam.fleiss-function of irr, I need a 
> > data.frame like this:
> >
> > (id of 1st voter) (id of 2nd voter)
> >
> > party spd cdu
> >
> > Of course I don't plan to calculate this with the million of cases 
> > mentioned in the table above (I am working on a small laptop). A 
> > division by 1000 would be more than perfect for this example. The 
> > exact format of the table is generally not so important, as I could 
> > reshape nearly every format with the help of the reshape-package.
> >
> > Unfortunately I could not figure out how to create such a 
> > fictive/artificial dataset as described above. Any 
> data.frame would be 
> > nice, that keeps at least the percentage. String-IDs of 
> parties could 
> > be substituted by numbers of course (would be even better 
> for function 
> > kappam.fleiss in irr!).
> >
> > I would appreciate any kind of help very much indeed.
> > Greetings from Munich,
> >
> > Felix Mueller-Sarnowski
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.htm

Re: [R] How to read malformed csv files with read.table?

2008-08-22 Thread Prof Brian Ripley

On Fri, 22 Aug 2008, Martin Ballaschk wrote:


Hi folks,

thank you for your friendly and immediate help!

Am 22.08.2008 um 17:14 schrieb Prof Brian Ripley:

Or, better, use header=FALSE, skip=1 and the col.names arg of read.table().


My solution is reading the files without the headers (skip = 1) and 
seperately reading the headers with scan (scan("myfile.CSV", what = 
"character", sep = "\t", nlines = 1). After throwing out the first two 
columns it should be possible to assign the scanned colnames to the 
data.frame colnames.


Yes, but if you read the header first you can set the col.names via the 
arg to read.table().


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep, gsub and metacharacters

2008-08-22 Thread Greg Snow
Try this:

> myvector<-c("ajkhfkiuwe","Variable(kg/min)")
> gsub( "\\(kg/min\\)", "va", myvector )

Does that come close to what you want?  If not, maybe an example of what you 
want the result to look like,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111



> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Judith Flores
> Sent: Friday, August 22, 2008 10:57 AM
> To: RHelp
> Subject: [R] grep, gsub and metacharacters
>
> Hello,
>
>I have an expression that I want to substitute for something else.
>
> myvector<-c("ajkhfkiuwe","Variable(kg/min)")
>
> if I use the following code to extract "variable(kg/min)" and
> substitute it for "va"
>
> gsub(myvector[grep("var", myvector, ignore=T)], "va", myvector)
>
>  grep identifies the element of the vector that matches my
> query, but it won't substitute the value. I have been reading
> the help pages for regular expression, but still can't figure
> out the syntax to read parenthesis and forward slashes, which
> are considered metacharacters.
>
>
> As usual, thank you for your help,
>
> Judith
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read malformed csv files with read.table?

2008-08-22 Thread Martin Ballaschk

Hi folks,

thank you for your friendly and immediate help!

Am 22.08.2008 um 17:14 schrieb Prof Brian Ripley:
Or, better, use header=FALSE, skip=1 and the col.names arg of  
read.table().


My solution is reading the files without the headers (skip = 1) and  
seperately reading the headers with scan (scan("myfile.CSV", what =  
"character", sep = "\t", nlines = 1). After throwing out the first two  
columns it should be possible to assign the scanned colnames to the  
data.frame colnames.


Cheers
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] WinBUGS with R

2008-08-22 Thread Uwe Ligges



artimon wrote:

Dear Users,
I am new to both of things, so do not blame me too much...

I am busy with semiparametric regression and use WinBUGS to sample
posteriors.
The code to call Winbugs is as follows: 


data<- list("y","X","n","m") #My variables
inits.beta  <- rep(0,K)
inits.beta0 <- 0
inits   <- 
function(){list(beta=inits.beta,beta0=inits.beta0,taueps=1.0E-3)}
parameters  <- list("sigma","beta","beta0","y.star")
fitm<- bugs(data,inits,parameters,model.file="model.bug",
n.chains=3, n.iter=n.iter, n.burnin=n.burnin, n.thin = 
n.thin,
debug=FALSE,DIC=FALSE,digit=5,codaPkg=FALSE,
bugs.directory="C:/Program Files/WinBUGS14/")

>

but I always get the following error:
Error in FUN(X[[1L]], ...) : 
  .C(..): 'type' must be "real" for this format



I tried the web, but failed. Could anyone give me a clue?
Best!




- Your example is not reproducible for us (without the data).
- Is WinBUGS runnign correctly (see its output with debug=TRUE)?
- What does traceback() tell us?

Which versions of R, WinBUGS and R2WinBUGS are in use?

Uwe Ligges

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survey Design / Rake questions

2008-08-22 Thread Farley, Robert
I *think* I'm making progress, but I'm still failing at the same step.  My rake 
call fails with:
Error in postStratify.survey.design(design, strata[[i]], 
population.margins[[i]],  : 
  Stratifying variables don't match

To my naïve eyes, it seems that my factors are "in the wrong order".  If so, 
how do I "assert" an ordering in my survey dataframe, or copy an "image" from 
the survey dataframe to my marginals dataframes?  I'd prefer to "pull" the 
original marginals dataframe(s) from the survey dataframe so that I can 
automate that in production.

If that's not my problem, where might I look for enlightenment?  Neither "?why" 
nor ?whatamimissing return citations.  :-)


**
 How I fail Now **
SurveyData <- read.spss("C:/Data/R/orange_delivery.sav", use.value.labels=TRUE, 
max.value.labels=Inf, to.data.frame=TRUE)
> #===
> temp <- sub(' +$', '', SurveyData$direction_) 
> SurveyData$direction_ <- temp
> #===
> SurveyData$NumStn=abs(as.numeric(SurveyData$lineon)-as.numeric(SurveyData$lineoff))
> mean(SurveyData$NumStn)
[1] 6.785276
> ### Kludge
> SurveyData$NumStn <- pmax(1,SurveyData$NumStn)
> mean(SurveyData$NumStn)
[1] 6.789877
> SurveyData$NumStn <- as.factor(SurveyData$NumStn)
> ###
> EBSurvey <- subset(SurveyData, direction_ == "EASTBOUND" )
> XTTable <- xtabs(~direction_ , EBSurvey)
> XTTable
direction_
EASTBOUND 
  345 
> WBSurvey <- subset(SurveyData, direction_ == "WESTBOUND" )
> XTTable <- xtabs(~direction_ , WBSurvey)
> XTTable
direction_
WESTBOUND 
  307 
> #
> EBDesign <- svydesign(id=~sampn, weights=~expwgt, data=EBSurvey)
> #   svytable(~lineon+lineoff, EBDesign)
> StnName <- c( "Warner Center", "De Soto", "Pierce College", "Tampa", 
> "Reseda", "Balboa", "Woodley", "Sepulveda", "Van Nuys", "Woodman", "Valley 
> College", "Laurel Canyon", "North Hollywood")
> EBOnNewTots <- c(1000,   600, 1200, 500, 
> 1000,  500,   200, 250,   1000,   300,  
> 100,  123.65,0 )
> StnTraveld  <- c(as.character(1:12))
> EBNumStn<- c(673.65, 800, 1000, 1000,  800,  700,  600, 500, 400, 
> 200,  50, 50 )
> ByEBOn  <- data.frame(StnName,   Freq=EBOnNewTots)
> ByEBNum <- data.frame(StnTraveld, Freq=EBNumStn)
> RakedEBSurvey <- rake(EBDesign, list(~lineon, ~NumStn), list(ByEBOn, ByEBNum) 
> )
Error in postStratify.survey.design(design, strata[[i]], 
population.margins[[i]],  : 
  Stratifying variables don't match
> 
> str(EBSurvey$lineon)
 Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
> EBSurvey$lineon[1:5]
[1] Pierce College Warner Center  Warner Center  Warner Center  De Soto   
Levels: Warner Center De Soto Pierce College Tampa Reseda Balboa Woodley 
Sepulveda Van Nuys Woodman Valley College Laurel Canyon North Hollywood
> str(ByEBOn$StnName)
 Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
> ByEBOn$StnName[1:5]
[1] Warner Center  De SotoPierce College Tampa  Reseda
Levels: Balboa De Soto Laurel Canyon North Hollywood Pierce College Reseda 
Sepulveda Tampa Valley College Van Nuys Warner Center Woodley Woodman
> 
> str(EBSurvey$NumStn)
 Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
> EBSurvey$NumStn[1:5]
[1] 10 12 4  12 8 
Levels: 1 2 3 4 5 6 7 8 9 10 11 12
> str(ByEBNum$StnTraveld)
 Factor w/ 12 levels "1","10","11",..: 1 5 6 7 8 9 10 11 12 2 ...
> ByEBNum$StnTraveld[1:5]
[1] 1 2 3 4 5
Levels: 1 10 11 12 2 3 4 5 6 7 8 9
>

**
**


Robert Farley
Metro
www.Metro.net 


-Original Message-
From: Thomas Lumley [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 21, 2008 13:55
To: Farley, Robert
Cc: r-help@r-project.org
Subject: Re: [R] Survey Design / Rake questions

On Tue, 19 Aug 2008, Farley, Robert wrote:

> While I'm trying to catch up on the statistical basis of my task, could
> someone point me to how I should fix my R error?

The variables in the formula in rake() need to be the raw variables in the 
design object, not summary tables.

   -thomas

>
> Thanks
>
>
> 
> 
>> library(survey)
>> SurveyData <- read.spss("C:/Data/R/orange_delivery.sav",
> use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
>>
> #===
> 
>> temp <- sub(' +$', '', SurveyData$direction_)
>> SurveyData$direction_ <- temp
>>
> #===
> 
>>
> SurveyData$NumStn=abs(as.numeric

[R] grep, gsub and metacharacters

2008-08-22 Thread Judith Flores
Hello,

   I have an expression that I want to substitute for something else.

myvector<-c("ajkhfkiuwe","Variable(kg/min)")

if I use the following code to extract "variable(kg/min)" and substitute it for 
"va"

gsub(myvector[grep("var", myvector, ignore=T)], "va", myvector)

 grep identifies the element of the vector that matches my query, but it won't 
substitute the value. I have been reading the help pages for regular 
expression, but still can't figure out the syntax to read parenthesis and 
forward slashes, which are considered metacharacters.


As usual, thank you for your help,

Judith

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quantile regression with complex survey data

2008-08-22 Thread Brian S Cade
Just as a follow-up on T. Lumley's post, 2 citations that may be useful in 
reference to application of quantile regression with survey samples are:

Bassett and Saleh.  1994.  L_1 estimation of the median of a survey 
population.  Nonparametric Statistics 3: 277-283.  (L_1 estimation of 
median is 0.50 quantile regression).

Bassett et al.  2002.  Quantile models and estimators for data analysis. 
Metrika 55: 17-26.  (describes weighted QR for survey of school 
characteristics and student achievement scores).

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  [EMAIL PROTECTED]
tel:  970 226-9326



"Stas Kolenikov" <[EMAIL PROTECTED]> 
Sent by: [EMAIL PROTECTED]
08/20/2008 01:14 PM

To
"Cheng, Yiling (CDC/CCHP/NCCDPHP)" <[EMAIL PROTECTED]>
cc
r-help@r-project.org
Subject
Re: [R] Quantile regression with complex survey data






On Wed, Aug 20, 2008 at 8:12 AM, Cheng, Yiling (CDC/CCHP/NCCDPHP)
<[EMAIL PROTECTED]> wrote:
> I am working on the NHANES survey data, and want to apply quantile
> regression on these complex survey data. Does anyone know how to do
> this?

There are no references in technical literature (thinking, Annals,
JASA, JRSS B, Survey Methodology). Absolutely none. Zero. You might be
able to apply the procedure mechanically and then adjust the standard
errors, but God only knows what the population equivalent is of
whatever that model estimates. If there is a population analogue at
all.

In general, a quantile regression is a heavily model based concept:
for each value of the explanatory variables, there is a well defined
distribution of the response, and quantile regression puts additional
structure on it -- linearity of quantiles wrt to some explanatory
variables. That does not mesh well with the design paradigm according
to which the survey estimation is usually conducted. With the latter,
the finite population and characteristics of every unit are assumed
fixed, and randomness comes only from the sampling procedure. Within
that paradigm, you can define the marginal distribution of the
response (or any other) variable, but the conditional distributions
may simply be unavailable because there are no units in the population
satisfying the conditions.

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining multiple datasets

2008-08-22 Thread bogdan romocea

Here's a function that does what you asked, you may need to adjust the column
names of your data frames before using it. If all your data frames are
similar (same number of rows, same years) then try do.call('cbind',
yourList).


#This function takes a list of data frames and merges them into one data
frame.
#   The data frames are assumed to have one common column name (the one 
used 
#   for joins) and customized name(s) for the remaining column(s).
#LDF = list of data frames to be merged

rmerge <- function(LDF, verbose=FALSE)
{
DF <- LDF[[1]]
cat(paste("Started with", nrow(DF), "rows from", names(LDF)[1]), "\n")
for (i in 2:length(LDF)) {
DF <- merge(DF, LDF[[i]], all=TRUE)  #outer join
if (verbose) cat(paste("Adding ", nrow(LDF[[i]]), " rows from ",
names(LDF)[i], 
" (", nrow(DF), " rows so far)...", sep=""), "\n")
}
rownames(DF) <- NULL
cat(paste("Got", nrow(DF), "rows after recursive merging"), "\n")
DF
}




Alison Macalady wrote:
> 
> Hi,
> I've tried to figure this out using Intro to R and help(), to no avail 
> - I am new at this.
> 
> I'm trying to write a script that will read multiple files from a 
> directory and then merge them into a single new data frame.
> The original data are in a tree-ring specific format, and so I've first 
> used a function (read.rwl) from the dplR package to read each file, 
> translate each into a more intuitive time series format, and then put 
> each new data frame into a single object, demo.Rwl:
> 
>  >demo.Rwl <- list.files(pattern=".RWL$"); for(i in demo.Rwl) { x <- 
> read.rwl(i, header=TRUE); assign(print(i, quote=FALSE), x)}
> 
> This part seems to work.  Now, I want to be able to put all of the 
> individual data frames contained in demo.Rwl into a single data frame, 
> merging by rows (rows are calendar years in the time series). I think I 
> know how to do this by typing each data set name into a merge:
> 
>  >merge(x,y..., by.x=0, all=TRUE )
> 
> However, I'd like to be able to script something that will save me the 
> time of typing in all of the individual data set names, as I will be 
> repeating this operation many times with many different numbers of 
> original input files. Is there a way to code a merge such that every 
> individual data frame contained in demo.Rwl (a different number every 
> time) gets combined into a single data set?
> 
> Thanks for the help!
> 
> Ali
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Combining-multiple-datasets-tp19100369p19108774.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] WinBUGS with R

2008-08-22 Thread artimon

Dear Users,
I am new to both of things, so do not blame me too much...

I am busy with semiparametric regression and use WinBUGS to sample
posteriors.
The code to call Winbugs is as follows: 

data<- list("y","X","n","m") #My variables
inits.beta  <- rep(0,K)
inits.beta0 <- 0
inits   <- 
function(){list(beta=inits.beta,beta0=inits.beta0,taueps=1.0E-3)}
parameters  <- list("sigma","beta","beta0","y.star")
fitm<- bugs(data,inits,parameters,model.file="model.bug",
n.chains=3, n.iter=n.iter, n.burnin=n.burnin, n.thin = 
n.thin,
debug=FALSE,DIC=FALSE,digit=5,codaPkg=FALSE,
bugs.directory="C:/Program Files/WinBUGS14/")

but I always get the following error:
Error in FUN(X[[1L]], ...) : 
  .C(..): 'type' must be "real" for this format


I tried the web, but failed. Could anyone give me a clue?
Best!

-- 
View this message in context: 
http://www.nabble.com/WinBUGS-with-R-tp19108557p19108557.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filtering out data

2008-08-22 Thread Kate Rohrbaugh
Thank you, thank you everyone!
 
This last worked the best for what I need to do.  The other options were 
definitely helpful in that they help me understand how R is structured 
(something I haven't been able to get my head around yet) and gave me good 
ideas about how to manipulate data, however they generated regressions of data 
I don't care about (i.e., when "model1"==0).   
 
Regards -- Kate
 
Kate Rohrbaugh
Independent Project Analysis, Inc.
44426 Atwater Drive
Ashburn, VA  20147
 
office - 703.726.5465
fax - 703.729.8301
email - [EMAIL PROTECTED] 
website - www.ipaglobal.com ( http://www.ipaglobal.com/ )


>>> Daniel Folkinshteyn <[EMAIL PROTECTED]> 8/22/2008 12:08 PM >>>
If I understand correctly the result you are trying to achieve, I think 
what you may be looking for is this:

model1.coeff <- lm(dv ~ iv1 + iv2 + iv3, data = 
merged.dataset[merged.dataset$model1 == 1, ])

This will give you a regression using only the data for which model1 == 1.

on 08/22/2008 11:07 AM jim holtman said the following:
> Exactly what do you want this statement to do:
> 
> if(merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv1 + iv2 +
> iv3, data = merged.dataset)
> 
> The result in the "if" is a vector of logical values (assuming that
> merged.dataset is longer than 1); that is the source of the error
> message.  Do you want the regression to be done on every row?  So what
> is the problem you are trying to solve?
> 
> On Fri, Aug 22, 2008 at 10:47 AM, Kate Rohrbaugh
> <[EMAIL PROTECTED]> wrote:
>> Greetings,
>>
>> Apologies for such a simple question, but I have been trying to figure this 
>> out for a few days now (I'm quite new to R), have read manuals, list posts, 
>> etc., and not finding precisely what I need.  I am accustomed to working in 
>> Stata, but I am currently working through the multilevel package right now.  
>> In Stata, I would include the statement "if model1 == 1" at the end of my 
>> regression statement, but I can't seem to find the correct syntax for doing 
>> the same thing in R.
>>
>> I have successfully converted Stata datasets, merged, aggregated, run lm, 
>> etc., but this simple task is evading me entirely.
>>
>> Here's what I have tried (and various iterations):
>>
>> if(merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv1 + iv2 + iv3, data 
>> = merged.dataset)
>> I get this:
>> Warning message:
>> In if (merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv +  :
>>  the condition has length > 1 and only the first element will be used
>> What it seems to do is just run lm on the first opportunities of model1==1, 
>> but then doesn't try it again.  I tried sorting so that the model1==1 appear 
>> first, but that doesn't seem to be a great solution either.
>>
>> I'll just go back into Stata and create separate new datasets if I have to, 
>> but there HAS to be a more elegant way.
>>
>> Thank you for ANY feedback!
>>
>> Kate
>>
>> Kate Rohrbaugh
>> Independent Project Analysis, Inc.
>> 44426 Atwater Drive
>> Ashburn, VA  20147
>>
>> office - 703.726.5465
>> fax - 703.729.8301
>> email - [EMAIL PROTECTED] 
>> website - www.ipaglobal.com 
>>
>>
>> 
>> 
>> 
>> 
>>  
>>  
>> 
>> 
>> 
>> This email message and any attached files are 
>> confidential and are intended solely for the use of the addressee(s) named 
>> above. This communication may contain material protected by legal 
>> privileges. If you are not the intended recipient or person responsible for 
>> delivering this confidential communication to the intended recipient, you 
>> have received this communication in error; any review, use, dissemination, 
>> forwarding, printing, copying or other distribution of this email message 
>> and any attached files is strictly prohibited. Independent Project Analysis 
>> Inc. reserves the right to monitor any communication that is created, 
>> received, or sent on its network. If you have received this confidential 
>> communication in error, please notify the sender immediately by reply email 
>> message and permanently delete the original message. Thank you for your 
>> cooperation.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help 
>> PLEASE do read the posting guide http://www.R ( http://www.r/ 
>> )-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 





 
 



This email message and any attached files are 
confidential and are intended solely for the use of the addressee(s) named 
above. This communication may contain material protected by legal privileges. 
If you are not the intended recipient or person responsible for delivering this 
confidential communication to the intended recipient, you have received this 
communication in error; any review, use, dissemination, forwarding, printing, 
copying or other distribution of this email message and any attached files is 
strictly prohibited. Independent Project Analysis Inc. reserv

Re: [R] simple generation of artificial data with defined features

2008-08-22 Thread Greg Snow
I don't think that the election data is the right data to demonstrate Kappa, 
you need subjects that are classified by 2 or more different raters/methods.  
The election data could be considered classifying the voters into which party 
they voted for, but you only have 1 rater.  Maybe if you had some survey data 
that showed which party each voter voted for in 2 or more elections, then that 
may be a good example dataset.  Otherwise you may want to stick with the sample 
datasets.

There are other packages that compute Kappa values as well (I don't know if 
others calculate this particular version), but some of those take the summary 
data as input rather than the raw data, which may be easier if you just have 
the summary tables.


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111



> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of drflxms
> Sent: Friday, August 22, 2008 6:12 AM
> To: r-help@r-project.org
> Subject: [R] simple generation of artificial data with
> defined features
>
> Dear R-colleagues,
>
> I am quite a newbie to R fighting my stupidity to solve a
> probably quite simple problem of generating artificial data
> with defined features.
>
> I am conducting a study of inter-observer-agreement in
> child-bronchoscopy. One of the most important measures is
> Kappa according to Fleiss, which is very comfortable
> available in R through the irr-package.
> Unfortunately medical doctors like me don't really understand
> much of statistics. Therefore I'd like to give the reader an
> easy understandable example of Fleiss-Kappa in the Methods
> part. To achieve this, I obtained a table with the results of
> the German election from 2005:
>
> partynumber of votespercent
>
> SPD1619466534,2
> CDU1313674027,8
> CSU34943097,4
> Gruene38383268,1
> FDP46481449,8
> PDS41181948,7
>
> I want to show the agreement of voters measured by
> Fleiss-Kappa. To calculate this with the
> kappam.fleiss-function of irr, I need a data.frame like this:
>
> (id of 1st voter) (id of 2nd voter)
>
> party spd cdu
>
> Of course I don't plan to calculate this with the million of
> cases mentioned in the table above (I am working on a small
> laptop). A division by 1000 would be more than perfect for
> this example. The exact format of the table is generally not
> so important, as I could reshape nearly every format with the
> help of the reshape-package.
>
> Unfortunately I could not figure out how to create such a
> fictive/artificial dataset as described above. Any data.frame
> would be nice, that keeps at least the percentage. String-IDs
> of parties could be substituted by numbers of course (would
> be even better for function kappam.fleiss in irr!).
>
> I would appreciate any kind of help very much indeed.
> Greetings from Munich,
>
> Felix Mueller-Sarnowski
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boxplot

2008-08-22 Thread jim holtman
Try this:  it came from the help on 'bxp' which is called from boxplot:

boxplot(count ~ spray, data = InsectSprays, col = "lightgray",medlwd=1)


On Fri, Aug 22, 2008 at 3:18 PM, Suyan Tian <[EMAIL PROTECTED]> wrote:
> Hi, I just made a boxplot but I want to change the thickness of the line
> used for the median to look a little thinner. Could anyone please help me
> figuring out the R-code to do it?
>
> Thanks, I really appreciate it.
>
> Suyan
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] call perl

2008-08-22 Thread Greg Snow
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Jay an
> Sent: Thursday, August 21, 2008 8:08 PM
> To: r-help@r-project.org
> Subject: [R] call perl
>
> Hi,
>
> It may be the old question.
> can anyone tell me how to call perl in R?



> # call perl in R
> cat('Here perl, here perl, come here perl\n')



Ok, more seriously it can depend on what you want to do with Perl.

You can use the system or shell command to run any external program including 
perl.

You can open up a pipe connection to send info to or get info from external 
programs including perl.

There is the Rsperl interface (http://www.omegahat.org/RSPerl/)

The book S Poetry (http://www.burns-stat.com/pages/spoetry.html) has a 'perl' 
function that will run perl commands on a character vector (takes care of the 
exporting/importing for you).

If you just want Perl for the regular expressions, there are R functions such 
as regexpr that can use Perl regular expressions directly inside of R (link to 
pcre).

There may be other options, which is best depends on what you want perl to do 
for you.

Hope this helps,


> thanks
>
> Y.
>
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filtering out data

2008-08-22 Thread Thomas Lumley

On Fri, 22 Aug 2008, Kate Rohrbaugh wrote:


Greetings,

Apologies for such a simple question, but I have been trying to figure 
this out for a few days now (I'm quite new to R), have read manuals, 
list posts, etc., and not finding precisely what I need.  I am 
accustomed to working in Stata, but I am currently working through the 
multilevel package right now.  In Stata, I would include the statement 
"if model1 == 1" at the end of my regression statement, but I can't seem 
to find the correct syntax for doing the same thing in R.


You want the subset= argument to lm(), eg subset= model1==1

-thomas


I have successfully converted Stata datasets, merged, aggregated, run lm, etc., 
but this simple task is evading me entirely.

Here's what I have tried (and various iterations):

if(merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv1 + iv2 + iv3, data = 
merged.dataset)
I get this:
Warning message:
In if (merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv +  :
 the condition has length > 1 and only the first element will be used
What it seems to do is just run lm on the first opportunities of model1==1, but 
then doesn't try it again.  I tried sorting so that the model1==1 appear first, 
but that doesn't seem to be a great solution either.

I'll just go back into Stata and create separate new datasets if I have to, but 
there HAS to be a more elegant way.

Thank you for ANY feedback!

Kate

Kate Rohrbaugh
Independent Project Analysis, Inc.
44426 Atwater Drive
Ashburn, VA  20147

office - 703.726.5465
fax - 703.729.8301
email - [EMAIL PROTECTED]
website - www.ipaglobal.com






 
 



This email message and any attached files are confidential and are 
intended solely for the use of the addressee(s) named above. This communication may contain material protected by 
legal privileges. If you are not the intended recipient or person responsible for delivering this confidential 
communication to the intended recipient, you have received this communication in error; any review, use, 
dissemination, forwarding, printing, copying or other distribution of this email message and any attached files is 
strictly prohibited. Independent Project Analysis Inc. reserves the right to monitor any communication that is 
created, received, or sent on its network. If you have received this confidential communication in error, please 
notify the sender immediately by reply email message and permanently delete the original message. Thank you for 
your cooperation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Large data sets with R (binding to hadoop available?)

2008-08-22 Thread Martin Morgan
Hi Avram --

My understanding is that Google-like map / reduce achieves throughput
by coordinating distributed calculation with distributed data.

snow, Rmpi, nws, etc provide a way of distributing calculations, but
don't help with coordinating distributed calculation with distributed
data.

SQL (at least naively implemented as a single database server) doesn't
help with distributed data and the overhead of data movement from the
server to compute nodes might be devastating. A shared file system
across compute nodes (the implicit approach usually taken parallel R
applications) offloads data distribution to the file system, which may
be effective for not-too-large (10's of GB?) data.

Many non-trivial R algorithms are not directly usable in distributed
map, because they expect to operate on 'all of the data' rather than
on data chunks. Out-of-the-box 'reduce' in R is limited really to
collation (the parallel lapply-like functions) or sapply-like
simplification; one would rather have more talented reducers (e.g., to
aggregate bootstrap results).

The list of talents required to exploit Hadoop starts to become
intimidating (R, Java, Hadoop, PIG, + cluster management, etc), so it
would certainly be useful to have that encapsulated in a way that
requires only R skills!

Martin

<[EMAIL PROTECTED]> writes:

> Hi
>
> Apart from database interfaces such as sqldf which Gabor has
> mentioned, there are also packages specifically for handling large
> data: see the "ff" package, for instance.
>
> I am currently playing with parallelizing R computations via Hadoop. I
> haven't looked at PIG yet though.
>
> Rory
>
>
> -Original Message- From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Roland Rau Sent: 21
> August 2008 20:04 To: Avram Aelony Cc: r-help@r-project.org Subject:
> Re: [R] Large data sets with R (binding to hadoop available?)
>
> Hi
>
> Avram Aelony wrote:
>> Dear R community,
>> I find R fantastic and use R whenever I can for my data analytic
>> needs.  Certain data sets, however, are so large that other tools
>> seem to be needed to pre-process data such that it can be brought
>> into R for further analysis.
>> Questions I have for the many expert contributors on this list are:
>> 1. How do others handle situations of large data sets (gigabytes,
>> terabytes) for analysis in R ?
>>
> I usually try to store the data in an SQLite database and interface
>> via functions from the packages RSQLite (and DBI).
>
> No idea about Question No. 2, though.
>
> Hope this helps,
> Roland
>
>
> P.S. When I am sure that I only need a certain subset of large data
>> sets, I still prefer to do some pre-processing in awk (gawk).
> 2.P.S. The size of my data sets are in the gigabyte range (not
>> terabyte range). This might be important if your data sets are
>> *really large* and you want to use sqlite:
>> http://www.sqlite.org/whentouse.html
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ***
> The Royal Bank of Scotland plc. Registered in Scotland No
>> 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
> Authorised and regulated by the Financial Services Authority
>
> This e-mail message is confidential and for use by
>> the=2...{{dropped:22}}
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] boxplot

2008-08-22 Thread Suyan Tian
Hi, I just made a boxplot but I want to change the thickness of the  
line used for the median to look a little thinner. Could anyone please  
help me figuring out the R-code to do it?


Thanks, I really appreciate it.

Suyan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Large data sets with R (binding to hadoop available?)

2008-08-22 Thread Thomas Lumley

On Thu, 21 Aug 2008, Roland Rau wrote:

Hi

Avram Aelony wrote: (in part)


1. How do others handle situations of large data sets (gigabytes, 
terabytes) for analysis in R ?


I usually try to store the data in an SQLite database and interface via 
functions from the packages RSQLite (and DBI).


No idea about Question No. 2, though.

Hope this helps,
Roland


P.S. When I am sure that I only need a certain subset of large data sets, I 
still prefer to do some pre-processing in awk (gawk).
2.P.S. The size of my data sets are in the gigabyte range (not terabyte 
range). This might be important if your data sets are *really large* and you 
want to use sqlite: http://www.sqlite.org/whentouse.html




I use netCDF for (genomic) datasets in the 100Gb range, with the ncdf 
package, because SQLite was too slow for the sort of queries I needed. 
HDF5 would be another possibility; I'm not sure of the current status of 
the HDF5 support in Bioconductor, though.


-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test of Homogeneity of Variances

2008-08-22 Thread Daren Tan

I am testing whether the sample variances are equal. When p-value < 0.05 
(alpha), should accept null hypothesis (sample variances are equal) or reject 
it ?


The two new examples with each having same sample variances also puzzle me. Why 
are the p-values different ?

bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4))


Bartlett test of homogeneity of variances


data:  rep(rnorm(5), times = 4) and rep(1:5, each = 4)
Bartlett's K-squared = 0.8681, df = 4, p-value = 0.929


bartlett.test(rep(rnorm(5),times=4), rep(1:5, each=4))


Bartlett test of homogeneity of variances


data:  rep(rnorm(5), times = 4) and rep(1:5, each = 4)
Bartlett's K-squared = 3.5599, df = 4, p-value = 0.4688


> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Date: Fri, 22 Aug 2008 11:25:36 -0400
> Subject: RE: [R] Test of Homogeneity of Variances
>
> What are your hypotheses? Once you state what they are, interpretation should 
> be straightforward.
>
>
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daren Tan
> Sent: Friday, August 22, 2008 11:18 AM
> To: [EMAIL PROTECTED]
> Subject: [R] Test of Homogeneity of Variances
>
>
> I am testing the homogeneity of variances via bartlett.test and fligner.test. 
> Using the following example, how should I interpret the p-value in order to 
> accept or reject the null hypothesis ?
>
> set.seed(5)
> x <- rnorm(20)
> bartlett.test(x, rep(1:5, each=4))
>
>
> Bartlett test of homogeneity of variances
>
>
> data: x and rep(1:5, each = 4)
> Bartlett's K-squared = 1.7709, df = 4, p-value = 0.7778
>
> fligner.test(x, rep(1:5, each=4))
>
> Fligner-Killeen test of homogeneity of variances
>
>
> data: x and rep(1:5, each = 4)
> Fligner-Killeen:med chi-squared = 1.0819, df = 4, p-value = 0.8971
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> This email message, including any attachments, is for ...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filtering out data

2008-08-22 Thread Daniel Folkinshteyn
If I understand correctly the result you are trying to achieve, I think 
what you may be looking for is this:


model1.coeff <- lm(dv ~ iv1 + iv2 + iv3, data = 
merged.dataset[merged.dataset$model1 == 1, ])


This will give you a regression using only the data for which model1 == 1.

on 08/22/2008 11:07 AM jim holtman said the following:

Exactly what do you want this statement to do:

if(merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv1 + iv2 +
iv3, data = merged.dataset)

The result in the "if" is a vector of logical values (assuming that
merged.dataset is longer than 1); that is the source of the error
message.  Do you want the regression to be done on every row?  So what
is the problem you are trying to solve?

On Fri, Aug 22, 2008 at 10:47 AM, Kate Rohrbaugh
<[EMAIL PROTECTED]> wrote:

Greetings,

Apologies for such a simple question, but I have been trying to figure this out for a few 
days now (I'm quite new to R), have read manuals, list posts, etc., and not finding 
precisely what I need.  I am accustomed to working in Stata, but I am currently working 
through the multilevel package right now.  In Stata, I would include the statement 
"if model1 == 1" at the end of my regression statement, but I can't seem to 
find the correct syntax for doing the same thing in R.

I have successfully converted Stata datasets, merged, aggregated, run lm, etc., 
but this simple task is evading me entirely.

Here's what I have tried (and various iterations):

if(merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv1 + iv2 + iv3, data = 
merged.dataset)
I get this:
Warning message:
In if (merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv +  :
 the condition has length > 1 and only the first element will be used
What it seems to do is just run lm on the first opportunities of model1==1, but 
then doesn't try it again.  I tried sorting so that the model1==1 appear first, 
but that doesn't seem to be a great solution either.

I'll just go back into Stata and create separate new datasets if I have to, but 
there HAS to be a more elegant way.

Thank you for ANY feedback!

Kate

Kate Rohrbaugh
Independent Project Analysis, Inc.
44426 Atwater Drive
Ashburn, VA  20147

office - 703.726.5465
fax - 703.729.8301
email - [EMAIL PROTECTED]
website - www.ipaglobal.com






 
 



This email message and any attached files are confidential and are 
intended solely for the use of the addressee(s) named above. This communication may contain material protected by 
legal privileges. If you are not the intended recipient or person responsible for delivering this confidential 
communication to the intended recipient, you have received this communication in error; any review, use, 
dissemination, forwarding, printing, copying or other distribution of this email message and any attached files is 
strictly prohibited. Independent Project Analysis Inc. reserves the right to monitor any communication that is 
created, received, or sent on its network. If you have received this confidential communication in error, please 
notify the sender immediately by reply email message and permanently delete the original message. Thank you for 
your cooperation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mapping a vector into predefined bins

2008-08-22 Thread Wolfgang Huber

Have a look at

  ? cut



Best wishes
 Wolfgang

--
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber



22/08/2008 14:14 Dr. G. Lawyer scripsit
>Hi,
>I have a mapping M consisting of a vector of bin centers (say 0.01, 0.02,
>0.03 ..., 1) with associated values (say "red", "orange", "yellow", ...),
>stored as a 2-column matrix with column 1 the bin center and column 2 the
>value.
>I also have a vector D of data (say 0.9056, 0.4118,  0.2162,  ...).
>I want to map each datum in D to the value associated with its bin in M 
> (say
>0.0111-> red, 0.0198 -> orange, ...) .
>Does R have a built-in function for this?
>Thanks for tips/suggestions.
> 
>+glenn

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme questions re: repeated measures & covariance structure

2008-08-22 Thread Christoph Scherber

Dear Mark,

I would include the repeated measure as the smallest stratum in the random 
effects specification:

random=~1|sampleunit/year

Setting up user-defined variance structures should be possible using for 
example:

weights=varPower(form=~habitat)

or also try out the available corStruct() classes (found in Pinheiro and Bates 
2000)

HTH
Christoph




Mark Na schrieb:

Hello,

We are attempting to use nlme to fit a linear mixed model to explain bird
abundance as a function of habitat:


lme(abundance~habitat-1,data=data,method="ML",random=~1|sampleunit)

The data consist of repeated counts of birds in sample units across multiple
years, and we have two questions:

1) Is it necessary (and, if so, how) to specify the repeated measure
(years)? As written, the above code does not.

2) How can we specify a Toeplitz heterogeneous covariance structure for this
model? We have searched the help file for lme, and the R-help archives, but
cannot find any pertinent information. If that's not possible, can we adapt
an existing covariance structure, and if so how?

Thanks, Mark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

.



--
Dr. rer.nat. Christoph Scherber
University of Goettingen
DNPW, Agroecology
Waldweg 26
D-37073 Goettingen
Germany

phone +49 (0)551 39 8807
fax   +49 (0)551 39 8806

Homepage http://www.gwdg.de/~cscherb1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] swap

2008-08-22 Thread Patrick Burns

I'm guessing that the following (untested)
does what is wanted:

function(x) {
   pos <- sample(length(x), 2, replace=FALSE)
   x[pos] <- x[ pos[2:1] ]
   x
}


Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

[EMAIL PROTECTED] wrote:

Hello Richie,
I would like to do three (or k) swap steps in each step just 2 ID 
recursive swaping

x <- 1:10
swap <- function(x){
  a <- sample(x,2)
  x[x==a[1]] <- swap[2]
  x[x==a[2]] <- swap[1]
  return(x)
  }
  swap(swap(swap(x))) -> mix



I tried my best with a response before, but if you want a sensible answer 
you are going to have to try harder explaining what you really want.


What do you mean by 'swap step'?

If you want to swap the position of two elements in a vector (as I suspect 
you might) then which positions do you want swapping?  Do you specify them 
yourself (as inputs to the swap function perhaps), or should they be 
randomly generated?


If you provide more context (the rest of your code, what you are trying to 
achieve etc.) the help will be better.


Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Null and Alternate hypothesis for Significance test

2008-08-22 Thread Greg Snow
The problem with trying to prove that 2 distributions are equal is that there 
is exactly one way in which they can be equal and an infinite number of ways 
that they can be different (and its rather a large infinity).  The traditional 
use of equality as the Null hypothesis works, because we can assume that it is 
true and then compute the probability of seeing data as or more extreem than 
that observed.  If we just want to assume that the 2 distributions are 
different and compute the probability of the similarity between the datasets, 
then we can assume differences that are small enough, but still non-zero, to 
easily result in the similarity of the data seen.  It is not hard to come up 
with 2 distributions that are clearly different, but which would yield 
identical datasets.  Given this, any test to show exact equality would need to 
have a p-value identically set at 1 (data observed is completely plausible with 
null hypothesis).  Or we could get a significantly higher powered !
 test of the correct size by generating a p-value from a uniform(0,1) 
distribution.  Neither option has much scientific merit.

Stepping away from proving equality, the more common approach is to prove 
equivalence, where the alternative hypothesis is that the 2 distributions are 
close enough even if not equal.  Determining "close enough" is subjective and 
dependent on the scientific question.  Close enough is often determined by 
visual inspection of graphs rather than a hypothesis test.  If you insist on a 
hypothesis test, then you need to determine in advance what is meant by "close 
enough", both in deciding the distance measure and how big that distance has to 
be before they are no longer equivalent.

You asked about the KS distance measure, that is one option for choosing a 
distance, there are others, which works best depends on you and the scientific 
question.  Take for example 2 distributions F and G.  F is uniform(0,1) meaning 
the the density function is 1 between 0 and 1 and 0 elsewhere, the distribution 
of G is equal to 1 between 0 and 0.99 and also equal to 1 between 99.99 and 
100, zero elsewhere.  Are these 2 functions equivalent?  The 2 functions have 
99% overlap, the KS distance is small (0.01 if I remember correctly), but the 
means and variances are quite different.  When generating random values from 
the 2 distributions we will see very similar numbers with the exception that G 
will generate outliers near 100 1% of the time.  Some people would consider 
these 2 distributions equivalent (they are pretty much the same if we discard 
the outliers), while others would consider the potential outliers (very 
extreeme) to make them non-equivalent.  You need to decide th!
 at based on the science from which your data comes.

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111



> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Nitin Agrawal
> Sent: Thursday, August 21, 2008 2:59 PM
> To: r-help@r-project.org
> Subject: [R] Null and Alternate hypothesis for Significance test
>
> Hi,
> I had a question about specifying the Null hypothesis in a
> significance test.
> Advance apologies if this has already been asked previously
> or is a naive question.
>
> I have two samples A and B, and I want to test whether A and
> B come from the same distribution. The default Null
> hypothesis would be H0: A=B But since I am trying to prove
> that A and B indeed come from the same distribution, I think
> this is not the right choice for the null hypothesis (it
> should be one that is set up to be rejected)
>
> How do I specify a null hypothesis H0: A not equal to B for
> say a KS test.
> An example to do this in R would be greatly appreciated.
>
> On a related note: what is a good way to measure the
> difference between observed and expected PDFs? Is the D
> statistic of the KS test a good choice?
>
> Thanks!
> Nitin
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lme questions re: repeated measures & covariance structure

2008-08-22 Thread Mark Na
Hello,

We are attempting to use nlme to fit a linear mixed model to explain bird
abundance as a function of habitat:


lme(abundance~habitat-1,data=data,method="ML",random=~1|sampleunit)

The data consist of repeated counts of birds in sample units across multiple
years, and we have two questions:

1) Is it necessary (and, if so, how) to specify the repeated measure
(years)? As written, the above code does not.

2) How can we specify a Toeplitz heterogeneous covariance structure for this
model? We have searched the help file for lme, and the R-help archives, but
cannot find any pertinent information. If that's not possible, can we adapt
an existing covariance structure, and if so how?

Thanks, Mark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test of Homogeneity of Variances

2008-08-22 Thread Richardson, Patrick
What are your hypotheses?  Once you state what they are, interpretation should 
be straightforward.



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daren Tan
Sent: Friday, August 22, 2008 11:18 AM
To: [EMAIL PROTECTED]
Subject: [R] Test of Homogeneity of Variances


I am testing the homogeneity of variances via bartlett.test and fligner.test. 
Using the following example, how should I interpret the p-value in order to 
accept or reject the null hypothesis ?

set.seed(5)
x <- rnorm(20)
bartlett.test(x, rep(1:5, each=4))


Bartlett test of homogeneity of variances


data:  x and rep(1:5, each = 4)
Bartlett's K-squared = 1.7709, df = 4, p-value = 0.7778

fligner.test(x, rep(1:5, each=4))

 Fligner-Killeen test of homogeneity of variances


data:  x and rep(1:5, each = 4)
Fligner-Killeen:med chi-squared = 1.0819, df = 4, p-value = 0.8971

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] HELP: how to add weight to a [x,y] coordinate

2008-08-22 Thread Greg Snow
There are several options.

You can use the sunflowerplot function.

You can plot solid points with a small alpha value (not all graphics devices 
support this), so that just a few points show up pale, but a lot show up as 
more opaque.

You can use the symbols function (or bubbleplot, or my.symbols, or ..., from 
other packages).

You can use the hexbin package and function (bioconductor) (I think this is my 
prefered method).

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111



> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Jan Akko Eleveld
> Sent: Thursday, August 21, 2008 3:11 PM
> To: r-help@r-project.org
> Subject: [R] HELP: how to add weight to a [x,y] coordinate
>
>
> Anyone who can help me with the following question?
>
>
> How can I add weight to [x,y] coordinates on a graph/scatterplot?
>
>
> Background:
> Monte Carlo simulation generated 730,000 [x,y] coordinates
> with a weight attached (from 0-0.5).
> Both x and y are rounded and fit on a raster with x-axis
> 0-170 months (smalles unit = 1 month) and y-axis 0-6
> (smallest unit=0.1).
> I would like every [x,y] to add its weight, so that after all
> 730,000 [x,y] are plotted, a maximum likelyhood becomes
> apparent through the size of the point that are made up by
> accumulated weights.
>
> Thank you in advance for any thoughts!
>
> Best, Akko Eleveld
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cross validation for lme?

2008-08-22 Thread Mark Na
Hello,

We would like to perform a cross validation on linear mixed models (lme) and
wonder if anyone knows of something analogous to cv.glm for such models?

Thanks, Mark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Test of Homogeneity of Variances

2008-08-22 Thread Daren Tan

I am testing the homogeneity of variances via bartlett.test and fligner.test. 
Using the following example, how should I interpret the p-value in order to 
accept or reject the null hypothesis ?

set.seed(5)
x <- rnorm(20)
bartlett.test(x, rep(1:5, each=4))


Bartlett test of homogeneity of variances


data:  x and rep(1:5, each = 4)
Bartlett's K-squared = 1.7709, df = 4, p-value = 0.7778

fligner.test(x, rep(1:5, each=4))

 Fligner-Killeen test of homogeneity of variances


data:  x and rep(1:5, each = 4)
Fligner-Killeen:med chi-squared = 1.0819, df = 4, p-value = 0.8971

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read malformed csv files with read.table?

2008-08-22 Thread Prof Brian Ripley

On Fri, 22 Aug 2008, Daniel Folkinshteyn wrote:


on 08/22/2008 10:19 AM Martin Ballaschk said the following:
how do I read files that have two header fields less than they have 
columns? The easiest solution would be to insert one or two additional 
header fields, but I have a lot of files and that would be quite a lot of 
awful work.


Any ideas on how to solve that problem?



you could use read.table with "header = F", that way it will read the table 
without worrying about column names (they will end up in the first row of the 
data).


Or, better, use header=FALSE, skip=1 and the col.names arg of 
read.table().




Then, you can just delete the first row, or assign it to names(), or 
whatever.


if all the columns in all your files have the same names, you can read them 
all with header=F and col.names=vectorofcolumnnames, and then delete first 
row (which will contain the incomplete col names from the file).


The trouble with that approach is that your will get all factor columns.


hope this helps :)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] testing for differences in sums - how to?

2008-08-22 Thread James Howey
If I want to test for a salesman effect on average sale, I model it as "Sale ~ 
Salesman". But what if I want to test for a salesman effect on total sales? 
How do I model that?

Given a significant salesman effect, how do I follow up with the equivalent of 
Tukey's HSD to determine who the (indistinguishly) top salesmen are?

If I add another factor, say "region" to the design, how do I then model the 
result?

Thanks,

jkh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filtering out data

2008-08-22 Thread jim holtman
Exactly what do you want this statement to do:

if(merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv1 + iv2 +
iv3, data = merged.dataset)

The result in the "if" is a vector of logical values (assuming that
merged.dataset is longer than 1); that is the source of the error
message.  Do you want the regression to be done on every row?  So what
is the problem you are trying to solve?

On Fri, Aug 22, 2008 at 10:47 AM, Kate Rohrbaugh
<[EMAIL PROTECTED]> wrote:
> Greetings,
>
> Apologies for such a simple question, but I have been trying to figure this 
> out for a few days now (I'm quite new to R), have read manuals, list posts, 
> etc., and not finding precisely what I need.  I am accustomed to working in 
> Stata, but I am currently working through the multilevel package right now.  
> In Stata, I would include the statement "if model1 == 1" at the end of my 
> regression statement, but I can't seem to find the correct syntax for doing 
> the same thing in R.
>
> I have successfully converted Stata datasets, merged, aggregated, run lm, 
> etc., but this simple task is evading me entirely.
>
> Here's what I have tried (and various iterations):
>
> if(merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv1 + iv2 + iv3, data 
> = merged.dataset)
> I get this:
> Warning message:
> In if (merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv +  :
>  the condition has length > 1 and only the first element will be used
> What it seems to do is just run lm on the first opportunities of model1==1, 
> but then doesn't try it again.  I tried sorting so that the model1==1 appear 
> first, but that doesn't seem to be a great solution either.
>
> I'll just go back into Stata and create separate new datasets if I have to, 
> but there HAS to be a more elegant way.
>
> Thank you for ANY feedback!
>
> Kate
>
> Kate Rohrbaugh
> Independent Project Analysis, Inc.
> 44426 Atwater Drive
> Ashburn, VA  20147
>
> office - 703.726.5465
> fax - 703.729.8301
> email - [EMAIL PROTECTED]
> website - www.ipaglobal.com
>
>
> 
> 
> 
> 
>  
>  
> 
> 
> 
> This email message and any attached files are 
> confidential and are intended solely for the use of the addressee(s) named 
> above. This communication may contain material protected by legal privileges. 
> If you are not the intended recipient or person responsible for delivering 
> this confidential communication to the intended recipient, you have received 
> this communication in error; any review, use, dissemination, forwarding, 
> printing, copying or other distribution of this email message and any 
> attached files is strictly prohibited. Independent Project Analysis Inc. 
> reserves the right to monitor any communication that is created, received, or 
> sent on its network. If you have received this confidential communication in 
> error, please notify the sender immediately by reply email message and 
> permanently delete the original message. Thank you for your 
> cooperation.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read malformed csv files with read.table?

2008-08-22 Thread Daniel Folkinshteyn

on 08/22/2008 10:19 AM Martin Ballaschk said the following:
how do I read files that have two header fields less than they have 
columns? The easiest solution would be to insert one or two additional 
header fields, but I have a lot of files and that would be quite a lot 
of awful work.


Any ideas on how to solve that problem?



you could use read.table with "header = F", that way it will read the 
table without worrying about column names (they will end up in the first 
row of the data).


Then, you can just delete the first row, or assign it to names(), or 
whatever.


if all the columns in all your files have the same names, you can read 
them all with header=F and col.names=vectorofcolumnnames, and then 
delete first row (which will contain the incomplete col names from the 
file).


hope this helps :)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read malformed csv files with read.table?

2008-08-22 Thread jim holtman
Try this.  It will read the file and see if there is a difference and
add in the extra headers:

x <- "   time/ms C550.KMSCyt_b559.KMSCyt_b563.KMS
Cyt_f_.KMS  P515FR.KMS  Scatt.KMS   Zea2.KMSPC
 P700
0   Point1  -599.500   0.000   0.000
0.000   0.000  0.000   0.000   0.000
0.000   0.000
0   Point2  -598.000  -0.012  -0.013
0.040   0.013  0.027   0.010   0.022
0.000   0.000
0   Point3  -596.500  -0.015  -0.015
0.044   0.020  0.025   0.010   0.033
0.000   0.000"
# find out how many dummy headers you have to add
x.c <- count.fields(textConnection(x))
x.diff <- x.c[2] - x.c[1]  # assume first line is short
x.connection <- textConnection(x)  # setup connection
if (x.diff > 0){
# read first line
x.first <- readLines(x.connection, n=1)
# add dummy headers
x.first <- paste(x.first, paste(LETTERS[1:x.diff], collapse=" "))
pushBack(x.first, x.connection)   # push back the line so it is
ready for read.table
}

input <- read.table(x.connection, header=TRUE)
closeAllConnections()



On Fri, Aug 22, 2008 at 10:19 AM, Martin Ballaschk
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> how do I read files that have two header fields less than they have columns?
> The easiest solution would be to insert one or two additional header fields,
> but I have a lot of files and that would be quite a lot of awful work.
>
> Any ideas on how to solve that problem?
>
> ###
> R stuff:
>
>> read.table("myfile.CSV", sep = "\t", header = T)
>  Error in read.table("myfile.CSV", sep = "\t",  :
>  more columns than column names
>
>> count.fields("myfile.CSV", sep = "\t")
>   [1] 10 12 12 12 12 12 12 12 12 12 12 [...]
>
> ###
> ugly sample ("Exported by SDL DataTable component"):
>
>time/ms C550.KMSCyt_b559.KMSCyt_b563.KMSCyt_f_.KMS
>P515FR.KMS  Scatt.KMS   Zea2.KMSPC  P700
> 0   Point1  -599.500   0.000   0.000   0.000
>   0.000  0.000   0.000   0.000
> 0.000   0.000
> 0   Point2  -598.000  -0.012  -0.013   0.040
>   0.013  0.027   0.010   0.022
> 0.000   0.000
> 0   Point3  -596.500  -0.015  -0.015   0.044
>   0.020  0.025   0.010   0.033
> 0.000   0.000
> [...]
>
>
> Cheers,
> Martin
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] draw a function over a histogram

2008-08-22 Thread Rubén Roa-Ureta

Daniela Garavaglia wrote:

Sorry, I have some troubles with the graph device.

How can I draw a function over a histogram?

Thank's so much.
  

Daniela,
What function?
Here is one example using density() and using dnorm()
x <- rnorm(1000,2,2)
hist(x,prob=TRUE)
lines(density(x,na.rm=TRUE),col="red")
library(MASS)
y <- fitdistr(x,"normal")
curve(dnorm(x,mean=y$estimate[1],sd=y$estimate[2]),add=TRUE)

HTH
R.


[[alternative HTML version deleted]] <--- What about this??




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] filtering out data

2008-08-22 Thread Kate Rohrbaugh
Greetings,
 
Apologies for such a simple question, but I have been trying to figure this out 
for a few days now (I'm quite new to R), have read manuals, list posts, etc., 
and not finding precisely what I need.  I am accustomed to working in Stata, 
but I am currently working through the multilevel package right now.  In Stata, 
I would include the statement "if model1 == 1" at the end of my regression 
statement, but I can't seem to find the correct syntax for doing the same thing 
in R.
 
I have successfully converted Stata datasets, merged, aggregated, run lm, etc., 
but this simple task is evading me entirely.  
 
Here's what I have tried (and various iterations):
 
if(merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv1 + iv2 + iv3, data = 
merged.dataset)
I get this:
Warning message:
In if (merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv +  :
  the condition has length > 1 and only the first element will be used
What it seems to do is just run lm on the first opportunities of model1==1, but 
then doesn't try it again.  I tried sorting so that the model1==1 appear first, 
but that doesn't seem to be a great solution either.
 
I'll just go back into Stata and create separate new datasets if I have to, but 
there HAS to be a more elegant way.
 
Thank you for ANY feedback!
 
Kate

Kate Rohrbaugh
Independent Project Analysis, Inc.
44426 Atwater Drive
Ashburn, VA  20147
 
office - 703.726.5465
fax - 703.729.8301
email - [EMAIL PROTECTED]
website - www.ipaglobal.com






 
 



This email message and any attached files are 
confidential and are intended solely for the use of the addressee(s) named 
above. This communication may contain material protected by legal privileges. 
If you are not the intended recipient or person responsible for delivering this 
confidential communication to the intended recipient, you have received this 
communication in error; any review, use, dissemination, forwarding, printing, 
copying or other distribution of this email message and any attached files is 
strictly prohibited. Independent Project Analysis Inc. reserves the right to 
monitor any communication that is created, received, or sent on its network. If 
you have received this confidential communication in error, please notify the 
sender immediately by reply email message and permanently delete the original 
message. Thank you for your cooperation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] filtering out data

2008-08-22 Thread Kate Rohrbaugh
Greetings,
 
Apologies for such a simple question, but I have been trying to figure this out 
for a few days now (I'm quite new to R), have read manuals, list posts, etc., 
and not finding precisely what I need.  I am accustomed to working in Stata, 
but I am currently working through the multilevel package right now.  In Stata, 
I would include the statement "if model1 == 1" at the end of my regression 
statement, but I can't seem to find the correct syntax for doing the same thing 
in R.
 
I have successfully converted Stata datasets, merged, aggregated, run lm, etc., 
but this simple task is evading me entirely.  
 
Here's what I have tried (and various iterations):
 
if(merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv1 + iv2 + iv3, data = 
merged.dataset)
I get this:
Warning message:
In if (merged.dataset$model1 == 1) model1.coeff <- lm(dv ~ iv +  :
  the condition has length > 1 and only the first element will be used
What it seems to do is just run lm on the first opportunities of model1==1, but 
then doesn't try it again.  I tried sorting so that the model1==1 appear first, 
but that doesn't seem to be a great solution either.
 
I'll just go back into Stata and create separate new datasets if I have to, but 
there HAS to be a more elegant way.
 
Thank you for ANY feedback!
 
Kate
 
 
Kate Rohrbaugh
Independent Project Analysis, Inc.
44426 Atwater Drive
Ashburn, VA  20147
 
office - 703.726.5465
fax - 703.729.8301
email - [EMAIL PROTECTED] 
website - www.ipaglobal.com ( http://www.ipaglobal.com/ )





 
 



This email message and any attached files are 
confidential and are intended solely for the use of the addressee(s) named 
above. This communication may contain material protected by legal privileges. 
If you are not the intended recipient or person responsible for delivering this 
confidential communication to the intended recipient, you have received this 
communication in error; any review, use, dissemination, forwarding, printing, 
copying or other distribution of this email message and any attached files is 
strictly prohibited. Independent Project Analysis Inc. reserves the right to 
monitor any communication that is created, received, or sent on its network. If 
you have received this confidential communication in error, please notify the 
sender immediately by reply email message and permanently delete the original 
message. Thank you for your cooperation.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mapping a vector into predefined bins

2008-08-22 Thread Dr. G. Lawyer

   Hi,
   I have a mapping M consisting of a vector of bin centers (say 0.01, 0.02,
   0.03 ..., 1) with associated values (say "red", "orange", "yellow", ...),
   stored as a 2-column matrix with column 1 the bin center and column 2 the
   value.
   I also have a vector D of data (say 0.9056, 0.4118,  0.2162,  ...).
   I want to map each datum in D to the value associated with its bin in M (say
   0.0111-> red, 0.0198 -> orange, ...) .
   Does R have a built-in function for this?
   Thanks for tips/suggestions.

   +glenn
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] draw a function over a histogram

2008-08-22 Thread Daniela Garavaglia
Sorry, I have some troubles with the graph device.

How can I draw a function over a histogram?

Thank's so much.

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : Help on competing risk package cmprsk with time dependent covariate

2008-08-22 Thread Philippe Guardiola
Hello again,

I m trying to use timereg package as you suggested (R2.7.1 on XP Pro). 
here is my script based on the example from timereg for a fine & gray model in 
which
relt = time to event, rels = status 0/1/2 2=competing, 1=event of interest, 
0=censored
random = covariate I want to test
 
library(timereg)
rel<-read.csv("relapse2.csv", header = TRUE, sep = ",", quote="\"", dec=".", 
fill = TRUE, comment.char="")
names(rel)

> names(rel)
[1] "upn""rels"   "relt"   "random"

times<-rel$relt[rel$rels==1]

fg1<-comp.risk(Surv(relt,rels>0)~const(random),rel,rel$rels,times[-1],causeS=1,resample.iid=1,model="prop")
summary(fg)

fg2<-comp.risk(Surv(relt,rels>0)~random,rel,rel$rels,times[-1],causeS=1,resample.iid=1,model="prop")
summary(fg)

Question1: can you confirm that "fg1" evaluates "random" with the
hypothesis that it has a constant hazard over time and that it is
assumed as being time dependent in "fg2" ?

Question 2: both summaries give me the following that I dont understand at all, 
is there a mistake in my script ?

Competing risks Model 

Test for nonparametric terms 

Test for non-significant effects 
sup| hat B(t)/SD(t) | p-value H_0: B(t)=0
(Intercept) 0   0
random  0   0
Test for time invariant effects 
supremum test p-value H_0: B(t)=b t
(Intercept) 0 0
random  0 0

int (b(t)-g(t,gam))^2dt p-value H_0:constant effect 
(Intercept)   00
random00

  Call: 
comp.risk(Surv(relt, rels > 0) ~ random, rel, rel$rels, times[-1], causeS = 
1, resample.iid = 1, model = "prop")


any help is very welcome
regards


Philippe G


- Message d'origine 
De : Arthur Allignol <[EMAIL PROTECTED]>
À : Philippe Guardiola <[EMAIL PROTECTED]>
Cc : R-help@r-project.org; [EMAIL PROTECTED]
Envoyé le : Vendredi, 22 Août 2008, 11h53mn 42s
Objet : Re: [R] Help on competing risk package cmprsk with time dependent 
covariate

Hello,

Something i don't understand
in your question.
Is treatment a time-dependent covariate?
That is, do patients receive the treatment
at the beginning of the study or later?

cmprsk cannot handle time-dependent covariates.

But if treatment is a baseline covariate,
but has a time-varying effect (i.e. does the subdistribution hazard 
ratio varies with time?), your solution
to assess that is weird, because you will transform
your baseline covariate into a time-dependent one,
thus considering all the patients to receive no treatment
the first year. For sure, the treatment wont have any
effect for the first year.
To assess a time-varying effect on competing risks,
i would either follow the cmprsk documentation, including
an interaction with functions of time, or use the comp.risk
function in the timereg package, which fits more flexible
models for the cumulative incidence functions.

Best regards,
Arthur Allignol


Philippe Guardiola wrote:
> Dear R users,
> 
> 
> I d like to assess the effect of "treatment" covariate on a disease relapse 
> risk with the package cmprsk. 
> However, the effect of this covariate on survival is time-dependent
> (assessed with cox.zph): no significant effect during the first year of 
> follow-up,
> then after 1 year a favorable effect is observed on survival (step
> function might be the correct way to say that ?). 
> For overall survival analysis
> I have used a time dependent Cox model which has confirmed this positive 
> effect after
> 1 year.
> Now I m moving to disease relapse incidence and a similar time dependency 
> seems to be present. 
> 
> what I d like to have is that: for
> patients without "treatment" the code for "treatment" covariate is
> always 0, and for patients who received "treatment" covariate I d like
> to have it = 0 during time interval 0 to 1 year, and equal to 1 after 1
> year. Correct me if I m wrong in trying to do so.
> 
> 
> First, I have run the following script (R2.7.1 under XPpro) according to 
> previous advices:
> 
> library(cmprsk)
> attach(LAMrelapse)
> fit1<- crr(rel.t, rel.s, treatment, treatment, function(uft)
> cbind(ifelse(uft<=1,1,0),ifelse(uft>1,1,0)), failcode=1,
> cencode=0, na.action=na.omit, gtol-06, maxiter)
> fit1
> 
> where:
> rel.t = time to event (in years)
> rel.s = status , =1 if disease relapse, =2 if death from non disease
> related cause (toxicity of previous chemotherapy), =0 if alive &
> not in relapse
> treatment =
> binary covariate (value: 0 or 1) representing the treatment to test
> (different from chemotherapy above, with no known toxicity)
> I have not yet added other covariates in the model.
> 
> 
> this script gave me the following result:
>> fit1 <- crr(relcmp.t, relcmp.s, treatment, treatment, function(uft) 
>> cbind(ifelse(uft <= 1, 1, 0), ifelse(uft > 1, 1, 0)), failcode = 1, cencode 
>> = 0, 
> na.a

[R] How to read malformed csv files with read.table?

2008-08-22 Thread Martin Ballaschk

Hi,

how do I read files that have two header fields less than they have  
columns? The easiest solution would be to insert one or two additional  
header fields, but I have a lot of files and that would be quite a lot  
of awful work.


Any ideas on how to solve that problem?

###
R stuff:

> read.table("myfile.CSV", sep = "\t", header = T)
  Error in read.table("myfile.CSV", sep = "\t",  :
  more columns than column names

> count.fields("myfile.CSV", sep = "\t")
   [1] 10 12 12 12 12 12 12 12 12 12 12 [...]

###
ugly sample ("Exported by SDL DataTable component"):

	time/ms	C550.KMS	Cyt_b559.KMS	Cyt_b563.KMS	Cyt_f_.KMS	P515FR.KMS	 
Scatt.KMS	Zea2.KMS	PC	P700
0	Point1	-599.500	   0.000	   0.000	   0.000	
0.000	   0.000	   0.000	   0.000	   0.000	   0.000
0	Point2	-598.000	  -0.012	  -0.013	   0.040	
0.013	   0.027	   0.010	   0.022	   0.000	   0.000
0	Point3	-596.500	  -0.015	  -0.015	   0.044	
0.020	   0.025	   0.010	   0.033	   0.000	   0.000

[...]


Cheers,
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] swap

2008-08-22 Thread Richard . Cotton
> Hello Richie,
> I would like to do three (or k) swap steps in each step just 2 ID 
> recursive swaping
> x <- 1:10
> swap <- function(x){
>   a <- sample(x,2)
>   x[x==a[1]] <- swap[2]
>   x[x==a[2]] <- swap[1]
>   return(x)
>   }
>   swap(swap(swap(x))) -> mix

I tried my best with a response before, but if you want a sensible answer 
you are going to have to try harder explaining what you really want.

What do you mean by 'swap step'?

If you want to swap the position of two elements in a vector (as I suspect 
you might) then which positions do you want swapping?  Do you specify them 
yourself (as inputs to the swap function perhaps), or should they be 
randomly generated?

If you provide more context (the rest of your code, what you are trying to 
achieve etc.) the help will be better.

Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coordinate systems for geostatistics in R

2008-08-22 Thread Paul Hiemstra

imicola schreef:

Hi,

I read somewhere that when carrying out geostatistical analysis in R you
should not use latitude and longitude...can anyone expand on this a little
for me, and what would be the best coordinate system to use?

I have my data in a geographic coordinate system, WGS84, decimal
degreesis this the wrong format for such analyses?

I have also converted my data in the UTM projection and so have it in
metres(ranging from 480,000 to 550,000 E and 170,000 to 230,000 N).  


If I was to use the UTM coordinates, should I be using the actual
coordinates in metres, or should I convert this into an arbitrary coordinate
system (i.e. from 0 - 1) somehow?  


I have noticed that running an analysis on the data gives different results
depending on which type of system you use, so I want to make sure I have the
correct system.  I should also probably note that I am a geostatistical
novice!

Thanks,
  

Hi,

I use the gstat package for geostatistics. For doing the analysis I 
don't think it is necessary to convert to UTM. But maybe just do it to 
be on the safe side. If you use the spatial objects provided by the 
sp-package (http://cran.r-project.org/web/packages/sp/vignettes/sp.pdf) 
you transform your data to other projections using the spTransform package.


Questions regarding geostatistics and spatial data will result in more 
answers on the r-sig-geo list.


cheers,
Paul

--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone: +31302535773
Fax:+31302531145
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nesting environments

2008-08-22 Thread Antje

Thanks a lot! I'll give it a closer look :-)

Ciao,
Antje



Gabor Grothendieck schrieb:

Look at the example section of ?new.env

Also the proto package on CRAN defines proto objects (which are just
environments with
slightly different semantics) and in the Applications section of the home page
has links to examples of using proto objects to creates GUIs with gWidgets:
http://r-proto.googlecode.com

On Fri, Aug 22, 2008 at 8:54 AM, Antje <[EMAIL PROTECTED]> wrote:

Hello,

I got the hint to use nesting environments (for a GUI problem), but I'm not
sure whether I understand what he means (and I did not really succeed with
reading documentation - probably I looked for the wrong keywords?).
That's what he wrote:


Another good way to control the scoping is nesting environments (e.g.
through closures). That is, your object does not need to be global (in the
workspace) but it can be in a parent environment. In fact, global variables
won't even work when you put your code into a namespaced pacakge, because
the namespace will be locked.

Can anybody explain it (probably with a simple example)
That would be great!

Thanks,
Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nesting environments

2008-08-22 Thread Gabor Grothendieck
Look at the example section of ?new.env

Also the proto package on CRAN defines proto objects (which are just
environments with
slightly different semantics) and in the Applications section of the home page
has links to examples of using proto objects to creates GUIs with gWidgets:
http://r-proto.googlecode.com

On Fri, Aug 22, 2008 at 8:54 AM, Antje <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I got the hint to use nesting environments (for a GUI problem), but I'm not
> sure whether I understand what he means (and I did not really succeed with
> reading documentation - probably I looked for the wrong keywords?).
> That's what he wrote:
>
>> Another good way to control the scoping is nesting environments (e.g.
>> through closures). That is, your object does not need to be global (in the
>> workspace) but it can be in a parent environment. In fact, global variables
>> won't even work when you put your code into a namespaced pacakge, because
>> the namespace will be locked.
>
> Can anybody explain it (probably with a simple example)
> That would be great!
>
> Thanks,
> Antje
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] swap

2008-08-22 Thread jim holtman
Is this what you want:

> swap <- function(z){
+ a <- sample(length(z), 2)
+ z[a] <- z[rev(a)]
+ z
+ }
> swap(1:10)
 [1]  2  1  3  4  5  6  7  8  9 10
>
> swap(1:10)
 [1]  5  2  3  4  1  6  7  8  9 10
> swap(1:10)
 [1]  8  2  3  4  5  6  7  1  9 10
> swap(1:10)
 [1]  2  1  3  4  5  6  7  8  9 10
> swap(1:10)
 [1]  4  2  3  1  5  6  7  8  9 10
> swap(swap(swap(1:10)))
 [1]  8 10  3  5  4  6  7  1  9  2
> swap(swap(swap(1:10)))
 [1]  8  2  7  4  5  6  3 10  9  1
>


On Fri, Aug 22, 2008 at 8:57 AM, amor Gandhi <[EMAIL PROTECTED]> wrote:
> Hello Richie,
> I would like to do three (or k) swap steps in each step just 2 ID recursive 
> swaping
> x <- 1:10
> swap <- function(x){
>   a <- sample(x,2)
>   x[x==a[1]] <- swap[2]
>   x[x==a[2]] <- swap[1]
>   return(x)
>   }
>   swap(swap(swap(x))) -> mix
>
> Is this possible?
> Thanks you in advance!
> Amor
>
> __
>
>
>  Schutz gegen Massenmails.
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : Help on competing risk package cmprsk with time dependent covariate

2008-08-22 Thread phguardiol

 Hello again,

I m trying to use timereg package as suggested (R2.7.1 on XP Pro). 
here is my script based on the example from timereg for a fine & gray model in 
which
relt = time to event, rels = status 0/1/2 2=competing, 1=event of interest, 
0=censored
random = covariate I want to test
 
library(timereg)
rel<-read.csv("relapse2.csv", header = TRUE, sep = ",", quote="\"", dec=".", 
fill = TRUE, comment.char="")
names(rel)

> names(rel)
[1] "upn"    "rels"   "relt"   "random"

times<-rel$relt[rel$rels==1]

fg1<-comp.risk(Surv(relt,rels>0)~const(random),rel,rel$rels,times[-1],causeS=1,resample.iid=1,model="prop")
summary(fg)

fg2<-comp.risk(Surv(relt,rels>0)~random,rel,rel$rels,times[-1],causeS=1,resample.iid=1,model="prop")
summary(fg)

Question1: can you confirm that "fg1" evaluates "random" with the hypothesis 
that it has a constant hazard over time and that it is assumed as being time 
dependent in "fg2" ?

Question 2: both summaries give me the following that I dont understand at all, 
is there a mistake in my script ?

Competing risks Model 

Test for nonparametric terms 

Test for non-significant effects 
            sup| hat B(t)/SD(t) | p-value H_0: B(t)=0
(Intercept)                     
0                   0
random                      C2   
0                   0
Test for time invariant effects 
            supremum test p-value H_0: B(t)=b t
(Intercept)             0                     0
random                  
0                     0

            int (b(t)-g(t,gam))^2dt p-value H_0:constant effect 
(Intercept)                       
0                            0
random                            
0                            0

  Call: 
comp.risk(Surv(relt, rels > 0) ~ random, rel, rel$rels, times[-1],     
causeS = 1, resample.iid = 1, model = "prop")


any help is very welcome
regards


 


Philippe G

 


 

-E-mail d'origine-
De : Arthur Allignol <[EMAIL PROTECTED]>
A : Philippe Guardiola <[EMAIL PROTECTED]>
Cc : R-help@r-project.org; phgua
[EMAIL PROTECTED]
Envoyé le : Vendredi, 22 Août 2008 11:53
Sujet : Re: [R] Help on competing risk package cmprsk with time dependent 
covariate









Hello, 
 

Something i don't understand 

in your question. 

Is treatment a time-dependent covariate? 

That is, do patients receive the treatment 

at the beginning of the study or later? 
 

cmprsk cannot handle time-dependent covariates. 
 

But if treatment is a baseline covariate, 

but has a time-varying effect (i.e. does the subdistribution hazard 
ratio varies with time?), your solution 

to assess that is weird, because you will transform 

your baseline covariate into a time-dependent one, 

thus considering all the patients to receive no treatment 

the first year. For sure, the treatment wont have any 

effect for the first year. 

To assess a time-varying effect on competing risks, 

i would either follow the cmprsk documentation, including 

an interaction with functions of time, or use the comp.risk 

function in the timereg package, which fits more flexible 

models for the cumulative incidence functions. 
 

Best regards, 

Arthur Allignol 
 


Philippe Guardiola wrote: 

> Dear R users, 

> 
> 
> I d like to assess the effect of "treatment" covariate on a disease relapse 
> risk with the package cmprsk. 
> However, the effect of this covariate on survival is time-dependent 

> (assessed with cox.zph): no 
significant effect during the first year of follow-up, 

> then after 1 year a favorable effect is observed on survival (step 

> function might be the correct way to say that ?). 
> For overall survival analysis 

> I have used a time dependent Cox model which has confirmed this positive 
> effect after 

> 1 year. 

> Now I m moving to disease relapse incidence and a similar time dependency 
> seems to be present. 
> 
> what I d like to have is that: for 

> patients without "treatment" the code for "treatment" covariate is 

> always 0, and for patients who received "treatment" covariate I d like 

> to have it = 0 during time interval 0 to 1 year, and equal to 1 after 1 

> year. Correct me if I m wrong in trying to do so. 

> 
> 
> First, I have run the following script (R2.7.1 under XPpro) according to 
> previous advices: 

> 
> library(cmprsk) 

> attach(LAMrelapse) 

> fit1<- crr(rel.t, rel.s, treatment, treatment, function(uft) 

> cbind(ifelse(uft<=1,1,0),ifelse(uft>1,1,0)), failcode=1, 

> cencode=0, na.action=na.omit, gtol-06, maxiter) 

> fit1 

> 
> where: 

> rel.t = time to event (in years) 

> rel.s = status , =1 if disease relapse, =2 if death from non disease 

> related cause (toxicity of previous chemothera

[R] swap

2008-08-22 Thread amor Gandhi
Hello Richie,
I would like to do three (or k) swap steps in each step just 2 ID recursive 
swaping
x <- 1:10
swap <- function(x){
  a <- sample(x,2)
  x[x==a[1]] <- swap[2]
  x[x==a[2]] <- swap[1]
  return(x)
  }
  swap(swap(swap(x))) -> mix
  
Is this possible?
Thanks you in advance!
Amor

__


 Schutz gegen Massenmails. 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nesting environments

2008-08-22 Thread Antje

Hello,

I got the hint to use nesting environments (for a GUI problem), but I'm not 
sure whether I understand what he means (and I did not really succeed with 
reading documentation - probably I looked for the wrong keywords?).

That's what he wrote:

> Another good way to control the scoping is nesting environments (e.g. 
through closures). That is, your object does not need to be global (in the 
workspace) but it can be in a parent environment. In fact, global variables 
won't even work when you put your code into a namespaced pacakge, because the 
namespace will be locked.


Can anybody explain it (probably with a simple example)
That would be great!

Thanks,
Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining multiple datasets

2008-08-22 Thread jim holtman
Here is the way that I usually do it using lapply:

myData <- lapply(list.files(pattern=".RWL$"), function(.file)
read.rel(.file, header=TRUE))
# combine into a single dataframe
myDF <- do.call(rbind, myData)



On Thu, Aug 21, 2008 at 10:42 PM, Alison Macalady <[EMAIL PROTECTED]> wrote:
> Hi,
> I've tried to figure this out using Intro to R and help(), to no avail - I
> am new at this.
>
> I'm trying to write a script that will read multiple files from a directory
> and then merge them into a single new data frame.
> The original data are in a tree-ring specific format, and so I've first used
> a function (read.rwl) from the dplR package to read each file, translate
> each into a more intuitive time series format, and then put each new data
> frame into a single object, demo.Rwl:
>
>>demo.Rwl <- list.files(pattern=".RWL$"); for(i in demo.Rwl) { x <-
>> read.rwl(i, header=TRUE); assign(print(i, quote=FALSE), x)}
>
> This part seems to work.  Now, I want to be able to put all of the
> individual data frames contained in demo.Rwl into a single data frame,
> merging by rows (rows are calendar years in the time series). I think I know
> how to do this by typing each data set name into a merge:
>
>>merge(x,y..., by.x=0, all=TRUE )
>
> However, I'd like to be able to script something that will save me the time
> of typing in all of the individual data set names, as I will be repeating
> this operation many times with many different numbers of original input
> files. Is there a way to code a merge such that every individual data frame
> contained in demo.Rwl (a different number every time) gets combined into a
> single data set?
>
> Thanks for the help!
>
> Ali
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coordinate systems for geostatistics in R

2008-08-22 Thread Rubén Roa-Ureta

imicola wrote:

Hi,

I read somewhere that when carrying out geostatistical analysis in R you
should not use latitude and longitude...can anyone expand on this a little
for me, and what would be the best coordinate system to use?
  
Not only in R. In most systems, the inter-point distances are assumed to 
be planar (distances over an Euclidean space), whereas latitude and 
longitude are spherical. I guess there could be a geostatistical 
analysis based on spherical distances, but why to make things more 
complicated when projecting the spherical coordinates into planar 
coordinates b4 the analysis produces a good approximation and simplifies 
the analysis significantly? I use UTM coordinates, and transform from 
geodetic to metric with Eino Uikkanen's GeoConv program (it's free).



I have my data in a geographic coordinate system, WGS84, decimal
degreesis this the wrong format for such analyses?
  
If the distances are short, it is not so wrong, and the wrongness 
increases with increasing distance.

I have also converted my data in the UTM projection and so have it in
metres(ranging from 480,000 to 550,000 E and 170,000 to 230,000 N).  


If I was to use the UTM coordinates, should I be using the actual
coordinates in metres, or should I convert this into an arbitrary coordinate
system (i.e. from 0 - 1) somehow?
It would be convenient to use km rather than m, so the range parameter 
would be closer in magnitude to the other parameters of the model. A 
very large range parameter in metres may cause numerical instability 
during minimization of the negative support function in likelihood-based 
models such as that implemented in geoR.
  


I have noticed that running an analysis on the data gives different results
depending on which type of system you use, so I want to make sure I have the
correct system.  I should also probably note that I am a geostatistical
novice!

Thanks,
  
Bottomline, your geostatistical software is probably based on distance 
calculations on an Euclidean space so it is wrong to input locations in 
spherical coordinates.

HTH
Ruben

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset grouped data with quantile and NA's

2008-08-22 Thread jim holtman
This will also remove the NAs from the output;  you will have to
change it to also keep the NAs.  Wasn't sure what you wanted to do
with them.

dat <- data.frame(fac = rep(c("a", "b"), each = 100),
 value = c(rnorm(130), rep(NA, 70)),
 other = rnorm(200))
# split the data
x.s <- split(dat, dat$fac, drop=TRUE)
# process the quantiles
x.l <- lapply(x.s, function(.fac){
# remove NAs from the output -- need to change if you want to keep NAs
.fac[(!is.na(.fac$value)) & (.fac$value <= quantile(.fac$value,
prob=0.95, na.rm=TRUE)),]
})
# put back into a dataframe
dat.new <- do.call(rbind, x.l)


On Fri, Aug 22, 2008 at 3:35 AM, David Carslaw
<[EMAIL PROTECTED]> wrote:
>
> I can't quite seem to solve a problem subsetting a data frame.  Here's a
> reproducible example.
>
> Given a data frame:
>
> dat <- data.frame(fac = rep(c("a", "b"), each = 100),
>  value = c(rnorm(130), rep(NA, 70)),
>  other = rnorm(200))
>
> What I want is a new data frame (with the same columns as dat) excluding the
> top 5% of "value" separately by "a" and "b". For example, this produces the
> results I'm after in an array:
>
> sub <- tapply(dat$value, dat$fac, function(x) x[x < quantile(x, probs =
> 0.95, na.rm = TRUE)])
>
> My difficulty is putting them into a data frame along with the other columns
> "fac" and "other". Note that quantile will return different length vectors
> due to different numbers of NAs for a and b.
>
> There's something I'm just not seeing - can you help?
>
> Many thanks.
>
> David Carslaw
>
> -
> Institute for Transport Studies
> University of Leeds
> --
> View this message in context: 
> http://www.nabble.com/subset-grouped-data-with-quantile-and-NA%27s-tp19102795p19102795.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] random effects in lmer

2008-08-22 Thread Line Johansen

Dear all<>
I am running several generalized mixed model using lmer. <>The models 
typical look like this: 
model2xx<-lmer(numbers~Treatment+b+(1|Genotype)+(1|Field)+(1|Genotype:Treatment), 
family=quasipoisson)<>
All factors are categorical.  


<>And the output looks like this:
Generalized linear mixed model fit using Laplace
formula:numbers~Treatment+Dead+(1|Genotype)+(1|Field)+(1|Genotype:Treatment), 
family=quasipoisson)

Family: quasipoisson(log link)

 AIC  BIC logLik deviance

1033 1051 -509.4 1019

<>Random effects:
Groups   NameVariance Std.Dev.
Genotype:Treatment (Intercept) 0.40555  0.63683
Genotype(Intercept) 1.16642  1.08001
Field(Intercept) 1.23738  1.11238
Residual 9.47740  3.07854
number of obs: 100, groups: Genotype1:Treatment1, 14; Genotype1, 5; 
Felt1, 4


Fixed effects:

<>Estimate Std. Error t value
(Intercept)  3.200610.80010   4.000
Treatment a -0.054820.42619  -0.129
Treatment c  0.083160.46395   0.179
Dead No  0.276040.14873   1.856

Correlation of Fixed Effects:<>
(Intr) Tretmnt1 Trtmnt1c
Treatment a -0.247 
Treatment c -0.239  0.450  
Dead No -0.111 -0.1320.003  


<>
I need some help to evaluate the importance of the random factors. The 
random factor of interest is Genotype. I have tried to delete random 
factors from the model and comparing the model with the original model 
by log likelihood-ratio statistics. Is this an appropriate method for 
testing the random factors in lmer? Is it possible to evaluate how much 
of the total variation the random factor Genotype explains?
<> 
I have am a new user in lmer and my questions is probably very naive. 
But I appreciate any help.
Thanks for the help. <> 


Line



--

Line Johansen   

Department of Biology   
Norwegian University of Natural Science and Technology

Høgskoleringen 15
No-7491 Trondheim
Phone: 73 55 12 94

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] simple generation of artificial data with defined features

2008-08-22 Thread drflxms
Dear R-colleagues,

I am quite a newbie to R fighting my stupidity to solve a probably quite
simple problem of generating artificial data with defined features.

I am conducting a study of inter-observer-agreement in
child-bronchoscopy. One of the most important measures is Kappa
according to Fleiss, which is very comfortable available in R through
the irr-package.
Unfortunately medical doctors like me don't really understand much of
statistics. Therefore I'd like to give the reader an easy understandable
example of Fleiss-Kappa in the Methods part. To achieve this, I obtained
a table with the results of the German election from 2005:

partynumber of votespercent

SPD1619466534,2
CDU1313674027,8
CSU34943097,4
Gruene38383268,1
FDP46481449,8
PDS41181948,7

I want to show the agreement of voters measured by Fleiss-Kappa. To
calculate this with the kappam.fleiss-function of irr, I need a
data.frame like this:

(id of 1st voter) (id of 2nd voter)

party spd cdu

Of course I don't plan to calculate this with the million of cases
mentioned in the table above (I am working on a small laptop). A
division by 1000 would be more than perfect for this example. The exact
format of the table is generally not so important, as I could reshape
nearly every format with the help of the reshape-package.

Unfortunately I could not figure out how to create such a
fictive/artificial dataset as described above. Any data.frame would be
nice, that keeps at least the percentage. String-IDs of parties could be
substituted by numbers of course (would be even better for function
kappam.fleiss in irr!).

I would appreciate any kind of help very much indeed.
Greetings from Munich,

Felix Mueller-Sarnowski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot with anchored bars

2008-08-22 Thread Jim Lemon
On Thu, 2008-08-21 at 15:47 -0300, Paulo Barata wrote:
> Dear R list members,
> 
> How to produce barplots anchored to the x-axis (not floating
> above the x-axis) with a box around?
> 
> With both following codes, the lower horizontal line of the
> box is below the y = 0 line:
> 
> # first code
> x <- c(1,2,3,4)
> barplot(x,yaxs='i')
> box()
> 
> # second code
> x <- c(1,2,3,4)
> op <- par(yaxs='i')
> barplot(x)
> box()
> par(op)
> 
> The parameter yaxs='i' does not seem to work with barplot.
> 
> I use R version 2.7.1, running on Windows XP.
> 
Hi Paolo,
Have a look at the barp function in the plotrix package.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bias reduction method

2008-08-22 Thread Julio Rojas
Dear all, I'm doing a simulation study for an MM1 queue and I would like to 
know if there are bias correction methods in R for time series (estimations of 
warm-up points or something else).

Hope you can help me.



  

Yahoo! MTV Blog & Rock >¡Cuéntanos tu historia, inspira una canción y 
gánate un viaje a los Premios MTV! Participa aquí http://mtvla.yahoo.com/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help needed for HWE.exact in library "genetics"

2008-08-22 Thread Neil Shephard

You could follow the advice given when loading the library (see the code you
posted for details) and use the enhanced genetics packages to cross-validate
the results (ideally you should get the same answer).

The results not weird though.

Your working with SNPs and having a homozygote with a frequency of zero
isn't that unlikely, it depends on the minor allele frequency (MAF).  In
this instance its the controls you appear to be concerned about, and for
this locus the MAF of the minor allele is 0.0145, so in a sample of 173
individuals you would expect to see 0.0363 individuals with the recessive
genotype (basic Mendelian genetics of 173 * 0.0145^2).  Thus at best you
might expect to see one person in a sample of that size, but its not
surprising that you've not seen any.

You may also find the following references of interest...

Guangyong Zou, Allan Donner
The Merits of Testing Hardy-Weinberg Equilibrium in the Analysis of
Unmatched Case-Control Data: A Cautionary Note
Annals of Human Genetics
Volume 70, Issue 6, Pages 923-933

(basically advocates the use of a robust test for association such as the
trend test over the two-stage design of testing for HWeqm and then testing
for association).

There is follow up discussion of the article in 

Yik Y. Teo, Andrew E. Fry, Taane G. Clark, E. S. Tai, Mark Seielstad
On the Usage of HWE for Identifying Genotyping Errors 
Annals of Human Genetics
Volume 71 Issue 5 , Pages 701 - 703

...and a rejoinder from Zou and Donner...

Guangyong Zou, Allan Donner
The Reply of Zou & Donner to Teo's Commentary on their Manuscript (p
704-704)
Annals of Human Genetics
Volume 71 Issue 5 , Pages 701 - 703

Neil

Sun, Ying [BSD] - HGD wrote:
> 
> 
>> library( "genetics" )
> NOTE: THIS PACKAGE IS NOW OBSOLETE.
>   The R-Genetics project has developed an set of enhanced genetics
>   packages to replace 'genetics'. Please visit the project homepage
>   at http://rgenetics.org for informtion.
> 

-- 
View this message in context: 
http://www.nabble.com/help-needed-for-HWE.exact-in-library-%22genetics%22-tp19100974p19105112.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] call perl

2008-08-22 Thread Rory.WINSTON
Maybe

?system

Might be what you need here?

-Rory


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jay an
Sent: 22 August 2008 03:08
To: r-help@r-project.org
Subject: [R] call perl

Hi,

It may be the old question.
can anyone tell me how to call perl in R?
thanks

Y.





[[alternative HTML version deleted]]


***
The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered 
Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority 

This e-mail message is confidential and for use by the=2...{{dropped:22}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Large data sets with R (binding to hadoop available?)

2008-08-22 Thread Rory.WINSTON
Hi

Apart from database interfaces such as sqldf which Gabor has mentioned, there 
are also packages specifically for handling large data: see the "ff" package, 
for instance.

I am currently playing with parallelizing R computations via Hadoop. I haven't 
looked at PIG yet though.

Rory


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Roland Rau
Sent: 21 August 2008 20:04
To: Avram Aelony
Cc: r-help@r-project.org
Subject: Re: [R] Large data sets with R (binding to hadoop available?)

Hi

Avram Aelony wrote:
>
> Dear R community,
>
> I find R fantastic and use R whenever I can for my data analytic needs.
> Certain data sets, however, are so large that other tools seem to be
> needed to pre-process data such that it can be brought into R for
> further analysis.
>
> Questions I have for the many expert contributors on this list are:
>
> 1. How do others handle situations of large data sets (gigabytes,
> terabytes) for analysis in R ?
>
I usually try to store the data in an SQLite database and interface via 
functions from the packages RSQLite (and DBI).

No idea about Question No. 2, though.

Hope this helps,
Roland


P.S. When I am sure that I only need a certain subset of large data sets, I 
still prefer to do some pre-processing in awk (gawk).
2.P.S. The size of my data sets are in the gigabyte range (not terabyte range). 
This might be important if your data sets are *really large* and you want to 
use sqlite: http://www.sqlite.org/whentouse.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

***
The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered 
Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority 

This e-mail message is confidential and for use by the=2...{{dropped:22}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Coordinate systems for geostatistics in R

2008-08-22 Thread imicola

Hi,

I read somewhere that when carrying out geostatistical analysis in R you
should not use latitude and longitude...can anyone expand on this a little
for me, and what would be the best coordinate system to use?

I have my data in a geographic coordinate system, WGS84, decimal
degreesis this the wrong format for such analyses?

I have also converted my data in the UTM projection and so have it in
metres(ranging from 480,000 to 550,000 E and 170,000 to 230,000 N).  

If I was to use the UTM coordinates, should I be using the actual
coordinates in metres, or should I convert this into an arbitrary coordinate
system (i.e. from 0 - 1) somehow?  

I have noticed that running an analysis on the data gives different results
depending on which type of system you use, so I want to make sure I have the
correct system.  I should also probably note that I am a geostatistical
novice!

Thanks,
-- 
View this message in context: 
http://www.nabble.com/Coordinate-systems-for-geostatistics-in-R-tp19104598p19104598.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] swap

2008-08-22 Thread Richard . Cotton
> I wonder if there is any swap function in R that does the following:
> x <- seq(1,10,12)

This just makes x equal to 1.  I'm guessing you meant something like
x <- 1:10

> x1 <- swap(x)
> x1 
> 1 8 3 4 5 6 7 2 10

It's not terribly clear what you want swap to do.  You can reorder the 
elements of x using sample, e.g.
sample(x)
[1]  6 10  8  4  2  5  1  9  3  7

Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on competing risk package cmprsk with time dependent covariate

2008-08-22 Thread Arthur Allignol

Hello,

Something i don't understand
in your question.
Is treatment a time-dependent covariate?
That is, do patients receive the treatment
at the beginning of the study or later?

cmprsk cannot handle time-dependent covariates.

But if treatment is a baseline covariate,
but has a time-varying effect (i.e. does the subdistribution hazard 
ratio varies with time?), your solution

to assess that is weird, because you will transform
your baseline covariate into a time-dependent one,
thus considering all the patients to receive no treatment
the first year. For sure, the treatment wont have any
effect for the first year.
To assess a time-varying effect on competing risks,
i would either follow the cmprsk documentation, including
an interaction with functions of time, or use the comp.risk
function in the timereg package, which fits more flexible
models for the cumulative incidence functions.

Best regards,
Arthur Allignol


Philippe Guardiola wrote:

Dear R users,


I d like to assess the effect of "treatment" covariate on a disease relapse risk with the package cmprsk. 
However, the effect of this covariate on survival is time-dependent

(assessed with cox.zph): no significant effect during the first year of 
follow-up,
then after 1 year a favorable effect is observed on survival (step
function might be the correct way to say that ?). 
For overall survival analysis

I have used a time dependent Cox model which has confirmed this positive effect 
after
1 year.
Now I m moving to disease relapse incidence and a similar time dependency seems to be present. 


what I d like to have is that: for
patients without "treatment" the code for "treatment" covariate is
always 0, and for patients who received "treatment" covariate I d like
to have it = 0 during time interval 0 to 1 year, and equal to 1 after 1
year. Correct me if I m wrong in trying to do so.


First, I have run the following script (R2.7.1 under XPpro) according to 
previous advices:

library(cmprsk)
attach(LAMrelapse)
fit1<- crr(rel.t, rel.s, treatment, treatment, function(uft)
cbind(ifelse(uft<=1,1,0),ifelse(uft>1,1,0)), failcode=1,
cencode=0, na.action=na.omit, gtol-06, maxiter)
fit1

where:
rel.t = time to event (in years)
rel.s = status , =1 if disease relapse, =2 if death from non disease
related cause (toxicity of previous chemotherapy), =0 if alive &
not in relapse
treatment =
binary covariate (value: 0 or 1) representing the treatment to test
(different from chemotherapy above, with no known toxicity)
I have not yet added other covariates in the model.


this script gave me the following result:
fit1 <- crr(relcmp.t, relcmp.s, treatment, treatment, function(uft) cbind(ifelse(uft <= 1, 1, 0), ifelse(uft > 1, 1, 0)), failcode = 1, cencode = 0, 

na.action = na.omit, gtol = 1e-006, maxiter = 10)

fit1
convergence:  TRUE 
coefficients:

[1] -0.6808  0.7508
standard errors:
[1] 0.2881 0.3644
two-sided p-values:
[1] 0.018 0.039

...That I dont understand at all since it looks like if "treatment"
covariate had also a significant effect of the first period of time !? 
This is absolutely not the case. 
So I m surely wrong with a part of this script... cov2 and tf are

pretty obscure for me in the help file of the package. I would really
appreciate advices regarding these 2 "terms". 


I was thinking that I might changed : cbind(ifelse(uft <= 1, 1, 0), ifelse(uft > 1, 
1, 0)   into:cbind(ifelse(uft <= 1, 0, 1), ifelse(uft > 1, 1, 
0)

But since I only have one covariate (treatment) to test, shouldnt I only write 
the following:
fit1<- crr(rel.t,
rel.s, treatment, treatment, function(uft) ifelse(uft<=1,0,1)), failcode=1,
cencode=0, na.action=na.omit, gtol-06, maxiter)

which gives me :

fit1
convergence:  TRUE 
coefficients:

[1]  0.06995 -0.75080
standard errors:
[1] 0.2236 0.3644
two-sided p-values:
[1] 0.750 0.039

which, if I understand things
correctly (I m not sure at all !) confirms that before 1 year, the effect of 
"treatment" covariate is not
significant, but is significant after 1 year of follow up. But there I m again 
not sure of the result I obtain...

any help would be greatly appreciated with cov2 and tf

thanks for  if you have some time for this,


Philippe Guardiola


  _ 


o.fr
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very confused with class

2008-08-22 Thread Dan Davison

Hi Rich,


Richard M. Heiberger wrote:
> 
> Dan,
> 
> The real problem is the use of csv files.  csv files don't handle missing
> values
> ("#VALUE" is most likely from Excel), dates, or other complications very
> well.
> 
> Read your Excel file directly into
> R with one of the packages designed specifically for that purpose.  I
> recommend
> RExcel (Windows only) which allows complete two-way communication between
> R
> and Excel.
> Missing values and dates are handled correctly.
> You can download the RExcelInstaller package from CRAN.
> 

I'm sure RExcel is an excellent technology. However, it is an unnecessarily
complex technology in this instance. What I was trying to do was help the
original poster read in tabular data stored in a standard text format, which
is a fundamental skill for any R programmer. In general, I would encourage
people (beginners especially) to avoid the use of hi-tech solutions, when
simple text-based solutions suffice. But when people do need to have more
sophisticated integration of R and e.g. Excel, it's nice that the tools
exist.

Dan


Richard M. Heiberger wrote:
> 
> 
> Rich
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Very-confused-with-class-tp19090246p19104343.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] swap

2008-08-22 Thread amor Gandhi
Hello everybody,
 
I wonder if there is any swap function in R that does the following:
x <- seq(1,10,12)
x1 <- swap(x)
x1 
1 8 3 4 5 6 7 2 10
Thank you very much!
 
Amor

__


 Schutz gegen Massenmails. 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] psychometric functions

2008-08-22 Thread Mario Maiworm
Paul,
thanks a lot for your advice! I got that kuss et al. paper and started to
read, this is more than I had expected! Very cool paper, and an algorithm
implemented in R that seems to be even superior to psignifit in matlab. So
thanks again!
mario


__

Mario Maiworm
Biological Psychology and Neuropsychology
University of Hamburg
Von-Melle-Park 11
D-20146 Hamburg

Phone: +49 40 42838 8265
Fax: +49 40 42838 6591

http://bpn.uni-hamburg.de/Maiworm_e.html
http://cinacs.org
__

>>> -Original Message-
>>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
>>> project.org] On Behalf Of Paul Artes
>>> Sent: Thursday, August 21, 2008 5:37 PM
>>> To: r-help@r-project.org
>>> Subject: Re: [R] psychometric functions
>>> 
>>> 
>>> There is a nice paper by Yssaad-Fesselier and Knoblauch on "Modelling
>>> Psychometric Functions in R".
>>> http://hal.archives-ouvertes.fr/docs/00/13/17/99/PDF/B125.pdf
>>> 
>>> You might also be interested in this:
>>> http://www.journalofvision.org/5/5/8/article.aspx
>>> which comes from the same group as the psignifit toolbox for matlab
>>> (methinks), but is a step ahead.
>>> 
>>> Best wishes
>>> 
>>> Paul
>>> --
>>> View this message in context: http://www.nabble.com/psychometric-
>>> functions-tp19086590p19091317.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>> 
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chisq

2008-08-22 Thread Ted Harding
On 22-Aug-08 08:19:15, Marco Chiapello wrote:
> Hi,
> the chisq value for .05 level of signif. and 3 DF is 7.81 (one-sided).
> Is there a way to calculated it in R (giving level of signif. and DF)?
> 
> Marco

  qchisq(0.95,df=3)
# [1] 7.814728
  qchisq(0.05,df=3,lower.tail=FALSE)
# [1] 7.814728

Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Aug-08   Time: 10:15:36
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Retrieving sample points

2008-08-22 Thread Yemi Oyeyemi
Hi everyone. My problem is about multivariate analysis. Given a multivariate 
data X, of dimension (n by p), n is the number of observations while p is the 
number of variables. I wanted a sample of size p+1 that will have its product 
of the eigen values of the covariance matrix to be the minimum among all the 
possible samples (n combination (p+1)). I have written a syntax with worked 
well, but the problem is that it does not work when the dimension of X is large 
(when n is greater than 25). I think the problem is memory size, is there 
anyway I can maximize the memory.
Here is the syntax that I used.
 
> bestunits <- function(x){
+ n <- nrow(x)
+ p <- ncol(x)
+ h <- p+1
+ nchoosek <- function(n,h){
+ factorial(n)/(factorial(h+1)*factorial(n-h-1))}
+ vec11 <- matrix(nrow=nchoosek(n,h), ncol=p)
+ vec12 <- matrix(nrow=nchoosek(n,h), ncol=h)
+ for (i in 1:nchoosek(n,h)){
+ flag <- sample(1:n, h, replace=FALSE)
+ z <- x[flag,]
+ y <- eigen(var(z), only.values=TRUE)$values
+ vec11[i,] <- y
+ vec12[i,] <- flag
+ }
+ product <- apply(vec11,1,prod)
+ best <- which.min(product)
+ sample <- vec12[best,]
+ bestsample <- x[sample,]
+ return(bestsample)
+ }
Thanks
Oyeyemi, Gafar Matanmi
Department of Statistics
University of Ilorin
Ilorin, Nigeria


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >