Re: [R] Dropping rows conditionally

2009-03-04 Thread Adrian Dusa
Hi Lazarus,
It would be more simple with mdat as a matrix (before coercing to a 
data.frame). It might be a simpler way to compare a matrix with a vector but I 
don't find it for the moment; in any case, this works:
mdatT <- matrix(mdat %in% c(1, 11, 20), ncol=3)
> mdat[!apply(mdatT, 1, any), ]
 C.1 C.2 C.3
row2   4   5   6
row3   7   8   9
row5  13  14  15
row6  16  17  18
Or you can use apply directly on a data.frame, with the same result:
mdat <- as.data.frame(mdat)
> mdat[!apply(mdat, 1, function(x) any(x %in% c(1, 11, 20))), ]
 C.1 C.2 C.3
row2   4   5   6
row3   7   8   9
row5  13  14  15
row6  16  17  18

hth,
Adrian

On Thursday 05 March 2009, Lazarus Mramba wrote:
> Dear R-help team,
>
> I am getting addicted to using R but keep on getting many challenges on the
> way especially on data management (data cleaning).
>
> I have been wanting to drop all the rows if there values are  `NA' or have
> specific values like 1 or 2 or 3.
>
>
> mdat <- matrix(1:21, nrow = 7, ncol=3, byrow=TRUE,
>dimnames = list(c("row1",
> "row2","row3","row4","row5","row6","row7"), c("C.1", "C.2", "C.3")))
> mdat<-data.frame(mdat)
> mdat
>
>   C.1 C.2 C.3
> row1   1   2   3
> row2   4   5   6
> row3   7   8   9
> row4  10  11  12
> row5  13  14  15
> row6  16  17  18
> row7  19  20  21
>
> I want to say drop row if value=1 or value =11 or value =20
>
> How do I do that?
>
>
> Kind regards,
> Lazarus Mramba


-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
 +40 21 3120210 / int.101
Fax: +40 21 3158391


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about the use of large datasets in R

2009-03-04 Thread Thomas Lumley

On Wed, 4 Mar 2009, Vadlamani, Satish {FLNA} wrote:


Hi:
Sorry if this is a double post. I posted the same thing this morning and did 
not see it.

I just started using R and am asking the following questions so that I can plan 
for the future when I may have to analyze volume data.

1) What are the limitations of R when it comes to handling large datasets? 
Say for example something like 200M rows and 15 columns data frame (between >1.5 to 2 GB in size)? Will the limitation be based on the specifications of 
the hardware or R itself?


It depends a lot on what you want to do.  The default situation in R is that 
all the data are loaded into memory, in which case the rule of thumb is that 
you want data sets no larger than 1/3 of memory. If you have, say, a system 
with 8Gb memory and a 64-bit version of R you should be ok.

It is often possible to work with much larger data sets than this, you just 
need to arrange for the whole thing not to be loaded simultaneously.  The right 
strategy depends on the problem.

For example, linear and generalized linear models on large data sets can be 
fitted with the biglm package.  The various database interface packages and the 
packages for netCDF and HDF5 allow subsets of a data set to be loaded easily. 
Packages such as bigmemory and ff allow at least some operations to be carried 
out on file-backed data objects.



2) Is R 32 bit compiled or 64 bit (on say Windows and AIX)


On AIX, 64 bit. On Windows, currently only 32-bit although there is work 
towards a 64-bit version.


4) Should I be looking at SAS also only for this reason (we do have SAS 
in-house but the problem is that I am still not sure what we have license for, 
etc.)


I would guess that it would be cheaper to buy hardware on which the problem can 
be solved in R than to buy a SAS license (last time I looked, suitable 
rack-mount Linux boxes were under USD3000). If you already have SAS available 
it would be worth looking at it. For some large-data problems it will be faster 
or easier to use, but not for all.


 -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inference for R Spam

2009-03-04 Thread Wacek Kusnierczyk
Rolf Turner wrote:
>
> Sports scores are random variables.  You don't know a priori what the
> scores are
> going to be, do you?  (Well, if you do, you must be able to make a
> *lot* of money
> betting on games!)  After the game is over they aren't random any
> more; they're
> just numbers.  But that applies to any random variable.  A random
> variable is
> random only until it is observed, then POOF! it turns into a number.
>

may i respectfully disagree?

to call for a reference, [1] says (p. 26, def. 1.4.1):

a random variable is a function from sample space S into the real
numbers.

and it's a pretty standard definition.

do you really turn a *function* into a *number* by *observing the
function*?  in the example above, you have a sample space, which
consists of possible outcomes of a class of sports events.  you have a
random variable -- a function that maps from the number of goals into,
well, the number of goals. 

after a sports event, the function is no less random, and no more a
number.  you have observed an event, you have computed one realization
of the function (here's your number, which happens to be an integer) --
but the random variable does not turn to anything.

vQ


[1] Casella, Berger. Statistical Inference, 1st 1990

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Dropping rows conditionally

2009-03-04 Thread Lazarus Mramba

Dear R-help team,

I am getting addicted to using R but keep on getting many challenges on the way 
especially on data management (data cleaning).

I have been wanting to drop all the rows if there values are  `NA' or have 
specific values like 1 or 2 or 3.


mdat <- matrix(1:21, nrow = 7, ncol=3, byrow=TRUE,
   dimnames = list(c("row1", 
"row2","row3","row4","row5","row6","row7"),
   c("C.1", "C.2", "C.3")))
mdat<-data.frame(mdat)
mdat

  C.1 C.2 C.3
row1   1   2   3
row2   4   5   6
row3   7   8   9
row4  10  11  12
row5  13  14  15
row6  16  17  18
row7  19  20  21

I want to say drop row if value=1 or value =11 or value =20

How do I do that?


Kind regards,
Lazarus Mramba
Junior Statistician
P.O Box 986, 80108,
Kilifi, Kenya
Mobile No. +254721292370
Tel: +254 41 522063
Tel: +254 41 522390
(office extension : 419)

This e-mail (including any attachment to it) contains information
which is confidential. It is intended only for the use of the named
recipient. If you have received this e-mail in error, please let us know
by replying to the sender, and immediately delete it from your system.
Please note, that in these circumstances, the use, disclosure,
distribution or copying of this information is strictly prohibited. We
apologize for any inconvenience that may have been caused to you.
KEMRI-Wellcome Trust Programmecannot accept any responsibility for the accuracy
or completeness of this message as it has been transmitted over a public
network. KEMRI-Wellcome Trust Programme reserves the right to monitor all 
incoming and
outgoing email traffic. Although the Programme has taken reasonable
precautions to ensure no viruses are present in emails, it cannot
accept responsibility for any loss or damage arising from the use of the
email or attachments. Any views expressed in this message are those of
the individual sender, except where the sender specifically states them
to be the views of KEMRI- Wellcome Trust Programme".

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] new r user

2009-03-04 Thread Daniel Nordlund
> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Kris
> Sent: Wednesday, March 04, 2009 8:28 PM
> To: r-help@r-project.org
> Subject: [R] new r user
> 
> When adding a trend line to a scatterplot (e.g. abline 
> (90,4,col="red"), I
> believe the "90" is the intercept and "4" is the slope.  How 
> do I determine
> the intercept and slope for the abline command?
> 
>  
> 
> Kristopher R. Deininger
> Management Strategy & Entrepreneurship
> PhD Student 2012
> 
> Robert H. Smith School of Business
> University of Maryland
> 3330L Van Munching Hall
> College Park, MD 20742-1815
> 

If you are asking how to find the least-squares estimates for slope and
intercept for a given set of data, then look at 

?lm

If you are asking about something else, then you will need to provide more
context.

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] new r user

2009-03-04 Thread Kris
When adding a trend line to a scatterplot (e.g. abline (90,4,col="red"), I
believe the "90" is the intercept and "4" is the slope.  How do I determine
the intercept and slope for the abline command?

 

Kristopher R. Deininger
Management Strategy & Entrepreneurship
PhD Student 2012

Robert H. Smith School of Business
University of Maryland
3330L Van Munching Hall
College Park, MD 20742-1815

301-405-0878 OFFICE
806-441-6697 MOBILE
301-314-9120 FAX

  kdeinin...@rhsmith.umd.edu
  http://www.rhsmith.umd.edu

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] new user

2009-03-04 Thread stephen sefick
try
?abline

everything should be there for you

stephen sefick

On Wed, Mar 4, 2009 at 11:30 PM, kris deininger
 wrote:
>
>
>
> When adding a trend line to a scatterplot (e.g. abline
> (90,4,col=”red”), I believe the “90” is the intercept and “4” is the
> slope.  How do I determine the intercept and slope for the abline command?
>
>
> _
>
> eet.
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Stephen Sefick

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in by() with dates as levels?

2009-03-04 Thread hadley wickham
On Wed, Mar 4, 2009 at 5:22 PM, oren cheyette  wrote:
> Trying to use dates in their R-native form (e.g., POSIXct) rather than
> turning them into character strings, I've encountered the following problem.
> I create a data frame where one column is dates. Then I use "by()" to do a
> calculation on grouped subsets of the data. When I try to extract values
> from the result, I get "subscript out of bounds". The example below shows
> the error.
>
>> x <- data.frame(A= c("X", "Y", "X", "X", "Y", "Y", "Z" ), D =
> as.POSIXct(c("2008-11-03","2008-11-03","2008-11-03", "2008-11-04",
> "2009-01-13", "2009-01-13", "2009-01-13")), Z = 1:7)
>> m <- by(x, list(A=x$A, D=x$D), function(v) { sum(v$Z); })
>> m[x$A[1], x$D[1]]
> Error: subscript out of bounds
>
>
> Rgds,
> Oren Cheyette

You might also want to take a look at the plyr package:

install.packages("plyr")
library(plyr)
ddply(x, .(A, D), function(df) sum(df$Z))
dlply(x, .(A, D), function(df) sum(df$Z))

More info at http://had.co.nz/plyr

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] output formatting

2009-03-04 Thread Pele

Hi Kingsford - this is exactly what I am looking for...
Many thanks!!


Kingsford Jones wrote:
> 
> I'm guessing you processed a data frame with the 'by' function.
> Rather than restructuring the by output, try using a different
> function on your data frame.  For example
> 
>> #install.packages(doBy)
>> summaryBy(breaks ~ tension + wool, data=warpbreaks, FUN=sum)
>   tension wool breaks.sum
> 1   LA401
> 2   LB254
> 3   MA216
> 4   MB259
> 5   HA221
> 6   HB169
> 
> as opposed to
> 
>> with(warpbreaks, by(breaks, list(tension,wool), sum))
> : L
> : A
> [1] 401
> --
> : M
> : A
> [1] 216
> --
> : H
> : A
> [1] 221
> --
> : L
> : B
> [1] 254
> --
> : M
> : B
> [1] 259
> --
> : H
> : B
> [1] 169
> 
> 
> hth,
> Kingsford Jones
> 
> On Wed, Mar 4, 2009 at 8:17 PM, Pele  wrote:
>>
>> Hi R users,
>>
>> I have an R object with the following attributes:
>>
>>> str(sales.bykey1)
>>  'by' int [1:3, 1:2, 1:52] 268 79 118 359 87 147 453 130 81 483 ...
>>  - attr(*, "dimnames")=List of 3
>>  ..$ GROUP: chr [1:3] "III" "II" "I"
>>  ..$ year           : chr [1:2] "2006" "2007"
>>  ..$ week           : chr [1:52] "1" "2" "3" "4" ...
>>  - attr(*, "call")= language by.data.frame(data = vars, INDICES = bykey1,
>> FUN = sum)
>>
>>> sales.bykey1
>> ---
>> GROUP: III
>> year: 2007
>> week: 51
>> [1] 64
>> ---
>> GROUP: II
>> year: 2007
>> week: 51
>> [1] 17
>> ---
>> GROUP: I
>> year: 2007
>> week: 51
>> [1] 21
>> ---
>> GROUP: III
>> year: 2006
>> week: 52
>> [1] 14
>> ---
>> GROUP: II
>> year: 2006
>> week: 52
>> [1] 62
>> --
>> GROUP: I
>> year: 2006
>> week: 52
>> [1] 10
>>
>>
>> Can anyone share the most efficient way to convert the output
>> (sales.bykey1)
>> above to look like this:
>>
>>
>> GROUP   Year    week    sales
>> III     2007    51      64
>> II      2007    51      17
>> I       2007    51      21
>> III     2006    52      14
>> II      2006    52      62
>> I       2006    52      10
>>
>> Many thanks in advance for any help!
>> --
>> View this message in context:
>> http://www.nabble.com/output-formatting-tp22344554p22344554.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/output-formatting-tp22344554p22345085.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] new user

2009-03-04 Thread kris deininger



When adding a trend line to a scatterplot (e.g. abline
(90,4,col=”red”), I believe the “90” is the intercept and “4” is the
slope.  How do I determine the intercept and slope for the abline command?


_

eet.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] output formatting

2009-03-04 Thread Kingsford Jones
I'm guessing you processed a data frame with the 'by' function.
Rather than restructuring the by output, try using a different
function on your data frame.  For example

> #install.packages(doBy)
> summaryBy(breaks ~ tension + wool, data=warpbreaks, FUN=sum)
  tension wool breaks.sum
1   LA401
2   LB254
3   MA216
4   MB259
5   HA221
6   HB169

as opposed to

> with(warpbreaks, by(breaks, list(tension,wool), sum))
: L
: A
[1] 401
--
: M
: A
[1] 216
--
: H
: A
[1] 221
--
: L
: B
[1] 254
--
: M
: B
[1] 259
--
: H
: B
[1] 169


hth,
Kingsford Jones

On Wed, Mar 4, 2009 at 8:17 PM, Pele  wrote:
>
> Hi R users,
>
> I have an R object with the following attributes:
>
>> str(sales.bykey1)
>  'by' int [1:3, 1:2, 1:52] 268 79 118 359 87 147 453 130 81 483 ...
>  - attr(*, "dimnames")=List of 3
>  ..$ GROUP: chr [1:3] "III" "II" "I"
>  ..$ year           : chr [1:2] "2006" "2007"
>  ..$ week           : chr [1:52] "1" "2" "3" "4" ...
>  - attr(*, "call")= language by.data.frame(data = vars, INDICES = bykey1,
> FUN = sum)
>
>> sales.bykey1
> ---
> GROUP: III
> year: 2007
> week: 51
> [1] 64
> ---
> GROUP: II
> year: 2007
> week: 51
> [1] 17
> ---
> GROUP: I
> year: 2007
> week: 51
> [1] 21
> ---
> GROUP: III
> year: 2006
> week: 52
> [1] 14
> ---
> GROUP: II
> year: 2006
> week: 52
> [1] 62
> --
> GROUP: I
> year: 2006
> week: 52
> [1] 10
>
>
> Can anyone share the most efficient way to convert the output (sales.bykey1)
> above to look like this:
>
>
> GROUP   Year    week    sales
> III     2007    51      64
> II      2007    51      17
> I       2007    51      21
> III     2006    52      14
> II      2006    52      62
> I       2006    52      10
>
> Many thanks in advance for any help!
> --
> View this message in context: 
> http://www.nabble.com/output-formatting-tp22344554p22344554.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] listing functions in base package

2009-03-04 Thread Kingsford Jones
On Wed, Mar 4, 2009 at 8:40 PM, Fuchs Ira  wrote:
> So functions in the base package are all written in C?

No, a large proportion are written in R and the code can be seen in
the console by typing the function name.  C is generally used where
speed is a concern.

Kingsford


> Thanks.
> On Mar 4, 2009, at 10:26 PM, Kingsford Jones wrote:
>
> see
>
> https://stat.ethz.ch/pipermail/r-help/2008-January/151694.html
>
> hth,
> Kingsford Jones
>
> On Wed, Mar 4, 2009 at 7:30 PM, Fuchs Ira  wrote:
>> How can I print the definition of a function that is in the base package?
>>
>> for example, if I type:
>>
>> which.min
>>
>> I get
>>
>> function (x)
>> .Internal(which.min(x))
>> 
>>
>> How can I see the definition of this function?
>>
>> Thanks.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] listing functions in base package

2009-03-04 Thread Fuchs Ira
So functions in the base package are all written in C?

Thanks.

On Mar 4, 2009, at 10:26 PM, Kingsford Jones wrote:

> see
>
> https://stat.ethz.ch/pipermail/r-help/2008-January/151694.html
>
> hth,
> Kingsford Jones
>
> On Wed, Mar 4, 2009 at 7:30 PM, Fuchs Ira  wrote:
> > How can I print the definition of a function that is in the base  
> package?
> >
> > for example, if I type:
> >
> > which.min
> >
> > I get
> >
> > function (x)
> > .Internal(which.min(x))
> > 
> >
> > How can I see the definition of this function?
> >
> > Thanks.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting one row or multiple rows per ID

2009-03-04 Thread Vedula, Satyanarayana
Many thanks, Hadley! It was really helpful.



Cheers,

Swaroop



-Original Message-
From: hadley wickham [mailto:h.wick...@gmail.com]
Sent: Wednesday, March 04, 2009 9:56 AM
To: Vedula, Satyanarayana
Cc: r-help@r-project.org
Subject: Re: [R] Selecting one row or multiple rows per ID



On Wed, Mar 4, 2009 at 12:09 AM, Vedula, Satyanarayana

 wrote:

> Hi,

>

> Could someone help with coding this in R?

>

> I need to select one row per patient i in clinic j. The data is organized 
> similar to that shown below.

>

> Two columns - patient i in column j identify each unique patient. There are 
> two columns on outcome. Some patients have multiple rows with each row 
> representing one visit, coded for in the column, visit. Some patients have 
> just one row indicating data from a single visit.

>

> I need to select one row per patient i in clinic j using the following 
> algorithm:

>

> If patient has outcome recorded at visit 2, then outcome = outcome columns at 
> visit 2

> If patient does not have visit 2, then outcome = outcome at visit 5

> If patient does not have visit 2 and visit 5, then outcome = outcome at visit 
> 4

> If patient does not have visits 2, 5, and 4, then outcome = outcome at visit 3

> If patient does not have visits 2, 5, 4, and 3, then outcome = outcome at 
> visit 1

> If patient does not have any of the visits, outcome = missing

>

>

> Patient Clinic Visit Outcome_left   Outcome_right

> patient 1  clinic 1   visit 22221

> patient 1  clinic 3   visit 12121

> patient 1  clinic 3   visit 22122

> patient 1  clinic 3   visit 32022

> patient 3  clinic 5   visit 12421

> patient 3  clinic 5   visit 32122

> patient 3  clinic 5   visit 42223

> patient 3  clinic 5   visit 52222

>

> I need to select just the first row for patient 1/clinic 1; the second row 
> (visit 2) for patient 1/clinic 3; and the fourth row (visit 5) for patient 
> 3/clinic 5.



I'd approach this problem in the following way:



df <- read.csv(textConnection("

Patient,Clinic,Visit,Outcome_left,Outcome_right

patient 1,clinic 1,visit 2,22,21

patient 1,clinic 3,visit 1,21,21

patient 1,clinic 3,visit 2,21,22

patient 1,clinic 3,visit 3,20,22

patient 3,clinic 5,visit 1,24,21

patient 3,clinic 5,visit 3,21,22

patient 3,clinic 5,visit 4,22,23

patient 3,clinic 5,visit 5,22,22

"), header = T)

closeAllConnections()





# With a single patient it's pretty easy to find the preferred visit

preferred_visit <- paste("visit", c(2, 5, 4, 3, 1))



one <- subset(df, Patient == "patient 3" & Clinic == "clinic 5")

best_visit <- na.omit(match(preferred_visit, one$Visit))[1]

one[best_visit, ]



# We then turn this into a function

find_best_visit <- function(one) {

  best_visit <- na.omit(match(preferred_visit, one$Visit))[1]

  one[best_visit, ]

}



# Then apply it to every combination of patient and clinic with plyr

ddply(df, .(Patient, Clinic), find_best_visit)



# You can learn more about plyr at http://had.co.nz/plyr





Hadley



--

http://had.co.nz/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] listing functions in base package

2009-03-04 Thread Kingsford Jones
see

https://stat.ethz.ch/pipermail/r-help/2008-January/151694.html

hth,
Kingsford Jones

On Wed, Mar 4, 2009 at 7:30 PM, Fuchs Ira  wrote:
> How can I print the definition of a function that is in the base package?
>
> for example, if I type:
>
> which.min
>
> I get
>
> function (x)
> .Internal(which.min(x))
> 
>
> How can I see the definition of this function?
>
> Thanks.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] output formatting

2009-03-04 Thread Pele

Hi R users,

I have an R object with the following attributes:

> str(sales.bykey1)
 'by' int [1:3, 1:2, 1:52] 268 79 118 359 87 147 453 130 81 483 ...
 - attr(*, "dimnames")=List of 3
  ..$ GROUP: chr [1:3] "III" "II" "I"
  ..$ year   : chr [1:2] "2006" "2007"
  ..$ week   : chr [1:52] "1" "2" "3" "4" ...
 - attr(*, "call")= language by.data.frame(data = vars, INDICES = bykey1,
FUN = sum)

> sales.bykey1
--- 
GROUP: III
year: 2007
week: 51
[1] 64
--- 
GROUP: II
year: 2007
week: 51
[1] 17
--- 
GROUP: I
year: 2007
week: 51
[1] 21
--- 
GROUP: III
year: 2006
week: 52
[1] 14
--- 
GROUP: II
year: 2006
week: 52
[1] 62
-- 
GROUP: I
year: 2006
week: 52
[1] 10


Can anyone share the most efficient way to convert the output (sales.bykey1)
above to look like this:


GROUP   Yearweeksales
III 200751  64
II  200751  17
I   200751  21
III 200652  14
II  200652  62
I   200652  10

Many thanks in advance for any help!
-- 
View this message in context: 
http://www.nabble.com/output-formatting-tp22344554p22344554.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package for determining correlation for mixed "Level of Measurement"

2009-03-04 Thread Jason Rupert
Well, it seems like I may need to use a few different correlation coefficient 
tests:
(1) For the Nominal scale to Interval Scale, I may need to be using the 
Point-biserial correlation coefficients (rpb).  It turns out that the ltm 
Package calculates that correlation coefficient.   Will be trying ltm out 
tomorrow...(unless there is a more standard/prefered method - still learning 
about such things)

(2) For the Ordinal scale to Interval scale, I am still looking for a 
correlation coefficient test that will allow those two to be compared.   Any 
suggestions there are really appreciated.

Thanks again for all the feedback, and I continue to be amazed by all the 
capability that is present within the user added R packages and native 
capability.  

Cheers.



--- On Tue, 3/3/09, Jason Rupert  wrote:
From: Jason Rupert 
Subject: [R] Package for determining correlation for mixed "Level of 
Measurement"
To: R-help@r-project.org
Date: Tuesday, March 3, 2009, 9:38 PM

My data set has a mixed level of measurement:
Nominal scale - location (city)
Ordinal scale - temperature (low, medium, high)
Interval scale - age & value

Just curious if there is an R package available that will handle the mixed
"Level of Measurement".

Looking to do graphical presentation of the correlation and also a qualitative
analysis of the correlation.  

Thanks again for any info and feedback.   






  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] listing functions in base package

2009-03-04 Thread Fuchs Ira
How can I print the definition of a function that is in the base  
package?


for example, if I type:

which.min

I get

function (x)
.Internal(which.min(x))


How can I see the definition of this function?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inference for R Spam

2009-03-04 Thread Rolf Turner


On 5/03/2009, at 3:06 PM, David Winsemius wrote:


I mostly agree with you, Rolf (and Gunter). I would challenge your
joint use of the term "scientists". My quibble arises not regarding
biomedical practitioners (who may be irredeemable as a group)  but
rather regarding physicists. At least in that domain, I believe those
domain experts are at least as likely, and possibly more so, to
understand issues relating to randomness as are statisticians.
Randomness has been theoretically embedded in the domain for the last
90 years or so.


My impression --- and I could be wrong --- is that physicists  
understanding
of randomness is very narrow and constrained.  They tend to think  
along the
lines of chaotic dynamical systems (although perhaps not consciously;  
and they
may not explicitly express themselves in this way).  They also tend  
to think
exclusively in terms of measurement error as the source of  
variability.  Which
may be appropriate in the applications with which they are concerned,  
but is
pretty limited.  Also they're a rather arrogant bunch.  E.g.  
Rutherford (???):

``If I need statistics to analyze my data I need more data.''

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regressão linear

2009-03-04 Thread Ben Bolker
Sueli Rodrigues  esalq.usp.br> writes:
> 
> Olá. Tenho um arquivo que a cada 6 linhas corresponde uma amostra da qual
> preciso dos coeficientes da regressão linear. Como faço para que o
> programa distinga a cada 6 linhas como uma amostra e não calcule como um
> todo?
> Estou usando a função: model=lm(y ~ x)
> 

  You're more likely to get a response if you post to the list
in English (even fractured English).

 Based on what Google translator thinks you said (you want
to perform linear regressions on 6-line subsets of a data set?),
here's a starting point (assuming your data are in a data frame
mydata, and have column names x and y):

splitdat <- split(mydata,rep(1:6,length.out=nrow(mydata))
linfits <- lapply(splitdata,lm,formula=y~x)
coefs <- sapply(linfits,coef)

or something like that.

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inference for R Spam

2009-03-04 Thread David Winsemius
I mostly agree with you, Rolf (and Gunter). I would challenge your  
joint use of the term "scientists". My quibble arises not regarding  
biomedical practitioners (who may be irredeemable as a group)  but  
rather regarding physicists. At least in that domain, I believe those  
domain experts are at least as likely, and possibly more so, to  
understand issues relating to randomness as are statisticians.  
Randomness has been theoretically embedded in the domain for the last  
90 years or so.


--
David Winsemius, MD


--
On Mar 4, 2009, at 6:43 PM, Rolf Turner wrote:



On 5/03/2009, at 12:13 PM, Bert Gunter wrote:



"The purpose of the subject or discipline ``statistics'' is in  
essence

to answer the question ``could the phenomenon we observed have arisen
simply by chance?'', or to quantify the *uncertainty* in any estimate
that we make of a quantity."


May I take strong issue with this characterization? It is far too  
narrow and
constraining. We are scientists first and foremost. The most  
important and
useful thing I do is to collaborate with other scientists to frame  
good
questions, design good experiments and studies, and gain insight  
into the
results of those experiments and studies (usually via graphical  
displays,
for which R is superbly suited). Blessing data with P-values is  
rarely of
much importance, and is often frankly irrelevant and even  
misleading (but

that's another rant).

George Box said this much better than I: "The business of the  
statistician

is to catalyze the scientific learning process."

This is much much more than you intimate.


I must respectfully disagree.  Far be it from me to argue with  
George Box,
but nevertheless ... it may be statisticians *business* to catalyze  
the
scientific learning process, but that is the business of *any*  
scientist.

What we bring to the process is our understanding of the essentials of
statistics, just as the chemist brings her understanding of the  
essentials

of chemistry and the biologist her understanding of the essentials of
biology.

The essentials of statistics consist in answering the question of  
``could
this phenomenon have arisen by chance?''  This is where we  
contribute in a
way that other scientists do not.  They don't understand  
variability, the
poor dears.  (Unless they have been well taught and thereby have  
become
in part statisticians themselves.) They have a devastating tendency  
to treat
an estimated regression line as *the* regression line, the truth.   
And so on.


The *way* we address the question of ``could it have happened by  
chance''
and the way we address the problem of quantifying variability is  
indeed open

to a broad range of techniques including graphics.

Note that I did not say word one about p-values.  The example I gave  
was
a scientific question --- is there a difference in the home field  
advantage
between the English Premier Division and the equivalent division or  
league
in Italy?  How much of a difference?  You may wish to throw in a p- 
value,
or you may not.  You will probably wish to look at a confidence  
interval.
You may wish to look at the question from the point of view of the  
distribution
of (home) - (away) differences, in which case graphics will most  
certainly
help.  But it comes down to answering the basic question.  If you  
have no
ability to answer such questions you are not, or might as well not  
be, a

statistician.

cheers,

Rolf Turner


##
Attention:\ This e-mail message is privileged and confid...{{dropped: 
9}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regressão linear

2009-03-04 Thread Sueli Rodrigues

Olá. Tenho um arquivo que a cada 6 linhas corresponde uma amostra da qual
preciso dos coeficientes da regressão linear. Como faço para que o
programa distinga a cada 6 linhas como uma amostra e não calcule como um
todo?
Estou usando a função: model=lm(y ~ x)


Sueli Rodrigues

Eng. Agrônoma - UNESP
Mestranda - USP/ESALQ
PPG-Solos e Nutrição de Plantas
Fones (19)93442981
  (19)33719762

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problems with exporting a chart

2009-03-04 Thread Elena Wilson
Dear Uwe, 
Thank you very much for your email. I think I have worked out that the problem 
was related to the coordinates of the legend that are manually specified in the 
leg_loc command. However, I'm not exactly sure what was wrong with exporting 
the picture of the plot...  

To avoid the problem I have to run plot.new() before running the histogram 
command to refresh the default parameters of the Device window (that might have 
changed after plotting previous charts or changing the size of the device 
window or some other reasons I don't know of...) 

I'm sorry that I wasn't clear enough in my question. I'll try to be this time 
and give you an idea of what happened.

If there is one legend, keywords like "topleft", "rightleft" etc are very good, 
but they don't work with several legends to be placed on each individual 
sub-plot. Maybe you can suggest a better way of placing multiple legends on the 
charts that would automatically detect the coordinates for each legend???

R version 2.8.0
Lattice version 0.17-15

With regards to format and device, I've tried many ways:
- Use the (Windows) device directly, then once the chart is ready I either copy 
it as a metafile and then paste to a document, or try to save it as 
pdf/png/jpeg etc OR
- Use pdf / png / jpeg functions to directly save the output to an external file

This is the code (which works now):

*To generate a similar data frame to the one I use:
data=data.frame("Size"=rep(c(60,70,150,250, 450),each=500), 
"Delta_R2"=rnorm(2500,mean=0,sd=1)) 
attach(data)

library(lattice)
plot.new() # I run it here to restore the default par options 

histogram(~Delta_R2|as.factor(Size), type="percent", col="red", xlab="Delta OLS 
- SV R squared", main="R Squared Deviations")
leg_loc=matrix(c(-0.05, 0.3, 0.65, -0.05, 0.3, 0.4, 0.4, 0.4, 0.98, 
0.98),ncol=2, nrow=5, byrow=FALSE) # specifying the coordinates of the legends
for (i in 1:5) {
nR=(i-1)*500+1
nR2=nR+499 
z=data[nR:nR2,2]

m<-mean(z)
std<-sqrt(var(z))
iqr=IQR(z)
median=median(z)
legend(leg_loc[i,1],leg_loc[i,2], cex=0.7,
legend= paste(
"Mean=",round(m,3),'\n',
"SD=",round(std,3),'\n',
"Median =",round(median,3),'\n',
"IQR=", round(iqr,3)),bty="n")
}

Then I copy it or save as pdf / png / jpeg etc...

Thanks a lot for getting back to me regarding this!

Best regards, 

Elena Wilson
DBM Consultants Pty Ltd
5-7 Guest Street, Hawthorn, Victoria 3122, Australia
T: (61 3) 9819 1555
www.dbmconsultants.com

Please consider the environment before printing this email.

NOTICE - The information contained in this email may be confidential and/or 
privileged. You should only read, disclose, re-transmit, copy, distribute, act 
in reliance on or commercialise the information if you are authorised to do so. 
If you receive this email communication in error, please notify us immediately 
by email to d...@dbmcons.com.au, or reply by email direct to the sender and 
then destroy any electronic or paper copy of this message.

 -Original Message-
From:   Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] 
Sent:   Wednesday, 4 March 2009 9:42 PM
To: Elena Wilson
Cc: r-help@r-project.org
Subject:Re: [R] problems with exporting a chart

Please read the posting guide which asks you to answer basic questions 
such as:

Which R / lattice versions are we talking about?
Which is the "any" format?
Are you using the Devices directly or are you using some other way to 
copy contents of one device into another device?
What is the exact, minimal code (including data!) that reproduces your 
problem? It would be nice if we could copy and paste in it work on our 
machines.
Why do you call plot.new()?

Uwe Ligges




Elena Wilson wrote:
> Dear R helpers, 
> 
> I have a problem with exporting a chart (to any format). The graphic device 
> becomes inactive and I get the 'Error: invalid graphics state' error message. 
> I searched the help, web and FAQ but couldn't find the solution.
> 
> This is my code:
> I chart a histogram for differences in R2 by sample size (an extract from the 
> data is below). Altogether I have n=2500 observations (n=500 per sample size)
> 
> Size; Delta_R2
> 60; 0.0073842 
> 60; 0.0007156 
> ...
> 70; 0.0049717
> 70; 0.0121892 
> ...
> 150; 0.0139615 
> 150; 0.0088114
> ...
> 250; 0.0027976
> 250; 0.0109080 
> ...
> 450; 0.0050917
> 450; 0.0088114
> ...
> 
> The histogram works ok and I can save  or copy to pdf/jpeg/png etc  with no 
> problems
> 
> library(lattice)
> plot.new()
> histogram(~Delta_R2|as.factor(Size), type="percent", col="red", xlab="Delta 
> OLS - SV R squared", main="R Squared Deviations")  
> 
> Once I put the legends (5 text boxes) on the chart and I try to save or copy 
> it as pdf / jpeg/png etc I get the above mentioned error message.
> 
> This is the code for adding the legends:
> 
> 
> *The locations of the legends for each chart
> leg_loc=matrix(c( -0.1, 0.26,  0.62, -0.1, 0.26, 0.4, 0.4, 0.4, 1, 1),ncol=2, 
> nrow=5, byrow=FALSE)
> 
> *Calculate the statistics for 

[R] text at the upper left corner outside of the plot region

2009-03-04 Thread batholdy

Hi,

is there a way to place text at the upper left corner (or another  
corner) of the plot?


I want to place it really at the upper left corner of the whole plot  
(the file I get),

not at the upper left corner of the plot-region.



I tried text() and mtext(), and corner.label() of the plotrix package  
but it didn't work out.



thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in by() with dates as levels?

2009-03-04 Thread Jorge Ivan Velez
Dear Oren,
Try this:

> x <- data.frame(A= c("X", "Y", "X", "X", "Y", "Y", "Z" ), D =
+ as.POSIXct(c("2008-11-03","2008-11-03","2008-11-03", "2008-11-04",
+ "2009-01-13", "2009-01-13", "2009-01-13")), Z = 1:7)
>
> m<-with(x,tapply(Z, list(A,D), sum))
> m[rownames(m)==x$A[1],]
2008-11-03 2008-11-04 2009-01-13
 4  4 NA
> m[,colnames(m)==x$D[1]]
 X  Y  Z
 4  2 NA
> m[rownames(m)==x$A[1],colnames(m)==x$D[1]]
[1] 4
> x$A[1]
[1] X
Levels: X Y Z
> x$D[1]
[1] "2008-11-03 EST"

See ?with, ?tapply, ?rownames, and ?colnames for more information.

HTH,

Jorge


On Wed, Mar 4, 2009 at 6:22 PM, oren cheyette  wrote:

> Trying to use dates in their R-native form (e.g., POSIXct) rather than
> turning them into character strings, I've encountered the following
> problem.
> I create a data frame where one column is dates. Then I use "by()" to do a
> calculation on grouped subsets of the data. When I try to extract values
> from the result, I get "subscript out of bounds". The example below shows
> the error.
>
> > x <- data.frame(A= c("X", "Y", "X", "X", "Y", "Y", "Z" ), D =
> as.POSIXct(c("2008-11-03","2008-11-03","2008-11-03", "2008-11-04",
> "2009-01-13", "2009-01-13", "2009-01-13")), Z = 1:7)
> > m <- by(x, list(A=x$A, D=x$D), function(v) { sum(v$Z); })
> > m[x$A[1], x$D[1]]
> Error: subscript out of bounds
>
>
> Rgds,
> Oren Cheyette
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inference for R Spam

2009-03-04 Thread Rolf Turner


On 5/03/2009, at 12:13 PM, Bert Gunter wrote:



"The purpose of the subject or discipline ``statistics'' is in essence
to answer the question ``could the phenomenon we observed have arisen
simply by chance?'', or to quantify the *uncertainty* in any estimate
that we make of a quantity."


May I take strong issue with this characterization? It is far too  
narrow and
constraining. We are scientists first and foremost. The most  
important and
useful thing I do is to collaborate with other scientists to frame  
good
questions, design good experiments and studies, and gain insight  
into the
results of those experiments and studies (usually via graphical  
displays,
for which R is superbly suited). Blessing data with P-values is  
rarely of
much importance, and is often frankly irrelevant and even  
misleading (but

that's another rant).

George Box said this much better than I: "The business of the  
statistician

is to catalyze the scientific learning process."

This is much much more than you intimate.


I must respectfully disagree.  Far be it from me to argue with George  
Box,

but nevertheless ... it may be statisticians *business* to catalyze the
scientific learning process, but that is the business of *any*  
scientist.

What we bring to the process is our understanding of the essentials of
statistics, just as the chemist brings her understanding of the  
essentials

of chemistry and the biologist her understanding of the essentials of
biology.

The essentials of statistics consist in answering the question of  
``could
this phenomenon have arisen by chance?''  This is where we contribute  
in a
way that other scientists do not.  They don't understand variability,  
the

poor dears.  (Unless they have been well taught and thereby have become
in part statisticians themselves.) They have a devastating tendency  
to treat
an estimated regression line as *the* regression line, the truth.   
And so on.


The *way* we address the question of ``could it have happened by  
chance''
and the way we address the problem of quantifying variability is  
indeed open

to a broad range of techniques including graphics.

Note that I did not say word one about p-values.  The example I gave was
a scientific question --- is there a difference in the home field  
advantage
between the English Premier Division and the equivalent division or  
league
in Italy?  How much of a difference?  You may wish to throw in a p- 
value,
or you may not.  You will probably wish to look at a confidence  
interval.
You may wish to look at the question from the point of view of the  
distribution
of (home) - (away) differences, in which case graphics will most  
certainly
help.  But it comes down to answering the basic question.  If you  
have no

ability to answer such questions you are not, or might as well not be, a
statistician.

cheers,

Rolf Turner


##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bug in by() with dates as levels?

2009-03-04 Thread oren cheyette
Trying to use dates in their R-native form (e.g., POSIXct) rather than
turning them into character strings, I've encountered the following problem.
I create a data frame where one column is dates. Then I use "by()" to do a
calculation on grouped subsets of the data. When I try to extract values
from the result, I get "subscript out of bounds". The example below shows
the error.

> x <- data.frame(A= c("X", "Y", "X", "X", "Y", "Y", "Z" ), D =
as.POSIXct(c("2008-11-03","2008-11-03","2008-11-03", "2008-11-04",
"2009-01-13", "2009-01-13", "2009-01-13")), Z = 1:7)
> m <- by(x, list(A=x$A, D=x$D), function(v) { sum(v$Z); })
> m[x$A[1], x$D[1]]
Error: subscript out of bounds


Rgds,
Oren Cheyette

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inference for R Spam

2009-03-04 Thread Bert Gunter

"The purpose of the subject or discipline ``statistics'' is in essence
to answer the question ``could the phenomenon we observed have arisen
simply by chance?'', or to quantify the *uncertainty* in any estimate
that we make of a quantity."


May I take strong issue with this characterization? It is far too narrow and
constraining. We are scientists first and foremost. The most important and
useful thing I do is to collaborate with other scientists to frame good
questions, design good experiments and studies, and gain insight into the
results of those experiments and studies (usually via graphical displays,
for which R is superbly suited). Blessing data with P-values is rarely of
much importance, and is often frankly irrelevant and even misleading (but
that's another rant). 

George Box said this much better than I: "The business of the statistician
is to catalyze the scientific learning process."

This is much much more than you intimate.

Cheers to all,

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about the use of large datasets in R

2009-03-04 Thread Vadlamani, Satish {FLNA}
Hi:
Sorry if this is a double post. I posted the same thing this morning and did 
not see it.

I just started using R and am asking the following questions so that I can plan 
for the future when I may have to analyze volume data.

1) What are the limitations of R when it comes to handling large datasets? Say 
for example something like 200M rows and 15 columns data frame (between 1.5 to 
2 GB in size)? Will the limitation be based on the specifications of the 
hardware or R itself?
2) Is R 32 bit compiled or 64 bit (on say Windows and AIX)
3) Are there any other points to note / things to keep in mind when handling 
large datasets?
4) Should I be looking at SAS also only for this reason (we do have SAS 
in-house but the problem is that I am still not sure what we have license for, 
etc.)

Any pointers / thoughts will be appreciated.

Satish

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Test mail

2009-03-04 Thread Vadlamani, Satish {FLNA}
Hi:
This is a test mail. Thanks.
Satish

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mapping lat and long with maps package

2009-03-04 Thread David Winsemius
Well, you're the one who offered code without designating what  
libraries were loaded or required. Here's my sessionInfo, ... what's  
yours?


> sessionInfo()
R version 2.8.1 Patched (2009-01-07 r47515)
i386-apple-darwin9.6.0

locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] maps_2.0-40 zoo_1.5-5

loaded via a namespace (and not attached):
[1] grid_2.8.1  lattice_0.17-20 tools_2.8.1

--
David Winsemius
On Mar 4, 2009, at 5:41 PM, Alina Sheyman wrote:


When I run this code i get the following error messages

Error in mapgetg(database, gon, as.polygon, xlim, ylim) :
  NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In min(x, na.rm = na.rm) :
  no non-missing arguments to min; returning Inf
2: In max(x, na.rm = na.rm) :
  no non-missing arguments to max; returning -Inf

On Wed, Mar 4, 2009 at 5:27 PM, David Winsemius > wrote:
The example on the help page would seem to be completely on point if  
I understand your desire to be plotting text at particular long,lat  
coordinates:


?map

text(long, lat, "text")

#
data(ozone)
map("state", xlim = range(ozone$x), ylim = range(ozone$y))
text(ozone$x, ozone$y, ozone$median)
box()

--
David Winsemius


On Mar 4, 2009, at 5:04 PM, Alina Sheyman wrote:

I am trying to overlay a data frame with lat and longitude(which  
refer to

zip codes) on the map of US that I get by using map ("states").
Is there anyway to do this or do I have to resort to using maptools?

thank you

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Map

2009-03-04 Thread Dr. Alireza Zolfaghari
Hi list,
I dont know why the top of map gets truncated: I appreciate if any one give
me a solution.

filename="C:\\temp\\test.pdf"
pdf(file=filename, width=15, height=10)
library(maps)
require("mapproj")
longlatLimit<-c(-106.65,  -93.53 ,  25.93 ,  36.49)
par(plt=c(0,1,0,1),cex=1,cex.main=1)  #Set plotting parameters
map(projection="azequalarea",
type="n",xlim=longlatLimit[1:2],ylim=longlatLimit[3:4])
bound<-c(floor(longlatLimit[1]), ceiling(longlatLimit[2]),
floor(longlatLimit[3]), ceiling(longlatLimit[4]))
map.grid(lim=bound,col="light grey")
dev.off()


Regards,
Alireza

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mapping lat and long with maps package

2009-03-04 Thread Alina Sheyman
When I run this code i get the following error messages

Error in mapgetg(database, gon, as.polygon, xlim, ylim) :
  NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In min(x, na.rm = na.rm) :
  no non-missing arguments to min; returning Inf
2: In max(x, na.rm = na.rm) :
  no non-missing arguments to max; returning -Inf

On Wed, Mar 4, 2009 at 5:27 PM, David Winsemius wrote:

> The example on the help page would seem to be completely on point if I
> understand your desire to be plotting text at particular long,lat
> coordinates:
>
> ?map
>
> text(long, lat, "text")
>
> #
> data(ozone)
> map("state", xlim = range(ozone$x), ylim = range(ozone$y))
> text(ozone$x, ozone$y, ozone$median)
> box()
>
> --
> David Winsemius
>
>
> On Mar 4, 2009, at 5:04 PM, Alina Sheyman wrote:
>
>  I am trying to overlay a data frame with lat and longitude(which refer to
>> zip codes) on the map of US that I get by using map ("states").
>> Is there anyway to do this or do I have to resort to using maptools?
>>
>> thank you
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mapping lat and long with maps package

2009-03-04 Thread David Winsemius
The example on the help page would seem to be completely on point if I  
understand your desire to be plotting text at particular long,lat  
coordinates:


?map

text(long, lat, "text")

#
data(ozone)
map("state", xlim = range(ozone$x), ylim = range(ozone$y))
text(ozone$x, ozone$y, ozone$median)
box()

--
David Winsemius

On Mar 4, 2009, at 5:04 PM, Alina Sheyman wrote:

I am trying to overlay a data frame with lat and longitude(which  
refer to

zip codes) on the map of US that I get by using map ("states").
Is there anyway to do this or do I have to resort to using maptools?

thank you

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mapping lat and long with maps package

2009-03-04 Thread Alina Sheyman
I am trying to overlay a data frame with lat and longitude(which refer to
zip codes) on the map of US that I get by using map ("states").
Is there anyway to do this or do I have to resort to using maptools?

thank you

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to create many variables at one time?

2009-03-04 Thread David Winsemius

You can also change the column names to something else en mass:

colnames(dat) <- paste("X",1:100,sep="")

I next tried constructing the X names inside data.frame, but failed  
using the paste function. The help page for data.frame has a paragraph  
that begins "How the names of the data frame are created is complex..."


At least the following will result in names of the form X.1, X.2 ...

dat <- data.frame( X = replicate(100, rnorm(10)) )

--
David Winsemius
On Mar 4, 2009, at 4:44 PM, Kingsford Jones wrote:

On Wed, Mar 4, 2009 at 2:34 PM, Manli Yan   
wrote:

 Hi:
 I need to create many variables at one time,how to do this in R?
 for eg ,X1,X2...X100?


It depends what you want.  If you want 100 random normal variables of
length 10, stored in a data.frame with names V1, V2, ..., V100 try

dat <- as.data.frame(replicate(100, rnorm(10)))


hth,
Kingsford Jones



 Thanks~

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with GAM

2009-03-04 Thread Daniel Malter
Type unique(density)

How many different unique values does density take? The wiggliness of the
smooth term consumes degrees of freedom. That is, the more wiggly your
smooth term, the more DFs it consumes. If you get the error you got, you
have to reduce the degrees of freedom alloted to the smooth term manually.
See ?gam

If however, the unique levels of density are very few, then gam may not be
the right way to take anyway. 

Da.


-
cuncta stricte discussurus
-

-Ursprüngliche Nachricht-
Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im
Auftrag von Las dA
Gesendet: Wednesday, March 04, 2009 1:04 PM
An: r-help@r-project.org
Betreff: [R] help with GAM

Hi

I'm trying to do a GAM analysis and have the following codes entered into R
(density is = sample number, alive are the successes)

density<-as.real(density)

y<-cbind(alive,density-alive)

library(mgcv)

m1<-gam(y~s(density),binomial)

at which point I get the following error message

Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
A term has fewer unique covariate combinations than specified maximum
degrees of freedom

What am I doing wrong?  Please help!

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to generate fake population (ie. not sample) data?

2009-03-04 Thread HBaize


Could that be extended to generate a population data set with known skew and
kurtosis? 
If so, how? 

Thanks in advance for suggested solutions.

Harold



Daniel Nordlund-2 wrote:
> 
> 
> 
> Something like this may help get you started.
> 
> std.pop <- function(x,mu,stdev){
>   ((x-mean(x))/sd(x)*stdev)+mu
>   }
> 
> population <- std.pop(rnorm(1000),10,5)
> 
> Hope this is helpful,
> 
> Dan
> 
> Daniel Nordlund
> Bothell, WA USA
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-generate-fake-population-%28ie.-not-sample%29-data--tp22322488p22340256.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to create many variables at one time?

2009-03-04 Thread jim holtman
Have you considered using a 'list'? much easier to manage than a lot
of individual objects.

mylist <- lapply(1:100, runif)


On Wed, Mar 4, 2009 at 4:34 PM, Manli Yan  wrote:
>  Hi:
>  I need to create many variables at one time,how to do this in R?
>  for eg ,X1,X2...X100?
>
>  Thanks~
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] changing font size for y-axis factor labels

2009-03-04 Thread Kingsford Jones
Does cex.axis not work in that it reduces the size for both x and y
axes?  If that's the case try calling plot with axes=FALSE, and then
add axes seperately with the axis function.

Kingsford Jones

On Wed, Mar 4, 2009 at 1:59 PM, Tiffany Vidal
 wrote:
> I am trying to reduce the font size for y-axis labels, not ylab, but the
> actual categorical names.  I have tried cex, cex.axis, cex.lab, font, but
> none seem to do the trick.  Any ideas?  thank you.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] place text out of plot region

2009-03-04 Thread Eik Vettorazzi

Why use text()?
There is a function called "mtext" for that task.

hth.

batho...@googlemail.com schrieb:

Hi,

is there a way to place text out of the plot region with text() ?



thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/42803-8243
F ++49/40/42803-7790

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to create many variables at one time?

2009-03-04 Thread Jorge Ivan Velez
Dear Manli,
Do you mean the names of the variables?  If so, something like this should
work:

paste('X',1:100,sep="")


HTH,

Jorge


On Wed, Mar 4, 2009 at 4:34 PM, Manli Yan  wrote:

>  Hi:
>  I need to create many variables at one time,how to do this in R?
>  for eg ,X1,X2...X100?
>
>  Thanks~
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to create many variables at one time?

2009-03-04 Thread Eik Vettorazzi

Hi Manli,
you may consider structuring your data in some appropriate form like 
data.frame or list. Its often not the best way holding information 
separated in many variables.

But if you *really* want to create 100 separate variables, something like

for (i in 1:100) assign(paste("X",i,sep=""), some_values)

will do the job.


Manli Yan schrieb:

  Hi:
  I need to create many variables at one time,how to do this in R?
  for eg ,X1,X2...X100?

 Thanks~

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  


--
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/42803-8243
F ++49/40/42803-7790

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] place text out of plot region

2009-03-04 Thread batholdy

Hi,

is there a way to place text out of the plot region with text() ?



thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to create many variables at one time?

2009-03-04 Thread Kingsford Jones
On Wed, Mar 4, 2009 at 2:34 PM, Manli Yan  wrote:
>  Hi:
>  I need to create many variables at one time,how to do this in R?
>  for eg ,X1,X2...X100?

It depends what you want.  If you want 100 random normal variables of
length 10, stored in a data.frame with names V1, V2, ..., V100 try

dat <- as.data.frame(replicate(100, rnorm(10)))


hth,
Kingsford Jones

>
>  Thanks~
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FW: flow control

2009-03-04 Thread William Dunlap
   The help page for ?"for" says that:

   The index seq in a for loop is evaluated at the start of the loop;
changing
   it subsequently does not affect the loop. The variable var has the
same type
   as seq, and is read-only: assigning to it does not alter seq.

The help file is not right when seq is a list() or other recursive
type.  In that case var has the type of seq[[i]] where i is the current
iteration count.  (I think this is true in general, since [ and [[ act
the same for nonrecursive types when the indices are such that a scalar
would be returned.  However that explanation is unnecessarily
complicated
in the nonrecursive case.)

Also, the variable var is not really read-only.  You can alter it but
it gets reset to the next value in seq at the start of each iteration.
You cannot affect the meaning of 'next' to force it to, e.g, omit or
repeat iterations.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Colormap that look good in gray scale

2009-03-04 Thread thibert

Thanks,
   Here is my partial solution, from what you suggested me:
library(TeachingDemos)

z<-colors()
zz<-col2grey(z)
#index sorted
zzz<-sort(zz,index.return = TRUE)$ix

x<-z # colors in order or their greyscale
y<-z # greyscale sorted in gradient
for (i in 1:length(z)){
   x[i]<-z[zzz[i]]
   y[i]<-zz[zzz[i]]
}

myCol<-round(seq(from=1,to=length(x),length.out=10))
myCol<-x[myCol]

I then look at it and change to colors that are too similar for another
value close in geyscale.
-- 
View this message in context: 
http://www.nabble.com/Colormap-that-look-good-in-gray-scale-tp22336097p22339785.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] changing font size for y-axis factor labels

2009-03-04 Thread Tiffany Vidal
I am trying to reduce the font size for y-axis labels, not ylab, but the 
actual categorical names.  I have tried cex, cex.axis, cex.lab, font, 
but none seem to do the trick.  Any ideas?  thank you.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to create many variables at one time?

2009-03-04 Thread Manli Yan
  Hi:
  I need to create many variables at one time,how to do this in R?
  for eg ,X1,X2...X100?

 Thanks~

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding value labels on Interaction Plot

2009-03-04 Thread Dimitri Liakhovitski
I'll reply to my own post - to make sure no one wastes his/her time on that.
I was able to solve the problem only after I modified the original
function interaction.plot (see below). All I did I added one line
before the final } - asking it to return the means on the numeric
(dependent) variable.
After that, the following code worked:

d=data.frame(xx=c(1,1,1,1,2,2,2,2,3,3,3,3),yy=c(3,3,4,4,3,3,4,4,3,3,4,4),zz=c(-1.1,-1.3,0,0.6,-0.5,1,3.3,-1.3,4.4,3.5,5.1,3.5))
d[[1]]<-as.factor(d[[1]])
d[[2]]<-as.factor(d[[2]])
print(d)

coordinates<-modified.interaction.plot(d$xx, d$yy, d$zz, fun=mean,
  type="b", col=c("red","blue"), legend=F,
  lty=c(1,2), lwd=2, pch=c(18,24),
  xlab="X Label (level of xx)",
  ylab="Y Label (level of zz)",
  main="Chart Label")

grid(nx=NA, ny=NULL,col = "lightgray", lty = "dotted",
 lwd = par("lwd"), equilogs = TRUE)

legend("bottomright",c("3","4"),bty="n",lty=c(1,2),lwd=2,
pch=c(18,24),col=c("red","blue"),title="Level of yy")

coordinates<-as.data.frame(coordinates)
str(coordinates)

for(i in 1:length(coordinates)) {

text(x=as.numeric(row.names(coordinates)),y=coordinates[[i]],labels=coordinates[[i]],pos=3)
}


### Modified interaction.plot function ###
modified.interaction.plot=function (x.factor, trace.factor, response,
fun = mean, type = c("l",
"p", "b"), legend = TRUE, trace.label = deparse(substitute(trace.factor)),
fixed = FALSE, xlab = deparse(substitute(x.factor)), ylab = ylabel,
ylim = range(cells, na.rm = TRUE), lty = nc:1, col = 1, pch = c(1:9,
0, letters), xpd = NULL, leg.bg = par("bg"), leg.bty = "n",
xtick = FALSE, xaxt = par("xaxt"), axes = TRUE, ...)
{
ylabel <- paste(deparse(substitute(fun)), "of ",
deparse(substitute(response)))
type <- match.arg(type)
cells <- tapply(response, list(x.factor, trace.factor), fun)
nr <- nrow(cells)
nc <- ncol(cells)
xvals <- 1:nr
if (is.ordered(x.factor)) {
wn <- getOption("warn")
options(warn = -1)
xnm <- as.numeric(levels(x.factor))
options(warn = wn)
if (!any(is.na(xnm)))
xvals <- xnm
}
xlabs <- rownames(cells)
ylabs <- colnames(cells)
nch <- max(sapply(ylabs, nchar, type = "width"))
if (is.null(xlabs))
xlabs <- as.character(xvals)
if (is.null(ylabs))
ylabs <- as.character(1:nc)
xlim <- range(xvals)
xleg <- xlim[2] + 0.05 * diff(xlim)
xlim <- xlim + c(-0.2/nr, if (legend) 0.2 + 0.02 * nch else 0.2/nr) *
diff(xlim)
matplot(xvals, cells, ..., type = type, xlim = xlim, ylim = ylim,
xlab = xlab, ylab = ylab, axes = axes, xaxt = "n", col = col,
lty = lty, pch = pch)
if (axes && xaxt != "n") {
axisInt <- function(x, main, sub, lwd, bg, log, asp,
...) axis(1, x, ...)
mgp. <- par("mgp")
if (!xtick)
mgp.[2] <- 0
axisInt(1, at = xvals, labels = xlabs, tick = xtick,
mgp = mgp., xaxt = xaxt, ...)
}
if (legend) {
yrng <- diff(ylim)
yleg <- ylim[2] - 0.1 * yrng
if (!is.null(xpd) || {
xpd. <- par("xpd")
!is.na(xpd.) && !xpd. && (xpd <- TRUE)
}) {
op <- par(xpd = xpd)
on.exit(par(op))
}
text(xleg, ylim[2] - 0.05 * yrng, paste("  ", trace.label),
adj = 0)
if (!fixed) {
ord <- sort.list(cells[nr, ], decreasing = TRUE)
ylabs <- ylabs[ord]
lty <- lty[1 + (ord - 1)%%length(lty)]
col <- col[1 + (ord - 1)%%length(col)]
pch <- pch[ord]
}
legend(xleg, yleg, legend = ylabs, col = col, pch = if (type %in%
c("p", "b"))
pch, lty = if (type %in% c("l", "b"))
lty, bty = leg.bty, bg = leg.bg)
}
invisible()
return(cells)
}


On Wed, Mar 4, 2009 at 3:48 PM, Dimitri Liakhovitski  wrote:
> Thank you, David, however, I am not sure this approach works.
> Let's try it again - I slightly modifed d to make it more clear:
>
> d=data.frame(xx=c(1,1,1,1,2,2,2,2,3,3,3,3),yy=c(3,3,4,4,3,3,4,4,3,3,4,4),zz=c(-1.1,-1.3,0,0.6,-0.5,1,3.3,-1.3,4.4,3.5,5.1,3.5))
> d[[1]]<-as.factor(d[[1]])
> d[[2]]<-as.factor(d[[2]])
> print(d)
>
> interaction.plot(d$xx, d$yy, d$zz, fun=mean,
>  type="b", col=c("red","blue"), legend=F,
>  lty=c(1,2), lwd=2, pch=c(18,24),
>  xlab="X Label (level of xx)",
>  ylab="Y Label (level of zz)",
>  main="Chart Label")
>
> grid(nx=NA, ny=NULL,col = "lightgray", lty = "dotted",
>     lwd = par("lwd"), equilogs = TRUE)
>
> legend("bottomright",c("3","4"),bty="n",lty=c(1,2),lwd=2,
> pch=c(18,24),col=c("red","blue"),title="Level of yy")
>
>
> The dots on both lines show means on zz for a given combination of
> factors xx and yy.
>
> # If I add this line:
> with(d, text(xx,zz,paste(zz)))
> It adds the zz values for ALL data points in d - instead of the means
> that are shown on the graph...
> Any advice?
>
> Dimitri
>
>
> d=data.frame(xx=c(3,3,3,2,2,2,1,1,1)

Re: [R] FW: flow control

2009-03-04 Thread Christos Hatzis
Hi Ken,

The help page for ?"for" says that:

The index seq in a for loop is evaluated at the start of the loop; changing
it subsequently does not affect the loop. The variable var has the same type
as seq, and is read-only: assigning to it does not alter seq.

So you cannot do what you want to do with a for loop.  But you could do what
you want with a while loop:

i <- 0
while(i < 20) {
i <- i + 1
cat(i, "\n")
if(i %% 5 == 0) i <- i + 2
}

-Christos

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Lo, Ken
> Sent: Wednesday, March 04, 2009 3:53 PM
> To: r-help@r-project.org
> Subject: [R] FW: flow control
> 
> Hi all,
> 
> I need a little help with flow control in R.  What I'd like 
> to do is to advance a for loop by changing its counter.  
> However, what seems obvious to me does not yield the proper 
> results.  An example of my problem is
> 
> 
> for (i in seq(1, some_number, some_increment)){
>   
>   if (some_condition == T) i <- i + 2;  #want to advance 
> the loop by 2 }
> 
> Whenever the counter goes to the next step, the next item in 
> the original sequence seq(1, some_number, some_increment) 
> replaces the counter value.  Is there a way for me to do this?
> 
> Best,
> 
> Ken
> 
>   
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Eliminate Factors from Data Frame

2009-03-04 Thread David Winsemius
It's in the R-FAQ. I can't remember it's 7.20 or 7.35 but it's in that  
general area.


--
David Winsemius


On Mar 4, 2009, at 3:38 PM, Bob Roberts wrote:


Hi,
  I formed a 49 by 3 data frame by reading in a text file using  
read.table(), and combining it with a matrix that I made by using  
unlist() on a list of character strings. I would like to do some  
simple arithmetic operations on the elements in the data frame  
columns (e.g. column 3/column2) but the values in the data frame are  
stored as factors and using stringsAsFactors=FALSE did not work. I  
get this error when doing arithmetic operations:
In Ops.factor(dataframe$col3, dataframe$col2) : / not meaningful for  
factors
Is there a way to store these values not as factors or convert them  
to numeric values? Or a way to do operations on factors in data  
frames? I know R has I() but that didn't help in this case. Thanks  
so much.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] FW: flow control

2009-03-04 Thread Lo, Ken
Hi all,

I need a little help with flow control in R.  What I'd like to do is to
advance a for loop by changing its counter.  However, what seems obvious
to me does not yield the proper results.  An example of my problem is


for (i in seq(1, some_number, some_increment)){

if (some_condition == T) i <- i + 2;  #want to advance the loop
by 2
}

Whenever the counter goes to the next step, the next item in the
original sequence seq(1, some_number, some_increment) replaces the
counter value.  Is there a way for me to do this?

Best,

Ken



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding value labels on Interaction Plot

2009-03-04 Thread Dimitri Liakhovitski
Thank you, David, however, I am not sure this approach works.
Let's try it again - I slightly modifed d to make it more clear:

d=data.frame(xx=c(1,1,1,1,2,2,2,2,3,3,3,3),yy=c(3,3,4,4,3,3,4,4,3,3,4,4),zz=c(-1.1,-1.3,0,0.6,-0.5,1,3.3,-1.3,4.4,3.5,5.1,3.5))
d[[1]]<-as.factor(d[[1]])
d[[2]]<-as.factor(d[[2]])
print(d)

interaction.plot(d$xx, d$yy, d$zz, fun=mean,
  type="b", col=c("red","blue"), legend=F,
  lty=c(1,2), lwd=2, pch=c(18,24),
  xlab="X Label (level of xx)",
  ylab="Y Label (level of zz)",
  main="Chart Label")

grid(nx=NA, ny=NULL,col = "lightgray", lty = "dotted",
 lwd = par("lwd"), equilogs = TRUE)

legend("bottomright",c("3","4"),bty="n",lty=c(1,2),lwd=2,
pch=c(18,24),col=c("red","blue"),title="Level of yy")


The dots on both lines show means on zz for a given combination of
factors xx and yy.

# If I add this line:
with(d, text(xx,zz,paste(zz)))
It adds the zz values for ALL data points in d - instead of the means
that are shown on the graph...
Any advice?

Dimitri


d=data.frame(xx=c(3,3,3,2,2,2,1,1,1),yy=c(4,3,4,3,4,3,4,3,4),zz=c(5.1,4.4,3.5,3.3,-1.1,-1.3,0,-0.5,0.6))
d[[1]]<-as.factor(d[[1]])
d[[2]]<-as.factor(d[[2]])
print(d)

interaction.plot(d$xx, d$yy, d$zz, fun=mean,
  type="b", col=c("red","blue"), legend=F,
  lty=c(1,2), lwd=2, pch=c(18,24),
  xlab="X Label",
  ylab="Y Label",
  main="Chart Label")

grid(nx=NA, ny=NULL,col = "lightgray", lty = "dotted",
 lwd = par("lwd"), equilogs = TRUE)

legend("bottomright",c("0.25","0.50","0.75"),bty="n",lty=c(1,2,3),lwd=2,
pch=c(18,24,22),col=c(PrimaryColors[c(6,4)],SecondaryColors[c(4)]),title="R
Squared")


On Wed, Mar 4, 2009 at 2:06 PM, David Winsemius  wrote:
> See if this helps. After your code, submit this to R:
>
> with(d, text(xx[xx==3],zz[xx==3],paste("3, ",zz[xx==3])))
>
> After that has convinced you that xx and zz are being used properly,  you
> can try the more general approach:
>
> with(d, text(xx,zz,paste(xx, " , ", zz)))
>
> I would have used ZZ rather than "Y Label" on the y axis, because yy is
> being used as a grouping parameter and the plotted value is really zz
>
> --
> David Winsemius
>
> On Mar 4, 2009, at 11:52 AM, Dimitri Liakhovitski wrote:
>
>> Hello - and sorry for what might look like a simple graphics question.
>>
>> I am building an interaction plot for d:
>>
>>
>> d=data.frame(xx=c(3,3,2,2,1,1),yy=c(4,3,4,3,4,3),zz=c(5.1,4.4,3.5,3.3,-1.1,-1.3))
>> d[[1]]<-as.factor(d[[1]])
>> d[[2]]<-as.factor(d[[2]])
>> print(d)
>>
>> interaction.plot(d$xx, d$yy, d$zz,
>>  type="b", col=c("red","blue"), legend=F,
>>  lty=c(1,2), lwd=2, pch=c(18,24),
>>  xlab="X Label",
>>  ylab="Y Label",
>>  main="Chart Label")
>>
>> I am trying and not succeeding in adding Y values (value labels in
>> Excel speak) near the data points on 3 lines of the graph.
>> I understand that I might have to use "text". But how do I tell text
>> to use the actual coordinates of the dots on the lines?
>>
>>
>> Thank you very much!
>>
>> --
>> Dimitri Liakhovitski
>> MarketTools, Inc.
>> dimitri.liakhovit...@markettools.com
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Dimitri Liakhovitski
MarketTools, Inc.
dimitri.liakhovit...@markettools.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouped Boxplot

2009-03-04 Thread Richard M. Heiberger
## you may need to
## install.packages("HH")
library(HH)

tmp <- data.frame(y=rnorm(500),
  g=rep.int(c("A", "B", "C", "D"), 125),
  a=factor(rbinom(500, 1, .5)))
bwplot(y ~ g | a, data=tmp)
bwplot(y ~ a | g, data=tmp)

tmp$ga <- with(tmp, interaction(a, g))
position(tmp$ga) <- c(1.1, 1.9, 3.1, 3.9, 5.1, 5.9, 7.1, 7.9)
bwplot(y ~ ga, data=tmp,
   panel=panel.bwplot.intermediate.hh,
   col=c(1,1,2,2,3,3,4,4),
   main="This is what you asked for")

## I normally want this in context
interaction2wt(y ~ a + g, data=tmp)

interaction2wt(y ~ a + g, data=tmp, simple=TRUE)


position(tmp$a) <- c(1.5, 3.5)
interaction2wt(y ~ a + g, data=tmp, simple=TRUE)
interaction2wt(y ~ a + g, data=tmp,
   simple=TRUE, simple.scale=list(a=.2, g=.4),
   sub="upper-right panel is what you asked for")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Eliminate Factors from Data Frame

2009-03-04 Thread Bob Roberts
Hi,
   I formed a 49 by 3 data frame by reading in a text file using read.table(), 
and combining it with a matrix that I made by using unlist() on a list of 
character strings. I would like to do some simple arithmetic operations on the 
elements in the data frame columns (e.g. column 3/column2) but the values in 
the data frame are stored as factors and using stringsAsFactors=FALSE did not 
work. I get this error when doing arithmetic operations:
In Ops.factor(dataframe$col3, dataframe$col2) : / not meaningful for factors
Is there a way to store these values not as factors or convert them to numeric 
values? Or a way to do operations on factors in data frames? I know R has I() 
but that didn't help in this case. Thanks so much.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2009-03-04 Thread DingJane

Hi, every body!
I am a new comer for R, so my question would unavoidablely sounds stupid. Sorry!
in my experiment, there are two type of soil ( soil F and soil D), each half of 
them were subjected to steam sterilize (result in FS and DS soil). A equal 
volume of soil from two of the four soil types (F, D, FS, DS) were mixed as 
follows: F+F, F+D, F+FS, F+DS, D+F, D+FS, D+DS, FS+DS (eight treatment).
Two type of plant (A, B) were planted in the eight treatment of soil in pot. 
There were 40 pots divided into 5 groups (8 pots for each group) for each 
treatment*plant combination. Finally there were 80 groups for plant A and B in 
total. The 40 groups for plant A were randomly arranged in plot 1, and for 
plant B in plot 2.
The experiment were sampled for three times. In each sampling date, one pot was 
randomly choosing from each group to measure biomass (80 pot for sampling date).
Now my question are as follow:
If different plant respond to soil treatment differently?
If plant's react to soil treatment deppends on time?
If soil F and D differed significantly in effects on plant biomass?
If soil sterilization have any aditional effects on plant biomass in this 
experiment?
Which is the most important factor for biomass accumulation?
It seems not so feasible to run six paralle one way ANOVA for each plant and 
sampling date combination to find answer for the questions above. It takes me a 
long time to learn lme4 package in R, but till now fruitless.
Would anybody recommend me a model and formula for these questions? Thank you!
Now you can see, my English is equally well as my statistics! Sorry again!
_
ÃλÃKͼ£¬°Ù±äÔìÐÍ£¬ÈÃÄãµÄÕÕƬÓëÖÚ²»Í¬£¬¿ìÀ´MClubÊÔÊÔ°É£¡

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] modifying a built in function from the stats package (fixing arima)

2009-03-04 Thread Rolf Turner


On 4/03/2009, at 10:06 PM, Marc Vinyes wrote:


Dear Carlos and Kjetil,

Thanks for your answer.

I do not think that is the way to go. If you believe that your  
algorithm
is better than the existing one, talk to the author of the package  
and

discuss the improvement. The whole community will benefit.


I should be able to *easily* modify it and test it first!


Copy the existing function into a new file, edit it and load it via
source.



3)  after downloading the source package (stats) containung arima,
rename it (my.arima) and then do the changes.


I obviously saved it with a different name and I was expecting it  
to work

out of the box but I get an error that I don't know how to solve:
Error in Delta %+% c(1, rep(0, seasonal$period - 1), -1) :
  object "R_TSconv" not found

Other people have previously discussed this in this list with no  
success...

http://www.nabble.com/Foreign-function-call-td21836156.html

Any other hints or maybe help with the error that I'm getting?



If you ***look at the code*** for arima you will see that ``%+%'' is  
defined
in terms of a call to ``.Call()'' which calls ``R_TSconv''.  So  
apparently
R_TSconv is a C or Fortran function or subroutine in a ``shared  
object library''
or dll upon which arima depends.  Hence to do anything with it you'll  
need to get
that shared object library and dynamically load it.  (E.g. get the  
code, SHLIB it,

and dynamically load the resulting shared object library.)

The code is all available from the R source tarball.

If this is a challenge for you then the best advice would be not to  
mess with it.


I don't think that ***I'd*** mess with it, and I'm at least partially  
au fait with

this stuff.


cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inference for R Spam

2009-03-04 Thread Rolf Turner


On 5/03/2009, at 4:54 AM, Michael A. Miller wrote:


"Rolf" == Rolf Turner  writes:



On 4/03/2009, at 11:50 AM, Michael A. Miller wrote:



Sports scores are not statistics, they are measurements
(counts) of the number of times each team scores.  There
is no sampling and vanishingly small possibility of
systematic error in the measurement.



I think this comment indicates a fundamental
misunderstanding of the nature of statistics in general and
the concept of variability in particular.  Measurement
error is only *one possible* source of variability and is
often a minor --- or as in the case of sports scores a
non-existent --- source.


Would you elaborate Rolf?  I'm was referring to measurements, not
statistics.  Isn't calling scores statistics similar to saying
that the values of some response in an individual subject before
and after treatment are statistics?  I think they are just
measured values and that if they are measured accurately enough,
they can be precisely known.  It is in considering the
distribution of similar measurements obtained in repeated trials
that statistics come into play.


From my perspective as a baseball fan (I know I'm in Indiana and

I aught to be more of a basketball fan, but I grew up as a Cubs
watcher and still can't shake it), it doesn't seem to me that the
purpose of the score is to allow for some inference about the
overall population of teams.  It is about which team beats the
other one and entertainment (and hot dogs) for the fans.


Well the *purpose* of the score has nothing to do with statistics
as such, but then then the ``purpose'' of many (most?) observations
to which the ideas of statistics are applied has nothing to do
with statistics either.

Technically a statistic is any function of a *sample* (sample =
a collection of random variables), including any one of these
random variables themselves.

The purpose of the subject or discipline ``statistics'' is in essence
to answer the question ``could the phenomenon we observed have arisen
simply by chance?'', or to quantify the *uncertainty* in any estimate
that we make of a quantity.

E.g., to stick with the sports idea:  We might ask ``Is there a home
field advantage?'' or ``How big is the home field advantage?'' or
``Is the home field advantage in the Premier Division (English football)
bigger than that in the equivalent division/league in Italian  
football?''


We would collect a sample or samples of pairs of scores

(X,Y) = (home team score, away team score)

and analyse these scores in some way, possibly on the basis of the
differences X - Y, possibly not, in order to answer these *statistical*
questions.  Not that there is *variability* or *uncertainty* in the  
differences
X - Y.  Even if we knew exactly that the home field advantage was  
1.576 goals,
we would not be able to say that the home team would always win by  
exactly
1.576 goals.  In fact the home team would *never* win by exactly  
1.576 goals! :-)


Sports scores are random variables.  You don't know a priori what the  
scores are
going to be, do you?  (Well, if you do, you must be able to make a  
*lot* of money
betting on games!)  After the game is over they aren't random any  
more; they're
just numbers.  But that applies to any random variable.  A random  
variable is

random only until it is observed, then POOF! it turns into a number.

The randomness in the scores does not arise from measurement error.   
This is usually
the case with integer valued random variables.  An ornithologist  
counting birds nests
in quadrats does not have to contend with measurement error.  Well,  
some ornithologists
might --- depends on how well they were taught to count.  But the  
quadrat counts are

random variables (statistics) nevertheless --- until they are observed.

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Colormap that look good in gray scale

2009-03-04 Thread Greg Snow
This does not fully answer your question, but there is a function col2grey (or 
col2gray) in the TeachingDemos package that will help you see what a given 
color plot will look like (approximately) when printed/photocopied in grayscale.

For your example you would do something like:

> plot(1:6,col=col2grey(c(1,7,5,3,2,4)),pch=c(1,20,20,20,20,20))

To view the grayscale version of the plot, then try with different colors until 
you are happy with the results.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of thibert
> Sent: Wednesday, March 04, 2009 11:23 AM
> To: r-help@r-project.org
> Subject: [R] Colormap that look good in gray scale
> 
> 
> Hi,
>I am looking for a colormap (in color) that look like a gradient in
> gray
> scale. It is to allow people without color printer to print the color
> graph
> and have something meaningful in gray scale.
> 
> It can be something like this
> plot(1:6,col=c(1,7,5,3,2,4),pch=c(1,20,20,20,20,20))
> but with an arbitrary number of different colors, not just six.
> 
> Thanks
> --
> View this message in context: http://www.nabble.com/Colormap-that-look-
> good-in-gray-scale-tp22336097p22336097.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouped Boxplot

2009-03-04 Thread David Winsemius

Q1:
See if this seems any better. I took the liberty of reconstruction  
your initial example in a longer dataframe:


dta <- data.frame(val = sample(t,1000), g = gl(4, 250, labels=c("A",  
"B", "C", "D")) , G2 = gl(2,1, labels=c("XX", "YY")))


#arguments to data.frame are recycled so one does not need to make the  
gl call with a
# length of 500, in fact , that only confuses things (or at least it  
does for me).


table(dta$G2,dta$g)

#   A   B   C   D
#  XX 125 125 125 125
#  YY 125 125 125 125

 boxplot( val ~ G2 + g, data=dta)

Q2:

?boxplot  #especially parameter at=

boxplot( val ~ g + G2, data=dta, at = 0.8*c(1,2,3,4,6,7,8,9),  
boxwex=0.4)


--
David Winsemius

On Mar 4, 2009, at 1:50 PM, soeren.vo...@eawag.ch wrote:


Pls forgive me heavy-handed data generation -- newby ;-)

### start ###
# example data
g <- rep.int(c("A", "B", "C", "D"), 125)
t <- rnorm(5000)
a <- sample(t, 500, replace=TRUE)
b <- sample(t, 500, replace=TRUE)

# what I actually want to have:
boxplot(a | b ~ g)

# but that does obviously not produce what I want, instead
i <- data.frame(g, a, rep("one", length(g)))
j <- data.frame(g, b, rep("two", length(g)))
names(i) <- c("Group", "Number", "Word")
names(j) <- c("Group", "Number", "Word")
k <- as.data.frame(rbind(i, j))
boxplot(k$Number ~ k$Word * k$Group)
### end ###

Questions: (1) Is there an easier way? (2) How can I get additional  
space between the A:D groups?


Thank you

Sören

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Diff btw percentile and quantile

2009-03-04 Thread Wacek Kusnierczyk
William Dunlap wrote:



> I entered the same x into Excel 2003 and used the formulae
> =percentile(A1:10,0),
> =percentile(A1:A10,.125), ..., =percentile(A1:A10,1) and got the results
>1, 1.125, 2.25, 3, 4, 6.875, 8, 8.875, 10
> This matches only R's type 7, the default.
>   

hurray!  in this respect, excel is as good as r ;)

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Table Transformation

2009-03-04 Thread hadley wickham
On Wed, Mar 4, 2009 at 11:58 AM, Christian Pilger
 wrote:
>
> Dear R-experts,
>
> recently, I started to discover the world of R. I came across a problem,
> that I was unable to solve by myself (including searches in R-help, etc.)
>
> I have a flat table similar to
>
> key1    key2    value1
>
> abcd_1  BP      10
> abcd_1  BSMP    1A
> abcd_1  PD      25
> abcd_2  BP      20
> abcd_3  BP      80
> abcd_4  IA      30
> abcd_4  PD      70
> abcd_4  PS      N
>
> I wish to transform this table to obtain the following result:
>
>        key2
> key1    BP      BSMP    IA      PD      PS
> abcd_1  "10"    "1A"    ""      "25"    ""
> abcd_2  "20"    ""      ""      ""      ""
> abcd_3  "80"    ""      ""      ""      ""
> abcd_4  ""      ""      "30"    "70"    "N"
>
> I considered "table" and "xtabs" but I could not get the desired result: I
> received cross-tables key1 vs. key2 that contained counts within the cells.
>
> Can anybody help me?

With the reshape package:

cast(mydf, key1 ~ key2)

You can find out more at http://had.co.nz/reshape

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dividing ts objects of different frequencies

2009-03-04 Thread Stephen J. Barr
Hello,

First, sorry for sending HTML emails earlier. This should now be in
plain-text mode.

I have two time series (ts) objects, 1 is yearly (population) and the
other is quarterly (bankruptcy statistics). I would like to produce a
quarterly time series object that consists of bankruptcy/population.
Is there a pre-built function to intelligently divide these time
series:

br.ts = ts(data=br.df[,-1], frequency = 4, start=c(2001,1), end=c(2008,2))
distPop.ts = ts(data=distPop.df[,-1], frequency=1, start=2000, end=2008)

The time series would consist of the elements (in pseudocode):

br.ts[2001Q1]/distPop.ts[2001] , br.ts[2001Q2]/distPop[2001],
br.ts[2001Q3]/distPop.ts[2001] , br.ts[2001Q4]/distPop[2001],
br.ts[2002Q1]/distPop.ts[2002] , br.ts[2002Q2]/distPop[2002],
etc.

I know that this would not be too difficult to write but does anything
like this already exist?

Thank you,

-stephen



-- 
==
Stephen J. Barr
University of Washington
Dept. of Applied and Computational Math Sciences
Dept. of Economics
WEB: www.econsteve.com
==



-- 
==
Stephen J. Barr
University of Washington
Dept. of Applied and Computational Math Sciences
Dept. of Economics
WEB: www.econsteve.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Table Transformation

2009-03-04 Thread Paul Johnson
On Wed, Mar 4, 2009 at 11:58 AM, Christian Pilger
 wrote:
>
> Dear R-experts,
>
> recently, I started to discover the world of R. I came across a problem,
> that I was unable to solve by myself (including searches in R-help, etc.)
>
> I have a flat table similar to
>
> key1    key2    value1
>
> abcd_1  BP      10
> abcd_1  BSMP    1A
> abcd_1  PD      25
> abcd_2  BP      20
> abcd_3  BP      80
> abcd_4  IA      30
> abcd_4  PD      70
> abcd_4  PS      N
>
> I wish to transform this table to obtain the following result:
>
>        key2
> key1    BP      BSMP    IA      PD      PS
> abcd_1  "10"    "1A"    ""      "25"    ""
> abcd_2  "20"    ""      ""      ""      ""
> abcd_3  "80"    ""      ""      ""      ""
> abcd_4  ""      ""      "30"    "70"    "N"
>

I think we would say that a dataframe of the first type is in the
"long" format, while the other one you want is in the "wide" format.
I've done changes like that with the "reshape" function that is in the
stats package.

This example you propose is like making one column for each "country"
where key 1 is like the "year" in which the observation is made.
Right?

You don't have an easily cut-and-pasteable code example, so I've
generated a little working example. Here, x1 is key 1 and x2 is key 2.

> x1 <- gl(4,5, labels=c("c1","c2","c3","c4"))
> x1
 [1] c1 c1 c1 c1 c1 c2 c2 c2 c2 c2 c3 c3 c3 c3 c3 c4 c4 c4 c4 c4
Levels: c1 c2 c3 c4
> x2 <- rep(1:5,4)
> df <- data.frame(x1, x2, y=rnorm(20))
> df
   x1 x2   y
1  c1  1  0.02095747
2  c1  2  0.05926233
3  c1  3 -0.07561916
4  c1  4 -1.06272710
5  c1  5 -1.89202032
6  c2  1 -0.04549782
7  c2  2 -0.68333187
8  c2  3 -0.99151410
9  c2  4 -0.29070280
10 c2  5 -0.97655024
11 c3  1  0.33411223
12 c3  2 -0.24907340
13 c3  3 -0.25469819
14 c3  4  1.23956157
15 c3  5 -1.38162430
16 c4  1  0.50343661
17 c4  2 -0.58126964
18 c4  3  0.24256348
19 c4  4 -0.39398578
20 c4  5  0.01664450
> reshape(df, direction="wide", timevar="x2", idvar="x1")
   x1 y.1 y.2 y.3y.4y.5
1  c1  0.02095747  0.05926233 -0.07561916 -1.0627271 -1.8920203
6  c2 -0.04549782 -0.68333187 -0.99151410 -0.2907028 -0.9765502
11 c3  0.33411223 -0.24907340 -0.25469819  1.2395616 -1.3816243
16 c4  0.50343661 -0.58126964  0.24256348 -0.3939858  0.0166445
>

Your case will have many missings, but I think the idea is the same.

HTH


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding value labels on Interaction Plot

2009-03-04 Thread David Winsemius

See if this helps. After your code, submit this to R:

with(d, text(xx[xx==3],zz[xx==3],paste("3, ",zz[xx==3])))

After that has convinced you that xx and zz are being used properly,   
you can try the more general approach:


with(d, text(xx,zz,paste(xx, " , ", zz)))

I would have used ZZ rather than "Y Label" on the y axis, because yy  
is being used as a grouping parameter and the plotted value is really zz


--
David Winsemius

On Mar 4, 2009, at 11:52 AM, Dimitri Liakhovitski wrote:


Hello - and sorry for what might look like a simple graphics question.

I am building an interaction plot for d:

d 
= 
data 
.frame 
(xx=c(3,3,2,2,1,1),yy=c(4,3,4,3,4,3),zz=c(5.1,4.4,3.5,3.3,-1.1,-1.3))

d[[1]]<-as.factor(d[[1]])
d[[2]]<-as.factor(d[[2]])
print(d)

interaction.plot(d$xx, d$yy, d$zz,
 type="b", col=c("red","blue"), legend=F,
 lty=c(1,2), lwd=2, pch=c(18,24),
 xlab="X Label",
 ylab="Y Label",
 main="Chart Label")

I am trying and not succeeding in adding Y values (value labels in
Excel speak) near the data points on 3 lines of the graph.
I understand that I might have to use "text". But how do I tell text
to use the actual coordinates of the dots on the lines?


Thank you very much!

--
Dimitri Liakhovitski
MarketTools, Inc.
dimitri.liakhovit...@markettools.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] best fit line

2009-03-04 Thread Jorge Ivan Velez
Hi Anuj,
Take a look at ?nls. It might be useful in this case.

HTH,

Jorge


On Wed, Mar 4, 2009 at 1:22 PM, anujgoel  wrote:

>
> Dear R Community,
> I am plotting this simple x-y plot (raw data & plot attached).
> I cant fit a linear regression line to it. I have to figure out what is the
> best fit for this graph. Is there a way to tell which regression to use for
> this kind of data?
> Also, after selecting the best fit model, I need to extrapolate what could
> be the other possible data points.
> I am new to R. Could anyone please help?
> Thanks.
> Anuj
> http://www.nabble.com/file/p22336095/plot.jpg
> --
> View this message in context:
> http://www.nabble.com/best-fit-line-tp22336095p22336095.html
> Sent from the R help mailing list archive at Nabble.com.
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Diff btw percentile and quantile

2009-03-04 Thread William Dunlap
Excel 2003's help for percentile just says it interpolates
between the quantiles in the data:  
   Array   is the array or range of data that defines relative standing.
   K   is the percentile value in the range 0..1, inclusive.
 If array is empty or contains more than 8,191 data points,
PERCENTILE returns the #NUM! error value. 
 If k is nonnumeric, PERCENTILE returns the #VALUE! error value. 
 If k is < 0 or if k > 1, PERCENTILE returns the #NUM! error value. 
 If k is not a multiple of 1/(n - 1), PERCENTILE interpolates
to determine the value at the k-th percentile. 
so some experimenation is on order.

I found that the call to R's quantile gives a different result
for each of the 9 documented values of the type argument:
   x<-c(1,1,2,3,3,5,8,8,9,10)
   quantile(x, probs=(0:8)/8, type=types[i])
E.g.,
sapply(1:9,function(type)quantile(x=x,probs=(0:8)/8,type=type))
  type=1 type=2 type=3 type=4 type=5 type=6 type=7type=8
type=9
0% 1  1  1   1.00   1.00  1.000  1.000  1.00
1.0
12.5%  1  1  1   1.00   1.00  1.000  1.125  1.00
1.0
25%2  2  1   1.50   2.00  1.750  2.250  1.916667
1.93750
37.5%  3  3  3   2.75   3.00  3.000  3.000  3.00
3.0
50%3  4  3   3.00   4.00  4.000  4.000  4.00
4.0
62.5%  8  8  5   5.75   7.25  7.625  6.875  7.375000
7.34375
75%8  8  8   8.00   8.00  8.250  8.000  8.08
8.06250
87.5%  9  9  9   8.75   9.25  9.625  8.875  9.375000
9.34375
100%  10 10 10  10.00  10.00 10.000 10.000 10.00
10.0

I entered the same x into Excel 2003 and used the formulae
=percentile(A1:10,0),
=percentile(A1:A10,.125), ..., =percentile(A1:A10,1) and got the results
   1, 1.125, 2.25, 3, 4, 6.875, 8, 8.875, 10
This matches only R's type 7, the default.

They also match S+'s default quantile calculation.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 


Ted Harding wrote:
> On 04-Mar-09 16:56:14, Wacek Kusnierczyk wrote:
> (Ted Harding) wrote:
> 
>> So, with reference to your original question
>>  "Excel has percentile() function. R function quantile() does the
>>   same thing. Is there any significant difference btw percentile
>>   and quantile?"
>> the answer is that they in effect give the same results, though
>> differ with respect to how they are to be fed (quantile eats
>> probabilities, percentile eats percentages). [Though (since I am
>> not familiar with Excel) I cannot rule out that Excel's percentile()
>> function also eats probabilities; in which case its name would be
>> an example of sloppy nomenclature on Excel's part; which I cannot
>> rule out on general grounds either].
> 
> i am not familiar enough with excel to prove or disprove what you say
> above, but in general such claims should be grounded in the respective
> documentations. 
> 
> there are a number of ways to compute empirical quantiles (see, e.g.,
> [1]), and it's possible that the one used by r's quantile by default
> (see ?quantile) is not the one used by excel (where you probably have
> no choice;  help in oocalc does not specify the method, and i guess
> that excel's does not either).
> 
> have you actually confirmed that excel's percentile() does the same as
> r's quantile() (modulo the scaling)?
> vQ

I have now googled around a bit. All references to the Excel
percentile() function say that you feed it the fractional value
corresponding to the percentage. So, for example, to get the
80-th percentile you would give it 0.8. Hence Excel should call
it "quantile"!

As to the algorithm, Wikipedia states the following (translated
into R syntax):

  Many software packages, such as Microsoft Excel, use the
  following method recommended by NIST[4] to estimate the
  value, vp, of the pth percentile of an ascending ordered
  dataset containing N elements with values v[1],v[2],...,v[N]:

n = (p/100)*(N-1) + 1

  n is then split into its integer component, k and decimal
  component, d, such that n = k + d.
  If k = 1, then the value for that percentile, vp, is the
  first member of the ordered dataset, v[1].
  If k = N, then the value for that percentile, vp, is the
  Nth member of the ordered dataset, v[N].
  Otherwise, 1 < k < N and vp = v[k] + d*(v[k + 1] - v[k]).

Note that the Wikipedia article uses the "%" interpretation of
"p-th percentile", i.e. the point which is (p/100) of the way
along the distribution.

It looks as though R's quantile with type=4 might be the same,
since it is explained as "linear interpolation of the empirical
cdf", which is what the above description of Excel's method does.
However, R's default type is 7, which is different.

Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 04-Mar-09   Time: 17:29:50
--

[R] Grouped Boxplot

2009-03-04 Thread soeren . vogel

Pls forgive me heavy-handed data generation -- newby ;-)

### start ###
# example data
g <- rep.int(c("A", "B", "C", "D"), 125)
t <- rnorm(5000)
a <- sample(t, 500, replace=TRUE)
b <- sample(t, 500, replace=TRUE)

# what I actually want to have:
boxplot(a | b ~ g)

# but that does obviously not produce what I want, instead
i <- data.frame(g, a, rep("one", length(g)))
j <- data.frame(g, b, rep("two", length(g)))
names(i) <- c("Group", "Number", "Word")
names(j) <- c("Group", "Number", "Word")
k <- as.data.frame(rbind(i, j))
boxplot(k$Number ~ k$Word * k$Group)
### end ###

Questions: (1) Is there an easier way? (2) How can I get additional  
space between the A:D groups?


Thank you

Sören

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] best fit line

2009-03-04 Thread David Winsemius


On Mar 4, 2009, at 1:22 PM, anujgoel wrote:



Dear R Community,
I am plotting this simple x-y plot (raw data & plot attached).
I cant fit a linear regression line to it. I have to figure out what  
is the

best fit for this graph.


That is virtually impossible to define rigorously. The "best fit"  
would go through all the points precisely, the extreme form of  
overfitting, but the mathematical result would not be informative at  
all. Where did these data come from? What domain of science are you  
working on? You want a result that incorporates the relationships  
known to exist in your domain of investigation and summarizes the data  
without overfitting. I am being intentionally vague in what follows  
because this looks like homework.



Is there a way to tell which regression to use for
this kind of data?


Not really. Looks rather "hyperbolic", so you might think about the  
formula for hyperbola and then use the lm function.




Also, after selecting the best fit model, I need to extrapolate what  
could

be the other possible data points.


The predict functions are used for this purpose. Consult you  
documentation.

?predict

--
David Winsemius



I am new to R. Could anyone please help?
Thanks.
Anuj
http://www.nabble.com/file/p22336095/plot.jpg
--
View this message in context: 
http://www.nabble.com/best-fit-line-tp22336095p22336095.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] arima additive seasonality

2009-03-04 Thread Martin Ivanov
Hello!
I asked in this forum about what kind of seasonality the function arima() from 
stats implements. Now that I have been answered that it implements the 
Box-Jenkins multiplicative seasonality, I would like to ask whether there is in 
R possibility to model ARIMA with additive seasonality. I mean whether there is 
a function or package that implements additive seasonality. In case there is no 
such function, I would be thankful if someone suggests some code for the 
purpose. 

Thank you very much for your attention.

Regards,
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dividing ts objects of different frequencies

2009-03-04 Thread Stephen J. Barr
Hello,

I have two time series objects, 1 is yearly (population) and the other is
quarterly (bankruptcy statistics). I would like to produce a quarterly time
series object that consists of bankruptcy/population. Is there a pre-built
function to intelligently divide these time series:

br.ts = ts(data=br.df[,-1], frequency = 4, start=c(2001,1), end=c(2008,2))
distPop.ts = ts(data=distPop.df[,-1], frequency=1, start=(2000))

The time series I want is:

br.ts[2001Q1]/distPop.ts[2001] , br.ts[2001Q2]/distPop[2001],
br.ts[2001Q3]/distPop.ts[2001] , br.ts[2001Q4]/distPop[2001],
br.ts[2002Q1]/distPop.ts[2002] , br.ts[2002Q2]/distPop[2002],
etc.

This would not be too difficult to write but does anything like this already
exist?

Thank you,

-stephen

-- 
==
Stephen J. Barr
University of Washington
Dept. of Applied and Computational Math Sciences
Dept. of Economics
WEB: www.econsteve.com
==

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] best fit line

2009-03-04 Thread Uwe Ligges



anujgoel wrote:

Dear R Community,
I am plotting this simple x-y plot (raw data & plot attached). 
I cant fit a linear regression line to it. I have to figure out what is the

best fit for this graph. Is there a way to tell which regression to use for
this kind of data?
Also, after selecting the best fit model, I need to extrapolate what could
be the other possible data points.
I am new to R. Could anyone please help?


Homework?

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Uwe Ligges


Thanks.
Anuj
http://www.nabble.com/file/p22336095/plot.jpg


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Colormap that look good in gray scale

2009-03-04 Thread Achim Zeileis

On Wed, 4 Mar 2009, thibert wrote:



Hi,
  I am looking for a colormap (in color) that look like a gradient in gray
scale. It is to allow people without color printer to print the color graph
and have something meaningful in gray scale.

It can be something like this
plot(1:6,col=c(1,7,5,3,2,4),pch=c(1,20,20,20,20,20))
but with an arbitrary number of different colors, not just six.


There is some discussion of this in our manuscript
  Achim Zeileis, Kurt Hornik, and Paul Murrell
  Escaping RGBland: Selecting colors for statistical graphics
which is forthcoming in CSDA (Computational Statistics & Data Analysis), 
for a preprint see

  http://statmath.wu-wien.ac.at/~zeileis/papers/Zeileis+Hornik+Murrell-2008.pdf

This discusses choice of the color, especially for shading areas. If you 
want to use this for shading points or lines, I would recommend to use 
relatively dark and colorful colors and different plotting characters.


R packages that provide useful tools for coloring graphics include 
colorspace, RColorBrewer, ggplot2, and plotrix.


hth,
Z


Thanks
--
View this message in context: 
http://www.nabble.com/Colormap-that-look-good-in-gray-scale-tp22336097p22336097.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scatter plot question

2009-03-04 Thread Tim Cavileer

At 12:19 AM 3/4/2009, you wrote:

plot(x,rho,pch=id)


Or this.
Tim

> dat
  id  x rho
1  A  1 0.1
2  B 20 0.5
3  C  2 0.9

> labels<-dat$id
> labels
[1] "A" "B" "C"
> plot(dat$x,dat$rho,pch=labels)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Table Transformation

2009-03-04 Thread Uwe Ligges

See ?reshape

Uwe Ligges

Christian Pilger wrote:

Dear R-experts,

recently, I started to discover the world of R. I came across a problem,
that I was unable to solve by myself (including searches in R-help, etc.)

I have a flat table similar to

key1key2value1

abcd_1  BP  10
abcd_1  BSMP1A
abcd_1  PD  25
abcd_2  BP  20
abcd_3  BP  80
abcd_4  IA  30
abcd_4  PD  70
abcd_4  PS  N

I wish to transform this table to obtain the following result:

key2
key1BP  BSMPIA  PD  PS
abcd_1  "10"  "1A"  """25"  ""
abcd_2  "20"  """"""""
abcd_3  "80"  """"""""
abcd_4  """""30"  "70"  "N"

I considered "table" and "xtabs" but I could not get the desired result: I
received cross-tables key1 vs. key2 that contained counts within the cells.

Can anybody help me?

Best wishes,

Christian



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding value labels on Interaction Plot

2009-03-04 Thread Paul Johnson
On Wed, Mar 4, 2009 at 10:52 AM, Dimitri Liakhovitski  wrote:
> Hello - and sorry for what might look like a simple graphics question.
>
> I am building an interaction plot for d:
>
> d=data.frame(xx=c(3,3,2,2,1,1),yy=c(4,3,4,3,4,3),zz=c(5.1,4.4,3.5,3.3,-1.1,-1.3))
> d[[1]]<-as.factor(d[[1]])
> d[[2]]<-as.factor(d[[2]])
> print(d)
>
> interaction.plot(d$xx, d$yy, d$zz,
>  type="b", col=c("red","blue"), legend=F,
>  lty=c(1,2), lwd=2, pch=c(18,24),
>  xlab="X Label",
>  ylab="Y Label",
>  main="Chart Label")
>
> I am trying and not succeeding in adding Y values (value labels in
> Excel speak) near the data points on 3 lines of the graph.
> I understand that I might have to use "text". But how do I tell text
> to use the actual coordinates of the dots on the lines?
>
>
> Thank you very much!
>

I'm not understanding your trouble, exactly. I had not heard of
"interaction.plot" before and so I've run your code and it is an
interesting function. Thanks for providing the working example.

I can help you with the text.

It is easy to add text. A commmands like

text( 1.3, 1, "whatever", pos=3)

will put the word "whatever" on top of coordinates x and y. (you leave
out pos=3 and R centes the text on the coordinates).

If you need to find out x , y before running that, you can.  the
locator function will return coordinates. Run

locator(1)

and then left click on a point in the graph. Coordinates will pop out
on the screen.

And you can make the text placement depend on locator

text(locator(1), "whatever", pos=3)

I don't think you want to do that work interactively, however.  It can
be automated.

You can add a column of names in your data frame and more or less
automate the plotting as well.  I did this to test.

mylabels <- c("A","B","C","D","E","F")
text(d$xx,d$zz, mylabels, pos=3)

This almost works perfectly, but it plops the labels in the wrong spots.

I'd like to change the command so that the position of the text for
the red line would be on the right, while the position of the text for
the blue line is on the left.

It appears to me your variable yy is the one that separates the 2
lines. Correct? I observe:

as.numeric(d$yy)
[1] 2 1 2 1 2 1

We want the blue ones on the left, for them we need pos=2. For the
others, we want pos=4

Ach. I tried this

text( d$xx, d$zz, mylabels, pos=2*as.numeric(d$yy))

but it comes out backward.  So how about this:

text( d$xx, d$zz, mylabels, pos=as.numeric(d$yy))

That positions the red ones below the line and the blue ones to the
left.  That doesn't look too bad to me.

Anyway, I think you get the idea.

If you wanted to simply stop plotting the circles, and put the letters
"right on" the spot, that would be easy as well.




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Colormap that look good in gray scale

2009-03-04 Thread thibert

Hi,
   I am looking for a colormap (in color) that look like a gradient in gray
scale. It is to allow people without color printer to print the color graph
and have something meaningful in gray scale. 

It can be something like this 
plot(1:6,col=c(1,7,5,3,2,4),pch=c(1,20,20,20,20,20))
but with an arbitrary number of different colors, not just six.

Thanks
-- 
View this message in context: 
http://www.nabble.com/Colormap-that-look-good-in-gray-scale-tp22336097p22336097.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Table Transformation

2009-03-04 Thread Christian Pilger

Dear R-experts,

recently, I started to discover the world of R. I came across a problem,
that I was unable to solve by myself (including searches in R-help, etc.)

I have a flat table similar to

key1key2value1

abcd_1  BP  10
abcd_1  BSMP1A
abcd_1  PD  25
abcd_2  BP  20
abcd_3  BP  80
abcd_4  IA  30
abcd_4  PD  70
abcd_4  PS  N

I wish to transform this table to obtain the following result:

key2
key1BP  BSMPIA  PD  PS
abcd_1  "10""1A"""  "25"""  
abcd_2  "20"""  ""  ""  ""
abcd_3  "80"""  ""  ""  ""
abcd_4  ""  ""  "30""70""N"

I considered "table" and "xtabs" but I could not get the desired result: I
received cross-tables key1 vs. key2 that contained counts within the cells.

Can anybody help me?

Best wishes,

Christian

-- 
View this message in context: 
http://www.nabble.com/Table-Transformation-tp22335545p22335545.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] best fit line

2009-03-04 Thread anujgoel

Dear R Community,
I am plotting this simple x-y plot (raw data & plot attached). 
I cant fit a linear regression line to it. I have to figure out what is the
best fit for this graph. Is there a way to tell which regression to use for
this kind of data?
Also, after selecting the best fit model, I need to extrapolate what could
be the other possible data points.
I am new to R. Could anyone please help?
Thanks.
Anuj
http://www.nabble.com/file/p22336095/plot.jpg 
-- 
View this message in context: 
http://www.nabble.com/best-fit-line-tp22336095p22336095.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] change individual label colours in a cluster plot?

2009-03-04 Thread Sur Nathan

Hi Jim,

  How are you?  I saw your posting. I am trying to do clustering for co
authorship.What I have is undirected graph .I want to have clusters for 393
nodes.

I am attaching the file along with this mail.If you move to the section
Cluster I am looking to do something like that.Is it something you are
familiar with.

Can you tell me how you did it in R. 

Nathan


Jim Ottaway wrote:
> 
>> patricia garcía gonzález  writes:
> 
>> Hi,
> 
>> If you have a variable, that defines what you want to differentiate
>> (sociology, economics etc.) then you can add color depending on the
>> value of that variable. You will have to convert it to numeric if it is
>> not. An example would be
> 
>> plot( iris[ , 1 ], iris[ , 2], col = iris[ , 3 ] )
> 
> 
> Thank you.  I'm not sure that I can do that with an hclust object,
> though: perhaps something using the text function and the order data in
> the hclust object might work?
> 
> Currently, I'm good getting results using a script to edit the
> postscript output, but I'm keen to find an R solution, if only to
> improve my understanding of R graphicss.
> 
> Yours sincerely,
> -- 
> Jim Ottaway
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
http://www.nabble.com/file/p22336078/Clustering%2BTechnique.pdf
Clustering+Technique.pdf 
-- 
View this message in context: 
http://www.nabble.com/change-individual-label-colours-in-a-cluster-plot--tp21852671p22336078.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with GAM

2009-03-04 Thread Las dA
Hi

I'm trying to do a GAM analysis and have the following codes entered
into R (density is = sample number, alive are the successes)

density<-as.real(density)

y<-cbind(alive,density-alive)

library(mgcv)

m1<-gam(y~s(density),binomial)

at which point I get the following error message

Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
A term has fewer unique covariate combinations than specified maximum
degrees of freedom

What am I doing wrong?  Please help!

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unrealistic dispersion parameter for quasibinomial

2009-03-04 Thread Prof Brian Ripley

For the record


residuals(model)

  1   2   3   4   5
 5.55860143 -0.00073852  2.49255235 -1.41987341 -0.00042425
  6   7   8
-0.94389158  2.72987046 -1.15760836

residuals(model, "pearson")

  1   2   3   4   5
 3.5362e+03 -5.e-04  2.3366e+00 -1.0080e+00 -2.e-04
  6   7   8
-8.8378e-01  2.4038e+00 -1.1646e+00

fitted(model)

 1  2  3  4  5
1.5994e-08 5.0502e-09 4.9946e-01 1.5873e-02 3.2140e-09
 6  7  8
2.0924e-02 8.0191e-01 6.1900e-01

so according to the model a very rare event occurred.  That is what is
'unrealistic' (and Ben Bolker supposed correctly).

How dispersion should be estimated is a matter of some debate (see 
e.g. McCullagh and Nelder), but the model here is simply inadequate.



On Mon, 2 Mar 2009, Menelaos Stavrinides wrote:


I am running a binomial glm with response variable the no of mites of two
species y->cbind(mitea,miteb) against two continuous variables (temperature
and predatory mites) - see below. My model shows overdispersion as the
residual deviance is 48.81  on 5  degrees of freedom. If I use quasibinomial
to account for overdispersion the dispersion parameter estimate is  2501139,
which seems unrealistic. Any ideas as to why I am getting such a huge
dispersion parameter?


y<-cbind(psmno,wsmno)
ldhours<-log(idhours+1)
lwpm<-log(wpm2wkb+1)
y

psmno wsmno
[1,] 1 4
[2,] 054
[3,] 8 1
[4,] 063
[5,] 028
[6,] 4   291
[7,]46 3
[8,]   11785

ldhours

[1] 0.00 2.308567 5.078473 4.875035 2.339399 3.723039 5.572344 5.250384

lwpm

[1] 0.6931472 2.1972246 0.000 0.6931472 2.3025851 0.000 0.000
[8] 0.000

model<-glm(y~ldhours+lwpm,binomial)
summary(model)


Call:
glm(formula = y ~ ldhours + lwpm, family = binomial)

Deviance Residuals:
1   2   3   4   5   6
5.5586025  -0.0007385   2.4925511  -1.4198734  -0.0004242  -0.9438916
7   8
2.7298663  -1.1576062

Coefficients:
   Estimate Std. Error z value Pr(>|z|)
(Intercept) -14.4029 1.3705 -10.509  < 2e-16 ***
ldhours   2.8357 0.2656  10.677  < 2e-16 ***
lwpm -5.1188 1.4689  -3.485 0.000492 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

   Null deviance: 441.20  on 7  degrees of freedom
Residual deviance:  48.81  on 5  degrees of freedom
AIC: 70.398

Number of Fisher Scoring iterations: 8


model2<-glm(y~ldhours+lwpm,quasibinomial)
summary(model2)


Call:
glm(formula = y ~ ldhours + lwpm, family = quasibinomial)

Deviance Residuals:
1   2   3   4   5   6
5.5586025  -0.0007385   2.4925511  -1.4198734  -0.0004242  -0.9438916
7   8
2.7298663  -1.1576062

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
(Intercept)  -14.403   2167.435  -0.0070.995
ldhours2.836420.015   0.0070.995
lwpm  -5.119   2323.044  -0.0020.998

(Dispersion parameter for quasibinomial family taken to be 2501139)

   Null deviance: 441.20  on 7  degrees of freedom
Residual deviance:  48.81  on 5  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 8

Thanks,
Mel

--
Menelaos Stavrinides
Ph.D. Candidate
Environmental Science, Policy and Management
137 Mulford Hall MC #3114
University of California
Berkeley, CA 94720-3114 USA
Tel: 510 717 5249

[[alternative HTML version deleted]]




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expression question

2009-03-04 Thread Wacek Kusnierczyk
Greg Snow wrote:
> Here is another approach that still uses strspit if you want to stay with 
> that:
>
>   
>> tmp <- '(-0.791,-0.263].(-38,-1.24].(0.96,2.43]'
>> strsplit(tmp, '\\.(?=\\()', perl=TRUE)
>> 
> [[1]]
> [1] "(-0.791,-0.263]" "(-38,-1.24]" "(0.96,2.43]"   
>
> This uses the Perl 'look-ahead' indicator to say only match on a period that 
> is followed by a '(', but don't include the '(' in the match.
>   

right;  you could extend this pattern to split the string by every dot
that does not separate two digits, for example:
   
strsplit(tmp, '(?https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dividing time series of different frequencies

2009-03-04 Thread Stephen J. Barr
Hello,

I have two time series objects, 1 is yearly (population) and the other is
quarterly (bankruptcy statistics). I would like to produce a quarterly time
series object that consists of bankruptcy/population. Is there a pre-built
function to intelligently divide these time series.

The series I want is:

br2001Q1/pop2001 , br2001Q2/pop2001, br2001Q3/pop2001, br2001Q4/pop2001,
br2002Q1/pop2001, br2002Q2/pop2002, etc.

This would not be too difficult to write but does anything like this already
exist?

-stephen

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice: remove box around a wireframe

2009-03-04 Thread Thomas Roth (geb. Kaliwe)

:-) works!


Sundar Dorai-Raj schrieb:

(Sorry for the repeat. Forgot to copy R-help)

Try,

test = data.frame(expand.grid(c(1:10), c(1:10)))
z = test[,1] + test[,2]
test = cbind(test, z)
names(test) = c("x", "y", "z")
require(lattice)
wireframe(z ~ x*y, data = test,
 par.settings = list(axis.line = list(col = "transparent")),
 par.box = c(col = "transparent") )

--sundar

On Wed, Mar 4, 2009 at 8:17 AM, Thomas Roth (geb. Kaliwe)
 wrote:
  

#Hi,
#
#somebody knows how to  remove the outer box around a wireframe and reduce
the height
#
#

test = data.frame(expand.grid(c(1:10), c(1:10)))
z = test[,1] + test[,2]
test = cbind(test, z)
names(test) = c("x", "y", "z")
require(lattice)
wireframe(z ~ x*y, data = test, par.box = c(col = "transparent") )  #not
this one but the remaining outer box.

Thanks in advance

Thomas Roth

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expression question

2009-03-04 Thread Greg Snow
Here is another approach that still uses strspit if you want to stay with that:

> tmp <- '(-0.791,-0.263].(-38,-1.24].(0.96,2.43]'
> strsplit(tmp, '\\.(?=\\()', perl=TRUE)
[[1]]
[1] "(-0.791,-0.263]" "(-38,-1.24]" "(0.96,2.43]"   

This uses the Perl 'look-ahead' indicator to say only match on a period that is 
followed by a '(', but don't include the '(' in the match.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of markle...@verizon.net
> Sent: Monday, March 02, 2009 11:17 PM
> To: r-help@r-project.org
> Subject: [R] regular expression question
> 
> can someone show me how to use a regular expression to break the string
> at the bottom up into its three components :
> 
> (-0.791,-0.263]
> (-38,-1.24]
> (0.96,2.43]
> 
> I tried to use strplit because of my regexpitis ( it's not curable.
> i've
> been to many doctors all over NYC. they tell me there's no cure  )  but
> it doesn't work because there also dots inside  the brackets. Thanks.
> 
> (-0.791,-0.263].(-38,-1.24].(0.96,2.43]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Diff btw percentile and quantile

2009-03-04 Thread Ted Harding
On 04-Mar-09 16:56:14, Wacek Kusnierczyk wrote:
> (Ted Harding) wrote:
> 
>> So, with reference to your original question
>>  "Excel has percentile() function. R function quantile() does the
>>   same thing. Is there any significant difference btw percentile
>>   and quantile?"
>> the answer is that they in effect give the same results, though
>> differ with respect to how they are to be fed (quantile eats
>> probabilities, percentile eats percentages). [Though (since I am
>> not familiar with Excel) I cannot rule out that Excel's percentile()
>> function also eats probabilities; in which case its name would be
>> an example of sloppy nomenclature on Excel's part; which I cannot
>> rule out on general grounds either].
> 
> i am not familiar enough with excel to prove or disprove what you say
> above, but in general such claims should be grounded in the respective
> documentations. 
> 
> there are a number of ways to compute empirical quantiles (see, e.g.,
> [1]), and it's possible that the one used by r's quantile by default
> (see ?quantile) is not the one used by excel (where you probably have
> no choice;  help in oocalc does not specify the method, and i guess
> that excel's does not either).
> 
> have you actually confirmed that excel's percentile() does the same as
> r's quantile() (modulo the scaling)?
> vQ

I have now googled around a bit. All references to the Excel
percentile() function say that you feed it the fractional value
corresponding to the percentage. So, for example, to get the
80-th percentile you would give it 0.8. Hence Excel should call
it "quantile"!

As to the algorithm, Wikipedia states the following (translated
into R syntax):

  Many software packages, such as Microsoft Excel, use the
  following method recommended by NIST[4] to estimate the
  value, vp, of the pth percentile of an ascending ordered
  dataset containing N elements with values v[1],v[2],...,v[N]:

n = (p/100)*(N-1) + 1

  n is then split into its integer component, k and decimal
  component, d, such that n = k + d.
  If k = 1, then the value for that percentile, vp, is the
  first member of the ordered dataset, v[1].
  If k = N, then the value for that percentile, vp, is the
  Nth member of the ordered dataset, v[N].
  Otherwise, 1 < k < N and vp = v[k] + d*(v[k + 1] - v[k]).

Note that the Wikipedia article uses the "%" interpretation of
"p-th percentile", i.e. the point which is (p/100) of the way
along the distribution.

It looks as though R's quantile with type=4 might be the same,
since it is explained as "linear interpolation of the empirical
cdf", which is what the above description of Excel's method does.
However, R's default type is 7, which is different.

Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 04-Mar-09   Time: 17:29:50
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] behavior of squishplot in TeachingDemos

2009-03-04 Thread Greg Snow
Thank you for finding this.  Yes in some cases the parameter settings need to 
be updated by a call to plot.new for the calculations to be correct (if you 
carried out your example 2 more times you would see that the 3rd plot is also 
incorrect since it is still using the dimensions of the 2nd plot in the 
calculations).

I have added a call to plot.new inside of the squishplot function for the next 
version, but until that comes out (I have been meaning to get it out for a 
while, but don't have a specific time frame) the work around that you found of 
calling plot.new before squishplot is the best thing to do.

Thanks,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Stephen Tucker
> Sent: Tuesday, March 03, 2009 9:17 AM
> To: r-help@r-project.org
> Subject: [R] behavior of squishplot in TeachingDemos
> 
> Hi list,
> I wonder if anyone has had this experience with squishplot() in the
> TeachingDemos package.
> 
> Taking the example from the ?image help page,
> 
> library(TeachingDemos)
> x <- 10*(1:nrow(volcano))
> y <- 10*(1:ncol(volcano))
> 
> layout(matrix(c(1,2,3,4),ncol=2,byrow=TRUE),height=c(2,1))
> ## 1st plot
> op <- squishplot(range(x),range(y),1)
> image(x, y, volcano, col = terrain.colors(100))
> par(op)
> ## 2nd plot
> op <- squishplot(range(x),range(y),1)
> image(x, y, volcano, col = terrain.colors(100))
> par(op)
> 
> The second plot comes out looking as expected, but the first plot is
> not squished in the desired proportions. I tried tracking the
> modifications to par('pin') and par('plt') in the function but gave up
> midway through in desire for haste - not sure what is going on but I
> did find that taking advantage of the behavior above, calling
> plot.new(); par(new=TRUE)
> before the first plot makes things work as expected. So the full code
> would be
> 
> layout(matrix(c(1,2,3,4),ncol=2,byrow=TRUE),height=c(2,1))
> ## 1st plot
> op <- squishplot(range(x),range(y),1)
> plot.new()
> par(new=TRUE)
> image(x, y, volcano, col = terrain.colors(100))
> par(op)
> ## 2nd plot
> op <- squishplot(range(x),range(y),1)
> image(x, y, volcano, col = terrain.colors(100))
> par(op)
> 
> +++
> 
> I wonder if this behavior is not surprising? It is a great function
> overall though - thanks for the contribution.
> 
> Stephen
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice: remove box around a wireframe

2009-03-04 Thread Sundar Dorai-Raj
(Sorry for the repeat. Forgot to copy R-help)

Try,

test = data.frame(expand.grid(c(1:10), c(1:10)))
z = test[,1] + test[,2]
test = cbind(test, z)
names(test) = c("x", "y", "z")
require(lattice)
wireframe(z ~ x*y, data = test,
 par.settings = list(axis.line = list(col = "transparent")),
 par.box = c(col = "transparent") )

--sundar

On Wed, Mar 4, 2009 at 8:17 AM, Thomas Roth (geb. Kaliwe)
 wrote:
> #Hi,
> #
> #somebody knows how to  remove the outer box around a wireframe and reduce
> the height
> #
> #
>
> test = data.frame(expand.grid(c(1:10), c(1:10)))
> z = test[,1] + test[,2]
> test = cbind(test, z)
> names(test) = c("x", "y", "z")
> require(lattice)
> wireframe(z ~ x*y, data = test, par.box = c(col = "transparent") )  #not
> this one but the remaining outer box.
>
> Thanks in advance
>
> Thomas Roth
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] readline in vi mode on OSX

2009-03-04 Thread Dave Murray-Rust

Hi All,

This is a slightly arcane question, but I'm wondering if anyone else  
uses vi mode with R? On my platform, across several versions, there is  
some broken behaviour. When executing commands like 'df)' (to delete  
up to the next bracket) the cursor moves to the next ), but nothing is  
deleted. In general, many delete/replace commands work only as movement.


Has anyone else come across this, and if so, did you find a fix?

The same commands work correctly on the command line, so I'm assuming  
R has it's own built in version of readline, which is causing problems.


I'm currently running R2.8.1 on OSX 10.5.6, but the bug has been there  
for both R and OSX upgrades. It isn't present on Linux.


Cheers,
dave


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Diff btw percentile and quantile

2009-03-04 Thread Wacek Kusnierczyk
(Ted Harding) wrote:



> So, with reference to your original question
>  "Excel has percentile() function. R function quantile() does the
>   same thing. Is there any significant difference btw percentile
>   and quantile?"
> the answer is that they in effect give the same results, though
> differ with respect to how they are to be fed (quantile eats
> probabilities, percentile eats percentages). [Though (since I am
> not familiar with Excel) I cannot rule out that Excel's percentile()
> function also eats probabilities; in which case its name would be
> an example of sloppy nomenclature on Excel's part; which I cannot
> rule out on general grounds either].
>   

i am not familiar enough with excel to prove or disprove what you say
above, but in general such claims should be grounded in the respective
documentations. 

there are a number of ways to compute empirical quantiles (see, e.g.,
[1]), and it's possible that the one used by r's quantile by default
(see ?quantile) is not the one used by excel (where you probably have no
choice;  help in oocalc does not specify the method, and i guess that
excel's does not either).

have you actually confirmed that excel's percentile() does the same as
r's quantile() (modulo the scaling)?

vQ

[1] http://www.jstor.org/stable/2684934

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding value labels on Interaction Plot

2009-03-04 Thread Dimitri Liakhovitski
Hello - and sorry for what might look like a simple graphics question.

I am building an interaction plot for d:

d=data.frame(xx=c(3,3,2,2,1,1),yy=c(4,3,4,3,4,3),zz=c(5.1,4.4,3.5,3.3,-1.1,-1.3))
d[[1]]<-as.factor(d[[1]])
d[[2]]<-as.factor(d[[2]])
print(d)

interaction.plot(d$xx, d$yy, d$zz,
  type="b", col=c("red","blue"), legend=F,
  lty=c(1,2), lwd=2, pch=c(18,24),
  xlab="X Label",
  ylab="Y Label",
  main="Chart Label")

I am trying and not succeeding in adding Y values (value labels in
Excel speak) near the data points on 3 lines of the graph.
I understand that I might have to use "text". But how do I tell text
to use the actual coordinates of the dots on the lines?


Thank you very much!

-- 
Dimitri Liakhovitski
MarketTools, Inc.
dimitri.liakhovit...@markettools.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in -class : invalid argument to unary operator

2009-03-04 Thread srfc

Hi guys I have been using R  for a few months now and have come across an
error that I have been trying to fix for a week or so now.I am trying to
build a classifer that will classify the wine dataset using Naive Bayes.

My code is as follows

library (e1071)

wine<- read.csv("C:\\Rproject\\Wine\\wine.csv")
split<-sample(nrow(wine), floor(nrow(wine) * 0.5))
wine_training <- wine[split, ] 
wine_testing <- iris[-split, ]



naive_bayes <-naiveBayes(class~.,data=wine_training) 



x_testing <- subset(wine_testing, select = -class)
y_testing <- wine_testing$class # just grab Species variable of
iris_training
pred <- predict(naive_bayes, x_testing)


tab<-table(pred, y_testing)


ca <- classAgreement(tab)

print(tab)
print(ca)


when I enter this code in I get the error 


Error in -class : invalid argument to unary operator

If anybody could give me anysort of advice this would be most welcome,Thanks
-- 
View this message in context: 
http://www.nabble.com/Error-in--class-%3A-invalid-argument-to-unary-operator-tp22333600p22333600.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Diff btw percentile and quantile

2009-03-04 Thread Ted Harding
On 04-Mar-09 16:10:29, megh wrote:
> Yes, I aware of those definitions. However I wanted to know the
> difference btw the words "Percentile" and "quantile", if any.
> Secondly your link navigates to some non-english site, which I could
> not understand. 

"Percentile" and "quantile" are in effect the same thing.
The difference is in how they express what they refer to.

For example, the Median of a distribution is the 0.5 Quantile,
and is the 50% percentile.

So, for 0 <= p <= 1, refer to either the p-th quantile,
or to the (100*p)-th percentile.

Thus R has the function quantile(), whose ?quantile states:
 The generic function 'quantile' produces sample quantiles
 corresponding to the given probabilities. The smallest
 observation corresponds to a probability of 0 and the
 largest to a probability of 1.

R (in its basic distribution) does now have a function percentile(),
but, given a series P of percentages, e.g.
  P <- c(1,5,10,25,50,75,90,95,99)
one could obtain the equivalent as
  quantile(X,probs=P/100).

So, with reference to your original question
 "Excel has percentile() function. R function quantile() does the
  same thing. Is there any significant difference btw percentile
  and quantile?"
the answer is that they in effect give the same results, though
differ with respect to how they are to be fed (quantile eats
probabilities, percentile eats percentages). [Though (since I am
not familiar with Excel) I cannot rule out that Excel's percentile()
function also eats probabilities; in which case its name would be
an example of sloppy nomenclature on Excel's part; which I cannot
rule out on general grounds either].

Ted.

> Dieter Menne wrote:
>> megh  yahoo.com> writes:
>>> To calculate Percentile for a set of observations Excel has
>>> percentile() function. R function quantile() does the same thing.
>>> Is there any significant difference btw percentile and quantile?
>> 
>> If you check the documentation of quantile, you will note that
>> there are 9 variants of quantile which may give different values
>> for small sample sizes and many ties.
>> 
>> I found a German page that explains the algorithm Excel uses:
>> 
>> http://www.excel4managers.de/index.php?page=quantile01
>> 
>> but I did not check if which of the R-variants this is equivalent to.
>> 
>> Dieter


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 04-Mar-09   Time: 16:31:42
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lattice: remove box around a wireframe

2009-03-04 Thread Thomas Roth (geb. Kaliwe)

#Hi,
#
#somebody knows how to  remove the outer box around a wireframe and 
reduce the height

#
#

test = data.frame(expand.grid(c(1:10), c(1:10)))
z = test[,1] + test[,2]
test = cbind(test, z)
names(test) = c("x", "y", "z")
require(lattice)
wireframe(z ~ x*y, data = test, par.box = c(col = "transparent") )  #not 
this one but the remaining outer box.


Thanks in advance

Thomas Roth

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Diff btw percentile and quantile

2009-03-04 Thread megh

Yes, I aware of those definitions. However I wanted to know the difference
btw the words "Percentile" and "quantile", if any. Secondly your link
navigates to some non-english site, which I could not understand. 


Dieter Menne wrote:
> 
> megh  yahoo.com> writes:
> 
> 
>  
>> To calculate Percentile for a set of observations Excel has percentile()
>> function. R function quantile() does the same thing. Is there any
>> significant difference btw percentile and quantile?
> 
> If you check the documentation of quantile, you will note that there are 9
> variants of quantile which may give different values for small sample
> sizes and many ties.
> 
> I found a German page that explains the algorithm Excel uses:
> 
> http://www.excel4managers.de/index.php?page=quantile01
> 
> but I did not check if which of the R-variants this is equivalent to.
> 
> Dieter
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Diff-btw-percentile-and-quantile-tp22328375p22333142.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to generate fake population (ie. not sample) data?

2009-03-04 Thread Daniel Nordlund
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
> Sent: Wednesday, March 04, 2009 3:17 AM
> To: Daniel Nordlund
> Cc: r-help@r-project.org
> Subject: Re: [R] How to generate fake population (ie. not 
> sample) data?
> 
> On Wed, Mar 4, 2009 at 2:48 AM, Daniel Nordlund 
>  wrote:
> >> -Original Message-
> >> From: r-help-boun...@r-project.org
> >> [mailto:r-help-boun...@r-project.org] On Behalf Of CB
> >> Sent: Tuesday, March 03, 2009 10:05 PM
> >> To: David Winsemius
> >> Cc: r-help@r-project.org
> >> Subject: Re: [R] How to generate fake population (ie. not
> >> sample) data?
> >>
> >> My understanding is that rnorm(n, x, s) will give me an 
> n-sized sample
> >> from an (x, s) normal distribution. So the vector returned 
> will have a
> >> mean from the sampling distribution of the mean. But what 
> I want is a
> >> set of n numbers literally with a mean of x and sd of s.
> >>
> >> I am at the very beginning of my R journey, so my 
> apologies if this is
> >> a particularly naive enquiry.
> >>
> >> 2009/3/4 David Winsemius :
> >> > In what ways is rnorm not a satisfactory answer?
> >> >
> >> > --
> >> > David Winsemius
> >> >
> >> > On Mar 3, 2009, at 9:33 PM, CB wrote:
> >> >
> >> >> This seems like it should be obvious, but searches I've
> >> tried all come
> >> >> up with rnorm etc.
> >> >>
> >> >> Is there a way of generating normally-distributed 
> 'population' data
> >> >> with known parameters?
> >> >>
> >> >> Cheers, CB.
> >> >>
> >
> > Something like this may help get you started.
> >
> > std.pop <- function(x,mu,stdev){
> >  ((x-mean(x))/sd(x)*stdev)+mu
> >  }
> 
> Note the scale function, i.e. the above can also be written:
> 
> stdev * scale(x) + mu
> 
> 
Gabor,

Thanks for pointing out the scale() function for the OP.  I suspected that
something like that existed,  but in a (very) quick look around didn't find
it.

Dan 

Daniel Nordlund
Bothell, WA USA  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-03-04 Thread Millo Giovanni
Dear Ajay,

just to deny the implicit statement 'corporate user'='moron' surfacing
here and there in this interesting thread :^). This might be a
statistical regularity but should by no means be considered a theorem,
as there are counter-examples available. You can find people willing to
learn both languages, appreciate the difference between them and use
each where it is particularly strong even in corporations and burosaurs
of any kind.

IMVHO, acceptance of R in the corporate world has little to do with
syntax and much with legacies, (discharge of-) responsibilities and the
distance between the decision maker/buyer and those who are actually
working with the SW. Else, assuming that 'corporate users' are not at a
significant cerebral disadvantage (which I like to), the penetration of
R in education, small and large companies should be the same, which I'm
afraid is not.

So I believe it boils down to industrial organization and the open
source vs. commercial development model, rather than to some kind of
(more or less appropriate) function rebranding. It is the *difference*
in syntax w.r.t. SAS that prompted the shift to R, in my case at least.
It was its ease and 'cleanliness' of installation (no registry entries,
no access to forbidden directories required) which allowed me to
experiment with it without having to mess with the IT Dept. (which would
probably have put an end to my quest). It was its open source nature
that allowed me to install it anywhere I liked to.

My 2 Euro-Cents
Giovanni

Disclaimer: just thinking of the Proc Step gives me shivers; yet I
recognize SAS is fast and powerful. I could understand somebody wanting
to execute SAS through R syntax, but the opposite is beyond my grasp. 



--

Message: 72
Date: Wed, 4 Mar 2009 08:44:51 +1300
From: Rolf Turner 
Subject: Re: [R] Inefficiency of SAS Programming
To: Ajay ohri 
Cc: "r-help-boun...@r-project.org" ,
"Gerard M. Keogh" , list
, R,  Greg Snow 
Message-ID: <8993cba0-46a3-41de-abbb-29db205fb...@auckland.ac.nz>
Content-Type: text/plain; charset="US-ASCII"; delsp=yes; format=flowed


On 3/03/2009, at 5:58 PM, Ajay ohri wrote:

> for an " inefficient " language , it sure has dominated the predictive

> analytics world for 3 plus decades. I referred once to intellectual 
> jealousy between newton and liebnitz.
>
> i am going ahead and creating the R package called "Anne".
>
> It basically is meant only for SAS users who want to learn R , without

> upsetting the schedule of the corporate users.
>
> Simply put , it is a wrapper on SAS language using the function
> command...ie
> procunivariate function in "Anne" package would call the summary  
> function
> and so on...

Reminds me of fortune(38).

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:13}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inference for R Spam

2009-03-04 Thread Michael A. Miller
> "Rolf" == Rolf Turner  writes:

> On 4/03/2009, at 11:50 AM, Michael A. Miller wrote:

>> Sports scores are not statistics, they are measurements
>> (counts) of the number of times each team scores.  There
>> is no sampling and vanishingly small possibility of
>> systematic error in the measurement.

> I think this comment indicates a fundamental
> misunderstanding of the nature of statistics in general and
> the concept of variability in particular.  Measurement
> error is only *one possible* source of variability and is
> often a minor --- or as in the case of sports scores a
> non-existent --- source.

Would you elaborate Rolf?  I'm was referring to measurements, not
statistics.  Isn't calling scores statistics similar to saying
that the values of some response in an individual subject before
and after treatment are statistics?  I think they are just
measured values and that if they are measured accurately enough,
they can be precisely known.  It is in considering the
distribution of similar measurements obtained in repeated trials
that statistics come into play.

>From my perspective as a baseball fan (I know I'm in Indiana and
I aught to be more of a basketball fan, but I grew up as a Cubs
watcher and still can't shake it), it doesn't seem to me that the
purpose of the score is to allow for some inference about the
overall population of teams.  It is about which team beats the
other one and entertainment (and hot dogs) for the fans.

Mike


-- 
Michael A. Miller mmill...@iupui.edu
  Department of Radiology, Indiana University School of Medicine

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >