Re: [R] Awk and Vilno

2007-06-13 Thread Tim Churches
Rogerio Porto wrote:
> Hey,
> 
>> What we should really compare is the four situations:
>> R alone
>> R + awk
>> R + vilno
>> R + awk + vilno
>> and maybe "R + SAS Data step"
>> and see what scripts are more  elegant (read 'short and understandable')

I don't think that short and understandable necessarily go hand-in-hand.
Sometimes longer scripts which are more explicit and use less tricky
syntax shortcuts are much easier to understand a year or two later. Ease
and speed of script writing (taking into account learning curve and time
taken to consult scripting language documentation) are important, as is
the ability to re-visit scripts or examine someone else's script and be
able to work out what it does and how it works is vital, and speed of
execution also counts with large datasets. Also ubiquity of the tool,
whether it is freely available on many platforms, either pre-installed
or in an easy-to-install form are also considerations.

> what do you guys think of creating a R-wiki page for syntax
> comparisons among the various options to enhance R use?
> 
> I already have two sugestions:
> 
> 1) syntax examples for using R and other tools to manipulate
> and analyze large datasets (with a concise description of the
> datasets);
> 
> 2) syntax examples for using R and other tools (or R alone) to clean
> and prepare datasets (simple and very small datasets, for didatic
> purposes).

The ability of the tools to scale to large or very large datasets is
also a consideration, as is their speed when dealing with such large data.

> I think this could be interesting for R users and to promote other
> software tools, since it seems there is a lot of R users that use
> other tools also.
> 
> Besides that, questions on those two above subjects are prevalent
> at this list. Thus a wiki page seems to be the right place to discuss
> and teach this to other users.
> 
> What do you think?

Yes, happy to contribute R + Python examples to such wiki pages. Please
post the URL.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reduced Error Logistic Regression, and R?

2007-04-25 Thread Tim Churches
This news item in a data mining newsletter makes various claims for a technique 
called "Reduced Error Logistic Regression": 
http://www.kdnuggets.com/news/2007/n08/12i.html

In brief, are these (ambitious) claims justified and if so, has this technique 
been implemented in R (or does anyone have any plans to do so)? 

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate similar to SPSS

2007-04-25 Thread Tim Churches
Andrew Robinson <[EMAIL PROTECTED]> wrote:
> can I suggest, without offending, that you purchase and read Peter
> Dalgaard's "Introductory Statistics with R" or Michael Crawley's
> "Statistics: An Introduction using R" or Venables and Ripley's "Modern
> Applied Statistics with S" or Maindonald and Braun's "Data Analysis
> and Graphics Using R: An Example-based Approach",
> or download and read An Introduction to R 
> http://cran.r-project.org/doc/manuals/R-intro.pdf
> or one of the numerous contributed documents at
> http://cran.r-project.org/other-docs.html

For Natalie, who is an SPSS user, may I strongly recommend "R FOR SAS AND SPSS 
USERS" by Bob Muenchen at http://oit.utk.edu/scc/RforSAS&SPSSusers.pdf

This is a really, really excellent document which has proven to be an 
invaluable resource in introducing my SAS and SPSS using collegaues tot he 
delights or R.

And it is free (as in available at no cost).

Tim C

> On Wed, Apr 25, 2007 at 03:32:11PM -0600, Natalie O'Toole wrote:
> > Hi,
> > 
> > Does anyone know if: with R can you take a set of numbers and 
> aggregate
> > them like you can in SPSS? For example, could you calculate the 
> percentage
> > of people who smoke based on a dataset like the following:
> > 
> > smoke = 1
> > non-smoke = 2
> > 
> > variable
> > 1
> > 1
> > 1
> > 2
> > 2
> > 1
> > 1
> > 1
> > 2
> > 2
> > 2
> > 2
> > 2
> > 2
> > 
> > 
> > When aggregated, SPSS can tell you what percentage of persons are 
> smokers
> > based on the frequency of 1's and 2's. Can R statistical package do a
> > similar thing?
> > 
> > Thanks,
> > 
> > Nat
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Andrew Robinson  
> Department of Mathematics and StatisticsTel: +61-3-8344-9763
> University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
> http://www.ms.unimelb.edu.au/~andrewpr
> http://blogs.mbs.edu/fishing-in-the-bay/
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sas.get problem

2007-04-11 Thread Tim Churches
John Kane wrote:
> How do I make this change? I naively have tried by
> a) list sas.get and copy to editor
> b) reload R without loading Hmisc
> c) made recommended changes to sas.get
> d) stuck a "sas.get <- " in front of the function and
> ran it. 

Here is what I do, until Frank fixes the problem in the Hmisc package
itself:

a) list sas.get and copy to editor
b) make the change to line 127 as described
c) preface the function with "sas.get <- "
d) save that as "sas_get_fixed.R"
e) reload R and load Hmisc
f) source("sas_get_fixed.R")

The final step will mask the original, broken sas.get function with the
fixed version.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sas.get problem

2007-04-10 Thread Tim Churches
John Kane wrote:
> I  have 3 SAS files all in the directory F:/sas, two
> data files
> and a format file :
> form.ea1.sas7bdat
> form.ea2.sas7bdat
> sas.fmts.sas7bdat
> 
> F is a USB.
> 
> I am trying import them to R using "sas.get".
> 
> I have not used SAS since I was downloading data from 
> mainframe
> and having to write JCL.  I had forgotten how bizarre
> SAS can be.
> I currently have not even figured out how to load the
> files into SAS but
> they look fine since I can import them with no problem
> into SPSS.
> 
> I am using R2.4.1 under Windows XP
> SAS files were created with SAS 9.x
> They convert easily into SPSS 14
> 
> I
> n the example below I have tried various versions of
> the file names with
> with no luck.
> Can anyone suggest some approach(s) that I might take.
> 
> Example.
> 
> library(Hmisc)
> mydata <- sas.get(library="F:/sas", mem="form.ea1",
>  format.library="sas.fmts.sas7bdat",
>sasprog = '"C:Program Files/SAS/SAS
> 9.1/sas.exe"')
> 
> Error message  (one of several that I have gotten
> while trying various things.)
> The filename, directory name, or volume label syntax
> is incorrect.
> Error in sas.get(library = "F:/sas", mem = "form.ea1",
> format.library = "sas.fmts.sas7bdat",  :
> SAS job failed with status 1
> In addition: Warning messages:
> 1: sas.fmts.sas7bdat/formats.sc? or formats.sas7bcat 
> not found. Formatting ignored.
>  in: sas.get(library = "F:/sas", mem = "form.ea1",
> format.library = "sas.fmts.sas7bdat",
> 2: 'cmd' execution failed with error code 1 in:
> shell(cmd, wait = TRUE, intern = output)

The sas.get function in the Hmisc library is broken under Windows.

Change line 127 from:

status <- sys(paste(shQuote(sasprog), shQuote(sasin), "-log",
shQuote(log.file)), output = FALSE)

to:

status <- system(paste(shQuote(sasprog), shQuote(sasin), "-log",
shQuote(log.file)))

I found this fix in the R-help archives, sorry, don't have the original
to hand so I can't give proper attribution, but the fix is not due to
me. But it does work for me. I believe Frank Harrell has been notified
of the problem and the fix. Once patched and working correctly, the
sas.get function in the Hmisc library is fantastic - thanks Frank!

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Datamining-package rattle() Errors

2007-02-27 Thread Tim Churches
j.joshua thomas wrote:
> Dear Group
> 
> I have few errors while installing package rattle from CRAN
> 
> i do the installing from the local zip files...
> 
>  I am using R 2.4.0 do i have to upgrade to R2.4.1 ?

You *do* have to read the r-help posting guide and take exact heed of
what it suggests: http://www.r-project.org/posting-guide.html

Tim C

> ~~
> 
> utils:::menuInstallLocal()
> package 'rattle' successfully unpacked and MD5 sums checked
> updating HTML package descriptions
>> help(rattle)
> No documentation for 'rattle' in specified packages and libraries:
> you could try 'help.search("rattle")'
>> library(rattle)
> Rattle, Graphical interface for data mining using R, Version 2.2.0.
> Copyright (C) 2006 [EMAIL PROTECTED], GPL
> Type "rattle()" to shake, rattle, and roll your data.
> Warning message:
> package 'rattle' was built under R version 2.4.1
>> rattle()
> Error in rattle() : could not find function "gladeXMLNew"
> In addition: Warning message:
> there is no package called 'RGtk2' in: library(package, lib.loc = lib.loc,
> character.only = TRUE, logical = TRUE,
>> local({pkg <- select.list(sort(.packages(all.available = TRUE)))
> + if(nchar(pkg)) library(pkg, character.only=TRUE)})
>> update.packages(ask='graphics')
> 
> 
> On 2/28/07, Roberto Perdisci <[EMAIL PROTECTED]> wrote:
>> Hi,
>> out of curiosity, what is the name of the package you found?
>>
>> Roberto
>>
>> On 2/27/07, j.joshua thomas <[EMAIL PROTECTED]> wrote:
>>> Dear Group,
>>>
>>> I have found the package.
>>>
>>> Thanks very much
>>>
>>>
>>> JJ
>>> ---
>>>
>>>
>>> On 2/28/07, j.joshua thomas <[EMAIL PROTECTED]> wrote:

 I couldn't locate package rattle?  Need some one's help.


 JJ
 ---



 On 2/28/07, Daniel Nordlund <[EMAIL PROTECTED]> wrote:
>> -Original Message-
>> From: [EMAIL PROTECTED] [mailto:
> [EMAIL PROTECTED]
>> On Behalf Of j.joshua thomas
>> Sent: Tuesday, February 27, 2007 5:52 PM
>> To: r-help@stat.math.ethz.ch
>> Subject: Re: [R] Datamining-package-?
>>
>> Hi again,
>> The idea of preprocessing is mainly based on the need to prepare
>> the
> data
>> before they are actually used in pattern extraction.or feed the
>> data
>> into EA's (Genetic Algorithm) There are no standard practice yet
> however,
>> the frequently used on are
>>
>> 1. the extraction of derived attributes that is quantities that
> accompany
>> but not directly related to the data patterns and may prove
>> meaningful
> or
>> increase the understanding of the patterns
>>
>> 2. the removal of some existing attributes that should be of no
> concern to
>> the mining process and its insignificance
>>
>> So i looking for a package that can do this two above mentioned
> points
>> Initially i would like to visualize the data into pattern and
> understand the
>> patterns.
>>
>>
> <<>>
>
> Joshua,
>
> You might take a look at the package rattle on CRAN for initially
> looking at your data and doing some basic data mining.
>
> Hope this is helpful,
>
> Dan
>
> Daniel Nordlund
> Bothell, WA, USA
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<
>> http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>


 --
 Lecturer J. Joshua Thomas
 KDU College Penang Campus
 Research Student,
 University Sains Malaysia

>>>
>>>
>>> --
>>> Lecturer J. Joshua Thomas
>>> KDU College Penang Campus
>>> Research Student,
>>> University Sains Malaysia
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RPy and the evil underscore

2007-02-25 Thread Tim Churches
Alberto Vieira Ferreira Monteiro wrote:
> I seems like I will join two threads :-)

Please address RPy-specific questions to the Rpy mailing list, where
they will be answered swiftly and without annoyance to everyone else on
this general r-help mailing list.

> Ok, RPy was installed (in Fedora Core 4, yum -y install rpy), and it 
> is running. However, I have a doubt, and the (meagre) documentation
> doesn't seem to address it.
>
> In python, when I do this:
> 
 import rpy
 rpy.r.setwd("/mypath")
 rpy.r.source("myfile.r")
> 
> Everything happens as expected. But now, there's
> a problem if I try to use a function in myfile:
> 
 x = my_function(1)
 x = r.my_function(1)
 x = rpy.my_function(1)
 x = rpy.r.my_function(1)
> 
> None of them work: the problem is that the _ is mistreated.
> If the function has "." instead of "_", it works:
> 
 x = rpy.r.my_function(1)
> 
> This is weird: I must write the R soutine with a ".", but then
> rpy translates it to "_"!

Object identifiers cannot begin with an underscore in R, but they can in
Python. To avoid having to confusingly special-case this difference, the
RPy designers elected to translate underscores in Python object names to
dots in R object names.

All this is clearly documented in the RPy manual at
http://rpy.sourceforge.net/rpy/doc/rpy_html/R-objects-look-up.html#R-objects-look-up

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Google, hard disc drives and R

2007-02-18 Thread Tim Churches
A recent paper from Google Labs, interesting in many respects, not the
least the exclusive use of R for data analysis and graphics (alas not
cited in the approved manner):

http://labs.google.com/papers/disk_failures.pdf

Perhaps some of the eminences grises of the R Foundation could prevail
upon Google to make some the data reported in the paper available for
inclusion in an R library or two, for pedagogical purposes?

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to speed up or avoid the for-loops in this example?

2007-02-14 Thread Tim Churches
Marc Schwartz wrote:
> OK, here is one possible solution, though perhaps with a bit more time,
> there may be more optimal approaches. 
> 
> Using your example data above, but first noting that you do not want to
> use:
> 
>   df <- data.frame(cbind(subject,year,event.of.interest))
> 
> Using cbind() first, creates a matrix and causes all columns to be
> coerced to a common data type, obviating the benefit of data frames to
> be able to handle multiple data types. 

Yes, quite right, the cbind() was unnecessary. I'm not making my real
data frame that way, however.

> So, now on to the solution:
> 
> # First, order the data frame by increasing order of
> # subject number and decreasing order for event.of.interest
> # This ensures that these columns are properly sorted
> # to facilitate the subsequent code. 
> 
> df <- df[order(df$subject, -df$event.of.interest), ]
> 
> 
> So, 'df' will look like:
> 
>> df
>subject year event.of.interest
> 21 1982  TRUE
> 31 1996  TRUE
> 11 1980 FALSE
> 42 1985 FALSE
> 52 1987 FALSE
> 73 1991  TRUE
> 93 1999  TRUE
> 63 1990 FALSE
> 83 1992 FALSE
> 10   4 1972  TRUE
> 11   4 1983 FALSE
> 
> 
> # Now use the combinations of sapply(), rle(), seq() and unlist() to
> # generate per subject sequences. Note that rle() returns:
> #
> # > rle(df$subject)
> # Run Length Encoding
> #   lengths: int [1:4] 3 2 4 2
> #   values : num [1:4] 1 2 3 4
> #
> # See ?rle, ?seq, ?sapply and ?unlist
> 
> df$subject.seq <- unlist(sapply(rle(df$subject)$lengths, 
> function(x) seq(x)))
> 
> 
> So, 'df' now looks like:
> 
>> df
>subject year event.of.interest subject.seq
> 21 1982  TRUE   1
> 31 1996  TRUE   2
> 11 1980 FALSE   3
> 42 1985 FALSE   1
> 52 1987 FALSE   2
> 73 1991  TRUE   1
> 93 1999  TRUE   2
> 63 1990 FALSE   3
> 83 1992 FALSE   4
> 10   4 1972  TRUE   1
> 11   4 1983 FALSE   2
> 
> 
> # Now set event.seq to all 0's
> 
> df$event.seq <- 0
> 
> 
> So, 'df' now looks like:
> 
>> df
>subject year event.of.interest subject.seq event.seq
> 21 1982  TRUE   1 0
> 31 1996  TRUE   2 0
> 11 1980 FALSE   3 0
> 42 1985 FALSE   1 0
> 52 1987 FALSE   2 0
> 73 1991  TRUE   1 0
> 93 1999  TRUE   2 0
> 63 1990 FALSE   3 0
> 83 1992 FALSE   4 0
> 10   4 1972  TRUE   1 0
> 11   4 1983 FALSE   2 0
> 
> 
> # Get the unique subject id's
> # See ?unique
> 
> subj.id <- unique(df$subject)
> 
> 
> # Now get the indices for each subject where event.of.interest
> # is TRUE.  See ?which
> 
> events <- sapply(subj.id, 
>  function(x) which(df$subject == x & df$event.of.interest))
> 
> 
> So, 'events' looks like:
> 
>> events
> [[1]]
> [1] 1 2
> 
> [[2]]
> integer(0)
> 
> [[3]]
> [1] 6 7
> 
> [[4]]
> [1] 10
> 
> 
> # Now use sapply() on the above list to create
> # individual sequences per list element:
> 
> seq <- sapply(events, function(x) seq(along = x))
> 
> 
> So 'seq' looks like:
> 
>> seq
> [[1]]
> [1] 1 2
> 
> [[2]]
> integer(0)
> 
> [[3]]
> [1] 1 2
> 
> [[4]]
> [1] 1
> 
> 
> # So, for the final step, assign the event sequence values in 'seq' to
> # the row indices in 'events':
> 
> df$event.seq[unlist(events)] <- unlist(seq)
> 
> 
> So, 'df' now looks like this:
> 
>> df
>subject year event.of.interest subject.seq event.seq
> 21 1982  TRUE   1 1
> 31 1996  TRUE   2 2
> 11 1980 FALSE   3 0
> 42 1985 FALSE   1 0
> 52 1987 FALSE   2 0
> 73 1991  TRUE   1 1
> 93 1999  TRUE   2 2
> 63 1990 FALSE   3 0
> 83 1992 FALSE   4 0
> 10   4 1972  TRUE   1 1
> 11   4 1983 FALSE   2 0
> 
> 
> HTH,
> 
> Marc SChwartz

Wow, that's very trick, or tricky. It works but it is a bit slower and
more complex than the Holtzman/Nielsen approach. But some interesting
ides there which I shall bear in mind.

Re: [R] How to speed up or avoid the for-loops in this example?

2007-02-14 Thread Tim Churches
jim holtman wrote:
> On 2/14/07, Tim Churches <[EMAIL PROTECTED]> wrote:
>> Any advice, tips, clues or pointers to resources on how best to speed up
>> or, better still, avoid the loops in the following example code much
>> appreciated. My actual dataset has several tens of thousands of rows and
>> lots of columns, and these loops take a rather long time to run.
>> Everything else which I need to do is done using vectors and those parts
>> all run very quickly indeed. I spent quite a while doing searches on
>> r-help and re-reading the various manuals, but couldn't find any
>> existing relevant advice. I am sure the solution is obvious, but it
>> escapes me.
>>
>> Tim C
>>
>> # create an example data frame, multiple events per subject
>>
>> year <- c(1980,1982,1996,1985,1987,1990,1991,1992,1999,1972,1983)
>> event.of.interest <- c(F,T,T,F,F,F,T,F,T,T,F)
>> subject <- c(1,1,1,2,2,3,3,3,3,4,4)
>> df <- data.frame(cbind(subject,year,event.of.interest))
>>
>> # add a per-subject sequence number
>>
>> df$subject.seq <- 1
>> for (i in 2:nrow(df)) {
>> if (df$subject[i-1] == df$subject[i]) df$subject.seq[i] <-
>> df$subject.seq[i-1] + 1
>> }
>> df
> 
> # add an event sequence number which is zero until the first
>> # event of interest for that subject happens, and then increments
>> # thereafter
>>
>> df$event.seq <- 0
>> for (i in 1:nrow(df)) {
>> if (df$subject.seq[i] == 1 ) {
>>current.event.seq <- 0
>> }
>> if (event.of.interest[i] == 1 | current.event.seq > 0)
>> current.event.seq <- current.event.seq + 1
>> df$event.seq[i] <- current.event.seq
>> }
>> df
> 
> 
> 
> try:
> 
>> df <- data.frame(cbind(subject,year,event.of.interest))
>> df <- do.call(rbind,by(df, df$subject, function(z){z$subject.seq <-
> seq(nrow(z)); z}))
>> df
>  subject year event.of.interest subject.seq
> 1.11 1980 0   1
> 1.21 1982 1   2
> 1.31 1996 1   3
> 2.42 1985 0   1
> 2.52 1987 0   2
> 3.63 1990 0   1
> 3.73 1991 1   2
> 3.83 1992 0   3
> 3.93 1999 1   4
> 4.10   4 1972 1   1
> 4.11   4 1983 0   2
>> # determine first event
>> df <- do.call(rbind, by(df, df$subject, function(x){
> + # determine first event
> + .first <- cumsum(x$event.of.interest)
> + # create sequence after first non-zero
> + .first <- cumsum(.first > 0)
> + x$event.seq <- .first
> + x
> + }))
>> df
>subject year event.of.interest subject.seq event.seq
> 1.1.11 1980 0   1 0
> 1.1.21 1982 1   2 1
> 1.1.31 1996 1   3 2
> 2.2.42 1985 0   1 0
> 2.2.52 1987 0   2 0
> 3.3.63 1990 0   1 0
> 3.3.73 1991 1   2 1
> 3.3.83 1992 0   3 2
> 3.3.93 1999 1   4 3
> 4.4.10   4 1972 1   1 1
> 4.4.11   4 1983 0   2 2

Thanks Jim, that works a treat, over an order of magnitude faster than
the for-loops.

Anders Nielsen also provided this solution:

  df$subject.seq<-unlist(tapply(df$subject,
  df$subject,
  function(x)1:length(x)
   )
)

Doing it that way is about 5 times faster than using rbind(). But Jim's
use of cumsum on the logical vector is very nifty.

I have now combined Jim's function with Anders' column-oriented approach
and the result is that my code now runs about two orders of magnitude
faster.

Many thanks,

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to speed up or avoid the for-loops in this example?

2007-02-14 Thread Tim Churches
Any advice, tips, clues or pointers to resources on how best to speed up
or, better still, avoid the loops in the following example code much
appreciated. My actual dataset has several tens of thousands of rows and
lots of columns, and these loops take a rather long time to run.
Everything else which I need to do is done using vectors and those parts
all run very quickly indeed. I spent quite a while doing searches on
r-help and re-reading the various manuals, but couldn't find any
existing relevant advice. I am sure the solution is obvious, but it
escapes me.

Tim C

# create an example data frame, multiple events per subject

year <- c(1980,1982,1996,1985,1987,1990,1991,1992,1999,1972,1983)
event.of.interest <- c(F,T,T,F,F,F,T,F,T,T,F)
subject <- c(1,1,1,2,2,3,3,3,3,4,4)
df <- data.frame(cbind(subject,year,event.of.interest))

# add a per-subject sequence number

df$subject.seq <- 1
for (i in 2:nrow(df)) {
 if (df$subject[i-1] == df$subject[i]) df$subject.seq[i] <-
df$subject.seq[i-1] + 1
}
df

# add an event sequence number which is zero until the first
# event of interest for that subject happens, and then increments
# thereafter

df$event.seq <- 0
for (i in 1:nrow(df)) {
 if (df$subject.seq[i] == 1 ) {
current.event.seq <- 0
 }
 if (event.of.interest[i] == 1 | current.event.seq > 0)
current.event.seq <- current.event.seq + 1
 df$event.seq[i] <- current.event.seq
}
df

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding error bars to a trellis barchart display

2006-05-13 Thread Tim Churches
Chris Bergstresser wrote:
> Hi all --
> 
>I'm using trellis to generate bar charts, but there's no built-in
> function to generate error bars or confidence intervals, as far as I
> can tell.  I assumed I could just write my own panel function to add
> them, so I searched the archive, and found a posting from the author
> of the package stating "... placing multiple bars side by side needs
> specialized calculations, which are done within panel.barchart. To add
> bars to these, you will need to reproduce those calculations."
>Just so I'm clear on this -- there's no capacity to add bars to the
> plot, nor to find out the coordinates of the bars in the graphs
> themselves.  If you want them, you have to completely rewrite
> panel.barchart.  Is this correct?  Are there really so few people
> using error bars with bar charts?

One of our projects does confidence intervals on bar cahrts produced
using teh lattice library. It is quite feasible without too much effort
- see:
http://members.optusnet.com.au/tchur/NetEpi-Analysis-0-8-Screenshot-5.png

Sorry I don't have time to extract the code which does this right now,
but you can dissect it out yourself from the NetEpi-Analysis-0.8 tarball
at http://sourceforge.net/project/showfiles.php?group_id=123700 -
although the R code is embedded in Python classes, which might make
extrication a bit more difficult (and which is why I don't have time to
do it right now). But from memory the chunk of R code which overrides
the default panel function is fairly self-contained and you should be
able to identify it fairly easily - just grep the source code for likely
strings such as "panel.barchart" to discover where it is.

Other screenshots can be downloaded from
http://sourceforge.net/project/showfiles.php?group_id=123700 if anyone
is interested.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] using GDD fonts

2006-04-12 Thread Tim Churches
Luiz Rodrigo Tozzi wrote:
> Hi
> 
> I was searching for some X replacement for my job in R and i found the GDD
> 
> I installed it and I match all the system requirements. My problem
> (maybe a dumb one) is that every plot comes with no font and i cant
> find a simgle example of a plot WITH FONT DETAIL in the list
> 
> can anybody help me?
> 
> a simple example:
> 
> library(GDD)
> GDD("something.png", type="png", width = 700, height = 500)
> par(cex.axis=0.65,lab=c(12,12,0),mar=c(2.5, 2.5, 2.5, 2.5))
> plot(rnorm(100))
> mtext("Something",side=3,padj=-0.33,cex=1)
> dev.off()
> 
> thanks in advance!

This might help - we found that we needed to install the MS TT fonts and
make sure that GDD can find them, as per the README. :

Simon Urbanek <[EMAIL PROTECTED]> wrote:
> Tim,
>
> On Jun 9, 2005, at 3:51 AM, Tim CHURCHES wrote:
>
>> I tried GDD 0.1-7 with Lattice graphs in R 2.1.0 (on Linux). It
>> doesn't segfault now but it is still not producing any usable output
>> - the output png file is produced but nly with a few lines on it.
>> Still the alpha channel problem? Have you been able to produce any
>> Lattice graphs with it?
>
> I know of no such problem, I tested a few lattice graphics and they
> worked. Can you, please, send me reproducible example and your output?
> Also send me, please output of
> library(GDD)
> .Call("gdd_look_up_font", NULL)

Sorry, my laziness. GDD was unable to find any fonts. After I installed
the MS TT fonts and set their location as per the GDD README, it worked
perfectly with both old-style R graphics and lattice graphics. The
output looks very nice indeed. We'll do a bit more testing (and let you
know if we find any problems), but it looks like we can at last drop the
requirement for Xvfb when using R in a Web application. Great work! From
our point of view, GDD solves one the biggest problem with R for Web
applications.

Cheers,

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Histogram over a Large Data Set (DB): How?

2005-11-18 Thread Tim Churches
Eric Eide wrote:
> "Sean" == Sean Davis <[EMAIL PROTECTED]> writes:
> 
>   Sean> Have you tried just grabbing the whole column using dbGetQuery?
>   Sean> Try doing this:
>   Sean> 
>   Sean> spams <- dbGetQuery(con,"select unixtime from email limit
>   Sean> 100")
>   Sean> 
>   Sean> Then increase from 1,000,000 to 1.5 million, to 2 million, etc.
>   Sean> until you break something (run out of memory), if you do at all.
> 
> Yes, you are right.  For the example problem that I posed, R can indeed 
> process
> the entire query result in memory.  (The R process grows to 240MB, though!)
> 
>   Sean> However, the BETTER way to do this, if you already have the data
>   Sean> in the database is to allow the database to do the histogram for
>   Sean> you.  For example, to get a count of spams by day, in MySQL do
>   Sean> something like: [...]
> 
> Yes, again you are right --- the particular problem that I posed is probably
> better handled by formulating a more sophisticated SQL query.
> 
> But really, my goal isn't to solve the the example problem that I posed ---
> rather, it is to understand how people use R to process very large data sets.
> The research project that I'm working on will eventually need to deal with
> query results that cannot fit in main memory, and for which the built-in
> statistical facilities of most DBMSs will be insufficient.
> 
> Some of my colleagues have previously written their analyses "by hand," using
> various scripting languages to read and process records from a DB in chunks.
> Writing things in this way, however, can be tedious and error-prone.  Instead
> of taking this approach, I would like to be able to use existing statistics
> packages that have the ability to deal with large datasets in good ways.
> 
> So, I seek to understand the ways that people deal with these sorts of
> situations in R.  Your advice is very helpful --- one should solve problems in
> the simplest ways available! --- but I would still like to understand the
> harder cases, and how one can use "general" R functions in combination with
> DBI's `dbApply' and `fetch' interfaces, which divide results into chunks.

You might be interested in our project: "NetEpi Analysis", which aims to
provide interactive exploratory data analysis and basic epidemiological
analysis via both a Web front end and a Python programmatic API (forgive
the redundancy in "programmatic API") for datests up to around 30
million rows (and as many columns as you like) on 32 bit platforms -
hundreds of millions of rows should be feasible on 64-bit platforms. It
stores data column-wise in memory-mapped on-disc arrays, and uses set
operations on ordinal indexes to permit rapid subsetting and
cross-tabulation of categorical (factored) data. It is written in
Python, but uses R for graphics and some (but not all) statistical
calculations (and for model fitting when we get round to providing
facilities for same).

See http://www.netepi.org - still in alpha, with an update coming out by
December. Although it is aimed at epidemiological analysis (of large
administrative health datasets), I dare say it might be useful for
exploring large databases of spam too.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] mid-p CIs for common odds ratio

2005-10-19 Thread Tim Churches
mantelhaen.test() gives the exact conditional p-value (for independence) and 
confidence intervals (CIs)for the common odds ratio for a stratified 2x2 table. 
The epitools package by Tomas Aragon (available via CRAN) contains functions 
which use fisher.test() to calculate mid-p exact p-values and CIs for the CMLE 
odds ratio for a single 2x2 table. The mid-p p-value for independence for a 
stratified 2x2 table is easy to calculate using mantelhaen.test(), but can 
anyone suggest a method for calculation of mid-p CIs for the common odds ratio? 
A search in the usual places draws a blank (but I am sure someone will 
immediately prove me wrong on that point...). Thanks to Andy Dean (of Epi-Info 
fame), I have a copy of public domain Pascal code from 1991 by David Martin and 
Harland Austin which calculates mid-p CIs for the common odds ratio by finding 
polynomial roots. Before trying to replicate that code in R (or C), I was 
wondering if anyone could suggest a better or easier way?

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio

2005-10-18 Thread Tim Churches
Marc Schwartz (via MN) <[EMAIL PROTECTED]> wrote:
> 
> On Wed, 2005-10-19 at 06:47 +1000, Tim Churches wrote:
> > Marc Schwartz (via MN) wrote:
> > 
> > > There is also code for the Woolf test in ?mantelhaen.test
> > 
> > Is there? How is it obtained? The documentation on mantelhaen.test in 
> R
> > 2.2.0 contains a note: "Currently, no inference on homogeneity of the
> > odds ratios is performed." and a quick scan of the source code for the
> > function didn't reveal any meantion of Woolf's test.
> > 
> > Tim C
> 
> 
> Review the code in the examples on the cited help page...
> 
> :-)

OK, I see it now, thanks.

Tim C

> 
> HTH,
> 
> Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio

2005-10-18 Thread Tim Churches
Marc Schwartz (via MN) wrote:

> There is also code for the Woolf test in ?mantelhaen.test

Is there? How is it obtained? The documentation on mantelhaen.test in R
2.2.0 contains a note: "Currently, no inference on homogeneity of the
odds ratios is performed." and a quick scan of the source code for the
function didn't reveal any meantion of Woolf's test.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio

2005-10-18 Thread Tim Churches
MJ Price, Social Medicine wrote:
> I have been trying to find a function to calculate the Breslow-Day test for 
> homogeneity of the odds ratio in R. I know the test can be preformed in SAS 
> but i was wondering if anyone could help me to perform this in r.

I don't recall seeing the Breslow-Day test anywhere in an R package, but
the VCD package (available via CRAN) has a function called woolf_test()
to calculate Woolf's test for homogeneity of ORs.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] running JPEG device on R 1.9.1 using xvfb-run on Linux

2005-10-13 Thread Tim Churches
Prof Brian Ripley wrote:
> On Wed, 12 Oct 2005, David Zhao wrote:
> 
> 
>>Does anybody have experience in running jpeg device using xvfb-run on
>>linux? I've been having sporadic problem with: /usr/X11/bin/xvfb-run
>>/usr/bin/R --no-save < Rinput.txt, with error saying: error in X11
>>connection. Especially when I run it from a perl script.
> 
> 
> Not sure what `xvfb-run on Linux' is, as it is not on my Linux (FC3).
> If you Google it you will find a number of problems reported on Debian 
> lists.  Here I would suspect timing.
> 
> What I do is to run Xvfb on screen 5 by
> 
> Xvfb :5 &
> setenv DISPLAY :5
> 
> and do not have a problem with the jpeg() or png() devices.  I do have a 
> problem with the rgl() package, but then I often do on-screen (on both 32- 
> and even more so 64-bit FC3).

For R-embedded-in-Python (via RPy) on a Web server, we have been using a
Python programme to automatically start Xvfb if it is not already
running. You can find a copy of the programme in the NetEpi-Analysis
tarball available at
http://sourceforge.net/project/showfiles.php?group_id=123700

The tricky bit is managing the permissions for the Xvfb session,
particularly in a Web server context - you need to take care. However,
this use of Xvfb has been perfectly reliable (on Red Hat Enterprise
Linux 2.1 and 3 with R2.0 and R 2.1)
> 
>>Is there a better way of doing this? or how can I fix the problem.
> 
> You really should update your R.

Yes. We now use GDD, which is an alternative R graphics driver for
raster graphics (Jpeg and PNG), available via CRAN. It allows R to
directly generate jpeg and png files on a Linux or Unix machine without
the need for an X server to be running (not even Xvfb). The quality of
the output is also better than the standard R X11/png/jpeg graphics
device due to the use of anti-aliased fonts by GDD. Earlier versions of
GDD were a bit buggy, but so far we have found the latest version
(0.1.7) to be fine. It is a bit fiddly to install all the libraries it
requires as well as  the recommended (no-cost) Microsoft TrueType fonts,
but the effort is worth it. Many thanks to Simon Urbanek for his work on
GDD.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Leading in line-wrapped Lattice axis value and panel labels

2005-09-07 Thread Tim Churches
Paul Murrell wrote:

> Hi
>
> Deepayan Sarkar wrote:
> > On 9/7/05, Tim Churches <[EMAIL PROTECTED]> wrote:
> >
> >> Version 2.1.1 Platforms: all
> >>
> >> What is the trellis parameter (or is there a trellis parameter) to
> >> set the leading (the gap between lines) when long axis values
> >> labels or panel header labels wrap over more than one line? By
> >> default, there is a huge gap between lines, and much looking and
> >> experimentation has not revealed to me a suitable parameter to
> >> adjust this.
> >>
> >
> > There is none. Whatever grid.text does happens.
>
> grid does have a "lineheight" graphical parameter.  For example,
>
> library(grid)
> grid.text("line one\nlinetwo",
>   x=rep(1:3/4, each=3),
>   y=rep(1:3/4, 3),
>   gp=gpar(lineheight=1:9/2))
>
> Could you add this in relevant places in trellis.par Deepayan?
>
Is there a work around we could use in the meantime, or should we 
attempt to hack trellis.par as per Paul's suggestion (gulp!)?  I suppose 
that is like asking "Should we attempt to climb Teichelmann?" - it 
depends... We have increased the depth of the panel headers, but this 
wastes plotting area and the tops of the tees and effs on the upper line 
and the bottoms of the gees and whys on the bottom line are still cut 
off, so large is the gap between the two lines.  And increasing the 
panel header depth it doesn't help with y-axis labels - typically the 
second line of one label will abut the first line of the next label, 
giving a results which  is rather like:

Value
   -
One
Value
  -
Two
Value
 -
Three
Value
-
Four

where the actual value labels are "Value One", "Value Two" etc and the 
"-" are the tick marks. Less than ideal.

Suggestions for interim fixes (other than using abbreviated labels... 
we've thought of that) most welcome.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Leading in line-wrapped Lattice value and panel labels

2005-09-06 Thread Tim Churches
Version 2.1.1
Platforms: all

What is the trellis parameter (or is there a trellis parameter) to set the 
leading (the gap between lines) when long axis values labels or panel header 
labels wrap over more than one line? By default, there is a huge gap between 
lines, and much looking and experimentation has not revealed to me a suitable 
parameter to adjust this.

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] The Perils of PowerPoint

2005-09-02 Thread Tim Churches
(Ted Harding) wrote:

>By the way, the Washington Post/Minneapolis Star Tribune article is
>somewhat reminiscent of a short (15 min) broadcast on BBC Radio 4
>back on October 18 2004 15:45-16:00 called
>
>  "Microsoft Powerpoint and the Decline of Civilisation"
>
>which explores similar themes and also frequently quotes Tufte.
>Unfortunately it lapsed for ever from "Listen Again" after the
>statutory week, so I can't point you to a replay. (However, I
>have carefully preserved the cassette recording I made).
>  
>
Try http://sooper.org/misc/powerpoint.mp3 (copyright law notwithstanding...)

Tim C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Confidence interval bars on Lattice barchart with groups

2005-06-25 Thread Tim Churches
I am trying to add confidence (error) bars to lattice barcharts (and
dotplots, and xyplots). I found this helpful message from Deepayan
Sarkar and based teh code below on it:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/50299.html

However, I can't get it to work with groups, as illustrated. I am sure I
am missing something elementary, but I am unsure what.

Using R 2.1.1 on various platforms. I am aware of xYplot in the Hmisc
library but would prefer to avoid any dependency on a non-core R
library, if possible.

Tim C

##
# set up dummy test data
testdata <- data.frame(
dsr=c(1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,
  2,3,4,5,6,7,8,9,10,11,3,4,5,6,7,8,9,10,11,12),
year=as.factor(c(1998,1998,1998,1998,1998,1998,1998,1998,1998,1998,
 1999,1999,1999,1999,1999,1999,1999,1999,1999,1999,
 2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,
 2001,2001,2001,2001,2001,2001,2001,2001,2001,2001)),
geog_area=c('North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle',
'North','South','East','West','Middle'),
sex=c('Male','Male','Male','Male','Male',
  'Female','Female','Female','Female','Female',
  'Male','Male','Male','Male','Male',
  'Female','Female','Female','Female','Female',
  'Male','Male','Male','Male','Male',
  'Female','Female','Female','Female','Female',
  'Male','Male','Male','Male','Male',
  'Female','Female','Female','Female','Female'),
age=c('Old','Old','Old','Old','Old',
  'Young','Young','Young','Young','Young',
  'Old','Old','Old','Old','Old',
  'Young','Young','Young','Young','Young',
  'Old','Old','Old','Old','Old',
  'Young','Young','Young','Young','Young',
  'Old','Old','Old','Old','Old',
  'Young','Young','Young','Young','Young'))

# add dummy lower and upper confidence limits
testdata$dsr_ll <- testdata$dsr - 0.7
testdata$dsr_ul <- testdata$dsr + 0.5

# examine the test data
testdata

# check that a normal barchart with groups works OK - it does
barchart(geog_area ~ dsr | year, testdata, groups=sex, origin = 0)

# this works as expected, but not sure what teh error messages mean
with(testdata,barchart(geog_area ~ dsr | year + sex,
  origin = 0,
  dsr_ul = dsr_ul,
  dsr_ll = dsr_ll,
  panel = function(x, y, ..., dsr_ll, dsr_ul, subscripts) {
  panel.barchart(x, y, subscripts, ...)
  dsr_ll <- dsr_ll[subscripts]
  dsr_ul <- dsr_ul[subscripts]
  panel.segments(dsr_ll,
 as.numeric(y),
 dsr_ul,
 as.numeric(y),
 col = 'red', lwd = 2)}
  ))

# no idea what I am doing wrong here, but there is not one bar per
group... something
# to do with panel.groups???
with(testdata,barchart(geog_area ~ dsr | year, groups=sex,
  origin = 0,
  dsr_ul = dsr_ul,
  dsr_ll = dsr_ll,
  panel = function(x, y, ..., dsr_ll, dsr_ul, subscripts,
groups) {
  panel.barchart(x, y, subscripts, groups, ...)
  dsr_ll <- dsr_ll[subscripts]
  dsr_ul <- dsr_ul[subscripts]
  panel.segments(dsr_ll,
 as.numeric(y),
 dsr_ul,
 as.numeric(y),
 col = 'red', lwd = 2)}
  ))
##

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Runnning R remotely

2005-02-02 Thread Tim Churches
Laura Quinn wrote:
I wasn't aware that it was possible to use postscript in the same fashion
as png, eg:
png(file,width=x,height=y,)
image(map)
text(text)
title(title)
box()
dev.off()
As there are a large number of iterations png has been working nicely
(when not working remotely!), especially as it has proven easy to convery
into gifs and then into movie gifs. Could anyone suggest an alternative
approach in this case?
 

Start an Xvfb (X11 virtual frame buffer) server in your remote ssh 
session. R will then use that as an X11 device to produce the PNG 
output. If you are running in a hostile network environment, consider 
using authentication and/or switching off network access to the Xvfb 
session - see the man pages for Xvfb. Xvfb is installed by default on 
most recent Linux distributions - if not, there should be an installable 
package available for it for your flavour of Linux.

Tim C
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to make R faster?

2005-01-25 Thread Tim Churches
ebashi wrote:
Dear R users;
I am using R for a project. I have some PHP forms that
pass parameters to R for calculations, and publish the
result in HTML format by CGIwithR. I'm using a Linux
machine and every things work perfectly. However, it
is  too slow, it takes 5 to 10 seconds to run, and
even  if I start R from the Shell it takes the same
amount of time, which is probably due to installing
packages. My first question is that how can i make R
run faster? and second if I am supposed to reduce the
packages which are being loaded at initiation of R,
how can I limit it to only the packages that i want?
and third how can i make R not to get open each time,
and let it sit on the server so that, when i pass
something to it , i get result faster?
 

Have a look at RSOAP, which does exactly what you suggest and allows you 
to commuicate with the R session via SOAP. I'm sure there are SOAP 
libraries available for PHP. See 
http://research.warnes.net/projects/rzope/rsoap/

Tim C
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Plotting with Statistics::R, Perl/R

2005-01-24 Thread Tim Churches
Peter Dalgaard wrote:
d) Use bitmap(). It requires a working Ghostscript install, but is
otherwise much more convenient. Newer versions of Ghostscript have
some quite decent antialiasing built into some of the png devices.
Currently you need a small hack to pass the extra options to
Ghostscript -- we should probably add a gsOptions argument in due
course. This works for me on FC3 (Ghostscript 7.07):
mybitmap(file="foo.png", type="png16m", gsOptions=" -dTextAlphaBits=4
-dGraphicsAlphaBits=4 ")
where mybitmap() is a modified bitmap() that just sticks the options
into the command line. There are definitely better ways...
[The antialiasing is not quite perfect. In particular, the axes stand
out from the box around plots, presumably because an additive model is
used (so that if you draw a line on top of itself, the result becomes
darker). Also, text gets a little muddy at the default 9pt @ 72dpi, so
you probably want to increase the pointsize or the resolution.]
 

Apart from the significant quality issues which you mention, the other 
problem with using bitmap() in a Web server environment is the speed 
issue - it takes much longer to produce the output. Whether it takes too 
long depends on the users of your Web application, and how many 
simultaneous users there are. However, most users are more worried by 
the poor quality of the fonts in output produced by bitmap().

Tim C
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Plotting with Statistics::R, Perl/R

2005-01-21 Thread Tim Churches
Joe Conway wrote:
Dirk Eddelbuettel wrote:
On Fri, Jan 21, 2005 at 06:06:45PM -0800, Leah Barrera wrote:
I am trying to plot in R from a perl script using the Statistics::R
 package as my bridge.  The following are the conditions:
0. I am running from a Linux server.
Plotting certain formats requires the X11 server to be present as the
font metrics for those formats can be supplied only the X11 server.
Other drivers don;t the font metrics from X11 -- I think pdf is a
good counterexample. When you run in 'batch' via a Perl script, you
don't have the X11 server -- even though it may be on the machine and
running, it is not associated with the particular session running
your Perl job.  There are two common fixes:
a) if you must have png() as a format, you can start a virtual X11
server with the xvfb server -- this is a bit involved, but doable;

Attached is an init script I use to start up xvfb on Linux.
HTH,
Joe

#!/bin/bash
#
# syslogStarts Xvfb.
#
#
# chkconfig: 2345 12 88
# description: Xvfb is a facility that applications requiring an X frame buffer 
\
# can use in place of actually running X on the server
# Source function library.
. /etc/init.d/functions
[ -f /usr/X11R6/bin/Xvfb ] || exit 0
XVFB="/usr/X11R6/bin/Xvfb :5 -screen 0 1024x768x16"
RETVAL=0
umask 077
start() {
   echo -n $"Starting Xvfb: "
   $XVFB&
   RETVAL=$?
   echo_success
   echo
   [ $RETVAL = 0 ] && touch /var/lock/subsys/Xvfb
   return $RETVAL
}
stop() {
   echo -n $"Shutting down Xvfb: "
   killproc Xvfb
   RETVAL=$?
   echo
   [ $RETVAL = 0 ] && rm -f /var/lock/subsys/Xvfb
   return $RETVAL
}
restart() {
   stop
   start
}
case "$1" in
 start)
   start
   ;;
 stop)
   stop
   ;;
 restart|reload)
   restart
   ;;
 condrestart)
   [ -f /var/lock/subsys/Xvfb ] && restart || :
   ;;
 *)
   echo $"Usage: $0 {start|stop|restart|condrestart}"
   exit 1
esac
exit $RETVAL
 

Hmm, the only problem with that is that, if I am not mistaken, you are 
starting Xvfb without any authentication, and I am told by people who 
know about such things that in the context of an Internet-accessible Web 
server, having an X server accepting unauthenticated connections is not 
a good idea. In other, less hostile environments, it might be OK. Maybe 
such concerns are unreasonable paranoia, but my motto is better safe 
than sorry when it comes to Internet-facing servers. I think there are 
also other switches you can pass to Xvfb to stop it listening on various 
TCP/IP ports etc.

Tim C
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Plotting with Statistics::R, Perl/R

2005-01-21 Thread Tim Churches
Dirk Eddelbuettel wrote:
Plotting certain formats requires the X11 server to be present as the font
metrics for those formats can be supplied only the X11 server. Other drivers
don;t the font metrics from X11 -- I think pdf is a good counterexample.
When you run in 'batch' via a Perl script, you don't have the X11 server --
even though it may be on the machine and running, it is not associated with
the particular session running your Perl job.  There are two common fixes:
a) if you must have png() as a format, you can start a virtual X11 server
  with the xvfb server -- this is a bit involved, but doable;
 

An example of a Python programme which manages the starting of an Xvfb 
server when one is required can be found in the xvfb_spawn.py file 
/SOOMv0 directory of the tarball for NetEpi Analysis, which can be 
downloaded by following the links at http://www.netepi.org

xvfb_spawn.py was written for use with RPy, which is a Python-to-R 
bridge, when used in a Web server setting (hence no X11 display server 
available). It should be possible to translate the programme to Perl, or 
to write somethig similar in Perl. Comments in the code note some 
potential security traps for the unwary.

Hopefully one day the dependency of the R raster graphics devices on an 
X11 server will be removed. R on Win32 doesn't have that dependency (but 
then, Windows machines, even servers, have displays running all the time 
as part of their kernel, and who would wish that on other operating 
system?). However, there are several graphics back-ends which produce 
very high quality raster graphics on POSIX platforms without the need 
for an X11 device to be present - Agg ("Anti-grain geometry", see 
http://www.antigrain.com/) and Cairo (see http://cairographics.org/) 
spring to mind (usually disclaimers about the foregoing comments not 
meaning to seem like ingratitude to the R development team etc apply).

Tim C
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] SAS or R software

2004-12-18 Thread Tim Churches
Shawn Way wrote:
I've seen multiple comments about MS Excel's precision and accuracy.
Can you please point me in the right direction in locating information
about these?
 

As always, Google is your friend, but see for example 
http://www.nwpho.org.uk/sadb/Poisson%20CI%20in%20spreadsheets.pdf

Tim C
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Lattice graph with segments

2004-12-04 Thread Tim Churches
Andrew Robinson wrote:
Ruud,
try something like the following (not debugged, no coffee yet):
xyplot(coupon.period ~ median, data=prepayment,
 subscripts=T,	 
 panel=function(x,y,subscripts,...){
   panel.xyplot(x,y)
   panel.segments(deel1$lcl[subscripts], deel$ucl[subscripts])
 }
)

Andrew Robinson wrote:
> Ruud,
>
> try something like the following (not debugged, no coffee yet):
>
>
> xyplot(coupon.period ~ median, data=prepayment,
>  subscripts=T, 
>  panel=function(x,y,subscripts,...){
>panel.xyplot(x,y)
>panel.segments(deel1$lcl[subscripts], deel$ucl[subscripts])
>  }
> )
>
Not quite:
library(lattice)
prepayment <- data.frame(median=c(10.89,12.54,10.62,8.46,7.54,4.39),
 ucl=c(NA,11.66,9.98,8.05,7.27,4.28),
 lcl=c(14.26,13.34,11.04,8.72,7.90,4.59),
 coupon.period=c('a','b','c','d','e','f'))
xyplot(coupon.period ~ median, data=prepayment,
 subscripts=T,  
 panel=function(x,y,subscripts,...){
   panel.xyplot(x,y)
   panel.segments(prepayment$lcl[subscripts], prepayment$ucl[subscripts])
 }
)
throws the error:
Error in max(length(x0), length(x1), length(y0), length(y1)) :
Argument "x1" is missing, with no default
Tim C
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How about a mascot for R?

2004-12-02 Thread Tim Churches
Damian Betebenner wrote:
R users,
How come R doesn't have a mascot? 
Perhaps someone with artistic flair could create a mascot based on this 
image? It would help to give newcomers to R-help the right idea:

http://www.accesscom.com/~alvaro/alien/thepics/ripley1__.jpg
Tim C



signature.asc
Description: PGP signature
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Unable to understand strptime() behaviour

2004-12-01 Thread Tim Churches
R V2.0.1 on Windows XP.
I have read the help pages on strptime() over and over, but can't
understand why strptime() is producing the following results.
  > v <- format("2002-11-31", format="%Y-%m-%d")
  > v
[1] "2002-11-31"
  > factor(v, levels=v)
[1] 2002-11-31
Levels: 2002-11-31
  > x <- strptime("2002-11-31", format="%Y-%m-%d")
  > x
[1] "2002-12-01"
  > factor(x, levels=x)
[1] 
Levels: 2002-12-01 NA NA NA NA NA NA NA NA
Tim C
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: Re: [R] draft of posting guide. Sorry.

2003-12-22 Thread Tim Churches
A.J. Rossini <[EMAIL PROTECTED]> wrote:
> However, the amount (and quality) of
> (freely-available, at least for the cost of download, which might not
> be free) documentation for R is simply incredible.  The closest that
> I've seen, for freely available languages, is Python, for actual
> quality of documentation.

The Python documentation is truly excellent, but I agree, the R documentation is 
even better. Sometimes the R help is a bit terse, but that simply means that one 
has to think a bit to work out what is meant, but I have never found it to be 
insufficient.

Tim C

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: RE: [R] R and Memory

2003-12-02 Thread Tim Churches
Mulholland, Tom <[EMAIL PROTECTED]> wrote:
> 
> I would suggest that you make a more thorough search of the
> R-Archives.
> (http://finzi.psych.upenn.edu/search.html) If you do you will find
> this
> discussion has been had several times and that the type of machine
> you
> are running will have an impact upon what you can do. My feeling is
> that
> you are going have to knuckle down with the documentation and
> understand
> how R works and then when you have specific issues that show you have
> read all the appropriate documentation, you might try another message
> to
> the list.
> 
> Ciao, Tom

Another approach is to not try to bring all your data into R at once - it is unlikely 
that you actually need every column of every row in your dataset to undertake any 
particular analysis. The trick is to bring into R only those rows and columns which 
you need at a particular moment, and then discard them.

The best way to do this is to manage your data in an SQl database such as 
MySQL or PostgreSQL, and then use one of the R database interfaces to issue 
queries against this database and to surface the query results as a data frame.  
Remeber to compose your queries in such as way as to only retreive the rows 
and columns you really need at any particular moment, and don't forget to delete 
these data frames as soon as you have finished with them (or at least, as soon 
as you need more space in your R session).

There is also an (experimental I think) package which allows lazy or virtual 
loading of database queries into data frames, so that the query results are paged 
into memory as they are needed. But I doubt you will need that.

Tim C

> 
> _
>  
> Tom Mulholland
> Senior Policy Officer
> WA Country Health Service
> Tel: (08) 9222 4062
>  
> The contents of this e-mail transmission are confidential and may be
> protected by professional privilege. The contents are intended only
> for
> the named recipients of this e-mail. If you are not the intended
> recipient, you are hereby notified that any use, reproduction,
> disclosure or distribution of the information contained in this
> e-mail
> is prohibited. Please notify the sender immediately.
> 
> 
> -Original Message-
> From: Edward McNeil [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, 2 December 2003 8:45 AM
> To: [EMAIL PROTECTED]
> Subject: [R] R and Memory
> 
> 
> Dear all,
> This is my first post.
> We have started to use R here and have also started teaching it to
> our
> PhD students. Our unit will be the HQ for developing R throughout
> Thailand.
> 
> I would like some help with a problem we are having. We have one
> sample
> of data that is quite large in fact - over 2 million records (ok ok
> it's
> more like a population!). The data is stored in SPSS. The file is
> over
> 350Mb but SPSS happily stores this much data. Now when I try to read
> it
> into R it grunts and groans for a few seconds and then reports that
> there is not enough memory (the computer has 250MB RAM). I have tried
> setting the memory in the command line (--max-vsize and
> --max-mem-size)
> but all to no avail.
> 
> Any help would be muchly appreciated!
> 
> Edward McNeil (son of Don)
> Epidemiology Unit
> Faculty of Medicine
> Prince of Songkhla University
> Hat Yai  90110
> THAILAND
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] SAS transport files and the foreign package

2003-01-18 Thread Tim Churches
On Sat, 2003-01-18 at 07:45, Frank E Harrell Jr wrote:
> I had no idea how strange the XPORT format really is.

Like the fact that the IBM double precision representation used in XPORT
uses 7 bits for the exponent and 56 bits for the mantissa, whereas IEEE
format uses 11 bits for the exponent and 52 bits for the mantissa.
  
> Following Duncan Temple Lang's suggestion I am contacting one of our
> clients to see what they think about moving towards XML for this.
> My guess is that XML will take a while to be used routinely for 
> this and that the sometimes huge datasets involved will cause XML 
> files to be monstrous (compression will help but will tax memory
> usage of R at least temporarily during processing).  

The nice things about the SAS XML engine are: 
a) all the metadata associated with a dataset is included in the
generated XML file, including not just the names of the formats
for each variable (column), but the actual format value labels
themselves. 
b) more than one dataset can be included in a single generated XML
export file
c) like the XPORT format, close to foolproof from the SAS user's
point of view, because the SAS XML engine does all the work.

The generated files are indeed huge (relative to the
amount of actual data they contain). For our purposes,
this is not likely to be a huge problem - we select
and/or summarise data in SAS, and then pass the subset or 
summary set to R. At the moment, we are experimenting with 
parsing the SAS XML files with Python and then passing the 
data to R via RPy (the Python-to-R bridge) - mainly because
I am slightly more adept at writing Python than R. However, the
ability of R to read SAS XML files directly, and to set
up categorical SAS variables which have formats as factor
columns in R data.frames, would be fabulous.

Tim C

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



Re: [R] calling R from python (fwd)

2002-12-23 Thread Tim Churches
Agustin Lobo wrote:
> 
> A question for a (experienced) user of the RPython package on
> linux.
> 
> I'm trying to call R from python on a linux (Suse 7.3) box.

Since you are calling R from Python, you could try Walter Moreira's
excellent RPy module. I found it much easier to install than RSPython
(provided you follow Waletr's instructions), and it has been very
reliable. It is also very efficient at converting Numeric Python arrays
to R, and has a very easy to use object model - much nicer than
RSPython's. See http://rpy.sourceforge.net

Of course, RSPython can also call Python from R, which RPy can't do.

Tim C

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help