[R] Sweave: \Sexpr and variables with special chars

2010-11-16 Thread Ralf B
I am using \Sexpr to include a variable in a title of a Sweave document:

\documentclass[a4paper]{article}
<>=
#mytitlevar <- "Stuff" # case 1, everything is find
mytitlevar <- "Stuff_first" # case 2, f is turned into sub-text
@
\title{MyTitle: \\ \Sexpr{mytitlevar} }
\begin{document}
\maketitle
\end{document}

When doing this, the variable seems to be subject to interpretation by
LaTeX. The variable in case #2 causes the 'f' of 'Stuff_first' to be
printed as sub-text because of the leading underscore. Is there a way
to turn the variable value (the text) into a form so that it is not
interpreted and/or Sweave? I understand that this perhaps more of a
LaTeX question than an R question...

Thanks,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sweave: Conditional code chunks?

2010-11-15 Thread Ralf B
I have a code junk that produces a figure. In my special case,
however, data does not always exist. In cases where data exists, the
code chunk is of course trival (case #1), however, what do I do for
case # 2 where the data does not exist?
I can obviously prevent the code from being executed by checking the
existence of the object x, but on the Sweave level I have a static
figure chunk. Here an example that should be reproducible:

# case 1
x <- c(1,2,3)
# case 2 - no definition of variable
#x <- c(1,2,3)

<>=
if(exists(as.character(substitute(meta.summary{
  plot(x)
}
@

In a way I would need a conditional chunk or a chunk that draws a
figure only if it was generated and ignores it otherwise.

Any ideas?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave question

2010-11-15 Thread Ralf B
Thank you for all your comment. In result of own research I found this
method that seems to do what I want in addition to your suggestions:

tools::texi2dvi("myfile.tex", pdf=TRUE)

Thanks again,
Ralf

On Mon, Nov 15, 2010 at 6:42 AM, Duncan Murdoch
 wrote:
> On 15/11/2010 6:22 AM, Dieter Menne wrote:
>>
>>
>> Duncan Murdoch-2 wrote:
>>>
>>> See SweavePDF() in the patchDVI package on R-forge.
>>>
>>>
>>
>> In case googling patchDVI only show a few Japanese Pages, and search for
>> patchDVI in R-Forge gives nothing: try
>>
>> https://r-forge.r-project.org/projects/sweavesearch/
>>
>> (or did I miss something obvious, Duncan?)
>
> No, I just didn't realize that it was hard to find.  But you can always
> select R-forge as a repository, and then install.packages() will find it.
>
> Duncan Murdoch
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Full path to currently executed script file

2010-11-14 Thread Ralf B
I am looking for a way to determine the full filepath to the currently
executed script. Any ideas?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave question

2010-11-13 Thread Ralf B
Thank you. The article you cited explains on the last page how this is
done and shows how Sweave is run from within R and it says that it
creates the .tex file.

My last remaining question is now if there is a way to execute this
Sweave tex output by executing Latex from R. In other words, what is
the command to execute latex from within R. Or do I perhaps think to
complcated and there is a single command to create the tex and the
pdf/ps in a single step? At the end, I would like to create everything
between the Sweave document and the final pdf/ps output from within R
without the need to make external calls.

Ralf


On Sat, Nov 13, 2010 at 4:29 PM, Johannes Huesing  wrote:
> Ralf B  [Sat, Nov 13, 2010 at 10:03:49PM CET]:
>> It seems that Sweave is supposed to be used from Latex and R is called
>> during the LaTeX compilation process whenever R chunks appear.
>
> This is not how it works.
>
> In the first page of
> http://www.statistik.lmu.de/~leisch/Sweave/Sweave-Rnews-2002-3.pdf
> that the file is first processed by R before it can be typeset by
> LaTeX.
>
>> What
>> about the other way round? I would like to run it triggered by R. Is
>> this possible?
>
> To my understanding this is how it's done.
>
>> I understand that this does not correspond to the idea
>> of literate programming since it means that there is R code running
>> outside the document,
>
> You lost me here.
>
>> but for my practical approach, I would like to
>> use Sweave more like a report extension at the end of my already
>> existing R scripts that combined a number of diagrams to a pdf file.
>>
>> My second question is, does Sweave create a potential performance
>> bottleneck when used with very big data analysis compared with when
>> using R directly?
>>
>
> Not really, because the only overhead is tangling the Sweave file.
> If it is very big, you may want to process only the parts you have
> changed last. The package weaver seems to come in handy then, see
> http://bioconductor.org/packages/2.6/bioc/vignettes/weaver/inst/doc/weaver_howTo.pdf
> --
> Johannes Hüsing               There is something fascinating about science.
>                              One gets such wholesale returns of conjecture
> mailto:johan...@huesing.name  from such a trifling investment of fact.
> http://derwisch.wikidot.com         (Mark Twain, "Life on the Mississippi")
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sweave question

2010-11-13 Thread Ralf B
It seems that Sweave is supposed to be used from Latex and R is called
during the LaTeX compilation process whenever R chunks appear. What
about the other way round? I would like to run it triggered by R. Is
this possible? I understand that this does not correspond to the idea
of literate programming since it means that there is R code running
outside the document, but for my practical approach, I would like to
use Sweave more like a report extension at the end of my already
existing R scripts that combined a number of diagrams to a pdf file.

My second question is, does Sweave create a potential performance
bottleneck when used with very big data analysis compared with when
using R directly?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge postscript files into ps/pdf

2010-11-12 Thread Ralf B
Assuming I would go into the trouble of messing with the existing R
scripts that create mentioned postscripts/pdfs, how can I achieve that
an array of scripts append to a single ps/pdf? I would want the first
script to create the file if it does not yet exist and all other to
append to it with new pages. I tried this simple example:

#first.R
pdfFileName <- paste("C:/testfile.pdf", sep="")
pdf(pdfFileName, onefile=TRUE)
plot(c(1,2,3))
abline(v = 2)
dev.off()

#second.R
pdfFileName <- paste("C:/testfile.pdf", sep="")
pdf(pdfFileName, onefile=TRUE)
plot(c(1,2,3))
abline(h = 2)
dev.off()

The second overwrites the first and I cannot accumulate across
different scripts. I can also not do it if I happen to start different
pdf file environments in the same script despite it sharing the same
file and having the 'onefile' set to true. Is it really just limited
to the a single environment?

Ralf





On Fri, Nov 12, 2010 at 2:28 AM, Joshua Wiley  wrote:
> Hi Ralf,
>
> It is easy to make a bunch of graphs in one file (each on its own
> page), using the onefile = TRUE argument to postscript() or pdf()
> (depending what type of file you want).  I usually use Ghostscript for
> tinkering with already created postscript or PDF files.  To me there
> is more appropriate software than R to use if you want to
> edit/merge/manipulate postscript or PDF files.
>
> Cheers,
>
> Josh
>
> On Thu, Nov 11, 2010 at 11:07 PM, Ralf B  wrote:
>> I created multiple postscript files using ?postscript. How can I merge
>> them into a single postscript file using R? How can I merge them into
>> a single pdf file?
>>
>> Thanks a lot,
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge postscript files into ps/pdf

2010-11-12 Thread Ralf B
I know such programs, however, for my specific problem I have an R
script that creates a report (which I have to create many times) and I
would like to append about 100 single paged post scripts at the end as
appendix. File names are controlled so it would be easy to detect
them; I just miss a useful function/package that allows me to perhaps
print them to a postscript graphics device.

Ralf

On Fri, Nov 12, 2010 at 11:47 AM, Greg Snow  wrote:
> The best approach if creating all the files using R is to change how you 
> create the graphs so that they all go to one file to begin with (as mentioned 
> by Joshua), but if some of the files are created differently (rgl, external 
> programs), then this is not an option.
>
> One external program that is fairly easy to use is pdftk which will 
> concatenate multiple pdf files into 1 (among other things).  If you want more 
> control of layout then you can use LaTeX which will read and include ps/pdf.
>
> If you need to use R, then you can read ps files using the grImport package 
> and then replot them to a postscript/pdf device with onefile set to TRUE.
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111
>
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
>> project.org] On Behalf Of Ralf B
>> Sent: Friday, November 12, 2010 12:07 AM
>> To: r-help Mailing List
>> Subject: [R] Merge postscript files into ps/pdf
>>
>> I created multiple postscript files using ?postscript. How can I merge
>> them into a single postscript file using R? How can I merge them into
>> a single pdf file?
>>
>> Thanks a lot,
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merge postscript files into ps/pdf

2010-11-11 Thread Ralf B
I created multiple postscript files using ?postscript. How can I merge
them into a single postscript file using R? How can I merge them into
a single pdf file?

Thanks a lot,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] time question

2010-11-09 Thread Ralf B
I have this script which I use to get an epoch with accuracy of 1
second (based on R's inability to calculate millisecond-accurate
timestamps -- at least I have not seen a straightforward solution :)
):

nowInSeconds <- as.numeric(Sys.time())
nowInMS <- nowInSeconds * 1000
print(nowInSeconds)
print(as.character(nowInMS))

when running this I get the following:

> nowInSeconds <- as.numeric(Sys.time())
> nowInMS <- nowInSeconds * 1000
> print(nowInSeconds)
[1] 1289312002
> print(as.character(nowInMS))
[1] "1289312002093"


I wonder where the 93 milliseconds come from. Is this a random number?
A rounding error? Can somebody explain this?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rserve alternative?

2010-11-08 Thread Ralf B
The Rserve documentation at

http://rosuda.org/Rserve/doc.shtml#start

states that even when making multiple connections to the Rserve,
Windows won't separate workspaces physically and share environments,
which will obviously cause problems and should therefore not be used.
Are there any alternatives for the windows platform?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rserve causes Perl error

2010-11-07 Thread Ralf B
Hi all,

I tried to run Rserve: I installed it from CRAN using

install.packages("Rserve")

and tried to run it from the command line using:

R CMD Rserve

I am getting an error telling me that the command perl cannot be
found. What is wrong and what can I do to fix this? Do I need to
install any other packages or is it just a path problem?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create variable by name

2010-10-06 Thread Ralf B
Hi all,

I have scripts that have one variable called 'output', the script
creates a data frame and stores this data frame in 'output'. Then
the data frame is written to disk. This is simple when working with a
single script. However, as soon as one script calls other, variables
overwrite each other.

Here a little example that fixes the problem for one particular case
by using variables with different names:


#
# Script A, either called directly or from another script
#
outputA <- NULL
getOutput <- function(){
  return(outputA)
}
outputA <- "script A"

#
# Script B, includes and executes script A as part of its own
#
output <- NULL
output <- "Script B"
source("C:/data/poodle/coder/overwritingTest/scriptA.R")
outputA <- getOutput()
print(paste("Output script B:", output))
print(paste("Output script A:", outputA ))

However, I simply want to ensure that script A's output does not interfere with
the the one produced by script B without script B's need to know how A called
its variable. I want script B to easily access the output of script A.
What I would
ideally need ( I think) is an OO approach. I was thinking that I could perhaps
store a generic variable that is generated based on the script name
(i.e. scriptA_output, scriptB_output)

On Wed, Oct 6, 2010 at 1:24 PM, Greg Snow  wrote:
> Possible? Yes (as others have shown), advisable? No, see fortune(106), 
> fortune(236), and possibly fortune(181).
>
> What is your ultimate goal? Maybe we can help you find a better way.
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111
>
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
>> project.org] On Behalf Of Ralf B
>> Sent: Wednesday, October 06, 2010 10:32 AM
>> To: r-help Mailing List
>> Subject: [R] Create variable by name
>>
>> Can one create a variable through a function by name
>>
>> createVariable <- function(name) {
>>       outputVariable = name
>>       name <- NULL
>> }
>>
>> after calling
>>
>> createVariable("myVar")
>>
>> I would like to have a variable myVar initialized with NULL in my
>> environment. Is this possible?
>>
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Create variable by name

2010-10-06 Thread Ralf B
Can one create a variable through a function by name

createVariable <- function(name) {
outputVariable = name
name <- NULL
}

after calling

createVariable("myVar")

I would like to have a variable myVar initialized with NULL in my
environment. Is this possible?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Source awareness?

2010-10-06 Thread Ralf B
Here the general (perhaps silly question) first: Is it possible for a
script to find out if it was sourced by another script or run
directly?

Here a small example with two scripts:

# script A
print ("This is script A")

# script B
source("C:/scriptA.R")
print ("This is script B")

I would like to modify script A in a way so that it only outputs 'This
is script A' if it was called directly, but keeps quiet in the other
case.

In addition to that, is it possible to access the stack of script
calls from the environment?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inheritance and automatic function call on script exit

2010-10-06 Thread Ralf B
Here the modified script with what I learned from  Joshua:

#
# superscript
#

output <- NULL

writeOutput <- function() {
processTime <- proc.time()
outputFilename <- paste("C:/myOutput_", processTime[3], ".csv", sep="")
write.csv(output, file = outputFilename)
}
on.exit(writeOutput, add=T)


#
# subscript
#

source("C:/superscript.R")
output <- data.frame(a=c(1,2,3), b=c(4,5,6))

For some reason, the file is not created. So it seems not to do the
call. What do I do wrong?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inheritance and automatic function call on script exit

2010-10-06 Thread Ralf B
I think base:on.exit() will do the trick. Thank you :)

Ralf

On Wed, Oct 6, 2010 at 11:24 AM, Ralf B  wrote:
>> If you are running these interactively, you could make your own
>> source() function.  In that function you could define the super and
>> subscripts, and have it call writeOutput on.exit().  I suspect you
>> could get something like that to work even in batch mode by having R
>> load the function by default and some tweaking of your scripts.
>
> What if I do not control the subscripts but only the superscript. In
> other words, other people keep adding subscripts and the function of
> my superscript only ensures certain standard behaviors.
>
> Ralf
>
>
>
>>
>>>
>>> Best,
>>> Ralf
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> http://www.joshuawiley.com/
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inheritance and automatic function call on script exit

2010-10-06 Thread Ralf B
> If you are running these interactively, you could make your own
> source() function.  In that function you could define the super and
> subscripts, and have it call writeOutput on.exit().  I suspect you
> could get something like that to work even in batch mode by having R
> load the function by default and some tweaking of your scripts.

What if I do not control the subscripts but only the superscript. In
other words, other people keep adding subscripts and the function of
my superscript only ensures certain standard behaviors.

Ralf



>
>>
>> Best,
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Inheritance and automatic function call on script exit

2010-10-06 Thread Ralf B
Hi all,

in order to add certain standard functionality to a large set of
scripts that I maintain, I developed a superscript that I manually
include into every script at the beginning. Here an example of a very
simplified super and subscript.

#
# superscript
#

output <- NULL

writeOutput <- function() {
processTime <- proc.time()
outputFilename <- paste("C:/myOutput_", processTime[3], ".csv", sep="")
write.csv(output, file = outputFilename)
}



#
# subscript
#

source("C:/superscript.R")
output <- data.frame(a=c(1,2,3), b=c(4,5,6))
writeOutput()


I would like to

a) avoid the need to include the super script manually. Does R support
some kind of script inheritance?
b) Is it possible to call writeOutput() automatically when a script is exiting?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rimage package problems

2010-09-27 Thread Ralf B
Hi all,

I tried to install the rimage in order to get to the function
?read.jpeg. However, I get this error, independent what mirror I
choose:

install.packages("rimage")
--- Please select a CRAN mirror for use in this session ---
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
  package ‘rimage’ is not available
>

Does anybody know what happend with the package? Is there an
alternative, I simply want to draw a background picture for a plot
using the standard graphics package.

Thanks,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Determine area between two density plots

2010-09-23 Thread Ralf B
I wonder what the best way is to access those values. I am using the
following code:

x1 <- c(1,2,1,3,5,6,6,7,7,8)
x2 <- c(1,2,1,3,5,6,5,3,8,7)
d1 <- density(x1, na.rm = TRUE)
d2 <- density(x2, na.rm = TRUE)
plot(d1, lwd=3, main="bla")
lines(d2, lty=2, lwd=3)
d1[1]
d1[2]

The last two lines allow me to access 1000 values, but I don't know if
this is the right approach. I also don't know why they are in two
columns. Does density have a saver way to get to those values?

Ralf

Ralf



On Wed, Sep 22, 2010 at 5:25 PM, Peter Alspach
 wrote:
> Tena koe Ralf
>
> If you save the results of density()
>
> x1Den <- density(x1)
>
> you get the x and y values of the line which is plotted.  Similarly for x2 - 
> you can then use these to shade the joint area and find the area.  Tinkering 
> with the arguments of density to make the x values for each the same will 
> make this process easier.  Let me know if you'd like more details.
>
> HTH 
>
> Peter Alspach
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
>> project.org] On Behalf Of Ralf B
>> Sent: Thursday, 23 September 2010 8:55 a.m.
>> To: r-help Mailing List
>> Subject: [R] Determine area between two density plots
>>
>> Hi group,
>>
>> I am creating two density plots as shown in the code below:
>>
>> x1 <- c(1,4,5,3,2,3,4,5,6,5,4,3,2,1,1,1,2,3)
>> x2 <- c(1,4,5,3,5,7,4,5,6,1,1,1,2,1,1,1,2,3)
>> plot(density(x1, na.rm = TRUE))
>> polygon(density(x2, na.rm = TRUE), border="blue")
>>
>> How can I determine the area that is covered between the two plots as
>> a number and how can I grey (or highlight with a pattern) the area
>> that lies between the two lines?
>>
>> Thanks,
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> The contents of this e-mail are confidential and may be subject to legal 
> privilege.
>  If you are not the intended recipient you must not use, disseminate, 
> distribute or
>  reproduce all or any part of this e-mail or attachments.  If you have 
> received this
>  e-mail in error, please notify the sender and delete all material pertaining 
> to this
>  e-mail.  Any opinion or views expressed in this e-mail are those of the 
> individual
>  sender and may not represent those of The New Zealand Institute for Plant and
>  Food Research Limited.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Length of vector without NA's

2010-09-23 Thread Ralf B
Hi,

this following code:

x<-c(1,2,NA)
length(x)

returns 3, correctly counting numbers as well as NA's. How can I
exclude NA's from this count?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting densities

2010-09-23 Thread Ralf B
Hi group,

I am currently plotting two densities using the following code:

x1 <- c(1,2,1,3,5,6,6,7,7,8)
x2 <- c(1,2,1,3,5,6,5,7)
plot(density(x1, na.rm = TRUE))
polygon(density(x2, na.rm = TRUE), border="blue")

However, I would like to avoid bordering the second density as it adds
a nasty bottom line which I would like to avoid.
I would also rather have a dashed or dotted line for the second
(currently blue) density but without the bottom part.
Any idea how to do that?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Determine area between two density plots

2010-09-22 Thread Ralf B
Hi group,

I am creating two density plots as shown in the code below:

x1 <- c(1,4,5,3,2,3,4,5,6,5,4,3,2,1,1,1,2,3)
x2 <- c(1,4,5,3,5,7,4,5,6,1,1,1,2,1,1,1,2,3)
plot(density(x1, na.rm = TRUE))
polygon(density(x2, na.rm = TRUE), border="blue")

How can I determine the area that is covered between the two plots as
a number and how can I grey (or highlight with a pattern) the area
that lies between the two lines?

Thanks,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple Lorenz curves in one diagram

2010-09-22 Thread Ralf B
Hi group,

I would like to draw multiple Lorenz curves in a single plot using
data already prepared. Here is a simple example:

require("lawstat")
lorenz.curve(c(1,2,3),c(4,5,4))
lorenz.curve(c(1,2,3),c(4,2,1))

This example draws two separate graphs. How can I combine them in a
distinguishable way? I tried ?polygon without success...

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extracting bins and frequencies from frequency table

2010-09-22 Thread Ralf B
Dear R users,

I would like to great a frequency table from raw data and then access
the classes/bins and
their respective frequencies separately. Here the code to create the
frequency tables:


x1 <- c(1,5,1,1,2,2,3,4,5,3,2,3,6,4,3,8)
t1 <- table(x1)
print(t1[1])

Its easy to plot this, but how do I actually access the frequencies
alone and the bins alone?
Basically I am looking to get:

bins <- c(1, 2, 3, 4, 5, 6, 8)
freq <- c(3, 3, 4, 2, 2, 1, 1)

When running

print(t1[1])

I only get one pair. It seems to be organized that way. Is there a
better way? Perhaps 'table' is not the right approach?

Thanks a lot,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Combined plot: Scatter + density plot

2010-09-20 Thread Ralf B
Hi,

in order to save space for a publication, it would be nice to have a
combined scatter and density plot similar to what is shows on

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=78

I wonder if anybody perhaps has already developed code for this and is
willing to share. This is the reproducible code for the histogram
version obtained from the site:

def.par <- par(no.readonly = TRUE) # save default, for resetting...
x <- pmin(3, pmax(-3, rnorm(50)))
y <- pmin(3, pmax(-3, rnorm(50)))
xhist <- hist(x, breaks=seq(-3,3,0.5), plot=FALSE)
yhist <- hist(y, breaks=seq(-3,3,0.5), plot=FALSE)
top <- max(c(xhist$counts, yhist$counts))
xrange <- c(-3,3)
yrange <- c(-3,3)
nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE)
par(mar=c(3,3,1,1))
plot(x, y, xlim=xrange, ylim=yrange, xlab="", ylab="")
par(mar=c(0,3,1,1))
barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0)
par(mar=c(3,0,1,1))
barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)

par(def.par)

I am basically stuck from line 6 where the bin information from the
histogram is used for determining plotting sizes. Density are
different and don't have (equal) bins and their size would need to be
determined differently. I wonder if somebody here has created such a
diagram already and is willing to share ideas/code/pointers to similar
examples. Your effort is highly appreciated.

Thanks a lot,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Baumgartner-Weiss-Schindler test

2010-08-22 Thread Ralf B
Hi R group,

I am wondering if there is any implementation of the
Baumgartner-Weiss-Schindler test in R, as described in:

http://www.jstor.org/stable/2533862

It is a non-parametric test, that works similar to KS and others
testing the null hypothesis that two sets of data originate from the
the same population.

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] offlist comment Re: KS Test question (2)

2010-08-05 Thread Ralf B
Hi David,

I would like to apologize for what I wrote earlier. It was late and I
was frustrated. Please give me time to adapt to the formal structures
of the forum.

Best,
Ralf

On Thu, Aug 5, 2010 at 7:32 AM, David Winsemius  wrote:
>
> On Aug 5, 2010, at 4:10 AM, Ralf B wrote:
>
>> This is unbelievable. Now people like yourself start doing background
>> searches on one and accusing one of not being professional
>
> Your words, not mine.
>
>> plus posting cheeky R code.
>
> It appeared that you were having problems and did not have an efficient
> strategy for searching the archives, so I shared with you code that I
> developed and have put in my .Rprofile setup file. I do no see where that is
> "posting cheeky R code". I saw it as trying to be constructive. Using it
> would only be part of the recommended actions to take before posting
>
>
>> The reason why I submitted the questions I have
>> submitted was that these answers did not satisfy my particular problem
>> (or perhaps I mistakenly thought so). The point here is that the forum
>> should be a forum where one should be allowed to ask questions without
>> first studying the history of the the entire forum in fear that
>> someone might have asked it before.
>
> If you read the Posting Guide I think you will find precisely the opposite
> expectation explicitly presented. Using my "cheeky code" would only be part
> of the recommended actions to take before posting if you follow the
> recommendations of the "Do your homework before posting:" section. This list
> was not set up to be a chat room or a tutoring center for general questions
> in statistics.
>
> While you are reading the Posting Guide, please note that it expresses this
> advice regarding posting messages that were sent privately:
>
> "Take care when you quote other people's comments to respect their rights,
> e.g., as summarized here. In particular
>
>        • Private messages should never be quoted without permission,  "
>
>
>> I was hoping that I could find
>> clearer answers then what I was able to read. I do know how to search
>> in Google. But I am not an expert in statistics, as you already found
>> in your background check. If I would be fluent in stastitsics and R
>> and if past answers would have exactly satisfied my problem I would
>> not post here and I certainly would not have occupied your expensive
>> attention.
>>
>>
>>
>>
>>
>> On Wed, Aug 4, 2010 at 6:16 PM, David Winsemius 
>> wrote:
>>>
>>> On Aug 4, 2010, at 5:49 PM, Ralf B wrote:
>>>
>>>> Hi R Users,
>>>>
>>>> I have two vectors, x and y, of equal length representing two types of
>>>> data from two studies. I would like to test if they are similar enough
>>>> to use them interchangeably. No assumptions about distributions can be
>>>> made (initial tests clearly show that they are not normal).
>>>> Here some result:
>>>>
>>>> Two-sample Kolmogorov-Smirnov test
>>>>
>>>> data:  x and y
>>>> D = 0.1091, p-value < 2.2e-16
>>>> alternative hypothesis: two-sided
>>>>
>>>> Warning message:
>>>> In ks.test(x[1:nx], y[1:nx], exact = FALSE) :
>>>>  cannot compute correct p-values with ties
>>>>
>>>> Here some questions:
>>>>
>>>> a) What does the error message means and what does it imply?
>>>> b) The data is very noisy and the initial result shows that there is
>>>> no relation between x and y. Is there a way to calculate and effect
>>>> size?
>>>> c) Can the p-value be used, when running tests over a large amount of
>>>> different data sets, as a metric for ranking similarity between x and
>>>> y data sets?
>>>
>>> There has been quite a bit of discussion on this list over the years
>>> about
>>> why KS test is not good in this situation. If I read the results of a
>>> search
>>> on your name correctly, you are in a department of Information Sciences.
>>> I
>>> would have thought that the first reaction of someone in that field would
>>> be
>>> do do a search on a question. Why are you filling up the archives with
>>> questions that have been repeatedly asked and  answered?
>>>
>>> Do you need help in this area?
>>>
>>> rhelpSearch <- function(string,
>>>                 restrict = c("Rhelp10", "Rhelp08", "Rhelp02", "functions"
>>> ),
>>>                 matchesPerPage = 100, ...)
>>>        RSiteSearch(string=string,  restrict = restrict,  matchesPerPage =
>>> matchesPerPage, ...)
>>>
>>>
>>> rhelpSearch("KS.test ties p-value")
>>>
>>>>
>>>> Best
>>>> R.
>
>
> --
> David Winsemius, MD
> West Hartford, CT
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: cannot allocate vector of size xxx Mb

2010-08-05 Thread Ralf B
Thank you for such a careful and thorough analysis of the problem and
your comparison with your configuration. I very much appreciate.
For completeness and (perhaps) further comparison, I have executed
'version' and sessionInfo() as well:


> version
   _
platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status RC
major  2
minor  10.0
year   2009
month  10
day25
svn rev50206
language   R
version.string R version 2.10.0 RC (2009-10-25 r50206)
> sessionInfo()
R version 2.10.0 RC (2009-10-25 r50206)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
 [1] splines   stats4grid  stats graphics  grDevices utils
 [8] datasets  methods   base

other attached packages:
 [1] flexmix_2.2-7 multcomp_1.1-7survival_2.35-8   mvtnorm_0.9-9
 [5] modeltools_0.2-16 lattice_0.18-3car_1.2-16psych_1.0-88
 [9] nortest_1.0   gplots_2.8.0  caTools_1.10  bitops_1.0-4.1
[13] gdata_2.8.0   gtools_2.6.2  ggplot2_0.8.7 digest_0.4.2
[17] reshape_0.8.3 plyr_0.1.9proto_0.3-8   RJDBC_0.1-5
[21] rJava_0.8-2   DBI_0.2-5

loaded via a namespace (and not attached):
[1] tools_2.10.0

> memory.limit()
[1] 2047



Also, the example i presented was a simplified reproduction of the
real data structure. My real data structure does not have reused
vectors. I merely wanted to show the error occurring when processing
large vectors into data frames and then binding these data frames
together. I hope this additional information helps. I might add that I
am running this in StatET under Eclipse with 512 MB of allocated RAM
in the environment.

Besides adding more memory, can you spot simple ways of how memory use
can be improved? I know that I am running quite a bit of baggage.
Unfortunately my script is rather comprehensive and my example is
really just a simplified part that I created to reproduce the problem.

Thanks,
Ralf






On Thu, Aug 5, 2010 at 4:44 AM, Petr PIKAL  wrote:
> Hi
>
> r-help-boun...@r-project.org napsal dne 05.08.2010 09:53:21:
>
>> I am dealing with very large data frames, artificially created with
>> the following code, that are combined using rbind.
>>
>>
>> a <- rnorm(500)
>> b <- rnorm(500)
>> c <- rnorm(500)
>> d <- rnorm(500)
>> first <- data.frame(one=a, two=b, three=c, four=d)
>> second <- data.frame(one=d, two=c, three=b, four=a)
>
> Up to this point there is no error on my system
>
>> version
>               _
> platform       i386-pc-mingw32
> arch           i386
> os             mingw32
> system         i386, mingw32
> status         Under development (unstable)
> major          2
> minor          12.0
> year           2010
> month          05
> day            31
> svn rev        52164
> language       R
> version.string R version 2.12.0 Under development (unstable) (2010-05-31
> r52164)
>
>> sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-05-31 r52164)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> attached base packages:
> [1] stats     grDevices datasets  utils     graphics  methods   base
>
> other attached packages:
> [1] lattice_0.18-8 fun_1.0
>
> loaded via a namespace (and not attached):
> [1] grid_2.12.0  tools_2.12.0
>
>> rbind(first, second)
>
> Although size of first and second is only roughly 160 MB their
> concatenation probably consumes all remaining memory space as you already
> have a-d first and second in memory.
>
> Regards
> Petr
>
>>
>> which results in the following error for each of the statements:
>>
>> > a <- rnorm(500)
>> Error: cannot allocate vector of size 38.1 Mb
>> > b <- rnorm(500)
>> Error: cannot allocate vector of size 38.1 Mb
>> > c <- rnorm(500)
>> Error: cannot allocate vector of size 38.1 Mb
>> > d <- rnorm(500)
>> Error: cannot allocate vector of size 38.1 Mb
>> > first <- data.frame(one=a, two=b, three=c, four=d)
>> Error: cannot allocate vector of size 38.1 Mb
>> > second <- data.frame(one=d, two=c, three=b, four=a)
>> Error: cannot allocate vector of size 38.1 Mb
>> > rbind(first, second)
>>
>> When running memory.limit() I am getting this:
>>
>> memory.limit()
>> [1] 2047
>>
>> Which shows me that I have 2 GB of memory available. What is wrong?
>> Shouldn't 38 MB be very feasible?
>>
>> Best,
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http:

Re: [R] offlist comment Re: KS Test question (2)

2010-08-05 Thread Ralf B
This is unbelievable. Now people like yourself start doing background
searches on one and accusing one of not being professional plus
posting cheeky R code. The reason why I submitted the questions I have
submitted was that these answers did not satisfy my particular problem
(or perhaps I mistakenly thought so). The point here is that the forum
should be a forum where one should be allowed to ask questions without
first studying the history of the the entire forum in fear that
someone might have asked it before. I was hoping that I could find
clearer answers then what I was able to read. I do know how to search
in Google. But I am not an expert in statistics, as you already found
in your background check. If I would be fluent in stastitsics and R
and if past answers would have exactly satisfied my problem I would
not post here and I certainly would not have occupied your expensive
attention.





On Wed, Aug 4, 2010 at 6:16 PM, David Winsemius  wrote:
>
> On Aug 4, 2010, at 5:49 PM, Ralf B wrote:
>
>> Hi R Users,
>>
>> I have two vectors, x and y, of equal length representing two types of
>> data from two studies. I would like to test if they are similar enough
>> to use them interchangeably. No assumptions about distributions can be
>> made (initial tests clearly show that they are not normal).
>> Here some result:
>>
>> Two-sample Kolmogorov-Smirnov test
>>
>> data:  x and y
>> D = 0.1091, p-value < 2.2e-16
>> alternative hypothesis: two-sided
>>
>> Warning message:
>> In ks.test(x[1:nx], y[1:nx], exact = FALSE) :
>>  cannot compute correct p-values with ties
>>
>> Here some questions:
>>
>> a) What does the error message means and what does it imply?
>> b) The data is very noisy and the initial result shows that there is
>> no relation between x and y. Is there a way to calculate and effect
>> size?
>> c) Can the p-value be used, when running tests over a large amount of
>> different data sets, as a metric for ranking similarity between x and
>> y data sets?
>
> There has been quite a bit of discussion on this list over the years about
> why KS test is not good in this situation. If I read the results of a search
> on your name correctly, you are in a department of Information Sciences. I
> would have thought that the first reaction of someone in that field would be
> do do a search on a question. Why are you filling up the archives with
> questions that have been repeatedly asked and  answered?
>
> Do you need help in this area?
>
> rhelpSearch <- function(string,
>                  restrict = c("Rhelp10", "Rhelp08", "Rhelp02", "functions"
> ),
>                  matchesPerPage = 100, ...)
>         RSiteSearch(string=string,  restrict = restrict,  matchesPerPage =
> matchesPerPage, ...)
>
>
> rhelpSearch("KS.test ties p-value")
>
>>
>> Best
>> R.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error: cannot allocate vector of size xxx Mb

2010-08-05 Thread Ralf B
I am dealing with very large data frames, artificially created with
the following code, that are combined using rbind.


a <- rnorm(500)
b <- rnorm(500)
c <- rnorm(500)
d <- rnorm(500)
first <- data.frame(one=a, two=b, three=c, four=d)
second <- data.frame(one=d, two=c, three=b, four=a)
rbind(first, second)

which results in the following error for each of the statements:

> a <- rnorm(500)
Error: cannot allocate vector of size 38.1 Mb
> b <- rnorm(500)
Error: cannot allocate vector of size 38.1 Mb
> c <- rnorm(500)
Error: cannot allocate vector of size 38.1 Mb
> d <- rnorm(500)
Error: cannot allocate vector of size 38.1 Mb
> first <- data.frame(one=a, two=b, three=c, four=d)
Error: cannot allocate vector of size 38.1 Mb
> second <- data.frame(one=d, two=c, three=b, four=a)
Error: cannot allocate vector of size 38.1 Mb
> rbind(first, second)

When running memory.limit() I am getting this:

memory.limit()
[1] 2047

Which shows me that I have 2 GB of memory available. What is wrong?
Shouldn't 38 MB be very feasible?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split / lapply over multiple columns

2010-08-04 Thread Ralf B
Besides beauty, is there an actual advantage in terms of run-time
and/or memory use?

Ralf

On Wed, Aug 4, 2010 at 3:44 PM, Bert Gunter  wrote:
> It's not that it's "bad" -- it's just unnecessarily clumsy. ALmost
> always, tapply/by will do the same thing more simply.
>
> -- Bert
>
> On Wed, Aug 4, 2010 at 10:10 AM, Ralf B  wrote:
>>> In general, the lapply(split(...)) construction should never be used.
>>
>> Why? What makes it so bad to use?
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Output (graphics and table/text)

2010-08-04 Thread Ralf B
Hi R Users,

I need to produce a simple report consisting of some graphs and a
statistic. Here simplification of it:

# graphics output test
a <- c(1,3,2,1,4)
b <- c(2,1,1,1,2)
c <- c(4,7,2,4,5)
d <- rnorm(500)
e <- rnorm(600)
op <- par(mfrow=c(3,2))
pie(a)
pie(b)
pie(c)
text(ks.test(d,e))

obviously, the ks.test does not make it to the output. How can this be
archived by a) simply dumpting the text into the fourth quad so that
coordination is relative to the quarter? b) the output is actually
presented as a little table without the need to use a LaTeX solution?

Thanks a lot,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] KS Test question (2)

2010-08-04 Thread Ralf B
Hi R Users,

I have two vectors, x and y, of equal length representing two types of
data from two studies. I would like to test if they are similar enough
to use them interchangeably. No assumptions about distributions can be
made (initial tests clearly show that they are not normal).
Here some result:

Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 0.1091, p-value < 2.2e-16
alternative hypothesis: two-sided

Warning message:
In ks.test(x[1:nx], y[1:nx], exact = FALSE) :
  cannot compute correct p-values with ties

Here some questions:

a) What does the error message means and what does it imply?
b) The data is very noisy and the initial result shows that there is
no relation between x and y. Is there a way to calculate and effect
size?
c) Can the p-value be used, when running tests over a large amount of
different data sets, as a metric for ranking similarity between x and
y data sets?

Best
R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] KS Test questions

2010-08-04 Thread Ralf B
1) When running ks.test, I am getting the following error after the
test presents its result::

'ks.test(x, y) : cannot compute correct p-values with ties'

I wonder what means and what causes it.

2) Also, how do I calculate an effect size from this statistic?

R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split / lapply over multiple columns

2010-08-04 Thread Ralf B
> In general, the lapply(split(...)) construction should never be used.

Why? What makes it so bad to use?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kullback–Leibler divergence question (fl exmix::KLdiv) Urgent!

2010-08-03 Thread Ralf B
Hi all,

x <- cbind(rnorm(500),rnorm(500))
KLdiv(x, eps=1e-4)
KLdiv(x, eps=1e-5)
KLdiv(x, eps=1e-6)
KLdiv(x, eps=1e-7)
KLdiv(x, eps=1e-8)
KLdiv(x, eps=1e-9)
KLdiv(x, eps=1e-10)
...
KLdiv(x, eps=1e-100)
...
KLdiv(x, eps=1e-1000)

When calling flexmix::KLdiv using the given code I get results with
increasing value the smaller I pick the accuracy parameter 'eps' until
finally reaching infinite. If I pick the number too low, I get NA as a
result.

What is the best value for eps and how should one deal with this?
Should I simple pick a value that returns an result and then keep the
accuracy value constant at this level for all my analysis in order to
get comparable results?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding collumn to existing data frame

2010-08-03 Thread Ralf B
Data frame is given by the rest of the script and not really an
option. Other than that, you are absolutely right.

Ralf

On Tue, Aug 3, 2010 at 11:34 PM, Dennis Murphy  wrote:
> Wouldn't a list be a better object type if the variables you want to add
> have variable lengths? This way you don't have to worry about nuisances such
> as NA padding. Just a thought...
>
> Dennis
>
> On Tue, Aug 3, 2010 at 7:54 PM, Ralf B  wrote:
>>
>> Actually it does -- one has to use feed the result back into the
>> original variable:
>>
>> add.col <- function(df, vec, namevec){
>>        if (nrow(df) < length(vec) ){ df <-  # pads rows if needed
>>        rbind(df, matrix(NA, length(vec)-nrow(df), ncol(df),
>>                dimnames=list( NULL, names(df) ) ) )
>>        }
>>      length(vec) <- nrow(df) # pads with NA's
>>      df[, namevec] <- vec; # names new col properly
>>      return(df)
>> }
>>
>> mydata <- NULL
>> mydata <- data.frame(userid = c(5, 6, 5, 6, 5, 6), taskid = c(1, 1, 2, 2,
>> 3, 3),
>>      stuff = 11:16)
>> mydata  <- add.col(mydata, c(1,2,3,4),"test1")
>> mydata  <- add.col(mydata, c(1,2,3,4,5,6,7,8),"test2")
>> mydata
>>
>>
>> Thanks a lot, David and all others here you made the effort!
>> Ralf
>>
>>
>> On Tue, Aug 3, 2010 at 10:37 PM, David Winsemius 
>> wrote:
>> >
>> > On Aug 3, 2010, at 10:35 PM, David Winsemius wrote:
>> >
>> >>
>> >> On Aug 3, 2010, at 8:32 PM, Ralf B wrote:
>> >>
>> >>> Hi experts,
>> >>>
>> >>> I am trying to write a very flexible method that allows me to add a
>> >>> new column to an existing data frame. This is what I have so far:
>> >>>
>> >>> add.column <- function(df, new.col, name) {
>> >>>        n.row <- dim(df)[1]
>> >>>        length(new.col) <- n.row
>> >>>        names(new.col) <- name
>> >>>        return(cbind(df, new.col))
>> >>> }
>> >>>
>> >>> df <- NULL
>> >>> df <- data.frame(a=c(1,2,3))
>> >>> df
>> >>> # corect: added NA to new collumn
>> >>> df <- add.column(df,c(1,2),'myNewColumn2')
>> >>> df
>> >>> # problem: not added, data frame should be extended with NAs
>> >>> add.column(df,c(1,2,3,4),'myNewColumn3')
>> >>> df
>> >>>
>> >>>
>> >>> However, there are two problems:
>> >>>
>> >>> 1) The column name is not renamed accurately but always set to
>> >>> 'new.col' . Surely this could be done outside the function, but it
>> >>> would be better if its self contained.
>> >>
>> >> Try this:
>> >>
>> >> add.col <- function(df, vec, namevec){
>> >>                         length(vec) <- nrow(df) # pads with NA's
>> >>                         cbind(df, namevec=vec)} # names new col
>> >> properly
>> >>
>> > Actually it doesn't name column correctky...  see below for a method
>> > with "[
>> > <-" .
>> >
>> >>> 2) It does not work for cases where new.col is longer than the length
>> >>> of the data frame. In such cases, I would like to add NA's to the data
>> >>> frame if it has less rows.
>> >>
>> >> Don't have a compact answer to this. (Tried re-dimensioning with "dim()
>> >> <-"  but it was not accepted by the interpreter.  Would need to add a
>> >> test
>> >> at the beginning and then pad with rows of NA's using rbind before
>> >> cbinding
>> >> as above.
>> >>
>> >> add.col <- function(df, vec, namevec){
>> >>              if (nrow(df) < length(vec) ){ df <-  # pads rows if needed
>> >>                    rbind(df, matrix(NA, length(vec)-nrow(df), ncol(df),
>> >>                                     dimnames=list( NULL, names(df) ) )
>> >> ) }
>> >>              length(vec) <- nrow(df) # pads with NA's
>> >>              df[, namevec] <- vec; # names new col properly
>> >>        return(df)}
>> >>
>> >>>
>> >>> Any ideas to to solve this?
>> >>
>> >> Has not been tested with columns of varying types.
>> >>
>> >
>> > David Winsemius, MD
>> > West Hartford, CT
>> >
>> >
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding collumn to existing data frame

2010-08-03 Thread Ralf B
Actually it does -- one has to use feed the result back into the
original variable:

add.col <- function(df, vec, namevec){
if (nrow(df) < length(vec) ){ df <-  # pads rows if needed
rbind(df, matrix(NA, length(vec)-nrow(df), ncol(df),
dimnames=list( NULL, names(df) ) ) )
}
  length(vec) <- nrow(df) # pads with NA's
  df[, namevec] <- vec; # names new col properly
  return(df)
}

mydata <- NULL
mydata <- data.frame(userid = c(5, 6, 5, 6, 5, 6), taskid = c(1, 1, 2, 2, 3, 3),
  stuff = 11:16)
mydata  <- add.col(mydata, c(1,2,3,4),"test1")
mydata  <- add.col(mydata, c(1,2,3,4,5,6,7,8),"test2")
mydata


Thanks a lot, David and all others here you made the effort!
Ralf


On Tue, Aug 3, 2010 at 10:37 PM, David Winsemius  wrote:
>
> On Aug 3, 2010, at 10:35 PM, David Winsemius wrote:
>
>>
>> On Aug 3, 2010, at 8:32 PM, Ralf B wrote:
>>
>>> Hi experts,
>>>
>>> I am trying to write a very flexible method that allows me to add a
>>> new column to an existing data frame. This is what I have so far:
>>>
>>> add.column <- function(df, new.col, name) {
>>>        n.row <- dim(df)[1]
>>>        length(new.col) <- n.row
>>>        names(new.col) <- name
>>>        return(cbind(df, new.col))
>>> }
>>>
>>> df <- NULL
>>> df <- data.frame(a=c(1,2,3))
>>> df
>>> # corect: added NA to new collumn
>>> df <- add.column(df,c(1,2),'myNewColumn2')
>>> df
>>> # problem: not added, data frame should be extended with NAs
>>> add.column(df,c(1,2,3,4),'myNewColumn3')
>>> df
>>>
>>>
>>> However, there are two problems:
>>>
>>> 1) The column name is not renamed accurately but always set to
>>> 'new.col' . Surely this could be done outside the function, but it
>>> would be better if its self contained.
>>
>> Try this:
>>
>> add.col <- function(df, vec, namevec){
>>                         length(vec) <- nrow(df) # pads with NA's
>>                         cbind(df, namevec=vec)} # names new col properly
>>
> Actually it doesn't name column correctky...  see below for a method with "[
> <-" .
>
>>> 2) It does not work for cases where new.col is longer than the length
>>> of the data frame. In such cases, I would like to add NA's to the data
>>> frame if it has less rows.
>>
>> Don't have a compact answer to this. (Tried re-dimensioning with "dim()
>> <-"  but it was not accepted by the interpreter.  Would need to add a test
>> at the beginning and then pad with rows of NA's using rbind before cbinding
>> as above.
>>
>> add.col <- function(df, vec, namevec){
>>              if (nrow(df) < length(vec) ){ df <-  # pads rows if needed
>>                    rbind(df, matrix(NA, length(vec)-nrow(df), ncol(df),
>>                                     dimnames=list( NULL, names(df) ) ) ) }
>>              length(vec) <- nrow(df) # pads with NA's
>>              df[, namevec] <- vec; # names new col properly
>>        return(df)}
>>
>>>
>>> Any ideas to to solve this?
>>
>> Has not been tested with columns of varying types.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] split / lapply over multiple columns

2010-08-03 Thread Ralf B
Hi all,

I have a data frame with column over which I would like to run
repeated functions for data analysis. Currently I am only running
recursively over two columns where I column 1 has two states over
which I split and column two has 3 states. The function therefore runs
2 x 3 = 6 times as shown when running the following code:

mydata <- data.frame(userid = c(5, 6, 5, 6, 5, 6), taskid = c(1, 1, 2, 2, 3, 3),
  stuff = 11:16)
mydata
mydata <- mydata[with(mydata, order(userid, taskid)), ]
mydata

lapply(split(mydata, mydata[,1]), function(x){
lapply(split(x, x[,2]), function(y){
print(paste("result:",y))
})
})

This traverses the tree like this:

5,1
5,2
5,3
6,1
6,2
6,3

Is there an easier way of doing that? I would like to provide the two
columns (index 1 and index 2) directly and have the ?lapply function
perform its lambda function directly on each memebr of the tree
automatically? How can I do that?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding collumn to existing data frame

2010-08-03 Thread Ralf B
Hi experts,

I am trying to write a very flexible method that allows me to add a
new column to an existing data frame. This is what I have so far:

add.column <- function(df, new.col, name) {
n.row <- dim(df)[1]
length(new.col) <- n.row
names(new.col) <- name
return(cbind(df, new.col))
}

df <- NULL
df <- data.frame(a=c(1,2,3))
df
# corect: added NA to new collumn
df <- add.column(df,c(1,2),'myNewColumn2')
df
# problem: not added, data frame should be extended with NAs
add.column(df,c(1,2,3,4),'myNewColumn3')
df


However, there are two problems:

1) The column name is not renamed accurately but always set to
'new.col' . Surely this could be done outside the function, but it
would be better if its self contained.
2) It does not work for cases where new.col is longer than the length
of the data frame. In such cases, I would like to add NA's to the data
frame if it has less rows.

Any ideas to to solve this?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Flip axis on hist2d plot

2010-07-30 Thread Ralf B
I am plotting a heatmap using the hist2d function:

require("gplots")
x <- rnorm(2000)
y <- rnorm(2000)
hist2d(x, y, freq=TRUE, nbins=50, col = c("white",heat.colors(256)))

However, I would like to flip the vertical y axis so that the upper
left corner serves as the y-origin. How can I do that?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unique rows in data frame (with condition)

2010-07-29 Thread Ralf B
I have to deal with data frames that contain multiple entries of the
same (based on an identifying collumn 'id'). The second collumn is
mostly corresponding to the the id collumn which means that double
entries can be eliminated with ?unique.

a <- unique(data.frame(timestamp=c(3,3,3,5,8), mylabel=c("a","a","a","b","c")))

However sometimes I have dataframes like this:

a <- unique(data.frame(timestamp=c(3,3,3,5,8), mylabel=c("a","z","a","b","c")))

which then results in:

   timestamp mylabel
1 3   a
2 3   z
4 5   b
5 8   c

However, I want only the first occurance of timestamp and then
selected over the first label resulting in an output like this:

   timestamp mylabel
1 3   a
4 5   b
5 8   c

Is there something like groupBy (like in SQL) ?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Linear Interpolation question

2010-07-29 Thread Ralf B
Hi R experts,

I have the following timeseries data:

#example data structure
a <- c(NA,1,NA,5,NA,NA,NA,10,NA,NA)
c <- c(1:10)
df <- data.frame(timestamp=a, sequence=c)
print(df)

where i would like to linearly interpolate between the points 1,5, and
10 in 'timestamp'. Original timestamps should not be modified. Here
the code I use to run the interpolation (so far):

# linear interpolation
print(c)
results <- approx(df$sequence, df$timestamp, n=NROW(df))
print(results)
df$timestamp <- results$y

# plotting
plot(c, a, main = "Linear Interpolation with approx")
points(results, col = 2, pch = "*")

# new dataframe
print(df)

when looking at the result dataframe however, I can see that the
original timestamps have been shifted as well. however would i would
like to have is a result where the timestamps at position 2,4 and 8
remain unchanged at the values 1,5, and 10. I also would like values
before the first item to be constant. So the dataframe should look
like this:


 timestamp sequence
1   1.001
2  1.002
3   3.003
4   5.004
5   6.255
6   7.506
7   8.757
8   10.08
9   10.00  9
10 10.00 10

How do I have the change the syntax of my script to make that work?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] KLdiv question

2010-07-29 Thread Ralf B
I am having a data set that causes flexmix::KLdiv to produce NA as a
result and I was told that increasing the sensitivity of the 'esp'
value can be used to avoid a lot of values being set to a default
(which presumably causes the problem).

Now here my question.

When running KLdiv on a normal distribution:

a <- rnorm(5)
b <- rnorm(5)
mydata <- cbind(a,b)
KLdiv(mydata, esp=1e-4)
KLdiv(mydata, esp=1e-5)
KLdiv(mydata, esp=1e-6)
KLdiv(mydata, esp=1e-7)
KLdiv(mydata, esp=1e-8)
KLdiv(mydata, esp=1e-9)
KLdiv(mydata, esp=1e-10)
KLdiv(mydata, esp=1e-100)

the result is stable independent from the chosen esp accuracy.
However, when I run the data on a distribution such as values in a
given range, I get NA and the method seems not to work independently
of how high I choose the accuracy.


y1 <- sample(1:1280, 20, replace=T)
y2 <- sample(1:1280, 20, replace=T)
mydata2 <- cbind(y1,y2)
KLdiv(mydata2, esp=1e-4)
KLdiv(mydata2, esp=1e-5)
KLdiv(mydata2, esp=1e-6)
KLdiv(mydata2, esp=1e-7)
KLdiv(mydata2, esp=1e-8)
KLdiv(mydata2, esp=1e-9)
KLdiv(mydata2, esp=1e-10)
KLdiv(mydata2, esp=1e-100)

Am I doing something wrong here? Does KL have any distributional
assumptions that I violate?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reset R environment through R command

2010-07-29 Thread Ralf B
With environment I actually meant workspace.

On Thu, Jul 29, 2010 at 1:22 PM, Ralf B  wrote:
> Is it possible to remove all variables in the current environment
> through a R command.
>
> Here is what I want:
>
> x <- 5
> y < 10:20
> reset()
> print(x)
> print(y)
>
> Output should be NULL for x and y, and not 5 and 10:20.
>
> Can one do that in R?
>
> Best,
> Ralf
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reset R environment through R command

2010-07-29 Thread Ralf B
Is it possible to remove all variables in the current environment
through a R command.

Here is what I want:

x <- 5
y < 10:20
reset()
print(x)
print(y)

Output should be NULL for x and y, and not 5 and 10:20.

Can one do that in R?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Spearman's Correlation Coefficient to compare distributions?

2010-07-29 Thread Ralf B
Hi,

I have distributions from two different data sets and I would like to
measure how similar their distributions (in terms of their bin
frequencies) are. In other words, I am not interested in the exact
sequence of data points but rather in the their distributional
properties and in their similarities.
Spearman's Correlation Coefficient is used to compare data without the
assumption of normality. I wonder if this measure can also be used to
compare distributional data rather than the data poitns that are
summarized in a distribution. Here the example code that exemplifies
what I would like to check:

aNorm <- rnorm(100)
bNorm <- rnorm(100)
cUni <- runif(100)
ha <- hist(aNorm)
hb <- hist(bNorm)
hc <- hist(cUni)
print(ha$counts)
print(hb$counts)
print(hc$counts)
# relatively similar
n <- min(c(NROW(ha$counts),NROW(hb$counts)))
cor.test(ha$counts[1:n], hb$counts[1:n], method="spearman")
# quite different
n <- min(c(NROW(ha$counts),NROW(hc$counts)))
cor.test(ha$counts[1:n], hc$counts[1:n], method="spearman")

Does this make sense or am I violating some assumptions of the coefficient?

Thanks,
R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Statistical mailing list

2010-07-28 Thread Ralf B
I am looking for a mailing list for general statistical questions that
are not R related. Do you have any suggestions for lists that are busy
and helpful and/or lists that you use and recommend?

Thanks in advance,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about KLdiv and large datasets

2010-07-18 Thread Ralf B
Is the 'eps' argument part of KLdiv (was not able to find that in the
help pages) or part of a general environment (such as the graphics
parameters 'par' ) ? I am asking so that I can read about it what it
actually does to resolve the question you already raised about its
reliability...

Ralf

On Fri, Jul 16, 2010 at 10:41 AM, Peter Ehlers  wrote:
> On 2010-07-16 7:56, Ralf B wrote:
>>
>> Hi all,
>>
>> when running KL on a small data set, everything is fine:
>>
>> require("flexmix")
>> n<- 20
>> a<- rnorm(n)
>> b<- rnorm(n)
>> mydata<- cbind(a,b)
>> KLdiv(mydata)
>>
>> however, when this dataset increases
>>
>> require("flexmix")
>> n<- 1000
>> a<- rnorm(n)
>> b<- rnorm(n)
>> mydata<- cbind(a,b)
>> KLdiv(mydata)
>>
>>
>> KL seems to be not defined. Can somebody explain what is going on?
>>
>> Thanks,
>> Ralf
>
> Ralf,
>
> You can adjust the 'eps=' argument. But I don't know
> what this will do to the reliability of the results.
>
> KLdiv(mydata, eps = 1e-7)
>
>  -Peter Ehlers
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about KLdiv and large datasets

2010-07-16 Thread Ralf B
Hi all,

when running KL on a small data set, everything is fine:

require("flexmix")
n <- 20
a <- rnorm(n)
b <- rnorm(n)
mydata <- cbind(a,b)
KLdiv(mydata)

however, when this dataset increases

require("flexmix")
n <- 1000
a <- rnorm(n)
b <- rnorm(n)
mydata <- cbind(a,b)
KLdiv(mydata)


KL seems to be not defined. Can somebody explain what is going on?

Thanks,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to transform: 4 columns into two columns stacked

2010-07-16 Thread Ralf B
I have the following data structure:

n=5
mydata <- data.frame(id=1:n, x=rnorm(n), y=rnorm(n), id=1:n,
x=rnorm(n), y=rnorm(n))
print(mydata)

producing the following represention

id  x   y id.1   x.1y.1
1  1  0.5326855 -2.076337031 0.7930274 -1.0530558
2  2  0.7888909  0.633546932 0.5908323 -1.3543282
3  3  0.5350803 -0.201089313 2.5079242 -0.4657274
4  4 -1.3041960 -0.251951294 1.6294046 -1.4094830
5  5  0.3109767 -0.023059815 0.5183756  1.3084776


however I need to transform this data into this form:

id  x   y
1  1  0.5326855 -2.07633703
2  2  0.7888909  0.63354693
3  3  0.5350803 -0.20108931
4  4 -1.3041960 -0.25195129
5  5  0.3109767 -0.02305981
6  1 0.7930274 -1.0530558
7  2 0.5908323 -1.3543282
8  3 2.5079242 -0.4657274
9  4 1.6294046 -1.4094830
10 5 0.5183756  1.3084776

what is the simplest way to do that?

Thanks a lot in advance!
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] KLdiv question (data.frame)

2010-07-15 Thread Ralf B
Hi all,

I wonder why KLdiv does not work with data.frames:

n <- 50
mydata <- data.frame(
sequence=c(1:n),
data1=c(rnorm(n)),
data2=c(rnorm(n))
)
# does NOT work
KLdiv(mydata)
# works fine
dataOnly <- cbind(mydata$data1, mydata$data2, mydata$group)
KLdiv(dataOnly)


Any ideas? Is there a better implementation that can deal with
data.frame or is there a simpler way of converting?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Repeated analysis over groups / Splitting by group variable

2010-07-15 Thread Ralf B
I am performing some analysis over a large data frame and would like
to conduct repeated analysis over grouped-up subsets. How can I do
that?

Here some example code for clarification:

require("flexmix")  # for Kullback-Leibler divergence
n <- 23
groups <- c(1,2,3)
mydata <- data.frame(
sequence=c(1:n),
data1=c(rnorm(n)),
data2=c(rnorm(n)),
group=rep(sample(groups, n, replace=TRUE))
)
# Part 1: full stats (works fine)
dataOnly <- cbind(mydata$data1, mydata$data2, mydata$group)
KLdiv(dataOnly)

#
# Part 2: again - but once for each group (error)
#
by(dataOnly, groups, KLdiv(dataOnly))

The error I am getting is: Error in tapply(1L:23L, list(INDICES = c(1,
2, 3)), function (x)  :
  arguments must have same length

Are there better ways than 'by' ? I would like to use different stats
and functions and therefore I am looking for a splitter whose output I
can hand to any statical function I want.

Any ideas?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merging columns along time line

2010-07-14 Thread Ralf B
I am resending this, as I believe it has not arrived on the mailing
list when I first emailed.

I have a set of labels arranged along a timeframe in a. Each label has
a timestamp and marks a state until the next label. The dataframe a
contains 5 such timestamps and 5 associated labels. This means, on a
continious scale between 1-100, there are 5 markers. E.g. 'abc' marks
the timestampls between 10 and 19, 'def' marks the timestamps between
20 and 32, and so on.

a <- data.frame(timestamp=c(3,5,8), mylabel=c("abc","def","ghi"))
b <- data.frame(timestamp=c(1:10))

I would like to assign these labels as an extra collumn 'label' to the
data.frame b which currently only consists of a the timestamp. The
output would then look like this:

  timestamp  label
1 1NA
2 2NA
3 3"abc"
4 4"abc"
5 5"def"
6 6"def"
7 7"def"
8 8"ghi"
9 9"ghi"
10  10"ghi"

What is the simplest way to assign these labels based on timestamps to
get this output. The real dataset is several millions of rows...

Thanks,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Arrange values on a timeline

2010-07-13 Thread Ralf B
I have a set of labels arranged along a timeframe in a. Each label has
a timestamp and marks a state until the next label. The dataframe a
contains 5 such timestamps and 5 associated labels. This means, on a
continious scale between 1-100, there are 5 markers. E.g. 'abc' marks
the timestampls between 10 and 19, 'def' marks the timestamps between
20 and 32, and so on.

a <- data.frame(timestamp=c(3,5,8), mylabel=c("abc","def","ghi"))
b <- data.frame(timestamp=c(1:10))

I would like to assign these labels as an extra collumn 'label' to the
data.frame b which currently only consists of a the timestamp. The
output would then look like this:

   timestamp  label
1 1NA
2 2NA
3 3"abc"
4 4"abc"
5 5"def"
6 6"def"
7 7"def"
8 8"ghi"
9 9"ghi"
10  10"ghi"

What is the simplest way to assign these labels based on timestamps to
get this output. The real dataset is several millions of rows...

Thanks,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] StartsWith over vector of Strings?

2010-07-13 Thread Ralf B
When running the combined code with your suggested line:

content <- data.frame(urls=c(

"http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3VU8TJqcMJHuzASm9qyBBgAAAKoEBU_QsmVh";,

"http://search.yahoo.com/search;_ylt=Atvki9MVpnxuEcPmXLEWgMqbvZx4?p=stuff&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t-701";)
)
searchset <- data.frame(signatures=c("http://www.google.com/search";))
content[na.omit(pmatch(searchset, content$urls))]
print(content)

I am getting both URLs as results, but in fact, would expect only the
first URL. Am I overlooking something?


Ralf

On Tue, Jul 13, 2010 at 12:03 PM, Greg Snow  wrote:
> content[na.omit(pmatch(searchset, content,,TRUE))]
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111
>
>
>> -----Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
>> project.org] On Behalf Of Ralf B
>> Sent: Tuesday, July 13, 2010 5:47 AM
>> To: r-help@r-project.org
>> Subject: [R] StartsWith over vector of Strings?
>>
>> Given vectors of strings of arbitrary length
>>
>> content <- c("abc", "def")
>> searchset <- c("a", "abc", "abcdef", "d", "def", "defghi")
>>
>> Is it possible to determine the content String set that matches the
>> searchset in the sense of 'startswith' ? This would be a vector of all
>> strings in content that start with the string of any of the strings in
>> the searchset. In the little example here, this would be:
>>
>> result <- c("abc", "abc", "def", "def")
>>
>> Best,
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Substring function?

2010-07-13 Thread Ralf B
Hi all,

I would like to detect all strings in the vector 'content' that
contain the strings from the vector 'search'. Here a code example:

content <- data.frame(urls=c(

"http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3";,

"http://search.yahoo.com/search;_ylt=Atvki9MVpnxuEcPmXLEWgMqbvZx4?p=stuff&toggle=1";)
)
search <- data.frame(signatures=c("http://www.google.com/search";))
subset(content, search$signatures %in% content$urls)

I am getting an error:

[1] urls
<0 rows> (or 0-length row.names)


What I would like to achieve is the return of
"http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3";.
Is that possible? In practice I would like to run this over 1000s of
strings in 'content' and 100s of strings in 'search'. Could I run into
performance issues with this approach and, if so, are there better
ways?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] StartsWith over vector of Strings?

2010-07-13 Thread Ralf B
Given vectors of strings of arbitrary length

content <- c("abc", "def")
searchset <- c("a", "abc", "abcdef", "d", "def", "defghi")

Is it possible to determine the content String set that matches the
searchset in the sense of 'startswith' ? This would be a vector of all
strings in content that start with the string of any of the strings in
the searchset. In the little example here, this would be:

result <- c("abc", "abc", "def", "def")

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast string comparison

2010-07-13 Thread Ralf B
I see. I did not get these performances since did not directly compare
arrays but run seemingly expensive for-loops to do it iteratively...
:(

R.









On Tue, Jul 13, 2010 at 1:42 AM, Hadley Wickham  wrote:
> strings <- replicate(1e5, paste(sample(letters, 100, rep = T), collapse =  
> ""))
> system.time(strings[-1] == strings[-1e5])
> #   user  system elapsed
> #  0.016   0.000   0.017
>
> So it takes ~1/100 of a second to do ~100,000 string comparisons. You
> need to provide a reproducible example that illustrates why you think
> string comparisons are slow.
>
> Hadley
>
>
> On Tue, Jul 13, 2010 at 6:52 AM, Ralf B  wrote:
>> I am asking this question because String comparison in R seems to be
>> awfully slow (based on profiling results) and I wonder if perhaps '=='
>> alone is not the best one can do. I did not ask for anything
>> particular and I don't think I need to provide a self-contained source
>> example for the question. So, to re-phrase my question, are there more
>> (runtime) effective ways to find out if two strings (about 100-150
>> characters long) are equal?
>>
>> Ralf
>>
>>
>>
>>
>>
>>
>> On Sun, Jul 11, 2010 at 2:37 PM, Sharpie  wrote:
>>>
>>>
>>> Ralf B wrote:
>>>>
>>>> What is the fastest way to compare two strings in R?
>>>>
>>>> Ralf
>>>>
>>>
>>> Which way is not fast enough?
>>>
>>> In other words, are you asking this question because profiling showed one of
>>> R's string comparison operations is causing a massive bottleneck in your
>>> code? If so, which one and how are you using it?
>>>
>>> -Charlie
>>>
>>> -
>>> Charlie Sharpsteen
>>> Undergraduate-- Environmental Resources Engineering
>>> Humboldt State University
>>> --
>>> View this message in context: 
>>> http://r.789695.n4.nabble.com/Fast-string-comparison-tp2285156p2285409.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast string comparison

2010-07-12 Thread Ralf B
I am asking this question because String comparison in R seems to be
awfully slow (based on profiling results) and I wonder if perhaps '=='
alone is not the best one can do. I did not ask for anything
particular and I don't think I need to provide a self-contained source
example for the question. So, to re-phrase my question, are there more
(runtime) effective ways to find out if two strings (about 100-150
characters long) are equal?

Ralf






On Sun, Jul 11, 2010 at 2:37 PM, Sharpie  wrote:
>
>
> Ralf B wrote:
>>
>> What is the fastest way to compare two strings in R?
>>
>> Ralf
>>
>
> Which way is not fast enough?
>
> In other words, are you asking this question because profiling showed one of
> R's string comparison operations is causing a massive bottleneck in your
> code? If so, which one and how are you using it?
>
> -Charlie
>
> -
> Charlie Sharpsteen
> Undergraduate-- Environmental Resources Engineering
> Humboldt State University
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Fast-string-comparison-tp2285156p2285409.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fast string comparison

2010-07-11 Thread Ralf B
What is the fastest way to compare two strings in R?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mirror axis on hist2d plot - how?

2010-07-09 Thread Ralf B
The following code produces a heatmap based on normalized data. I
would like to mirror x and y axis for this plot. Any idea how to do
that?


require("gplots")
x <- rnorm(500)
y <- rnorm(500)
hist2d(x, y, freq=TRUE, nbins=50, col = c("white",heat.colors(256)))

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Current script name from R

2010-07-09 Thread Ralf B
I am using RGUI, the  command line or the StatET Eclipse environment.
Should this not all be the same?

Ralf

On Fri, Jul 9, 2010 at 7:11 AM, Allan Engelhardt  wrote:
> I'm assuming you are using Rscript (please provide self-contained examples
> when posting) in which case you could look for the element in
> (base|R.utils)::commandArgs() that begin with the string "--file=" - the
> rest is the file name.  See the asValues= parameter in help("commandArgs",
> package="R.utils") for a nice way to get the parameter.
>
> For an invocation of the form R < foo.R you'd need to inspect your system's
> process table (so don't do that).
>
> Hope this helps.
>
> Allan
>
> On 09/07/10 10:48, Ralf B wrote:
>>
>> Is there a way for a script to find out about its own name ?
>>
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] KLdiv produces NA. Why?

2010-07-09 Thread Ralf B
I am trying to calculate a Kullback-Leibler divergence from two
vectors with integers but get NA as a result when trying to calulate
the measure. Why?

x <- cbind(stuff$X, morestuff$X)

x[1:5,]

 [,1] [,2]
[1,]  293  938
[2,]  293  942
[3,]  297  949
[4,]  290  956
[5,]  294  959

KLdiv(x)


 [,1] [,2]
[1,]0   NA
[2,]   NA0


Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting text in existing plot?

2010-07-09 Thread Ralf B
I would like to plot some text in a existing plot graph. Is there a
very simple way to do that. It does not need to be pretty at all (just
maybe a way to center it or define a position within the plot). ( ? )

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Current script name from R

2010-07-09 Thread Ralf B
Is there a way for a script to find out about its own name ?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R^2 in loess and predict?

2010-07-09 Thread Ralf B
Parametric regression produces R^2 as a measure of how well the model
predicts the sample and adjusted R^2 as a measure of how well it
models the population. What is the equalvalent for non-parametric
regression (e.g. loess function) ?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Non-parametric regression

2010-07-09 Thread Ralf B
I have two data sets, each a vector of 1000 numbers, each vector
representing a distribution (i.e. 1000 numbers each of which
representing a frequency at one point on a scale between 1 and 1000).
For similfication, here an short version with only 5 points.


a <- c(8,10,8,12,4)
b <- c(7,11,8,10,5)

Leaving the obvious discussion about causality aside fro a moment, I
would like to see how well i can predict b from a using a regression.
Since I do not know anything about the distribution type and already
discovered non-normality I cannot use parametric regression or
anything GLM for that matter.

How should I proceed in using non-parametric regression to model
vector a and see how well it predicts b? Perhaps you could extend the
given lines into a short example script to give me an idea? Are there
any other options?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fast String operations in R ? Cost of String operations

2010-07-04 Thread Ralf B
Hi experts,

currently developing some code that checks a large amount of Strings
for the existence of sub-strings and pattern (detecting sub-strings
within URLs). I wonder if there is information about how well
particular String operations work in R together with comparisons. Are
there  recommendations (based on such information) regarding what
operations should be used and what should be avoided? Are there
libraries and functions that provide optimized String operations for
such needs or is R simply not the right choice for that?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Profiler for R ?

2010-07-04 Thread Ralf B
Hi,

is there such a thing as a profiler for R that informs about a) how
much processing time is used by particular functions and commands and
b) how much memory is used for creating how many objects (or types of
data structures)? In a way I am looking for something similar to the
java profiler (which is started by command line and provides profiling
information collected from the run of a particular program). Is there
such a tool through the R command line or RGUI ? Are there profilers
available for the Eclipse StatET or though another package or
extension?

Thanks,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Good Package(s) for String and URL processing?

2010-07-01 Thread Ralf B
Are there packages that allow improved String and URL processing?
E.g. extract parts of a URLs such as sub-domains, top-level domain,
protocols (e.g. https, http, ftp), file type based on endings, check
if a URL is valid or not, etc...

I am currently only using split and paste. Are there better and more
efficient ways to handle strings e.g. finding sub-strings or to do
pattern matching?
What packages do you use if you have to do a lot of String processing
and you don't have the option to go to another language such as Perl
or Python?

Thanks,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Assigning variable value as name to cbind column

2010-06-24 Thread Ralf B
Hi all,

I have this (non-working) script:

dataTest <- data.frame(col1=c(1,2,3))
new.data <- c(1,2)
name <- "test"
n.row <- dim(dataTest)[1]
length(new.data) <- n.row
names(new.data) <- name
cbind(dataTest, name=new.data)
print(dataTest)

and would like to bind the new column 'new.data' to 'dataTest' by
using the value of the variable 'name' as the column name.

The end result should look like this:

  col1 test
1  1  1
2  2  2
3  3  NA


The best I got was that 'name' became the column name but never the
actual value of 'name'. How can i do that?

(This is actually a function that runs many time -- this means a
manual workaround is not feasible).

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple qqplot question

2010-06-24 Thread Ralf B
Short rep: I have two distributions, data and data2; each build from
about 3 million data points; they appear similar when looking at
densities and histograms. I plotted qqplots for further eye-balling:

qqplot(data, data2, xlab = "1", ylab = "2")

and get an almost perfect diagonal line which means they are in fact
very alike. Now I tried to check normality using qqnorm -- and I think
I am doing something wrong here:

qqnorm(data, main = "Q-Q normality plot for 1")
qqnorm(data2, main = "Q-Q normality plot for 2")

I am getting perfect S-shaped curves (??) for both distributions. Am I
something missing here?

|
|   *  *   *  *
|   *
|*
|*
|   *
|*
| *
| * * *
|-

Thanks, Ralf

On Thu, Jun 24, 2010 at 8:23 PM, Ralf B  wrote:
> Unfortunately not. I want a qqplot from two variables.
>
> Ralf
>
>
> On Thu, Jun 24, 2010 at 7:23 PM, Joris Meys  wrote:
>> Also take a look at qq.plot in the package "car". Gives you exactly
>> what you want.
>> Cheers
>> Joris
>>
>> On Fri, Jun 25, 2010 at 12:55 AM, Ralf B  wrote:
>>> More details...
>>>
>>> I have two distributions which are very similar. I have plotted
>>> density plots already from the two distributions. In addition,
>>> I created a qqplot that show an almost straight line. What I want is a
>>> line that represents the ideal case in which the two
>>> distributions match perfectly. I would use this line to see how much
>>> the errors divert at different stages of the plot.
>>>
>>> Ralf
>>>
>>>
>>>
>>> On Thu, Jun 24, 2010 at 5:56 PM, stephen sefick  wrote:
>>>> You are going to have to define the question a little better.  Also,
>>>> please provide a reproducible example.
>>>>
>>>> On Thu, Jun 24, 2010 at 4:44 PM, Ralf B  wrote:
>>>>> I am a beginner in R, so please don't step on me if this is too
>>>>> simple. I have two data sets datax and datay for which I created a
>>>>> qqplot
>>>>>
>>>>> qqplot(datax,datay)
>>>>>
>>>>> but now I want a line that indicates the perfect match so that I can
>>>>> see how much the plot diverts from the ideal. This ideal however is
>>>>> not normal, so I think qqnorm and qqline cannot be applied.
>>>>>
>>>>> Perhaps you can help?
>>>>>
>>>>> Ralf
>>>>>
>>>>> __
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Stephen Sefick
>>>> 
>>>> | Auburn University                                   |
>>>> | Department of Biological Sciences           |
>>>> | 331 Funchess Hall                                  |
>>>> | Auburn, Alabama                                   |
>>>> | 36849                                                    |
>>>> |___|
>>>> | sas0...@auburn.edu                             |
>>>> | http://www.auburn.edu/~sas0025             |
>>>> |___|
>>>>
>>>> Let's not spend our time and resources thinking about things that are
>>>> so little or so large that all they really do for us is puff us up and
>>>> make us feel like gods.  We are mammals, and have not exhausted the
>>>> annoying little problems of being mammals.
>>>>
>>>>                                                                -K. Mullis
>>>>
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what "density" is plotting ?

2010-06-24 Thread Ralf B
The density function works empirically based on your data. It makes no
assumption about an underlying distribution.

Ralf

On Thu, Jun 24, 2010 at 10:48 PM, Carrie Li  wrote:
> Hello, Ralf,
>
> Sorry I was being clear.
> I mean probability density function
> like normal f(x)=(1/2*pi*sd )*exp()  something like that .
> Sorry about the confusion
>
> Carrie
>
> On Thu, Jun 24, 2010 at 10:43 PM, Ralf B  wrote:
>>
>> Hi Carrie,
>>
>> the output is defined by you; density() only creates the function
>> which you need to plot using the plot() function. When you call
>> plot(density(x)) you get the output on the screen. You need to use
>> pdf() if you want to create a pdf file, png() for creating a png file
>> or postscript if you like ps; there are many others.
>>
>> Ralf
>>
>> On Thu, Jun 24, 2010 at 10:35 PM, Carrie Li 
>> wrote:
>> > Hi everyone,
>> >
>> > I am confused regarding the function "density".
>> > suppose that there is a sample x of 100 data points, and
>> > plot(density(x))
>> > gives it's pdf ?
>> > or it's more like histogram only ?
>> >
>> > thanks for any answering
>> >
>> > Carrie
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what "density" is plotting ?

2010-06-24 Thread Ralf B
Hi Carrie,

the output is defined by you; density() only creates the function
which you need to plot using the plot() function. When you call
plot(density(x)) you get the output on the screen. You need to use
pdf() if you want to create a pdf file, png() for creating a png file
or postscript if you like ps; there are many others.

Ralf

On Thu, Jun 24, 2010 at 10:35 PM, Carrie Li  wrote:
> Hi everyone,
>
> I am confused regarding the function "density".
> suppose that there is a sample x of 100 data points, and plot(density(x))
> gives it's pdf ?
> or it's more like histogram only ?
>
> thanks for any answering
>
> Carrie
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Handouts / Reports or just simply printing text to PDF?

2010-06-24 Thread Ralf B
I assume R won't easily generate nice reports (unless one starts using
Sweave and LaTeX) but perhaps somebody here knows a package that can
create report like output for special cases? How can I simply plot
output into PDF? Perhaps you know a package I should check out? What
do you guys do to create handouts (before actually publishing)?

Thanks in advance,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple qqplot question

2010-06-24 Thread Ralf B
Unfortunately not. I want a qqplot from two variables.

Ralf


On Thu, Jun 24, 2010 at 7:23 PM, Joris Meys  wrote:
> Also take a look at qq.plot in the package "car". Gives you exactly
> what you want.
> Cheers
> Joris
>
> On Fri, Jun 25, 2010 at 12:55 AM, Ralf B  wrote:
>> More details...
>>
>> I have two distributions which are very similar. I have plotted
>> density plots already from the two distributions. In addition,
>> I created a qqplot that show an almost straight line. What I want is a
>> line that represents the ideal case in which the two
>> distributions match perfectly. I would use this line to see how much
>> the errors divert at different stages of the plot.
>>
>> Ralf
>>
>>
>>
>> On Thu, Jun 24, 2010 at 5:56 PM, stephen sefick  wrote:
>>> You are going to have to define the question a little better.  Also,
>>> please provide a reproducible example.
>>>
>>> On Thu, Jun 24, 2010 at 4:44 PM, Ralf B  wrote:
>>>> I am a beginner in R, so please don't step on me if this is too
>>>> simple. I have two data sets datax and datay for which I created a
>>>> qqplot
>>>>
>>>> qqplot(datax,datay)
>>>>
>>>> but now I want a line that indicates the perfect match so that I can
>>>> see how much the plot diverts from the ideal. This ideal however is
>>>> not normal, so I think qqnorm and qqline cannot be applied.
>>>>
>>>> Perhaps you can help?
>>>>
>>>> Ralf
>>>>
>>>> __
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Stephen Sefick
>>> 
>>> | Auburn University                                   |
>>> | Department of Biological Sciences           |
>>> | 331 Funchess Hall                                  |
>>> | Auburn, Alabama                                   |
>>> | 36849                                                    |
>>> |___|
>>> | sas0...@auburn.edu                             |
>>> | http://www.auburn.edu/~sas0025             |
>>> |___|
>>>
>>> Let's not spend our time and resources thinking about things that are
>>> so little or so large that all they really do for us is puff us up and
>>> make us feel like gods.  We are mammals, and have not exhausted the
>>> annoying little problems of being mammals.
>>>
>>>                                                                -K. Mullis
>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple qqplot question

2010-06-24 Thread Ralf B
More details...

I have two distributions which are very similar. I have plotted
density plots already from the two distributions. In addition,
I created a qqplot that show an almost straight line. What I want is a
line that represents the ideal case in which the two
distributions match perfectly. I would use this line to see how much
the errors divert at different stages of the plot.

Ralf



On Thu, Jun 24, 2010 at 5:56 PM, stephen sefick  wrote:
> You are going to have to define the question a little better.  Also,
> please provide a reproducible example.
>
> On Thu, Jun 24, 2010 at 4:44 PM, Ralf B  wrote:
>> I am a beginner in R, so please don't step on me if this is too
>> simple. I have two data sets datax and datay for which I created a
>> qqplot
>>
>> qqplot(datax,datay)
>>
>> but now I want a line that indicates the perfect match so that I can
>> see how much the plot diverts from the ideal. This ideal however is
>> not normal, so I think qqnorm and qqline cannot be applied.
>>
>> Perhaps you can help?
>>
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Stephen Sefick
> 
> | Auburn University                                   |
> | Department of Biological Sciences           |
> | 331 Funchess Hall                                  |
> | Auburn, Alabama                                   |
> | 36849                                                    |
> |___|
> | sas0...@auburn.edu                             |
> | http://www.auburn.edu/~sas0025             |
> |___|
>
> Let's not spend our time and resources thinking about things that are
> so little or so large that all they really do for us is puff us up and
> make us feel like gods.  We are mammals, and have not exhausted the
> annoying little problems of being mammals.
>
>                                                                -K. Mullis
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simple qqplot question

2010-06-24 Thread Ralf B
I am a beginner in R, so please don't step on me if this is too
simple. I have two data sets datax and datay for which I created a
qqplot

qqplot(datax,datay)

but now I want a line that indicates the perfect match so that I can
see how much the plot diverts from the ideal. This ideal however is
not normal, so I think qqnorm and qqline cannot be applied.

Perhaps you can help?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Install package automatically if not there?

2010-06-24 Thread Ralf B
Hi fans,

is it possible for a script to check if a library has been installed?
I want to automatically install it if it is missing to avoid scripts
to crash when running on a new machine...

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv does not find my file (windows xp)

2010-06-24 Thread Ralf B
jep! I forgot to use sep="" for paste and introducted a space in front
of the filename... damn, 1 hour of my life!

Ralf

2010/6/24 Uwe Ligges :
>
>
> On 24.06.2010 19:02, Ralf B wrote:
>>
>> I try to load a file
>>
>> myData<- read.csv(file="C:\\myfolder\\mysubfolder\\mydata.csv",
>> head=TRUE, sep=";")
>>
>> and get this error:
>>
>> Error in file(file, "rt") : cannot open the connection
>> In addition: Warning message:
>> In file(file, "rt") :
>>   cannot open file 'C:\myfolder\mysubfolder\mydata.csv: No such file
>> or directory
>>
>> am I overlooking something?
>>
>> I am getting the same error when I write the path in '/' notation...
>> Does R not tolorate drive letters?
>
>
> It does, and if you can open the file with opther software, then you
> probably misspelled folder or filename.
>
> Uwe Ligges
>
>
>
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.csv does not find my file (windows xp)

2010-06-24 Thread Ralf B
I try to load a file

myData <- read.csv(file="C:\\myfolder\\mysubfolder\\mydata.csv",
head=TRUE, sep=";")

and get this error:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'C:\myfolder\mysubfolder\mydata.csv: No such file
or directory

am I overlooking something?

I am getting the same error when I write the path in '/' notation...
Does R not tolorate drive letters?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RJDBC vs RMySQL vs ???

2010-06-24 Thread Ralf B
Unfortunately, I have a lot of errors with RMySQL -- but that is
another thread...

Ralf

On Thu, Jun 24, 2010 at 10:31 AM, James W. MacDonald
 wrote:
> Hi Ralf,
>
> Ralf B wrote:
>>
>> Sorry for the lack of details. Since I run the same SQL first directly
>> on MySQL (using the MySQL Query Browser) and then again using R
>> through the RJDBC interface, I assume that I won't simply have a badly
>> constructed SQL query. However, just to clear possible objection, here
>> the SQL:
>>
>>
>> # Extracts vector of data points
>> getData <- function(connection) {
>>        queryStart <- "SELECT id1, id2, x, y FROM `mytable` "
>>        queryEnd <- ";"
>>        query <- paste(queryStart, " WHERE id1 IN(", id1s, ") AND id2 IN(",
>> id2s, ") AND subtype='TYPE1'", queryEnd)
>>        # execute query
>>        data =  dbGetQuery(connection, query)
>>        return(data)
>> }
>>
>> When running this method using either RGUI or the command line, I have
>> a runtime that reaches an incredible 10 minutes (!) for selecting
>> about 50k - 80k data points (which I consider not much) based on the
>> range of IDs I choose. The table size is about 5-8 million data points
>> total. The same SQL query directly executed in MySQL Query Browser
>> takes about 20 seconds which I would consider fine. There are no
>> indices created for any of the fields but since the query runs a lot
>> faster in the query browser I don't suspect this to be the main
>> reason.
>>
>> Any ideas?
>
> Well, the RJDBC rforge page has this note:
>
> Note: The current implementation of RJDBC is done entirely in R, no Java
> code is used. This means that it may not be extremely efficient and could be
> potentially sped up by using Java native code. However, it was sufficient
> for most tasks we tested. If you have performance issues with RJDBC, please
> let us know and tell us more details about your test case.
>
> And from my quick peek at the page, it appears RJDBC is designed to allow
> one to query any DBMS. Since RMySQL is MySQL-specific, it may be more
> efficient. Anyway, why don't you just try it and see?
>
> Best,
>
> Jim
>
>
>>
>> Best,
>> Ralf
>>
>>
>>
>>
>> On Wed, Jun 23, 2010 at 4:36 PM, James W. MacDonald
>>  wrote:
>>>
>>> Hi Ralf,
>>>
>>> Ralf B wrote:
>>>>
>>>> I am running a simple SQL SELECT statement that involvs 50k + data
>>>> points using R and the RJDBC interface. I am facing very slow response
>>>> times in both the RGUI and the R console. When running this SQL
>>>> statement directly in a SQL client I have processing times that are a
>>>> lot lot faster (which means that the SQL statement itself is not the
>>>> problem).
>>>>
>>>> Did any of you compare RJDBC vs RMySQL or is there a better, more
>>>> efficient way to extract large data from databases using R? Would you
>>>> recommend dumping data out completely into flat files and working with
>>>> flat files instead? I expected that this would not be such a problem
>>>> given that businesses maintain their data in DBs and R is supposed to
>>>> be good in shifting around data. Am I doing something wrong?
>>>
>>> Well, if you don't show people what you have done, how can anybody tell
>>> if
>>> you are doing something wrong or not?
>>>
>>> I have no experience with RJDBC, so cannot say anything about that.
>>> However,
>>> I have always found RMySQL to be speedy enough. As an example:
>>>
>>>> library(RMySQL)
>>>
>>> Loading required package: DBI
>>>>
>>>> con <- dbConnect("MySQL", host="genome-mysql.cse.ucsc.edu", user =
>>>> "genome", dbname = "hg18")
>>>> system.time(a <- dbGetQuery(con, "select name, chromEnd from snp129
>>>> where
>>>> chrom='chr1' and chromStart between 1 and 1e8;")
>>>
>>> + )
>>>  user  system elapsed
>>>  7.95    0.06   38.59
>>>>
>>>> dim(a)
>>>
>>> [1] 508676      2
>>>
>>> So 40 seconds to get half a million records. Since this is via the
>>> internet,
>>> I have to imagine things would be much faster querying a local DB.
>>>
>>> But then you never say what constitutes 'slow' for you, s

Re: [R] Comparing distributions

2010-06-23 Thread Ralf B
The diagram only serves as a rough example to give you an idea.

To be more precise I would like to give more detail: The data
represents movements from two types of pointing device (e.g. mouse,
pointer, ) along an axis. The data has diffreent parameters -- such as
different pointing devices, different axis, split by different
experiment conditions etc. but the problem is always the same: I would
like find out if their distributions correlate and would like to have
some kind of 'objective' (Yes, I know -- nothing is objective -- but
eye-balling isn't either.) measure, test, etc. These would be
accompanied by Q-Q plots and density plots to get a general feeling of
what is going on and become part of the discussion. I don't expect a
solution from here, but perhaps a general direction where I could find
my kind of problem being understood.

Ralf



On Wed, Jun 23, 2010 at 10:07 PM, Robert A LaBudde  wrote:
> Your "*" curve apparently dominates your "+" curve.
>
> If they have the same total number of data each, as you say, they both
> cannot sum to the same value (e.g., N = 1 or 1.000).
>
> So there is something going on that you aren't mentioning.
>
> Try comparing CDFs instead of pdfs.
>
> At 03:33 PM 6/23/2010, Ralf B wrote:
>>
>> I am trying to do something in R and would appreciate a push into the
>> right direction. I hope some of you experts can help.
>>
>> I have two distributions obtrained from 1 datapoints each (about
>> 1 datapoints each, non-normal with multi-model shape (when
>> eye-balling densities) but other then that I know little about its
>> distribution). When plotting the two distributions together I can see
>> that the two densities are alike with a certain distance to each other
>> (e.g. 50 units on the X axis). I tried to plot a simplified picture of
>> the density plot below:
>>
>>
>>
>>
>> |
>> |                                                         *
>> |                                                      *     *
>> |                                                   *    +   *
>> |                                              *     +     +  *
>> |                     *        +           *   +            +  *
>> |                 *        +*     +   *  +                   + *
>> |              *       +       *     +                           +*
>> |           *       +                                               +*
>> |        *       +                                                    +*
>> |     *      +                                                          +
>> *
>> |  *      +
>> + *
>> |___
>>
>>
>> What I would like to do is to formally test their similarity or
>> otherwise measure it more reliably than just showing and discussing a
>> plot. Is there a general approach other then using a Mann-Whitney test
>> which is very strict and seems to assume a perfect match. Is there a
>> test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or
>> are there any other similarity measures that could give me a statistic
>> about how close these two distributions are to each other ? All I can
>> say from eye-balling is that they seem to follow each other and it
>> appears that one distribution is shifted by a amount from the other.
>> Any ideas?
>>
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> 
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
> Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
> 824 Timberlake Drive                     Tel: 757-467-0954
> Virginia Beach, VA 23464-3239            Fax: 757-467-2947
>
> "Vere scire est per causas scire"
> 
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RJDBC vs RMySQL vs ???

2010-06-23 Thread Ralf B
Sorry for the lack of details. Since I run the same SQL first directly
on MySQL (using the MySQL Query Browser) and then again using R
through the RJDBC interface, I assume that I won't simply have a badly
constructed SQL query. However, just to clear possible objection, here
the SQL:


# Extracts vector of data points
getData <- function(connection) {
queryStart <- "SELECT id1, id2, x, y FROM `mytable` "
queryEnd <- ";"
query <- paste(queryStart, " WHERE id1 IN(", id1s, ") AND id2 IN(",
id2s, ") AND subtype='TYPE1'", queryEnd)
# execute query
data =  dbGetQuery(connection, query)
return(data)
}

When running this method using either RGUI or the command line, I have
a runtime that reaches an incredible 10 minutes (!) for selecting
about 50k - 80k data points (which I consider not much) based on the
range of IDs I choose. The table size is about 5-8 million data points
total. The same SQL query directly executed in MySQL Query Browser
takes about 20 seconds which I would consider fine. There are no
indices created for any of the fields but since the query runs a lot
faster in the query browser I don't suspect this to be the main
reason.

Any ideas?

Best,
Ralf




On Wed, Jun 23, 2010 at 4:36 PM, James W. MacDonald
 wrote:
> Hi Ralf,
>
> Ralf B wrote:
>>
>> I am running a simple SQL SELECT statement that involvs 50k + data
>> points using R and the RJDBC interface. I am facing very slow response
>> times in both the RGUI and the R console. When running this SQL
>> statement directly in a SQL client I have processing times that are a
>> lot lot faster (which means that the SQL statement itself is not the
>> problem).
>>
>> Did any of you compare RJDBC vs RMySQL or is there a better, more
>> efficient way to extract large data from databases using R? Would you
>> recommend dumping data out completely into flat files and working with
>> flat files instead? I expected that this would not be such a problem
>> given that businesses maintain their data in DBs and R is supposed to
>> be good in shifting around data. Am I doing something wrong?
>
> Well, if you don't show people what you have done, how can anybody tell if
> you are doing something wrong or not?
>
> I have no experience with RJDBC, so cannot say anything about that. However,
> I have always found RMySQL to be speedy enough. As an example:
>
>> library(RMySQL)
> Loading required package: DBI
>> con <- dbConnect("MySQL", host="genome-mysql.cse.ucsc.edu", user =
>> "genome", dbname = "hg18")
>> system.time(a <- dbGetQuery(con, "select name, chromEnd from snp129 where
>> chrom='chr1' and chromStart between 1 and 1e8;")
> + )
>   user  system elapsed
>   7.95    0.06   38.59
>> dim(a)
> [1] 508676      2
>
> So 40 seconds to get half a million records. Since this is via the internet,
> I have to imagine things would be much faster querying a local DB.
>
> But then you never say what constitutes 'slow' for you, so maybe this is
> slow as well?
>
> Best,
>
> Jim
>
>
>>
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> **
> Electronic Mail is not secure, may not be read every day, and should not be
> used for urgent or sensitive issues
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RJDBC vs RMySQL vs ???

2010-06-23 Thread Ralf B
I am running a simple SQL SELECT statement that involvs 50k + data
points using R and the RJDBC interface. I am facing very slow response
times in both the RGUI and the R console. When running this SQL
statement directly in a SQL client I have processing times that are a
lot lot faster (which means that the SQL statement itself is not the
problem).

Did any of you compare RJDBC vs RMySQL or is there a better, more
efficient way to extract large data from databases using R? Would you
recommend dumping data out completely into flat files and working with
flat files instead? I expected that this would not be such a problem
given that businesses maintain their data in DBs and R is supposed to
be good in shifting around data. Am I doing something wrong?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Comparing distributions

2010-06-23 Thread Ralf B
I am trying to do something in R and would appreciate a push into the
right direction. I hope some of you experts can help.

I have two distributions obtrained from 1 datapoints each (about
1 datapoints each, non-normal with multi-model shape (when
eye-balling densities) but other then that I know little about its
distribution). When plotting the two distributions together I can see
that the two densities are alike with a certain distance to each other
(e.g. 50 units on the X axis). I tried to plot a simplified picture of
the density plot below:




|
| *
|  * *
|   *+   *
|  * + +  *
| *+   *   ++  *
| *+* +   *  +   + *
|  *   +   * +   +*
|   *   +   +*
|*   ++*
| *  +  + *
|  *  +   + *
|___


What I would like to do is to formally test their similarity or
otherwise measure it more reliably than just showing and discussing a
plot. Is there a general approach other then using a Mann-Whitney test
which is very strict and seems to assume a perfect match. Is there a
test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or
are there any other similarity measures that could give me a statistic
about how close these two distributions are to each other ? All I can
say from eye-balling is that they seem to follow each other and it
appears that one distribution is shifted by a amount from the other.
Any ideas?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] About normality tests (2) ...

2010-06-23 Thread Ralf B
In addition to the previous email:

What plots would you suggest in addition to density / histogram plots
and how can I produce them with R ? Perhaps one of you has an example
?

Thanks a lot,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] About normality tests...

2010-06-23 Thread Ralf B
Hi all,

I have two very large samples of data (1+ data points) and would
like to perform normality tests on it. I know that p < .05 means that
a data set is considered as not normal with any of the two tests. I am
also aware that large samples tend to lead more likely to normal
results (Andy Field, 2005).

I have a few questions to ensure that I am using them right.

1) The Shapiro-Wilk test requires to provide mean and sd. Is is
correct to add here the mean and sd of the data itself (since I am
comparing to a normal distribution with the same parameters) ?

mySD <- sd(mydata$myfield)
myMean <- mean(mydata$myfield)
shapiro.test(rnorm(100, mean = myMean, sd = mySD))

2) If I just want to test each distribution individually, I assume
that I am doing a one-sample Kolmogorov-Smirnov test. Is that correct?

3) If I simply want to know if normality exists or not, what should I
put for the parameter 'alternative' ? Does it actually matter?

alternative = c("two.sided", "less", "greater")

Thank you,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RJDBC - sloooooow - HELP!

2010-06-17 Thread Ralf B
Hi all,

I am suffering from a very slow RJDBC (7 rows of from a simple
select take like 10 minutes). Does anybody know if RMySQL is faster?
Or RODBC in that respect? What are alternatives and what can be done
to get a realistic performance out of MySQL when connected to R's JRI
?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Column name defined by function variable

2010-06-17 Thread Ralf B
 Sorry, its late and I am getting tired ;)

I modified based on your suggestion:

#combine data
add.col <- function(df, new.col, name) {
n.row <- dim(df)[1]
length(new.col) <- n.row
names(new.col) <- name
cbind(df, new.col)
}
data <- data.frame(stuff1=as.numeric(d2$points))
data <- add.col(data, as.numeric(d1$morepoints), "stuff2")

but the column in the data frame is still called 'new.col' and not 'stuff2'.

Any further ideas?

Best,
Ralf



On Thu, Jun 17, 2010 at 5:14 AM, Ivan Calandra
 wrote:
> Hi,
>
> I haven't check much of what you wrote, so just a blind guess. What about in
> the function's body before cbind():
> names(new.col) <- "more stuff"
> ?
>
> HTH,
> Ivan
>
> Le 6/17/2010 11:09, Ralf B a écrit :
>>
>> Hi all,
>>
>> probably a simple problem for you but I am stuck.
>>
>> This simple function adds columns (with differing length) to data frames:
>>
>> add.col<- function(df, new.col) {
>>        n.row<- dim(df)[1]
>>        length(new.col)<- n.row
>>        cbind(df, new.col)
>> }
>>
>> Now I would like to extend that method. A new parameter 'name' shouild
>> allow people to pass in a name for that new column. Is that possible
>> and how can this be achieved?
>>
>> Example:
>>
>> myData<- data.frame(c(1,2,3))
>> add.col(myData, c(5,6,7,8), 'more stuff')
>>
>> adds a new column named 'more stuff' to the dataframe myData.
>>
>>
>> Any ideas?
>>
>> Best,
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> Ivan CALANDRA
> PhD Student
> University of Hamburg
> Biozentrum Grindel und Zoologisches Museum
> Abt. Säugetiere
> Martin-Luther-King-Platz 3
> D-20146 Hamburg, GERMANY
> +49(0)40 42838 6231
> ivan.calan...@uni-hamburg.de
>
> **
> http://www.for771.uni-bonn.de
> http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Testing for differences between 2 unknown distributions/densities

2010-06-17 Thread Ralf B
Hi all,

I have two distributions / densities (drew density plots and
eye-balled some data). Given that I don't want to make any assumptions
about the data (e.g. normality, existence of certain distribution
types and parameters), what are my options for testing that the
distributions are the same?

Thanks,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Column name defined by function variable

2010-06-17 Thread Ralf B
Hi all,

probably a simple problem for you but I am stuck.

This simple function adds columns (with differing length) to data frames:

add.col <- function(df, new.col) {
n.row <- dim(df)[1]
length(new.col) <- n.row
cbind(df, new.col)
}

Now I would like to extend that method. A new parameter 'name' shouild
allow people to pass in a name for that new column. Is that possible
and how can this be achieved?

Example:

myData <- data.frame(c(1,2,3))
add.col(myData, c(5,6,7,8), 'more stuff')

adds a new column named 'more stuff' to the dataframe myData.


Any ideas?

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strange behavior when plotting with ggplot2 and lattice

2010-05-14 Thread Ralf B
Hi all,

I have the following script,which won't plot (tried in RGUI and also
in Eclipse StatET):

library(ggplot2)# for plotting results
userids <- c(1,2,3)
for (userid in userids){
qplot(c(1:10), c(1:20))
}
print ("end")

No plot shows up. If I run the following:

library(ggplot2)# for plotting results
userids <- c(1,2,3)
for (userid in userids){
blabla)))
qplot(c(1:10), c(1:20))
}
print ("end")

which contains a syntax mistake in line 4, then I get the plot output
on the screen. I have the same issue when using lattice, but its ok
when using standard graphics plot.

WHAT IS GOING ON???

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2: qplot won't work

2010-05-12 Thread Ralf B
I have a script running in the StatET Eclipse environment that
executes the ggplot2 command qplot in a function:

# Creates the plot
createPlot <- function(){
print("Lets plot!")
qplot(1:10, letters[1:10])
}

When executing the qplot line directly, it works. When executing the
script it does not open a window and it it does not plot. Is there
something important I have forgotten? I know that the function is
called because I always get my 'Lets plot' When using normal
graphics plot functions, its also works seamless.

Of course I am importing the library ggplot2 at the beginning of my
script - here is the import log:

 library(gdata) # for trim function
gdata: Unable to locate valid perl interpreter
gdata:
gdata: read.xls() will be unable to read Excel XLS and XLSX files
gdata: unless the 'perl=' argument is used to specify the location of a
gdata: valid perl intrpreter.
gdata:
gdata: (To avoid display of this message in the future, please ensure
gdata: perl is installed and available on the executable search path.)
gdata: Unable to load perl libaries needed by read.xls()
gdata: to support 'XLX' (Excel 97-2004) files.

gdata: Unable to load perl libaries needed by read.xls()
gdata: to support 'XLSX' (Excel 2007+) files.

gdata: Run the function 'installXLSXsupport()'
gdata: to automatically download and install the perl
gdata: libaries needed to support Excel XLS and XLSX formats.

Attaching package: 'gdata'


The following object(s) are masked from package:utils :

 object.size

Warning message:
package 'gdata' was built under R version 2.10.1
> library(TTR)  # for moving averages (SMA,...) smoothing
Loading required package: xts
Loading required package: zoo
Warning messages:
1: package 'TTR' was built under R version 2.10.1
2: package 'xts' was built under R version 2.10.1
3: package 'zoo' was built under R version 2.10.1
> library(ggplot2)  # for plotting results
Loading required package: proto
Loading required package: grid
Loading required package: reshape
Loading required package: plyr
Loading required package: digest

Attaching package: 'ggplot2'


The following object(s) are masked from package:gdata :

 interleave

Warning messages:
1: package 'ggplot2' was built under R version 2.10.1
2: package 'proto' was built under R version 2.10.1
3: package 'reshape' was built under R version 2.10.1
4: package 'plyr' was built under R version 2.10.1
5: package 'digest' was built under R version 2.10.1


What is wrong here?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Smoothing Techniques - short stepwise functions with spikes

2010-05-11 Thread Ralf B
R Friends,

I have data from which I would like to learn a more general
(smoothened) trend by applying data smoothing methods. Data points
follow a positive stepwise function.


|x
 x
|   
|   xx
| xxx 
|   x
|
|
  xxx 
|__


Data points from each step should not be interacting with any other
step. The outliers I want to to remove are spikes as shown in the
diagram. These spikes do not have more than one or two points. I
consider larger groups as relevant and want to keep them in. I
sometimes have less than 5 points for each step, and up to 50 at max.
Given these conditions would you suggest using one of the moving
averages (e.g. SMA, EMA, DEMA, ...) or the locally linear regression
(lowress) method. Are there any other options? Does anybody know a
good site that overviews all methods without going to much into
mathematical details but rather focusing on the requirements and
underlying assumptions of each method? Is there perhaps even a package
that runs and visualizes a comparison on the data similar to packages
like 'party' ? (with 1000s of active packages, one can always hope for
that)

Thanks in advance!
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Corrupt R installation?

2010-05-10 Thread Ralf B
I installed the lattice package, and got an error that R was not able
to remove the previous version of lattice. Now my installation seems
to be currupt, even affecting other packages. I am getting this error
when loading TTR:

> library(TTR)
Loading required package: xts
Loading required package: zoo
Error in loadNamespace(i, c(lib.loc, .libPaths())) :
  there is no package called 'lattice'
In addition: Warning messages:
1: package 'TTR' was built under R version 2.10.1
2: package 'xts' was built under R version 2.10.1
3: package 'zoo' was built under R version 2.10.1
Error: package 'zoo' could not be loaded

My question now is, is there a way to manually remove lattice (or
whats left from it) ? Or do I have to go through the process of
completely re-installing? What do you guys do to prevent such a
situation - is there an easy way to secure a R installation?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bug in DEMA (Moving Average smoothing algoritm) ?

2010-05-10 Thread Ralf B
When running DEMA(data, 5) on a vector 'data' of length 5, my R engine
stops. Is this function or the R environment facing a bug here or am I
doing something wrong? DEMA should work if the smoothing window size
is the same size as the the data length, right?

(I am working with Eclipse 3.5. and the StatET environment.)

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >