date:20100516

[R] Vector recycling and zoo

2010-05-16 Thread Sean Carmody

I am a bit confused about the different approaches taken to recycling in
plain data frames and zoo objects. When carrying out simple arithmetic,
dataframe seem to recycle single arguments, zoo objects do not. Here is an
example

 x - data.frame(a=1:5*2, b=1:5*3)
 x
   a  b
1  2  3
2  4  6
3  6  9
4  8 12
5 10 15
 x$a/x$a[1]
[1] 1 2 3 4 5
 x - zoo(x)
 x$a/x$a[1]
1
1


I feel understanding this difference would lead me to a greater
understanding of the zoo module!

Sean.

-- 
Sean Carmody
Twitter: http://twitter.com/seancarmody
Stable: http://mulestable.net/sean

The Stubborn Mule
Blog: http://www.stubbornmule.net
Forum: http://mulestable.net/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Location attribute

2010-05-16 Thread M. (AMFOR)

hi everybody, a question, as I can know the location (number) of an
attribute with its name.

Ej.

X1  X2  X3  X4  X5  X6
1   3   5   2   1   7
6   7   4   5   2   9

as I can know that the attribute X4 is in position 4

I hope you can help me

from already thank you very much to all

Agustín

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] p value

2010-05-16 Thread Soham


How to compute the p-value of a statistic generally?
-- 
View this message in context: 
http://r.789695.n4.nabble.com/p-value-tp2217867p2217867.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Location attribute

2010-05-16 Thread M. (AMFOR)

hi everybody, a question, as I can know the location (number) of an
attribute with its name.

Ej.

X1  X2  X3  X4  X5  X6
1   3   5   2   1   7
6   7   4   5   2   9

as I can know that the attribute X4 is in position 4

I hope you can help me

from already thank you very much to all

Agustín

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to manage an error message about NA/NaN/Inf

2010-05-16 Thread Moohwan Kim

Dear R Family,

I have an error message. I would like to learn how to deal with that.
The orginal series is as follows: I just pick up the first 10 observations.
 dif_transaud[1:10]
 [1]  0.0065880493 -0.0065880490 -0.0131743570  0.0197745715  0.0065889175
 [6]  0.0131813110  0.0065923924 -0.0395587070  0.156455  0.0197693578

Then I transformed them into the following observations.
 dif_transaud_sq - dif_transaud^2
 lnabsdif_transaud - 0.5*log(dif_transaud_sq)
 lnabsdif_transaud[1:10]
 [1]  -5.022498  -5.022498  -4.329483  -3.923358  -5.022366  -4.328955
 [7]  -5.021839  -3.229969 -11.065327  -3.923622

Finally, I run the program, which is part of wavelet transform.
 mra.out - mra(lnabsdif_transaud, filter=la8, n.levels=8,
+   boundary=reflection, fast=TRUE, method=modwt)

However, this triggered an error message.
 Error in FUN(1L[[1L]], ...) : NA/NaN/Inf in foreign function call (arg 1)

I guess there are a few big negative numbers in lnabsdif_transaud.
I was wondering if there is an appropriate way to truncate those
numbers in a reasonable way.

Regards,
Moohwan Kim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] number of location attribute with its name

2010-05-16 Thread M. (AMFOR)

hi everybody, a question, as I can know the location (number) of an
attribute with its name.

Ej.

X1  X2  X3  X4  X5  X6
1   3   5   2   1   7
6   7   4   5   2   9

as I can know that the attribute X4 is in position 4

I hope you can help me

from already thank you very much to all

Agustín

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to rank matrix data by deciles?

2010-05-16 Thread Peter Ehlers


On 2010-05-13 17:50, Phil Spector wrote:

Vincent -
I'm afraid there's no solution other than artificially modifying
the zeroes:


vec

[1] 26.58950617 5.73074074 5.9622 5.6478 20.95728395 0.
0.0700 12.8689
[9] 3.64543210 0.05049383 25.6089 3.53246914 0. 31.39049383
3.77641975 13.19617284
[17] 0.

cut(vec,quantile(vec,(0:10)/10),include.lowest=TRUE,label=FALSE)

Error in cut.default(vec, quantile(vec, (0:10)/10), include.lowest =
TRUE, :
'breaks' are not unique

vec[vec==0] = jitter(vec[vec==0])
cut(vec,quantile(vec,(0:10)/10),include.lowest=TRUE,label=FALSE)

[1] 10 6 7 5 9 1 3 7 4 2 9 4 2 10 5 8 1

It gives an answer, but it may not make sense for all data.

- Phil



The problem is that quantile() produces multiple values
for the breaks used in cut(). Phil's suggestion modifies
the data. It might be preferable to modify the breaks:

  eps - .Machine$double.eps  #or use something like 1e-10
  brks - quantile(vec, (0:10)/10) + eps*(0:10)
  cut(vec, brks, include.lowest=TRUE, labels=FALSE)
  #[1] 10  6  7  5  9  1  3  7  4  2  9  4  1 10  5  8  1

 -Peter Ehlers


On Thu, 13 May 2010, vincent.deluard wrote:





Dear Phil,

You helped me with a request to rand matrix columns by deciles two weeks
ago.

This really un-blocked me on this project but I found a little bug.

As in before, my data is in a matrix:


madebt[1:16,1:2]

X4.19.2010 X4.16.2010
[1,] 26.61197531 26.58950617
[2,] 5.72765432 5.73074074
[3,] 5.95839506 5.9622
[4,] 5.6433 5.6478
[5,] 20.93814815 20.95728395
[6,] 0. 0.
[7,] 0.0700 0.0700
[8,] 12.87802469 12.8689
[9,] 3.64407407 3.64543210
[10,] 0.05037037 0.05049383
[11,] 25.59024691 25.6089
[12,] 3.47987654 3.53246914
[13,] 0. 0.
[14,] 31.39037037 31.39049383
[15,] 3.78296296 3.77641975
[16,] 13.17876543 13.19617284

The apply function will work for this sample of my data:

debtdeciles = apply(madebt[1:16,1:2],2,function(x)
cut(x,quantile(x,(0:10)/10,
na.rm=TRUE),label=FALSE,include.lowest=TRUE))

debtdeciles

X4.19.2010 X4.16.2010
[1,] 10 10
[2,] 6 6
[3,] 6 6
[4,] 5 5
[5,] 8 8
[6,] 1 1
[7,] 2 2
[8,] 7 7
[9,] 4 4
[10,] 2 2
[11,] 9 9
[12,] 3 3
[13,] 1 1
[14,] 10 10
[15,] 4 4
[16,] 8 8

However, it will fail for


madebt[1:17,1:2]

X4.19.2010 X4.16.2010
[1,] 26.61197531 26.58950617
[2,] 5.72765432 5.73074074
[3,] 5.95839506 5.9622
[4,] 5.6433 5.6478
[5,] 20.93814815 20.95728395
[6,] 0. 0.
[7,] 0.0700 0.0700
[8,] 12.87802469 12.8689
[9,] 3.64407407 3.64543210
[10,] 0.05037037 0.05049383
[11,] 25.59024691 25.6089
[12,] 3.47987654 3.53246914
[13,] 0. 0.
[14,] 31.39037037 31.39049383
[15,] 3.78296296 3.77641975
[16,] 13.17876543 13.19617284
[17,] 0. 0.



debtdeciles = apply(madebt[1:17,1:2],2,function(x)

+ cut(x,quantile(x,(0:10)/10,
na.rm=TRUE),label=FALSE,include.lowest=TRUE))
Error in cut.default(x, quantile(x, (0:10)/10, na.rm = TRUE), label =
FALSE,
:
'breaks' are not unique

My guess is that we now have 3 zeros in each column. For each
decile, we
cannot have more than 2 elements (total of 17 numbers in each column)
and I
believe R cannot determine where to put the third zero. Do you have any
solution for this problem?

Many thanks,

--
View this message in context:
http://r.789695.n4.nabble.com/How-to-rank-matrix-data-by-deciles-tp2133496p2215945.html

Sent from the R help mailing list archive at Nabble.com.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sample

2010-05-16 Thread Laetitia Schmid


Hi,
I am sampling two random columns from females and two random columns  
from males to produce tetraploid offspring. For every female I am  
sampling a random male.
In the end I want to write out a a matrix with all the offspring, but  
that does not work. I get always only the offspring from the last  
females. There must be a mistake in my script:


moms-read.delim(females.txt, stringsAsFactors=FALSE,header=TRUE)
dads-read.delim(males.txt, stringsAsFactors=FALSE,header=TRUE)

output_offspring-data.frame()

for (i in 1:nrow(moms)){
   rdad=sample(1:nrow(dads),1)
   kid-c(sample(moms[i,2:5],2),sample(dads[rdad,2:5],2))
 output_offspring-rbind(output_offspring,c(moms$SampleID[i],dads 
$SampleID[rdad],kid))

}
write 
.table 
(output_offspring,offspring_7.txt,row.names=T,col.names=T,quote=F)


females.txt:

SampleIDA1  A2  A3  A4
GM920222GATTGCC GATTGCC GATAGAC GATAGAC
GM930040GTCATCA GAGTGCA ACTATAA GATTGCC
GM930040GTCATCA GAGTGCA ACTATAA GATTGCC
GM960023GATTGCC GTCATCA GATTGCC GATTGCC
GM920224ACTAGAA GTCATCA GTCATCA ACTAGAA
GM920224ACTAGAA GTCATCA GTCATCA ACTAGAA
GM920034GATTGCC GTCATCA GATTGCA GATTGCA
GM920096GATTGCC GATTGCC GATTGCA GATTGCC
GM930029GTCATCA GATTGCC GTCATCA GATTGCC
GM940031GATTGCC GAGTGCA GATTGCA ACTAGAA
GM960028GATTGCC GAGTGCA GATTGCA ACTAGAA
GM980007GTCATCA GATTGCC ACTTGAA GTCATCA
GM970009ACTAGAA GTCAGAA GTCAGCA ACTAGCA
GM930026ACTAGAA GAGTGCA GAGTGCA ACTAGAA
GM920031GATTGCC GTCATCA GATTGCC GATTGCC
GM990105GATTGCC GATTGCC GTCAGCA GTCAGCA
GM920202GATTGCC GATTGCC GATTGCC GATTGCC
GM920089GAGTGCA GTCAGAA ACTATCA GATTGCC
GM980051ACTAGAA ACTAGAA GATAGCC GATAGCC
GM930109GTCATCA GAGTGCA GAA ACTAGAA
GM940039GTCATCA GAGTGCA GTTTGCC ACTTTCA
GM050099GAGTGCA GTCAGAA GTTATCC ACTTTCA
GM050099GAGTGCA GTCAGAA GTTATCC ACTTTCA
GM030005ACTAGAA GAGTGCA ACTAGAA ACTAGAA
GM050009ACTAGAA GATTGCC GATTGCC ACTAGAA
GM990027GATTGCC GAGTGCA GATTGCA GATTGCC
GM990066GATTGCC GTCATCA GTCATCA GATTGCC

males.txt:

SampleIDA1  A2  A3  A4
WI920425ACTAGAA ACCATCA ACTAGAA ACTAGAA
WI920408ACTAGAA ACTAGAA ACTAGAA ACTAGAA
WI920009ACTAGAA ACTAGAA ACTAGAA GATTGCC
WI920352ACTTTCA ACGTTCA GAGAGAA GATTGCA
WI920004GATTGCC GATTGCC ACTAGAA ACTAGAA
WI920353ACTAGAA GATTGCC ACTAGAA GATTGCC
WI920410ACTAGAA GTCAGAA GAGTACC ACTTTCA
WI920007ACTAGAA ACTTTCA GAATGCA GTTAGAC
WI920015ACTTTCA ACGTTCA GTCAGAA GATTGCC
WI920426ACTAGAA GTCATCA GTCATCA ACTAGAA
WI920433ACTAGAA GTCAGAA GTCTGCA ACTTGCA
WI920370GATTGCC GAGTGCA GATTGCA ACTAGAA
WI920437GTCATCA GTCAGAA GATTGCC ACTTTCA
WI920027GATTGCC GAGTGCA GATTACC GATTGCC
WI920415GATTGCC GAGTGCA GTCATCA ACT
WI920023ACTTTCA GTCAGAA GAGATCA GATTGCC
WI920360GATTGCC GTCATCA GATTGCA ACTTTCA
WI920017GATTGCC GTCAGAA GATTTCC ACTAGCA
WI920028GTCATCA GTCAGAA GATTGCC ACTTGCA
WI920361GATTGCC GAGTGCA GTCAGCA GATTGCC
WI920367GATTGCC GATTGCC GTCATCA GATTGCC
WI920366GATTGCC GATTGCC GTCTGCA GTCTGCA
WI920365GATTGCC GAGTGCA GTCAGCA GTTTGCC
WI920362GATTGCC GAGTGCA GATTGCA ACTAGAA
WI920441GATTGCC GAGTGCA GATTGCA ACTAGAA
WI920022GTCATCA GTCAGAA GATTGCC ACTTTCA
WI920356GTCATCA GTCAGAA GATTGCC ACTTTCA
WI920355GATTGCC GATTGCC GATTGCC GATTGCC
WI920423GATTGCC GATTGCC GATTGCC GATTGCC
WI920021GATTGCC GATTGCC GTCAGCA GATTGCC
WI920359GATTGCC GATTGCC GTCAGCA GATTGCC
WI920024GATTGCC GATTGCC GATTGCC GATTGCC
WI920369GATTGCC GATTGCC GATTGCC GATTGCC
WI920416GATTGCC GATTGCC GATTGCC GATTGCC
WI920427GATTGCC GATTGCC GATTGCC GATTGCC
WI920428GATTGCC GATTGCC GATTGCC GATTGCC
WI920431GATTGCC GATTGCC GATTGCC GATTGCC
WI920001GATTGCC GTCATCA GTCATCA GATTGCC
WI920010GATTGCC GTCATCA GTCATCA GATTGCC
WI920349GATTGCC GTCATCA GTCATCA GATTGCC
WI920363GATTGCC GTCATCA GTCATCA GATTGCC
WI920417GATTGCC GTCATCA GTCATCA GATTGCC
WI920430GATTGCC GTCATCA GTCATCA GATTGCC

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] improvement

2010-05-16 Thread Sebastian Kruk

Hi, if i just want a vector filled with names which has length(index)  0.

For example if

nombreC - c(Juan, Carlos, Ana, María)
nombreL - c(Juan Campo, Carlos Gallardo, Ana Iglesias, María
Bacaldi, Juan Grondona, Dario Grandineti, Jaime Acosta,
Lourdes Serrano)

I would like to obtain a matrix called vaca  with two column, name and
index, name is nombreC's element and index is the position in nombreL,
I don't want info about nombreC which no appear in nombreL. And I
would like to count how many cases appear.

For example vaca:

name   index
1  Juan1
2  Juan5
3  Carlos  2
4  Ana 3
5  María   4

Code is it:
vaca - do.call(rbind,lapply(noquote(nombreC),function(.name) {
index - grep(.name,noquote(nombreL))
  index - if( length(index)  0) index else 0
data.frame(name=.name,index=index)
}))

vaca - vaca[vaca$index0,]
cuenta - nrow(vaca)

Thanks,

Sebastia´n


2010/5/11  markle...@verizon.net:
 Hi: I added another column to make the output more understandable. I hope it
 helps.


 do.call(rbind,lapply(nombreC,function(.name) {
    index - grep(.name,nombreL)
    nummatches - if (length(index)  0) length(index) else 0
    index - if( length(index)  0) index else 0
    data.frame(name=.name,index=index,nummatches=nummatches)
 }))


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Location attribute

2010-05-16 Thread Ted Harding

On 16-May-10 06:11:52, Agustín Muñoz M. (AMFOR) wrote:
 hi everybody, a question, as I can know the location (number) of an
 attribute with its name.
 
 Ej.
 
 X1  X2  X3  X4  X5  X6
 1   3   5   2   1   7
 6   7   4   5   2   9
 
 as I can know that the attribute X4 is in position 4
 
 I hope you can help me
 
 from already thank you very much to all
 AgustÃn

You can use the function colnames(), with either a matrix or a
dataframe, to extract (or set) the column names:

  X - matrix(c( 1,3,5,2,1,7,6,7,4,5,2,9), byrow=TRUE,nrow=2)
  colnames(X) - c(X1,X2,X3,X4,X5,X6)
  X
  #  X1 X2 X3 X4 X5 X6
  # [1,]  1  3  5  2  1  7
  # [2,]  6  7  4  5  2  9
  colnames(X)
  # [1] X1 X2 X3 X4 X5 X6
  which(colnames(X)==X4)
  # [1] 4

  X - data.frame(X1=c(1,6),X2=c(3,7),X3=c(5,4),
  X4=c(2,5),X5=c(1,2),X6=c(7,9))
  X
  #   X1 X2 X3 X4 X5 X6
  # 1  1  3  5  2  1  7
  # 2  6  7  4  5  2  9
  colnames(X)
  # [1] X1 X2 X3 X4 X5 X6
  which(colnames(X)==X4)
  # [1] 4

So, in either case,

  which(colnames(X)==X4)

will give the result you want.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 16-May-10   Time: 09:42:28
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] p value

2010-05-16 Thread Tal Galili

Hi Soham,
I don't feel your question is well defined.
But an equally ill defined answer would be:
Through a permutation test.



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Sat, May 15, 2010 at 7:04 PM, Soham soham.tommarvolorid...@gmail.comwrote:


 How to compute the p-value of a statistic generally?
 --
 View this message in context:
 http://r.789695.n4.nabble.com/p-value-tp2217867p2217867.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with making multiple plots (geom_pointrange) in a loop (ggplot2)

2010-05-16 Thread baptiste auguie

Hi,

On 16 May 2010 03:31, michael westphal mi_westp...@yahoo.com wrote:
[ snipped ]
 Any suggestions?


i'd suggest you

- read the posting guide
- upgrade your R to the latest version
- don't post to two mailing lists
- make your example minimal, self-contained, reproducible
- show the result of sessionInfo()

HTH,

baptiste


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RODBC-Error-sqlSave

2010-05-16 Thread Johan Lassen

Dear R-community,

After repeating the sqlSave-command 3 times on a dataframe (of size 13149
rows * 5 columns) to my MS-Access database I get the following error:

*Error in sqlSave(channel, eksport_transp_acc_2, transp_acc_scenarier,  :
unable to append to table transp_acc_scenarier*
**
This means that the first 2 savings are completed, but the third-one
is somehow not. I have an idea that perhaps it is due to some out-of-memory
problem. My PC has 2 CPUs, 1.83 G Hz, 0.99 GB RAM.

Have anyone got some idea of what causes and solves the problem? I have
tried also with the function *gc()*, but without success.

Thanks in advance,
Best regards,
Johan



PS:
I use the following code, where the file *eksport_transp_acc_2_rbind.csv* is
of size 13149*5:


*library(RODBC)*
**
*eksport_transp_acc_2 -
read.table(file = results/csv/eksport_transp_acc_2_rbind.csv,
 sep =;, header = T)*
**
*sqlSave(channel,eksport_transp_acc_2,
transp_acc_scenarier,append = T,fast = F,rownames = F)
*





-- 
Johan Lassen

In the cities people live in time -
in the mountains people live in space

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sample

2010-05-16 Thread Peter Ehlers


On 2010-05-16 2:23, Laetitia Schmid wrote:

Hi,
I am sampling two random columns from females and two random columns
from males to produce tetraploid offspring. For every female I am
sampling a random male.
In the end I want to write out a a matrix with all the offspring, but
that does not work. I get always only the offspring from the last
females. There must be a mistake in my script:

moms-read.delim(females.txt, stringsAsFactors=FALSE,header=TRUE)
dads-read.delim(males.txt, stringsAsFactors=FALSE,header=TRUE)

output_offspring-data.frame()

for (i in 1:nrow(moms)){
rdad=sample(1:nrow(dads),1)
kid-c(sample(moms[i,2:5],2),sample(dads[rdad,2:5],2))
output_offspring-rbind(output_offspring,c(moms$SampleID[i],dads$SampleID[rdad],kid))

}


(When I run your code, I get an error.)

It's always best to pre-assign your output to have the
desired dimensions and then fill in the cells:

output_offspring - as.data.frame(matrix(, nrow=nrow(moms), ncol=6), 
stringsAsFactors=FALSE)


for (i in 1:nrow(moms)){
   rdad - sample(1:nrow(dads),1)
   kid - c(sample(moms[i,2:5],2), sample(dads[rdad,2:5],2))
   output_offspring[i,] - c(moms$SampleID[i], dads$SampleID[rdad], kid)
}

Personally, I would work with matrices, since all of your data
are string variables.

 -Peter Ehlers


write.table(output_offspring,offspring_7.txt,row.names=T,col.names=T,quote=F)


females.txt:

SampleID A1 A2 A3 A4
GM920222 GATTGCC GATTGCC GATAGAC GATAGAC
GM930040 GTCATCA GAGTGCA ACTATAA GATTGCC
GM930040 GTCATCA GAGTGCA ACTATAA GATTGCC
GM960023 GATTGCC GTCATCA GATTGCC GATTGCC
GM920224 ACTAGAA GTCATCA GTCATCA ACTAGAA
GM920224 ACTAGAA GTCATCA GTCATCA ACTAGAA
GM920034 GATTGCC GTCATCA GATTGCA GATTGCA
GM920096 GATTGCC GATTGCC GATTGCA GATTGCC
GM930029 GTCATCA GATTGCC GTCATCA GATTGCC
GM940031 GATTGCC GAGTGCA GATTGCA ACTAGAA
GM960028 GATTGCC GAGTGCA GATTGCA ACTAGAA
GM980007 GTCATCA GATTGCC ACTTGAA GTCATCA
GM970009 ACTAGAA GTCAGAA GTCAGCA ACTAGCA
GM930026 ACTAGAA GAGTGCA GAGTGCA ACTAGAA
GM920031 GATTGCC GTCATCA GATTGCC GATTGCC
GM990105 GATTGCC GATTGCC GTCAGCA GTCAGCA
GM920202 GATTGCC GATTGCC GATTGCC GATTGCC
GM920089 GAGTGCA GTCAGAA ACTATCA GATTGCC
GM980051 ACTAGAA ACTAGAA GATAGCC GATAGCC
GM930109 GTCATCA GAGTGCA GAA ACTAGAA
GM940039 GTCATCA GAGTGCA GTTTGCC ACTTTCA
GM050099 GAGTGCA GTCAGAA GTTATCC ACTTTCA
GM050099 GAGTGCA GTCAGAA GTTATCC ACTTTCA
GM030005 ACTAGAA GAGTGCA ACTAGAA ACTAGAA
GM050009 ACTAGAA GATTGCC GATTGCC ACTAGAA
GM990027 GATTGCC GAGTGCA GATTGCA GATTGCC
GM990066 GATTGCC GTCATCA GTCATCA GATTGCC

males.txt:

SampleID A1 A2 A3 A4
WI920425 ACTAGAA ACCATCA ACTAGAA ACTAGAA
WI920408 ACTAGAA ACTAGAA ACTAGAA ACTAGAA
WI920009 ACTAGAA ACTAGAA ACTAGAA GATTGCC
WI920352 ACTTTCA ACGTTCA GAGAGAA GATTGCA
WI920004 GATTGCC GATTGCC ACTAGAA ACTAGAA
WI920353 ACTAGAA GATTGCC ACTAGAA GATTGCC
WI920410 ACTAGAA GTCAGAA GAGTACC ACTTTCA
WI920007 ACTAGAA ACTTTCA GAATGCA GTTAGAC
WI920015 ACTTTCA ACGTTCA GTCAGAA GATTGCC
WI920426 ACTAGAA GTCATCA GTCATCA ACTAGAA
WI920433 ACTAGAA GTCAGAA GTCTGCA ACTTGCA
WI920370 GATTGCC GAGTGCA GATTGCA ACTAGAA
WI920437 GTCATCA GTCAGAA GATTGCC ACTTTCA
WI920027 GATTGCC GAGTGCA GATTACC GATTGCC
WI920415 GATTGCC GAGTGCA GTCATCA ACT
WI920023 ACTTTCA GTCAGAA GAGATCA GATTGCC
WI920360 GATTGCC GTCATCA GATTGCA ACTTTCA
WI920017 GATTGCC GTCAGAA GATTTCC ACTAGCA
WI920028 GTCATCA GTCAGAA GATTGCC ACTTGCA
WI920361 GATTGCC GAGTGCA GTCAGCA GATTGCC
WI920367 GATTGCC GATTGCC GTCATCA GATTGCC
WI920366 GATTGCC GATTGCC GTCTGCA GTCTGCA
WI920365 GATTGCC GAGTGCA GTCAGCA GTTTGCC
WI920362 GATTGCC GAGTGCA GATTGCA ACTAGAA
WI920441 GATTGCC GAGTGCA GATTGCA ACTAGAA
WI920022 GTCATCA GTCAGAA GATTGCC ACTTTCA
WI920356 GTCATCA GTCAGAA GATTGCC ACTTTCA
WI920355 GATTGCC GATTGCC GATTGCC GATTGCC
WI920423 GATTGCC GATTGCC GATTGCC GATTGCC
WI920021 GATTGCC GATTGCC GTCAGCA GATTGCC
WI920359 GATTGCC GATTGCC GTCAGCA GATTGCC
WI920024 GATTGCC GATTGCC GATTGCC GATTGCC
WI920369 GATTGCC GATTGCC GATTGCC GATTGCC
WI920416 GATTGCC GATTGCC GATTGCC GATTGCC
WI920427 GATTGCC GATTGCC GATTGCC GATTGCC
WI920428 GATTGCC GATTGCC GATTGCC GATTGCC
WI920431 GATTGCC GATTGCC GATTGCC GATTGCC
WI920001 GATTGCC GTCATCA GTCATCA GATTGCC
WI920010 GATTGCC GTCATCA GTCATCA GATTGCC
WI920349 GATTGCC GTCATCA GTCATCA GATTGCC
WI920363 GATTGCC GTCATCA GTCATCA GATTGCC
WI920417 GATTGCC GTCATCA GTCATCA GATTGCC
WI920430 GATTGCC GTCATCA GTCATCA GATTGCC

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] abline limit constrain x-range how?

2010-05-16 Thread Jim Lemon


On 05/16/2010 12:03 AM, Giovanni Azua wrote:

Hello,

I managed to linearize my LDA decision boundaries now I would like to call abline three 
times but be able to specify the exact x range. I was reading the doc but it doesn't seem to 
support this use-case? are there alternatives. The reason why I use abline is because I first call 
plot to plot all the three datasets and then call abline to append these decision 
boundary lines to the existing plot ...


Hi Giovanni,
Try the ablineclip function in the plotrix package.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Normalizing plot tick values

2010-05-16 Thread Jim Lemon


On 05/16/2010 03:10 AM, rajesh j wrote:

Hi,

I have a plot whole tick values along the axis have a certain range 0 - x .
I need to normalize this range without changing my data files. for e.g.,
if my plot has tick values at 10,20,30,40,50... i have to make this 2,4,6,
etc. but without changing the plot data... I am hoping I can add something
to the plot command that goes like tick values divided by a quantity.
Any help is appreciated.


Hi Rajesh,
I think the axis.mult function in the plotrix package will do what you want.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vector recycling and zoo

2010-05-16 Thread David Winsemius



On May 16, 2010, at 2:00 AM, Sean Carmody wrote:

I am a bit confused about the different approaches taken to  
recycling in
plain data frames and zoo objects. When carrying out simple  
arithmetic,
dataframe seem to recycle single arguments, zoo objects do not. Here  
is an

example


x - data.frame(a=1:5*2, b=1:5*3)
x

  a  b
1  2  3
2  4  6
3  6  9
4  8 12
5 10 15

x$a/x$a[1]

[1] 1 2 3 4 5

x - zoo(x)
x$a/x$a[1]

1
1




I feel understanding this difference would lead me to a greater
understanding of the zoo module!


I think you do have misunderstandings about the zoo package but I do  
not think it is in the area of vector recycling. Notice the effect of  
your application of the zoo function to x:


 x$a
 1  2  3  4  5
 2  4  6  8 10
 x$a[1]
1
2

You have in effect transposed the elements in x and are now getting a  
two element column vector when requesting x$a[1].  The term vector  
recycling is applied to situations where short vectors are reused  
starting with their first elements until the necessary length is  
achieved. For instance if you request:


 data.frame(x=1:2, y=letters[1:10])
   x y
1  1 a
2  2 b
3  1 c
4  2 d
5  1 e
6  2 f
7  1 g
8  2 h
9  1 i
10 2 j

Or plot(1:10, col=c(red,green))



Sean.

--
Sean Carmody


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Loading Intraday Time Series Data

2010-05-16 Thread Steve Johns


Hi,

I am trying to load a data file that looks like this:

|Date,Time,Open,High,Low,Close,Up,Down
05/02/2001,0030,421.20,421.20,421.20,421.20,11,0
05/02/2001,0130,421.20,421.40,421.20,421.40,7,0
05/02/2001,0200,421.30,421.30,421.30,421.30,0,5
05/02/2001,0230,421.60,421.60,421.50,421.50,26,1|
etc.

into an R timeseries or ts object.

The key point is that both the date and time need to become part of the 
index.


With zoo, this line will load the data:

z - read.zoo(foo_hs.csv, format = %m/%d/%Y, sep=,, header = TRUE )

but the Time does not become part of the index this way.  This means the 
index is non-unique, and that is not the goal.


Could someone kindly show me a way, using R itself, to deal with the 
separate Date and Time columns so as to properly combine them into the 
index for the timeseries?


Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] number of location attribute with its name

2010-05-16 Thread David Winsemius



On May 16, 2010, at 2:24 AM, Agustín Muñoz M. (AMFOR) wrote:


hi everybody, a question, as I can know the location (number) of an
attribute with its name.

Ej.

X1  X2  X3  X4  X5  X6
1   3   5   2   1   7
6   7   4   5   2   9

as I can know that the attribute X4 is in position 4


It is probably not an attribute in R terms, but rather a column name.  
Your English is a bit unclear as to what you want but perhaps you want:


names(dataframe_name)[4]

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC-Error-sqlSave

2010-05-16 Thread Orvalho Augusto

Let us see if it is a R issue.

Try this:
Read the CSV on Ms Access directly. It is an importation on MsAccess.

If you succeed we will check R then.

Caveman


On Sun, May 16, 2010 at 11:48 AM, Johan Lassen johanlas...@gmail.com wrote:
 Dear R-community,

 After repeating the sqlSave-command 3 times on a dataframe (of size 13149
 rows * 5 columns) to my MS-Access database I get the following error:

 *Error in sqlSave(channel, eksport_transp_acc_2, transp_acc_scenarier,  :
 unable to append to table ‘transp_acc_scenarier’*
 **
 This means that the first 2 savings are completed, but the third-one
 is somehow not. I have an idea that perhaps it is due to some out-of-memory
 problem. My PC has 2 CPUs, 1.83 G Hz, 0.99 GB RAM.

 Have anyone got some idea of what causes and solves the problem? I have
 tried also with the function *gc()*, but without success.

 Thanks in advance,
 Best regards,
 Johan



 PS:
 I use the following code, where the file *eksport_transp_acc_2_rbind.csv* is
 of size 13149*5:


 *library(RODBC)*
 **
 *eksport_transp_acc_2 -
 read.table(file = results/csv/eksport_transp_acc_2_rbind.csv,
  sep =;, header = T)*
 **
 *sqlSave(channel,eksport_transp_acc_2,
 transp_acc_scenarier,append = T,fast = F,rownames = F)
 *





 --
 Johan Lassen

 In the cities people live in time -
 in the mountains people live in space

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loading Intraday Time Series Data

2010-05-16 Thread Gabor Grothendieck

In zoo the index= argument to read.zoo can be a vector of column
indices to indicate that the time is split across multiple columns and
the FUN= argument can be used to process the multiple columns.  In
this example the resulting z uses chron:


L - Date,Time,Open,High,Low,Close,Up,Down
05/02/2001,0030,421.20,421.20,421.20,421.20,11,0
05/02/2001,0130,421.20,421.40,421.20,421.40,7,0
05/02/2001,0200,421.30,421.30,421.30,421.30,0,5
05/02/2001,0230,421.60,421.60,421.50,421.50,26,1

library(zoo)
library(chron)

f - function(x) chron(paste(x[,1]), sprintf(%04d00, x[,2]), format
= c(M/D/Y, HMS))

# z - read.zoo(myfile.csv, index = 1:2, sep=,, header = TRUE, FUN  = f)

z - read.zoo(textConnection(L), index = 1:2, sep=,, header = TRUE, FUN  = f)


On Sun, May 16, 2010 at 7:22 AM, Steve Johns steve.jo...@verizon.net wrote:
 Hi,

 I am trying to load a data file that looks like this:

 |Date,Time,Open,High,Low,Close,Up,Down
 05/02/2001,0030,421.20,421.20,421.20,421.20,11,0
 05/02/2001,0130,421.20,421.40,421.20,421.40,7,0
 05/02/2001,0200,421.30,421.30,421.30,421.30,0,5
 05/02/2001,0230,421.60,421.60,421.50,421.50,26,1|
 etc.

 into an R timeseries or ts object.

 The key point is that both the date and time need to become part of the
 index.

 With zoo, this line will load the data:

 z - read.zoo(foo_hs.csv, format = %m/%d/%Y, sep=,, header = TRUE )

 but the Time does not become part of the index this way.  This means the
 index is non-unique, and that is not the goal.

 Could someone kindly show me a way, using R itself, to deal with the
 separate Date and Time columns so as to properly combine them into the index
 for the timeseries?

 Thanks!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loading Intraday Time Series Data

2010-05-16 Thread John Kane

Hi Steve,

I think what you want to do is get a unique time-date from the first two 
columns.  

Try something like this: (changing the file name obviously.
mydate should give you a time and date format that you can add to the existing 
data.frame.



mydata - read.table(C:/rdata/dates.junk.csv, header=TRUE, sep=,,
 colClasses=c(character,character, numeric , numeric,
   numeric, numeric, numeric, numeric))


 df1 - paste(mydata[,1], , mydata[,2]) 
 
 
mydates - strptime(df1, %d/%m/%Y %H%M)


--- On Sun, 5/16/10, Steve Johns steve.jo...@verizon.net wrote:

 From: Steve Johns steve.jo...@verizon.net
 Subject: [R] Loading Intraday Time Series Data
 To: r-help@r-project.org
 Received: Sunday, May 16, 2010, 7:22 AM
 Hi,
 
 I am trying to load a data file that looks like this:
 
 |Date,Time,Open,High,Low,Close,Up,Down
 05/02/2001,0030,421.20,421.20,421.20,421.20,11,0
 05/02/2001,0130,421.20,421.40,421.20,421.40,7,0
 05/02/2001,0200,421.30,421.30,421.30,421.30,0,5
 05/02/2001,0230,421.60,421.60,421.50,421.50,26,1|
 etc.
 
 into an R timeseries or ts object.
 
 The key point is that both the date and time need to become
 part of the index.
 
 With zoo, this line will load the data:
 
 z - read.zoo(foo_hs.csv, format = %m/%d/%Y,
 sep=,, header = TRUE )
 
 but the Time does not become part of the index this
 way.  This means the index is non-unique, and that is
 not the goal.
 
 Could someone kindly show me a way, using R itself, to deal
 with the separate Date and Time columns so as to properly
 combine them into the index for the timeseries?
 
 Thanks!
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Questions about ggplot2

2010-05-16 Thread Juliet Hannah

I started with the summarized data, and there are different ways to do
this. For this example, let there be four columns and a corresponding
sum of 1s.

library(ggplot2)
mydf  - data.frame(colname = c(A,B,C,D),mycolsum=c(1:4))
p - ggplot(mydf,aes(x=colname,y=mycolsum))
p - p + geom_bar(stat = identity)

# Here is one way a legend would be created, and how to remove it.

library(ggplot2)
mydf  - data.frame(colname = c(A,B,C,D),mycolsum=c(1:4))
p - ggplot(mydf,aes(fill=colname, x=colname,y=mycolsum))
p - p + geom_bar(stat = identity)
p + opts(legend.position = none)


On Thu, May 13, 2010 at 11:33 AM, Christopher David Desjardins
cddesjard...@gmail.com wrote:
 Hi I have two questions about using ggplot2.

 First, I have multiple columns of data that I would like to combine into one
 histogram where each column of data would correspond to one bar in the
 histogram. Each column has 0 or 1s and I want my bars in the histogram to
 correspond to the sum of the 1s in each column. Does that make sense?

 Second, is there a way to completely turn off the legend?

 Thanks!
 Chris

 PS - Please cc me on the email as I'm a digest subscriber.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vector recycling and zoo

2010-05-16 Thread Gabor Grothendieck

When you combine zoo objects with arithmetic it merges them using all = FALSE:

 library(zoo)
 x - data.frame(a=1:5*2, b=1:5*3)
 x - zoo(x); x
   a  b
1  2  3
2  4  6
3  6  9
4  8 12
5 10 15

 # these two are the same

 x$a/x$a[1]
1
1

 m - merge(x$a, x$a[1], all = FALSE)
 m
  x$a x$a[1]
1   2  2
 m[,1]/m[,2]
1
1





On Sun, May 16, 2010 at 3:00 AM, Sean Carmody seancarm...@gmail.com wrote:
 I am a bit confused about the different approaches taken to recycling in
 plain data frames and zoo objects. When carrying out simple arithmetic,
 dataframe seem to recycle single arguments, zoo objects do not. Here is an
 example

 x - data.frame(a=1:5*2, b=1:5*3)
 x
   a  b
 1  2  3
 2  4  6
 3  6  9
 4  8 12
 5 10 15
 x$a/x$a[1]
 [1] 1 2 3 4 5
 x - zoo(x)
 x$a/x$a[1]
 1
 1


 I feel understanding this difference would lead me to a greater
 understanding of the zoo module!

 Sean.

 --
 Sean Carmody
 Twitter: http://twitter.com/seancarmody
 Stable: http://mulestable.net/sean

 The Stubborn Mule
 Blog: http://www.stubbornmule.net
 Forum: http://mulestable.net/

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Box-Cox Transformation: Drastic differences when varying added constants

2010-05-16 Thread Holger Steinmetz


Dear experts,

I tried to learn about Box-Cox-transformation but found the following thing:

When I had to add a constant to make all values of the original variable
positive, I found that 
the lambda estimates (box.cox.powers-function) differed dramatically
depending on the specific constant chosen.

In addition, the correlation between the transformed variable and the
original were not 1 (as I think it should be to use the transformed variable
meaningfully) but much lower.

With higher added values (and a right skewed variable) the lambda estimate
was even negative and the correlation between the transformed variable and
the original varible was -.91!!?

I guess that is something fundmental missing in my current thinking about
box-cox...

Best,
Holger


P.S. Here is what i did:

# Creating of a skewed variable X (mixture of two normals)
x1 = rnorm(120,0,.5)
x2 = rnorm(40,2.5,2)
X = c(x1,x2)

# Adding a small constant
Xnew1 = X +abs(min(X))+ .1
box.cox.powers(Xnew1)
Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate)

# Adding a larger constant
Xnew2 = X +abs(min(X)) + 1
box.cox.powers(Xnew2)
Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate)

#Plotting it all
par(mfrow=c(3,2))
hist(X)
qqnorm(X)
qqline(X,lty=2) 
hist(Xtrans1)
qqnorm(Xtrans1)
qqline(Xtrans1,lty=2) 
hist(Xtrans2)
qqnorm(Xtrans2)
qqline(Xtrans2,lty=2) 

#correlation among original and transformed variables
round(cor(cbind(X,Xtrans1,Xtrans2)),2)
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Box-Cox-Transformation-Drastic-differences-when-varying-added-constants-tp2218490p2218490.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading JPEG file, converting to HEX

2010-05-16 Thread Dennis Fisher

Colleagues,

I am using R to assemble RTF documents (which are plain text).  I need to embed 
a JPEG graphic that was created with R.  I presume that the steps need to be:
a.  read the file into R
b.  convert the object to HEX format
c.  write the converted object to a textfile.  

If I read the file into R using readLines, I get the following (only the first 
5 lines shown):
  readLines(/path/to/file)
  [1] \xff\xd8\xff\xe0   
   
   
   
 
  [2] \002\xa3\003\001\ 
   
   
   
 
  [3] \v\xff\xc4 
   
   
   
 
  [4] 
 \026\027\030\031\032%'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xff\xc4
  
  [5] \v\xff\xc4 
   
   
   
 


and I also receive a number of warning messages:
 Warning messages:
 1: In encodeString(object, quote = \, na.encode = FALSE) :
   it is not known that wchar_t is Unicode on this platform

I assume (naively) that I need some other approach to reading the file.  I also 
presume (again, naively) that once I have read the file successfully, I can 
convert the contents to hex format.  However, it is not obvious to me what 
approach should be used to read the file.  I found the command read.jpeg (in 
rimage; of note, I needed to use the Windows version because the OSX version 
appears to be broken).  However, this command creates an imagematrix, which 
does not appear to be what I need.

Any thoughts?  

Thanks in advance.

Dennis

Dennis Fisher MD
P  (The P Less Than Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] p value

2010-05-16 Thread Bert Gunter

runif(1) 


Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Soham
Sent: Saturday, May 15, 2010 9:05 AM
To: r-help@r-project.org
Subject: [R] p value


How to compute the p-value of a statistic generally?
-- 
View this message in context:
http://r.789695.n4.nabble.com/p-value-tp2217867p2217867.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] function density (stats): parameter n

2010-05-16 Thread Paulo Barata



Dear R-list members,

About the parameter n of the function density() (Kernel Density
Estimation, package stats):

The R HTML documentation says about the parameter n: the number
of equally spaced points at which the density is to be estimated.
When n  512, it is rounded up to the next power of 2 for
efficiency reasons.
Note: 512 is the default size for n.

The code below:

data - rnorm(500)
d - density(data,n=800)
length(d$x)

produces this result:
[1] 800

Here, according to the R HTML documentation, d$x gives the n
coordinates of the points where the density is estimated.

Of course, given that n=800, the next power of 2 would be 1024.

With regard to the parameter n, does the R documentation match
what function density() actually does?

I am using R 2.11.0 running on Windows XP.

Thank you very much.

Paulo Barata

--
Paulo Barata
Fundacao Oswaldo Cruz - Oswaldo Cruz Foundation
Rua Leopoldo Bulhoes 1480 - 8A
21041-210  Rio de Janeiro - RJ
Brazil

E-mail: pbar...@infolink.com.br
Alternative e-mail: paulo.bar...@ensp.fiocruz.br

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot for linear discriminant

2010-05-16 Thread Hadley Wickham

Hi Giovanni,

Have a look at the classifly package for an alternative approach that
works for all classification algorithms.  If you provided a small
reproducible example, I could step through it for you.

Hadley

On Sat, May 15, 2010 at 6:19 AM, Giovanni Azua brave...@gmail.com wrote:
 Hello,

 I have a labelled dataset with three classes. I have computed manually the 
 LDA hyperplane that separate the classes from each other i.e.

 \hat{\delta}_j(x)=x^Tb_j + c_j where b_j \in \mathbb{R}^p and c_j \in 
 \mathbb{R}

 my concrete b_j looks like e.g.
 b_j - rbind(1,2)
 c_j - 3

 How can I plot y=x^Tb_j + c_j ??  two problems:

 1- I need lines and the dimension of my x is 2
 2- I would like the plotted lines to end when they intersect so they nicely 
 show the decision boundaries

 Any pointers? maybe an example with ggplot2 I could not find any from the 
 showcase documentation page ...

 Thanks in advance,
 Best regards,
 Giovanni
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading JPEG file, converting to HEX

2010-05-16 Thread jim holtman

Use readBin:

readBin('/path/to/file', 'raw', n=100)

Here is an example of reading in a JPEG file:

 x - readBin(fileName,'raw',n=100)
 str(x)
 raw [1:801403] ff d8 ff e1 ...
 # convert to a HEX string (on the first 20 bytes)
 paste(sprintf(%s, x[1:20]), collapse='')
[1] ffd8ffe1fffe457869664d4d002a0008



On Sun, May 16, 2010 at 11:13 AM, Dennis Fisher fis...@plessthan.comwrote:

 Colleagues,

 I am using R to assemble RTF documents (which are plain text).  I need to
 embed a JPEG graphic that was created with R.  I presume that the steps need
 to be:
a.  read the file into R
b.  convert the object to HEX format
c.  write the converted object to a textfile.

 If I read the file into R using readLines, I get the following (only the
 first 5 lines shown):
   readLines(/path/to/file)
   [1] \xff\xd8\xff\xe0
   [2] \002\xa3\003\001\
   [3] \v\xff\xc4
   [4]
 \026\027\030\031\032%'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xff\xc4
   [5] \v\xff\xc4


 and I also receive a number of warning messages:
  Warning messages:
  1: In encodeString(object, quote = \, na.encode = FALSE) :
it is not known that wchar_t is Unicode on this platform

 I assume (naively) that I need some other approach to reading the file.  I
 also presume (again, naively) that once I have read the file successfully, I
 can convert the contents to hex format.  However, it is not obvious to me
 what approach should be used to read the file.  I found the command
 read.jpeg (in rimage; of note, I needed to use the Windows version because
 the OSX version appears to be broken).  However, this command creates an
 imagematrix, which does not appear to be what I need.

 Any thoughts?

 Thanks in advance.

 Dennis

 Dennis Fisher MD
 P  (The P Less Than Company)
 Phone: 1-866-PLessThan (1-866-753-7784)
 Fax: 1-866-PLessThan (1-866-753-7784)
 www.PLessThan.com http://www.plessthan.com/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to profile R interpreter?

2010-05-16 Thread Erich Neuwirth

Look for Rprof in the utils package.


On 5/12/2010 9:22 PM, xiaoming gu wrote:
 Hi, all. Does anyone know how to profile R interpreter? I've tried gprof but
 it doesn't work. Thanks.
 
 Xiaoming

-- 
Erich Neuwirth, University of Vienna
Faculty of Computer Science
Center for Computer Science Didactics and Learning Research
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39902 Fax: +43-1-4277-39459

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Discretize factors?

2010-05-16 Thread Noah Silverman

Thanks,

That gives me exactly what I'm looking for.

Two quick questions:

1) What would be the fastest way to do this if I have other continuous
data as well.  For example,  I have a data frame with 10 variable and
want to discretize one of them using this method.  (Say, column 6 for
example.)
I thought something like this would work, but it gives me an error:
new.data - rbind(data[,1:5], model.matrix(~0+data[,6]), data[,7:10])

Error in rbind(deparse.level, ...) :
  numbers of columns of arguments do not match




2) What exactly is it doing?  It appears as if it is a formula similar
to lm, but not actually doing any regression?


Thanks again!

-N


On 5/15/10 11:17 AM, Thomas Stewart wrote:
 Maybe this?

 group - factor(c(A, B,B,C,C,C))
 model.matrix(~0+group)

 -tgs

 On Sat, May 15, 2010 at 2:02 PM, Noah Silverman
 n...@smartmediacorp.com mailto:n...@smartmediacorp.com wrote:

 Hi,

 I'm looking for an easy way to discretize factors in R

 I've noticed that the lm function does this automatically with a nice
 result.

 If I have

 group - c(A, B,B,C,C,C)

 and run:

 lm(result ~ x1 + group)

 The lm function has split the group into separate binary variables
 {0,1}
 before performing the regression.  I now have:
 groupA
 groupB
 groupC

 Some of the other models that I want to try won't accept factors, so
 they need to be discretized this way.

 Is there a command in R for this, or some easy shortcut?  (I tried
 digging into the lm code, but couldn't find where this is being done.)

 Thanks!

 -N

 __
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Box-Cox Transformation: Drastic differences when varying added constants

2010-05-16 Thread Peter Ehlers


On 2010-05-16 6:22, Holger Steinmetz wrote:


Dear experts,

I tried to learn about Box-Cox-transformation but found the following thing:

When I had to add a constant to make all values of the original variable
positive, I found that
the lambda estimates (box.cox.powers-function) differed dramatically
depending on the specific constant chosen.


Let's say that x is such that 1/x has a Normal distribution,
i.e. lambda = -1.
Then y = (1/x) + b also has a Normal distribution.
But you're expecting 1/(x+b) to also have a Normal distribution.



In addition, the correlation between the transformed variable and the
original were not 1 (as I think it should be to use the transformed variable
meaningfully) but much lower.


Again, your expectation is faulty. The relationship between the
original and transformed variables is not linear (otherwise,
why do the transformation?), but cor() computes the Pearson
correlation coefficient by default. Try method='spearman'.
Better yet, plot the transformed variables vs the original
variable for further enlightenment.

 -Peter Ehlers



With higher added values (and a right skewed variable) the lambda estimate
was even negative and the correlation between the transformed variable and
the original varible was -.91!!?

I guess that is something fundmental missing in my current thinking about
box-cox...

Best,
Holger


P.S. Here is what i did:

# Creating of a skewed variable X (mixture of two normals)
x1 = rnorm(120,0,.5)
x2 = rnorm(40,2.5,2)
X = c(x1,x2)

# Adding a small constant
Xnew1 = X +abs(min(X))+ .1
box.cox.powers(Xnew1)
Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate)

# Adding a larger constant
Xnew2 = X +abs(min(X)) + 1
box.cox.powers(Xnew2)
Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate)

#Plotting it all
par(mfrow=c(3,2))
hist(X)
qqnorm(X)
qqline(X,lty=2)
hist(Xtrans1)
qqnorm(Xtrans1)
qqline(Xtrans1,lty=2)
hist(Xtrans2)
qqnorm(Xtrans2)
qqline(Xtrans2,lty=2)

#correlation among original and transformed variables
round(cor(cbind(X,Xtrans1,Xtrans2)),2)


--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Discretize factors?

2010-05-16 Thread Noah Silverman

Update,

I have it working, but now its producing really ugly labels.  Must be a
small adjustment to the code.  Any ideas??

##Create example data.frame
group - c(A, B,B,C,C,C)
a - c(1,4,3,4,5,6)
b - c(5,4,5,3,4,5)
d - data.frame(cbind(a,b,group))

#create new frame with discretized group
cbind(d[,1:2], model.matrix(~0+d[,3]) )
  a b d[, 3]A d[, 3]B d[, 3]C
1 1 5   1   0   0
2 4 4   0   1   0
3 3 5   0   1   0
4 4 3   0   0   1
5 5 4   0   0   1
6 6 5   0   0   1


So, as you can see, it works, but the labels for the groups don't 

I then tried using the column name instead of number and still got ugly
results:

 cbind(d[,1:2], model.matrix(~0+d[,group]) )
  a b d[, group]A d[, group]B d[, group]C
1 1 5 1 0 0
2 4 4 0 1 0
3 3 5 0 1 0
4 4 3 0 0 1
5 5 4 0 0 1
6 6 5 0 0 1



Any ideas?

-N



On 5/15/10 11:02 AM, Noah Silverman wrote:
 Hi,

 I'm looking for an easy way to discretize factors in R

 I've noticed that the lm function does this automatically with a nice
 result.

 If I have

 group - c(A, B,B,C,C,C)

 and run:

 lm(result ~ x1 + group)

 The lm function has split the group into separate binary variables {0,1}
 before performing the regression.  I now have:
 groupA
 groupB
 groupC

 Some of the other models that I want to try won't accept factors, so
 they need to be discretized this way. 

 Is there a command in R for this, or some easy shortcut?  (I tried
 digging into the lm code, but couldn't find where this is being done.)

 Thanks!

 -N

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Splines under tension

2010-05-16 Thread sam.e


Thank you for the helpful direction to the smoothing splines function, it was
very helpful and is exactly what i am trying to do. My data however is 3-D,
i.e. i have x and y values which are coordinates for different field sites
and z values which are really what I am interested in analysing with
interpolation. This has posed a problem with many of the spline functions in
R. Even if i input my coordinate data as a matrix as my 'x' value and my
site data as my 'y' values i get the following error:

Error in xy.coords(x, y) : 'x' and 'y' lengths differ

I have made sure that there are the same amount of values and that they are
all of the same type, i.e. numeric but with little luck and i am a bit lost
as to what to try next. Does anyone have any suggestions?

Thanks, 

Sam
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Splines-under-tension-tp2173887p2218693.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot for linear discriminant

2010-05-16 Thread Giovanni Azua

Hello Hadley,

Thank you very much for your help! I have just received your book btw :)

On May 16, 2010, at 6:16 PM, Hadley Wickham wrote:
Hi Giovanni,

Have a look at the classifly package for an alternative approach that
works for all classification algorithms.  If you provided a small
reproducible example, I could step through it for you.

Hadley

Please find below a self contained example. I managed to complete the task 
using the graphics package. I would be curious to see how to get one of those 
really nice ggplot2 graphs with decision boundaries and class regions :) 

Thank you!
Best regards,
Giovanni

# 
=
# (1) Generate sample labelled data   
# 
=

rm(list=ls())   
  # clear workspace
library(mvtnorm)
 # needed for rmvnorm
set.seed(11)
   # predictability of results

sigma - cbind(c(0.5, 0.3), c(0.3, 0.5))
   # true covariance matrix
mu - matrix(0,nrow=3,ncol=2)
mu[1,] - c(3, 1.5) 
  # true mean vectors
mu[2,] - c(4,   4)
mu[3,] - c(8.5, 2)
x - matrix(0, nrow = 300, ncol = 3)
x[,3] - rep(1:3, each = 100)   
# class labels
x[1  :100,1:2] - rmvnorm(n = 100, mean = mu[1,], sigma = sigma)   # simulate 
data
x[101:200,1:2] - rmvnorm(n = 100, mean = mu[2,], sigma = sigma)
x[201:300,1:2] - rmvnorm(n = 100, mean = mu[3,], sigma = sigma)

# 
=
# (2) Plot the labelled data   
# 
=

##
## Function for plotting the data separated by classes, hacked out of predplot:
## http://stat.ethz.ch/teaching/lectures/FS_2010/CompStat/predplot.R
##
plotclasses - function(x, main = , len = 200, ...) {
xp - seq(min(x[,1]), max(x[,1]), length=len)
yp - seq(min(x[,2]), max(x[,2]), length=len)
grid - expand.grid(xp, yp)
colnames(grid) - colnames(x)[-3]
plot(x[,1],x[,2],col=x[,3],pch=x[,3],main=main,xlab='x_1',ylab='x_2')
text(2.5,4.8,Class 1,cex=.8)  
# class 1  
text(4.2,1.0,Class 2,cex=.8)  
# class 2
text(8.0,0.5,Class 3,cex=.8)  
# class 3
}

plotclasses(x)

# 
=
# (3) Functions needed: calculate separating hyperplane between two given 
# classes and converting hyperplanes to line equations for the p=2 case 
# 
=

##
## Returns the coefficients for the hyperplane that separates one class from 
another. 
## Computes the coefficients according to the formula: 
## $x^T\hat{\Sigma}^{-1}(\hat{\mu}_0-\hat{\mu}_1) - \frac{1}{2}(\hat{\mu}_0 + 
## 
\hat{\mu}_1)^T\hat{\Sigma}^{-1}(\hat{\mu}_0-\hat{\mu}_1)+\log(\frac{p_0}{p_1})$ 
 
##
## sigmainv(DxD) - precalculated sigma (covariance matrix) inverse
## mu1(1xD) - precalculated mu mean for class 1
## mu2(1xD) - precalculated mu mean for class 2
## prior1 - precalculated prior probability for class 1
## prior2 - precalculated prior probability for class 2
##
ownldahyperplane - function(sigmainv,mu1,mu2,prior1,prior2) {
J - nrow(mu)   
   # number of classes
b - sigmainv%*%(mu1 - mu2)
c - -(1/2)*t(mu1 + mu2)%*%sigmainv%*%(mu1 - mu2) + log(prior1/prior2) 
return(list(b=b,c=c))
}

##
## Returns linear betas (intersect and slopes) for the given hyperplane 
structure. 
## The structure is a list that matches the output of the function defined 
above. 
##
ownlinearize - function(sephyp) {
return(list(beta0=-sephyp$c/sephyp$b[2],
   # line slope and intersect
beta1=-sephyp$b[1]/sephyp$b[2]))
}

# 
=
# (4) Run lda  
# 
=

library(MASS)   
   # needed for lda/qda

# read in a function that plots

Re: [R] Attempt to customise the plotpc() function

2010-05-16 Thread Nikos Alexandris

Nikos Alexandris:
 Among the (R-)tools, I've seen on the net, for (bivariate) Principal
 Component scatter plots (+histograms), plotpc [1] is the one I like
 most.
 
[...]

 I started the modification by attempting first to get a prcomp version of
 plotpc() (named it plotpc.svd()) by altering the following:
 
[...]
 
 I am bit lost now about where I should continue looking for required
 modifications in the code. Any hints?

Once again I am replying to myself ;-)

I've spend many-many hours of searching in manuals, on the net and trying 
crazy things to understand where this mystical (to me) function un() could 
be sourced from. Eventually I decided to change all of its occurrences with 
unit() as I am sure this was the function meant to be used but I hit another 
wall, another unknown function my.plot.something(). I was then sure that they 
are sourced from somewhere.

At some point I was enlightened and had a look in the source plotpc.R where 
I just found what I was expecting to found ;-). It may look silly but to the 
unexperienced useR it is not. I was only looking at the print-out of plotpc 
(without the parentheses) and was puzzled that the un() function is nowhere.

  - Question: why isn't the whole source of plotpc.R printed out with 
plotpc? Or why isn't any clue given where this un() function is coming 
from? methods() and getAnywhere() say anything about it.

  * Note to self: look at the Source.R

Anyhow, it was a long learning-night-session and I am good to go. In fact, I 
am very close to what I want to do. I've added the option to use either 
prcomp or princomp as well as if the input dataset should be centered 
and/or scaled. I have still some strange issue with respect to how the plotted 
histograms (of PC1 and PC2) along with their text are flipped. Hopefully I'll 
fix this too.

I will be contacting soon the author to post him my modifications as 
enhancement wishes. If anybody is interested to know or help please post here.

Regards, Nikos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Attempt to customise the plotpc() function

2010-05-16 Thread Peter Ehlers


Nikos,

I think you can just replace the line

 pc - princomp(x[,1:2], scores=TRUE, na.action=na.fail)

with

pc - prcomp(x[,1:2], retx=TRUE, center=pc.center,
 scale.=pc.scale, na.action=na.fail)

and rename the components of pc

names(pc) - c('sdev', 'loadings', 'center', 'scale', 'scores')

and then use the rest of the plotpc() code as is (except for
maybe having to use flip1=TRUE, etc).

As to why other functions used in plotpc() are not printed
when you ask R to print plotpc(): why should they be? Can you
imagine the mess that would result if you got the printouts of
is.na(), pushViewport, popViewport, ...? Egad!

Anyway, as you've discovered, when you want to modify code, look
at the sources.

 -Peter Ehlers


On 2010-05-16 12:05, Nikos Alexandris wrote:

Nikos Alexandris:

Among the (R-)tools, I've seen on the net, for (bivariate) Principal
Component scatter plots (+histograms), plotpc [1] is the one I like
most.


[...]


I started the modification by attempting first to get a prcomp version of
plotpc() (named it plotpc.svd()) by altering the following:


[...]


I am bit lost now about where I should continue looking for required
modifications in the code. Any hints?


Once again I am replying to myself ;-)

I've spend many-many hours of searching in manuals, on the net and trying
crazy things to understand where this mystical (to me) function un() could
be sourced from. Eventually I decided to change all of its occurrences with
unit() as I am sure this was the function meant to be used but I hit another
wall, another unknown function my.plot.something(). I was then sure that they
are sourced from somewhere.

At some point I was enlightened and had a look in the source plotpc.R where
I just found what I was expecting to found ;-). It may look silly but to the
unexperienced useR it is not. I was only looking at the print-out of plotpc
(without the parentheses) and was puzzled that the un() function is nowhere.

   - Question: why isn't the whole source of plotpc.R printed out with
plotpc? Or why isn't any clue given where this un() function is coming
from? methods() and getAnywhere() say anything about it.

   * Note to self: look at the Source.R

Anyhow, it was a long learning-night-session and I am good to go. In fact, I
am very close to what I want to do. I've added the option to use either
prcomp or princomp as well as if the input dataset should be centered
and/or scaled. I have still some strange issue with respect to how the plotted
histograms (of PC1 and PC2) along with their text are flipped. Hopefully I'll
fix this too.

I will be contacting soon the author to post him my modifications as
enhancement wishes. If anybody is interested to know or help please post here.

Regards, Nikos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Discretize factors?

2010-05-16 Thread Peter Ehlers


On 2010-05-16 11:06, Noah Silverman wrote:

Update,

I have it working, but now its producing really ugly labels.  Must be a
small adjustment to the code.  Any ideas??

##Create example data.frame
group- c(A, B,B,C,C,C)
a- c(1,4,3,4,5,6)
b- c(5,4,5,3,4,5)
d- data.frame(cbind(a,b,group))

#create new frame with discretized group

cbind(d[,1:2], model.matrix(~0+d[,3]) )

   a b d[, 3]A d[, 3]B d[, 3]C
1 1 5   1   0   0
2 4 4   0   1   0
3 3 5   0   1   0
4 4 3   0   0   1
5 5 4   0   0   1
6 6 5   0   0   1


So, as you can see, it works, but the labels for the groups don't

I then tried using the column name instead of number and still got ugly
results:


cbind(d[,1:2], model.matrix(~0+d[,group]) )

   a b d[, group]A d[, group]B d[, group]C
1 1 5 1 0 0
2 4 4 0 1 0
3 3 5 0 1 0
4 4 3 0 0 1
5 5 4 0 0 1
6 6 5 0 0 1



Any ideas?



Can't you just use names(...) - c() on your final dataframe?

 -Peter Ehlers


-N



On 5/15/10 11:02 AM, Noah Silverman wrote:

Hi,

I'm looking for an easy way to discretize factors in R

I've noticed that the lm function does this automatically with a nice
result.

If I have

group- c(A, B,B,C,C,C)

and run:

lm(result ~ x1 + group)

The lm function has split the group into separate binary variables {0,1}
before performing the regression.  I now have:
groupA
groupB
groupC

Some of the other models that I want to try won't accept factors, so
they need to be discretized this way.

Is there a command in R for this, or some easy shortcut?  (I tried
digging into the lm code, but couldn't find where this is being done.)

Thanks!

-N



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Discretize factors?

2010-05-16 Thread Noah Silverman

I could, but with close to 100 columns, its messy.


On 5/16/10 11:22 AM, Peter Ehlers wrote:
 On 2010-05-16 11:06, Noah Silverman wrote:
 Update,

 I have it working, but now its producing really ugly labels.  Must be a
 small adjustment to the code.  Any ideas??

 ##Create example data.frame
 group- c(A, B,B,C,C,C)
 a- c(1,4,3,4,5,6)
 b- c(5,4,5,3,4,5)
 d- data.frame(cbind(a,b,group))

 #create new frame with discretized group
 cbind(d[,1:2], model.matrix(~0+d[,3]) )
a b d[, 3]A d[, 3]B d[, 3]C
 1 1 5   1   0   0
 2 4 4   0   1   0
 3 3 5   0   1   0
 4 4 3   0   0   1
 5 5 4   0   0   1
 6 6 5   0   0   1


 So, as you can see, it works, but the labels for the groups don't

 I then tried using the column name instead of number and still got ugly
 results:

 cbind(d[,1:2], model.matrix(~0+d[,group]) )
a b d[, group]A d[, group]B d[, group]C
 1 1 5 1 0 0
 2 4 4 0 1 0
 3 3 5 0 1 0
 4 4 3 0 0 1
 5 5 4 0 0 1
 6 6 5 0 0 1



 Any ideas?


 Can't you just use names(...) - c() on your final dataframe?

  -Peter Ehlers

 -N



 On 5/15/10 11:02 AM, Noah Silverman wrote:
 Hi,

 I'm looking for an easy way to discretize factors in R

 I've noticed that the lm function does this automatically with a nice
 result.

 If I have

 group- c(A, B,B,C,C,C)

 and run:

 lm(result ~ x1 + group)

 The lm function has split the group into separate binary variables
 {0,1}
 before performing the regression.  I now have:
 groupA
 groupB
 groupC

 Some of the other models that I want to try won't accept factors, so
 they need to be discretized this way.

 Is there a command in R for this, or some easy shortcut?  (I tried
 digging into the lm code, but couldn't find where this is being done.)

 Thanks!

 -N


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Attempt to customise the plotpc() function

2010-05-16 Thread Nikos Alexandris

Peter Ehlers wrote:
 Nikos,
 
 I think you can just replace the line
 
   pc - princomp(x[,1:2], scores=TRUE, na.action=na.fail)
 
 with
 
  pc - prcomp(x[,1:2], retx=TRUE, center=pc.center,
   scale.=pc.scale, na.action=na.fail)
 
 and rename the components of pc
 
  names(pc) - c('sdev', 'loadings', 'center', 'scale', 'scores')

Right. Υet, it is still not enough. I had to change the definition of the 
limits that feed viewport mainly because of the huge difference of an unscaled 
vs. scaled dataset before the pc-analysis takes place.

Because I want to give (me) the option to have really informative plots, I've 
added an extra grid.points() in case the data are transformed (centered and/or 
scaled) to print both the original and the transformed (with another pch 
and/or color) point cloud.
 
 and then use the rest of the plotpc() code as is (except for
 maybe having to use flip1=TRUE, etc).

Hmm... I am _now_ working on it to understand how I could make this 
automatic!.

If I give flip1, flip2 (=TRUE) the histograms are located where they should 
(optically) be printed but the text (rotation angle) that accompanies the 
histogram is I think not correct. It is quite the opposite angle that is being 
printed.

Any ideas?
 
 As to why other functions used in plotpc() are not printed
 when you ask R to print plotpc(): why should they be? Can you
 imagine the mess that would result if you got the printouts of
 is.na(), pushViewport, popViewport, ...? Egad!

Thank you Peter. I understand it now. [ Ignorant me but if you don't know 
something you will probably do mistakes (which is after all the learning 
process. ]
 
 Anyway, as you've discovered, when you want to modify code, look
 at the sources.

Thank you Peter. Kindest regards, Nikos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Attempt to customise the plotpc() function

2010-05-16 Thread Nikos Alexandris

Peter Ehlers wrote:
  and then use the rest of the plotpc() code as is (except for
  maybe having to use flip1=TRUE, etc).

Nikos:
 Hmm... I am _now_ working on it to understand how I could make this
 automatic!.
 
 If I give flip1, flip2 (=TRUE) the histograms are located where they should
 (optically) be printed but the text (rotation angle) that accompanies the
 histogram is I think not correct. It is quite the opposite angle that is
 being printed.
 
 Any ideas?

I was wrong. flip is about the location only and has nothing to do with the 
rotation angle of the principal components (right?). So it's ok.

Maybe there is still a way to auto-define the flips? But it's not worth 
spending time on it...

Thanks again, Nikos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC-Error-sqlSave

2010-05-16 Thread Johan Lassen

Thank you so much for pointing on this obvious check of the MS Access
database! Inspired, I tried to import the csv-file directly into the MS
Access database and I encountered an Error saying (freely translated from
Danish) : Cannot find search key.
The MS Access database is in MS Access-2000 format and I run MS office 2007
on my machine. Hence I tried to make a new MS Access-database in 2002-2003
format and did the same operations in R. With this new set-up for the
database I had no problems at all saving the large dataframe from R to the
new database. It did the saving of even much larger dataframes quickly.

So somehow, setting the database up in 2002-2003 format solved the problem
for me. Thank you very much!
2010/5/16 Orvalho Augusto orvaq...@gmail.com

 Let us see if it is a R issue.

 Try this:
 Read the CSV on Ms Access directly. It is an importation on MsAccess.

 If you succeed we will check R then.

 Caveman


 On Sun, May 16, 2010 at 11:48 AM, Johan Lassen johanlas...@gmail.com
 wrote:
  Dear R-community,
 
  After repeating the sqlSave-command 3 times on a dataframe (of size 13149
  rows * 5 columns) to my MS-Access database I get the following error:
 
  *Error in sqlSave(channel, eksport_transp_acc_2, transp_acc_scenarier,
  :
  unable to append to table transp_acc_scenarier*
  **
  This means that the first 2 savings are completed, but the third-one
  is somehow not. I have an idea that perhaps it is due to some
 out-of-memory
  problem. My PC has 2 CPUs, 1.83 G Hz, 0.99 GB RAM.
 
  Have anyone got some idea of what causes and solves the problem? I have
  tried also with the function *gc()*, but without success.
 
  Thanks in advance,
  Best regards,
  Johan
 
 
 
  PS:
  I use the following code, where the file *eksport_transp_acc_2_rbind.csv*
 is
  of size 13149*5:
 
 
  *library(RODBC)*
  **
  *eksport_transp_acc_2 -
  read.table(file = results/csv/eksport_transp_acc_2_rbind.csv,
   sep =;, header = T)*
  **
  *sqlSave(channel,eksport_transp_acc_2,
  transp_acc_scenarier,append = T,fast = F,rownames = F)
  *
 
 
 
 
 
  --
  Johan Lassen
 
  In the cities people live in time -
  in the mountains people live in space
 
 [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 




-- 
Johan Lassen

In the cities people live in time -
in the mountains people live in space (Budistisk munk).

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to profile R interpreter?

2010-05-16 Thread Sharpie



Erich Neuwirth wrote:
 
 Look for Rprof in the utils package.
 

This was already suggested- but the original poster clarified that he is
looking to profile the R interpreter it's self, not R scripts.

-
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-profile-R-interpreter-tp2196633p2218846.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] GSA getting ngenes into the list

2010-05-16 Thread Loren Engrav

Greetings

I suppose a simple matter for R experts but for me...

Am using GSA and have the GSA.obj (class GSA) created with GSA that contains
the value ngenes visible with GSA.obj$ngenes

And I have the list object (class list) created with GSA.listsets that
contains
Table of negative sets with score, p and FDR
Table of positive sets with same

But the list object does not contain ngenes.

How do I manipulate ngenes into the list?

Suggestions appreciated.

Thank you.

-- 
Loren Engrav, MD
Professor and Chief, Plastic Surgery, 1977-2001
Associate Director, Burn Center, 1977-2001
Univ Washington
Seattle


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Discretize factors?

2010-05-16 Thread Thomas Stewart

Maybe this will lead you to an acceptable solution.  Note that changed how
the data set is created.  (In your example, the numeric variables were being
converted to factor variables.  It seems to me that you want something
different.)  The key difference between my code and yours is that I use the
variable name in the model matrix function; that is, I use ~0+grp instead of
~0+d[,3].  As seen below, this change creates non-ugly results.

 grp - c(A, B,B,C,C,C)
 a - c(1,4,3,4,5,6)
 b - c(5,4,5,3,4,5)
 d - data.frame(a=a,b=b,grp=grp)

 str(d)
'data.frame':   6 obs. of  3 variables:
 $ a  : num  1 4 3 4 5 6
 $ b  : num  5 4 5 3 4 5
 $ grp: Factor w/ 3 levels A,B,C: 1 2 2 3 3 3

 d-cbind(d,model.matrix(~0+grp,data=d))

 d
  a b grp grpA grpB grpC
1 1 5   A100
2 4 4   B010
3 3 5   B010
4 4 3   C001
5 5 4   C001
6 6 5   C001
 str(d)
'data.frame':   6 obs. of  6 variables:
 $ a   : num  1 4 3 4 5 6
 $ b   : num  5 4 5 3 4 5
 $ grp : Factor w/ 3 levels A,B,C: 1 2 2 3 3 3
 $ grpA: num  1 0 0 0 0 0
 $ grpB: num  0 1 1 0 0 0
 $ grpC: num  0 0 0 1 1 1

If you are trying to automate the process---convert factor variables to
dummy variables without direct user input of variables names---you have
several options.  Here is a quick function I wrote that you may have to
alter for your own needs.

-tgs

grp - c(A, B,B,C,C,C)
sex-c(m,m,m,f,f,f)
educ-c(none,some,some,grad,law,med)
a - c(1,4,3,4,5,6)
b - c(5,4,5,3,4,5)
d - data.frame(a=a,b=b,grp=grp,sex=sex,educ=educ)

Factors.to.dummies-function(data){
Factor.Flag-sapply(data,is.factor)
formula-paste(~0+,paste(colnames(data)[Factor.Flag],collapse=+),sep=)
data2-model.matrix(as.formula(formula),data=data)
return(cbind(data,data2))}

Factors.to.dummies(d)
  a b grp sex educ grpA grpB grpC sexm educlaw educmed educnone educsome
1 1 5   A   m none1001   0   010
2 4 4   B   m some0101   0   001
3 3 5   B   m some0101   0   001
4 4 3   C   f grad0010   0   000
5 5 4   C   f  law0010   1   000
6 6 5   C   f  med0010   0   100

On Sun, May 16, 2010 at 2:24 PM, Noah Silverman n...@smartmediacorp.comwrote:

 I could, but with close to 100 columns, its messy.


 On 5/16/10 11:22 AM, Peter Ehlers wrote:
  On 2010-05-16 11:06, Noah Silverman wrote:
  Update,
 
  I have it working, but now its producing really ugly labels.  Must be a
  small adjustment to the code.  Any ideas??
 
  ##Create example data.frame
  group- c(A, B,B,C,C,C)
  a- c(1,4,3,4,5,6)
  b- c(5,4,5,3,4,5)
  d- data.frame(cbind(a,b,group))
 
  #create new frame with discretized group
  cbind(d[,1:2], model.matrix(~0+d[,3]) )
 a b d[, 3]A d[, 3]B d[, 3]C
  1 1 5   1   0   0
  2 4 4   0   1   0
  3 3 5   0   1   0
  4 4 3   0   0   1
  5 5 4   0   0   1
  6 6 5   0   0   1
 
 
  So, as you can see, it works, but the labels for the groups don't
 
  I then tried using the column name instead of number and still got ugly
  results:
 
  cbind(d[,1:2], model.matrix(~0+d[,group]) )
 a b d[, group]A d[, group]B d[, group]C
  1 1 5 1 0 0
  2 4 4 0 1 0
  3 3 5 0 1 0
  4 4 3 0 0 1
  5 5 4 0 0 1
  6 6 5 0 0 1
 
 
 
  Any ideas?
 
 
  Can't you just use names(...) - c() on your final dataframe?
 
   -Peter Ehlers
 
  -N
 
 
 
  On 5/15/10 11:02 AM, Noah Silverman wrote:
  Hi,
 
  I'm looking for an easy way to discretize factors in R
 
  I've noticed that the lm function does this automatically with a nice
  result.
 
  If I have
 
  group- c(A, B,B,C,C,C)
 
  and run:
 
  lm(result ~ x1 + group)
 
  The lm function has split the group into separate binary variables
  {0,1}
  before performing the regression.  I now have:
  groupA
  groupB
  groupC
 
  Some of the other models that I want to try won't accept factors, so
  they need to be discretized this way.
 
  Is there a command in R for this, or some easy shortcut?  (I tried
  digging into the lm code, but couldn't find where this is being done.)
 
  Thanks!
 
  -N
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] Discretize factors?

2010-05-16 Thread Peter Ehlers


And if you do have many variables in one dataframe, you might
wish to construct the formula first using paste():

 nm - c(0, names(d)[-c(1,2)])
 fo - as.formula(paste(~, paste(nm, collapse= +)))
 d - cbind(d, model.matrix(fo, data=d)

 -Peter Ehlers

On 2010-05-16 15:30, Thomas Stewart wrote:

Maybe this will lead you to an acceptable solution.  Note that changed how
the data set is created.  (In your example, the numeric variables were being
converted to factor variables.  It seems to me that you want something
different.)  The key difference between my code and yours is that I use the
variable name in the model matrix function; that is, I use ~0+grp instead of
~0+d[,3].  As seen below, this change creates non-ugly results.


grp- c(A, B,B,C,C,C)
a- c(1,4,3,4,5,6)
b- c(5,4,5,3,4,5)
d- data.frame(a=a,b=b,grp=grp)

str(d)

'data.frame':   6 obs. of  3 variables:
  $ a  : num  1 4 3 4 5 6
  $ b  : num  5 4 5 3 4 5
  $ grp: Factor w/ 3 levels A,B,C: 1 2 2 3 3 3


d-cbind(d,model.matrix(~0+grp,data=d))

d

   a b grp grpA grpB grpC
1 1 5   A100
2 4 4   B010
3 3 5   B010
4 4 3   C001
5 5 4   C001
6 6 5   C001

str(d)

'data.frame':   6 obs. of  6 variables:
  $ a   : num  1 4 3 4 5 6
  $ b   : num  5 4 5 3 4 5
  $ grp : Factor w/ 3 levels A,B,C: 1 2 2 3 3 3
  $ grpA: num  1 0 0 0 0 0
  $ grpB: num  0 1 1 0 0 0
  $ grpC: num  0 0 0 1 1 1

If you are trying to automate the process---convert factor variables to
dummy variables without direct user input of variables names---you have
several options.  Here is a quick function I wrote that you may have to
alter for your own needs.

-tgs

grp- c(A, B,B,C,C,C)
sex-c(m,m,m,f,f,f)
educ-c(none,some,some,grad,law,med)
a- c(1,4,3,4,5,6)
b- c(5,4,5,3,4,5)
d- data.frame(a=a,b=b,grp=grp,sex=sex,educ=educ)

Factors.to.dummies-function(data){
Factor.Flag-sapply(data,is.factor)
formula-paste(~0+,paste(colnames(data)[Factor.Flag],collapse=+),sep=)
data2-model.matrix(as.formula(formula),data=data)
return(cbind(data,data2))}

Factors.to.dummies(d)
   a b grp sex educ grpA grpB grpC sexm educlaw educmed educnone educsome
1 1 5   A   m none1001   0   010
2 4 4   B   m some0101   0   001
3 3 5   B   m some0101   0   001
4 4 3   C   f grad0010   0   000
5 5 4   C   f  law0010   1   000
6 6 5   C   f  med0010   0   100

On Sun, May 16, 2010 at 2:24 PM, Noah Silvermann...@smartmediacorp.comwrote:


I could, but with close to 100 columns, its messy.


On 5/16/10 11:22 AM, Peter Ehlers wrote:

On 2010-05-16 11:06, Noah Silverman wrote:

Update,

I have it working, but now its producing really ugly labels.  Must be a
small adjustment to the code.  Any ideas??

##Create example data.frame
group- c(A, B,B,C,C,C)
a- c(1,4,3,4,5,6)
b- c(5,4,5,3,4,5)
d- data.frame(cbind(a,b,group))

#create new frame with discretized group

cbind(d[,1:2], model.matrix(~0+d[,3]) )

a b d[, 3]A d[, 3]B d[, 3]C
1 1 5   1   0   0
2 4 4   0   1   0
3 3 5   0   1   0
4 4 3   0   0   1
5 5 4   0   0   1
6 6 5   0   0   1


So, as you can see, it works, but the labels for the groups don't

I then tried using the column name instead of number and still got ugly
results:


cbind(d[,1:2], model.matrix(~0+d[,group]) )

a b d[, group]A d[, group]B d[, group]C
1 1 5 1 0 0
2 4 4 0 1 0
3 3 5 0 1 0
4 4 3 0 0 1
5 5 4 0 0 1
6 6 5 0 0 1



Any ideas?



Can't you just use names(...)- c() on your final dataframe?

  -Peter Ehlers


-N



On 5/15/10 11:02 AM, Noah Silverman wrote:

Hi,

I'm looking for an easy way to discretize factors in R

I've noticed that the lm function does this automatically with a nice
result.

If I have

group- c(A, B,B,C,C,C)

and run:

lm(result ~ x1 + group)

The lm function has split the group into separate binary variables
{0,1}
before performing the regression.  I now have:
groupA
groupB
groupC

Some of the other models that I want to try won't accept factors, so
they need to be discretized this way.

Is there a command in R for this, or some easy shortcut?  (I tried
digging into the lm code, but couldn't find where this is being done.)

Thanks!

-N



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] predict.lda breaks when priors are specified

2010-05-16 Thread Andrew Redd

Dear R help,

What am I doing wrong here? when I don't specify the priors it works
just fine but when I specify the priors it breaks.  Does anyone know
why and how I can fix it?

 N=2
 ncontrol=ncases=50
 X - as.matrix(rnorm(N,0,1))
 eta - -5.3 + X * 1.7
 p - exp(eta)/(1+exp(eta))
 Y - rbinom(N,1,p)
 controls - sample(seq_len(N), ncontrol, prob=!Y)
 cases - sample(seq_len(N), ncases, prob=Y)
 data-rbind(
+ data.frame(Y = 0, X = cbind(1,X[controls,])),
+ data.frame(Y = 1, X = cbind(1,X[cases,])))
 head(data)
  Y X.1    X.2
1 0   1  0.6965323
2 0   1 -0.0817520
3 0   1  2.8673412
4 0   1 -0.2351386
5 0   1  0.2653452
6 0   1 -1.2437612
 m - lda(Y~X,subset=c(controls,cases),priors=c(.95,.05))
 predict(m)
Error in model.frame.default(formula = Y ~ X, priors = c(0.95, 0.05),  :
  variable lengths differ (found for '(priors)')
 predict(m,prior=c(.95,0.05))
Error in model.frame.default(formula = Y ~ X, priors = c(0.95, 0.05),  :
  variable lengths differ (found for '(priors)')
---
Thanks,
Andrew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predict.lda breaks when priors are specified

2010-05-16 Thread Andrew Redd

Never mind. Stupid misplaced 's'.
-Andrew

On Sun, May 16, 2010 at 5:39 PM, Andrew Redd ar...@stat.tamu.edu wrote:
 Dear R help,

 What am I doing wrong here? when I don't specify the priors it works
 just fine but when I specify the priors it breaks.  Does anyone know
 why and how I can fix it?
 
 N=2
 ncontrol=ncases=50
 X - as.matrix(rnorm(N,0,1))
 eta - -5.3 + X * 1.7
 p - exp(eta)/(1+exp(eta))
 Y - rbinom(N,1,p)
 controls - sample(seq_len(N), ncontrol, prob=!Y)
 cases - sample(seq_len(N), ncases, prob=Y)
 data-rbind(
 + data.frame(Y = 0, X = cbind(1,X[controls,])),
 + data.frame(Y = 1, X = cbind(1,X[cases,])))
 head(data)
   Y X.1    X.2
 1 0   1  0.6965323
 2 0   1 -0.0817520
 3 0   1  2.8673412
 4 0   1 -0.2351386
 5 0   1  0.2653452
 6 0   1 -1.2437612
 m - lda(Y~X,subset=c(controls,cases),priors=c(.95,.05))
 predict(m)
 Error in model.frame.default(formula = Y ~ X, priors = c(0.95, 0.05),  :
   variable lengths differ (found for '(priors)')
 predict(m,prior=c(.95,0.05))
 Error in model.frame.default(formula = Y ~ X, priors = c(0.95, 0.05),  :
   variable lengths differ (found for '(priors)')
 ---
 Thanks,
 Andrew


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vector recycling and zoo

2010-05-16 Thread Sean Carmody

Thanks David,

You comment made me realise that whereas when x is a data frame, x$a is a
numeric vector,
when x is of class zoo, x$a is also of class zoo, so the following does what
I was expecting:

x$a/as.numeric(x$a[1])

Sean.

On Sun, May 16, 2010 at 9:25 PM, David Winsemius dwinsem...@comcast.netwrote:


 On May 16, 2010, at 2:00 AM, Sean Carmody wrote:

  I am a bit confused about the different approaches taken to recycling in
 plain data frames and zoo objects. When carrying out simple arithmetic,
 dataframe seem to recycle single arguments, zoo objects do not. Here is an
 example

  x - data.frame(a=1:5*2, b=1:5*3)
 x

  a  b
 1  2  3
 2  4  6
 3  6  9
 4  8 12
 5 10 15

 x$a/x$a[1]

 [1] 1 2 3 4 5

 x - zoo(x)
 x$a/x$a[1]

 1
 1



 I feel understanding this difference would lead me to a greater
 understanding of the zoo module!


 I think you do have misunderstandings about the zoo package but I do not
 think it is in the area of vector recycling. Notice the effect of your
 application of the zoo function to x:

  x$a

  1  2  3  4  5
  2  4  6  8 10
  x$a[1]
 1
 2

 You have in effect transposed the elements in x and are now getting a two
 element column vector when requesting x$a[1].  The term vector recycling is
 applied to situations where short vectors are reused starting with their
 first elements until the necessary length is achieved. For instance if you
 request:

  data.frame(x=1:2, y=letters[1:10])
   x y
 1  1 a
 2  2 b
 3  1 c
 4  2 d
 5  1 e
 6  2 f
 7  1 g
 8  2 h
 9  1 i
 10 2 j

 Or plot(1:10, col=c(red,green))


 Sean.

 --
 Sean Carmody






-- 
Sean Carmody
Twitter: http://twitter.com/seancarmody
Stable: http://mulestable.net/sean

The Stubborn Mule
Blog: http://www.stubbornmule.net
Forum: http://mulestable.net/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vector recycling and zoo

2010-05-16 Thread Gabor Grothendieck

Normally that would be written like this using the coredata extraction
function which extracts the data portion of a zoo object:

x$a / coredata( x$a[1] )

On Sun, May 16, 2010 at 7:32 PM, Sean Carmody seancarm...@gmail.com wrote:
 Thanks David,

 You comment made me realise that whereas when x is a data frame, x$a is a
 numeric vector,
 when x is of class zoo, x$a is also of class zoo, so the following does what
 I was expecting:

 x$a/as.numeric(x$a[1])

 Sean.

 On Sun, May 16, 2010 at 9:25 PM, David Winsemius 
 dwinsem...@comcast.netwrote:


 On May 16, 2010, at 2:00 AM, Sean Carmody wrote:

  I am a bit confused about the different approaches taken to recycling in
 plain data frames and zoo objects. When carrying out simple arithmetic,
 dataframe seem to recycle single arguments, zoo objects do not. Here is an
 example

  x - data.frame(a=1:5*2, b=1:5*3)
 x

  a  b
 1  2  3
 2  4  6
 3  6  9
 4  8 12
 5 10 15

 x$a/x$a[1]

 [1] 1 2 3 4 5

 x - zoo(x)
 x$a/x$a[1]

 1
 1



 I feel understanding this difference would lead me to a greater
 understanding of the zoo module!


 I think you do have misunderstandings about the zoo package but I do not
 think it is in the area of vector recycling. Notice the effect of your
 application of the zoo function to x:

  x$a

  1  2  3  4  5
  2  4  6  8 10
  x$a[1]
 1
 2

 You have in effect transposed the elements in x and are now getting a two
 element column vector when requesting x$a[1].  The term vector recycling is
 applied to situations where short vectors are reused starting with their
 first elements until the necessary length is achieved. For instance if you
 request:

  data.frame(x=1:2, y=letters[1:10])
   x y
 1  1 a
 2  2 b
 3  1 c
 4  2 d
 5  1 e
 6  2 f
 7  1 g
 8  2 h
 9  1 i
 10 2 j

 Or plot(1:10, col=c(red,green))


 Sean.

 --
 Sean Carmody






 --
 Sean Carmody
 Twitter: http://twitter.com/seancarmody
 Stable: http://mulestable.net/sean

 The Stubborn Mule
 Blog: http://www.stubbornmule.net
 Forum: http://mulestable.net/

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vector recycling and zoo

2010-05-16 Thread Gabor Grothendieck

Or even:

with(x, a / coredata(a[1]) )


On Sun, May 16, 2010 at 7:48 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 Normally that would be written like this using the coredata extraction
 function which extracts the data portion of a zoo object:

 x$a / coredata( x$a[1] )

 On Sun, May 16, 2010 at 7:32 PM, Sean Carmody seancarm...@gmail.com wrote:
 Thanks David,

 You comment made me realise that whereas when x is a data frame, x$a is a
 numeric vector,
 when x is of class zoo, x$a is also of class zoo, so the following does what
 I was expecting:

 x$a/as.numeric(x$a[1])

 Sean.

 On Sun, May 16, 2010 at 9:25 PM, David Winsemius 
 dwinsem...@comcast.netwrote:


 On May 16, 2010, at 2:00 AM, Sean Carmody wrote:

  I am a bit confused about the different approaches taken to recycling in
 plain data frames and zoo objects. When carrying out simple arithmetic,
 dataframe seem to recycle single arguments, zoo objects do not. Here is an
 example

  x - data.frame(a=1:5*2, b=1:5*3)
 x

  a  b
 1  2  3
 2  4  6
 3  6  9
 4  8 12
 5 10 15

 x$a/x$a[1]

 [1] 1 2 3 4 5

 x - zoo(x)
 x$a/x$a[1]

 1
 1



 I feel understanding this difference would lead me to a greater
 understanding of the zoo module!


 I think you do have misunderstandings about the zoo package but I do not
 think it is in the area of vector recycling. Notice the effect of your
 application of the zoo function to x:

  x$a

  1  2  3  4  5
  2  4  6  8 10
  x$a[1]
 1
 2

 You have in effect transposed the elements in x and are now getting a two
 element column vector when requesting x$a[1].  The term vector recycling is
 applied to situations where short vectors are reused starting with their
 first elements until the necessary length is achieved. For instance if you
 request:

  data.frame(x=1:2, y=letters[1:10])
   x y
 1  1 a
 2  2 b
 3  1 c
 4  2 d
 5  1 e
 6  2 f
 7  1 g
 8  2 h
 9  1 i
 10 2 j

 Or plot(1:10, col=c(red,green))


 Sean.

 --
 Sean Carmody






 --
 Sean Carmody
 Twitter: http://twitter.com/seancarmody
 Stable: http://mulestable.net/sean

 The Stubborn Mule
 Blog: http://www.stubbornmule.net
 Forum: http://mulestable.net/

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sapply code

2010-05-16 Thread Roslina Zakaria

Hi r-users,
 
I have this code here, but I just wonder how do I use 'sapply' to make it more 
efficient

lamda_cor - eigen(winter_cor)$values
 
 lamda_cor
[1] 1.3459066 1.0368399 0.8958128 0.7214407
 
lamda_cxn - function(dt)
{ n   - length(dt)
  term    - vector(length=n, mode=numeric)
  
  for (i in 1:n)
  { term[i] - (dt[i]/n)*log(dt[i]/n) }
 
  #sum(term)
  cxn - 1 + (1/log(n))*sum(term)
  cxn
}
lamda_cxn(lamda_cor)
 lamda_cxn(lamda_cor)
[1] 0.01861457
 
Thank you so much for all helps given.

 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sapply code

2010-05-16 Thread Henrique Dallazuanna

Try this:

1 + (1 / log(length(lambda_cor))) * sum((l - lambda_cor /
length(lambda_cor)) * log(l))


On Sun, May 16, 2010 at 10:43 PM, Roslina Zakaria zrosl...@yahoo.comwrote:

 Hi r-users,

 I have this code here, but I just wonder how do I use 'sapply' to make it
 more efficient

 lamda_cor - eigen(winter_cor)$values

  lamda_cor
 [1] 1.3459066 1.0368399 0.8958128 0.7214407

 lamda_cxn - function(dt)
 { n   - length(dt)
   term- vector(length=n, mode=numeric)

   for (i in 1:n)
   { term[i] - (dt[i]/n)*log(dt[i]/n) }

   #sum(term)
   cxn - 1 + (1/log(n))*sum(term)
   cxn
 }
 lamda_cxn(lamda_cor)
  lamda_cxn(lamda_cor)
 [1] 0.01861457

 Thank you so much for all helps given.





[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Splines under tension

2010-05-16 Thread William Dunlap

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of sam.e
 Sent: Sunday, May 16, 2010 10:13 AM
 To: r-help@r-project.org
 Subject: Re: [R] Splines under tension

 Thank you for the helpful direction to the smoothing splines 
 function, it was
 very helpful and is exactly what i am trying to do. My data 
 however is 3-D,
 i.e. i have x and y values which are coordinates for 
 different field sites
 and z values which are really what I am interested in analysing with
 interpolation.

Look into the 'Tps' (thin plate splines) package.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 This has posed a problem with many of the 
 spline functions in
 R. Even if i input my coordinate data as a matrix as my 'x' 
 value and my
 site data as my 'y' values i get the following error:

 Error in xy.coords(x, y) : 'x' and 'y' lengths differ

 I have made sure that there are the same amount of values and 
 that they are
 all of the same type, i.e. numeric but with little luck and i 
 am a bit lost
 as to what to try next. Does anyone have any suggestions?

 Thanks, 

 Sam
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/Splines-under-tension-tp2173887p
2218693.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Normalizing plot tick values

2010-05-16 Thread rajesh j

Hi,
I tried using this in my plot and I get an error saying
'at' and 'labels' lengths differ, 8 != 5

~Rajesh

On Sat, May 15, 2010 at 10:47 PM, Henrique Dallazuanna www...@gmail.comwrote:

 Try this:

 x - 1:100
 plot(x, xaxt = 'n')
 axis(1, axTicks(1), pretty(x) / 10)

 On Sat, May 15, 2010 at 2:10 PM, rajesh j akshay.raj...@gmail.com wrote:

 Hi,

 I have a plot whole tick values along the axis have a certain range 0 - x
 .
 I need to normalize this range without changing my data files. for e.g.,
 if my plot has tick values at 10,20,30,40,50... i have to make this 2,4,6,
 etc. but without changing the plot data... I am hoping I can add something
 to the plot command that goes like tick values divided by a quantity.
 Any help is appreciated.

 Rajesh

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O




-- 
Rajesh.J

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

56 matches

Mail list logo