Re: [R] Ignoring missing elements in data.frame()

2010-06-05 Thread Joris Meys
Hi,

One possible way to get around it is using following idea :
X1 - rnorm(10)
X2 - rnorm(10)

Names - c(X1,X2,X3)
Names - Names[Names %in% ls()]

n - length(Names)
p - 10   #length of each object
output - matrix(NA,ncol=n,nrow=p)

for(i in 1:n){
output[,i] - get(Names[i])
}
output - as.data.frame(output)
names(output) - Names

You can also use an eval-parse construct like this :
## Alternative
Names - c(X1,X2,X3)
Names - Names[Names %in% ls()]
Names - paste(Names,collapse=,)
expr = paste(output - data.frame(,Names,),sep=)
eval(parse(text=expr))

Both are not really the most optimal solution, but do work. It would
be better if you made a list or matrix beforehand and then save the
results of the calculations in that list or matrix whenever the
calculation turns out to give a result.

Cheers
Joris

On Sat, Jun 5, 2010 at 1:23 AM, Scott Chamberlain scham...@rice.edu wrote:
 Hello, I am trying to make a data frame from many elements after
 running a function which creates many elements, some of which may not
 end up being real elements due to errors or missing data. For example,
 I have the following three elements p1s, p2s, and p3s. p9s did not
 generate the same data as there was an error in the function for some
 reason. I currently have to delete p9s from the data.frame() command
 to get the data.frame to work.  How can I make a data frame by somehow
 ignoring elements (e.g., p9s) that do not exist, without having to
 delete each missing element from data.frame()? The below is an example
 of the code.

 p1s
  statistic parameter p.value
 [1,] 3.606518  153   0.0004195377
 p2s
  statistic parameter p.value
 [1,] -3.412436 8 0.009190015
 p3s
  statistic parameter p.value
 [1,] 1.543685  599   0.1231928

 t(data.frame(t(p1s),t(p2s),t(p3s),t(p9s)))
 Error in t(p9s) : object 'p9s' not found


 Thanks, Scott Chamberlain
 Rice University
 Houston, TX

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon test output as a table

2010-06-05 Thread Joris Meys
# not tested
out - rbind(as.numeric(Wnew),as.numeric(P))
rownames(out) - c(Wnew,P)

Cheers

On Sat, Jun 5, 2010 at 11:18 PM, Iurie Malai iurie.ma...@gmail.com wrote:
 Hi!

 I searched some time ago a way to get the Wilcoxon test results as a table
 more or less formatted. Nobody told me any solution and I found nothing on
 the Internet. Recently I came across this link (
 http://myowelt.blogspot.com/2008/04/beautiful-correlation-tables-in-r.html),
 which helped me to find a solution.

 Here's the solution (I'm using R Commander):

 W - as.matrix(lapply(Dataset[2:11], function(x) wilcox.test(x ~ GrFac,
 alternative=two.sided, data=Dataset)$statistic))
 P - as.matrix(lapply(Dataset[2:11], function(x) wilcox.test(x ~ GrFac,
 alternative=two.sided, data=Dataset)$p.value))
 W - format(W, digits = 5, nsmall = 2)
 P - format(P, digits = 1, nsmall = 3)
 Wnew - matrix(paste(W), ncol=ncol(Dataset[2:11]))
 colnames(Wnew) - paste(colnames(Dataset[2:11]))
 Wnew
 P

 This is the output (excerpt):
 Wnew
     X1        X2        X3        X4        X5        X6        X7
 X8        IA        IV
 [1,] 4582.50 4335.50 4610.50 4008.50 6409.50 6064.50 5126.50
 6861.50 4305.50 5769.00
 P
  [1] 0.301 0.100 0.336 0.013 5e-04 0.008 0.756 4e-06 0.089
 0.059

 Can anyone share their views? Propose an improvement? For example, how to
 make it appear as a table, not as separate rows? How to remove quotes? How
 to show rows names W and P?

 Regards,
 Iurie Malai, Senior Lecturer
 Department of Psychology
 Faculty of Psychology and Special Education
 Ion Creanga Moldova Pedagogical State University - www.upsm.md
 http://en.wikipedia.org/wiki/Ion_Creang%C4%83_Pedagogical_State_University

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2010-06-05 Thread Joris Meys
OK, as you're new:

1) this is a list about R, not about statistics.
2) it looks awkwardly much like a homework assignment. People tend to
be not really keen on solving those ones.
3) READ THE POSTING GUIDELINES. Seriously, read them.
http://www.R-project.org/posting-guide.html

As a tip : go through the archives and search on zero inflated
negative binomial or ZINB. You'll find tons of discussions about the
code, including very recent ones.

Cheers
Joris

On Sat, Jun 5, 2010 at 3:19 PM, cahyo kristiono
cahyo_kristi...@yahoo.com wrote:
 Dear Sirs

 First herewith I'll introduce myself. My name is Kristiono, I want ask you to 
 help me how to get  ZINB (Zero Inflated Negative Binomial) regression 
 modeling step by step.
 Anyway, I get some trouble to get step by step about
 1.    How to get the log likelihood function of ZINB (step by step)
 2.    How to get first derivative, second derifative to get MLE by Newton 
 Raphson (step by step)
 3.    Syntax program

 I want ask you to help me please to solve my trouble above, because I'm very 
 realy need it soon.
  I will thank you a lot for your help.
  Sincerely.
 Kristiono


        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R2HTML problem

2010-06-05 Thread Joris Meys
Tinn-R is using the R2HTML package itself for communication with R.
You could ask JC Faria who wrote Tinn-R what exactly is going on
there. You might get more help here :
http://sourceforge.net/projects/tinn-r/support

Personally, I'd just use a different editor in this case. I love
Tinn-R, but it's no use programming in an editor that interacts with
your code.

Cheers
Joris

On Sat, Jun 5, 2010 at 4:57 PM, RGtk2User iagoco...@gmail.com wrote:

 Im developing an application with R and Gtk+. It's just a simple GUI which
 helps new users to interactuate with R. Thing is, when you do a statistical
 analysis, I also want to provide a HTML report, but HTMLStart doesnt work
 propperly when executing from TinnR. It does create the file but not empty,
 I've tried some examples from different websites, and it's always the same..
 it works if I execute it from the R prompt, but doesnt when it comes to
 execute it from TinnR. So, any ideas? Im trying to divide the internal code
 of the HTMLStart function to find out where it crashes, but I couldnt find
 it yet. Thanks in advance

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon test output as a table

2010-06-05 Thread Joris Meys
Can't reproduce those with your code and your dataset.
I also noticed some other unwanted behaviour by using as.numeric : it
changes the formatting again. You won't get rid of the  as that
indicates it's a character, and you won't be able to format the
numbers as the columns in a dataframe or in a matrix have all the same
formatting.

If you want to generate output for a function or so, you can play
around with cat() (see ?cat ). If it's for a report, think about using
latex or HTML and the xtable package. There are other options, but
that requires a bit more info.

And your code is not very optimal.

setwd(c:/Temp)
Dataset - read.table(Dataset.txt,header=T,sep=,)

W - apply(Dataset[2:11],2, function(x) wilcox.test(x ~ GrFac,
alternative=two.sided, data=Dataset)$statistic)
P - apply(Dataset[2:11],2, function(x) wilcox.test(x ~ GrFac,
alternative=two.sided, data=Dataset)$p.value)
W - format(W, digits = 5, nsmall = 2)
P - format(P, digits = 1, nsmall = 3)

out - rbind(W,P)
rownames(out) - c(W,P)
colnames(out) - colnames(Dataset[2:11])


If you know latex, you can use following package to get
library(xtable)
xtable(out) # latex output

#html output
outtable - xtable(out)
print(outtable,type=html)

On Sat, Jun 5, 2010 at 11:35 PM, Iurie Malai iurie.ma...@gmail.com wrote:
 Thank you, Joris!

 I received two identical warnings:

 [14] WARNING: Warning in if (nchar(cmd) = width) return(cmd) :
  the condition has length  1 and only the first element will be used
 [15] WARNING: Warning in if (nchar(cmd) = width) return(cmd) :
  the condition has length  1 and only the first element will be used

 2010/6/6 Joris Meys jorism...@gmail.com

 # not tested
 out - rbind(as.numeric(Wnew),as.numeric(P))
 rownames(out) - c(Wnew,P)



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a maxtrix from conditional prints

2010-06-05 Thread Joris Meys
Use rbind? Not the most optimal solution, but it should get the job done.

# not tested
Code example:

out - c()
for (x in 1:10) {
for (y in 1:10) {
qui - ifelse((mac[,1] == x)  (mac[,5] == y) |  (mac[,1] == y)  (mac[,5]
== x), 1, NA)
quo - cbind(mac,qui)
qua - subset(quo, qui ==1)
if(nrow(qua) == 2)
print(qua)
out - rbind(out,qua)
}}


On Fri, Jun 4, 2010 at 9:08 PM, EM evilmas...@gmail.com wrote:
 Hi guys :)

 I'm dealing with this problem, perhaps conceptually not that complex, but
 still - I'm stuck.

 Two columns, values 1x10, only integers. I want to check when the first
 column's index is identical to the second's (and vice versa). If that's
 true, I want to add a further column with value 1 (if true) or NA (if
 false).
 Thus, I obtain 100 matrices (for each columns I will have 1-1, 1-2, 1-3
 etc). Now, I want R to  consider only those matrices whose new column has
 value = 1  whose total number of rows is equal to 2. I can get R to print
 this result inside the for cycle, yet I can't manage to build a single
 matrix, to store all the results altoghether - which is what I really want.


 Code example:

 for (x in 1:10) {
 for (y in 1:10) {
 qui - ifelse((mac[,1] == x)  (mac[,5] == y) |  (mac[,1] == y)  (mac[,5]
 == x), 1, NA)
 quo - cbind(mac,qui)
 qua - subset(quo, qui ==1)
 if(nrow(qua) == 2)
 print(qua)
 }}

 result (wrong, now):

     ricevente genere_r abo_r classieta_r donatore genere_d abo_d
 classieta_d    eta_d mismatch pra comp       mum qui
 [1,]         8        0     1           3        9        1     1
 4 56.17437        2   1    1 -6.645437   1
 [2,]         9        1     1           2        8        0     1
 3 48.77579        2   1    1 -5.905579   1
     ricevente genere_r abo_r classieta_r donatore genere_d abo_d
 classieta_d    eta_d mismatch pra comp       mum qui
 [1,]         8        0     1           3       10        0     0
 3 48.77579        2   1    1 -5.905579   1
 [2,]        10        0     2           5        8        0     1
 3 48.77579        1   1    1 -5.391579   1
     ricevente genere_r abo_r classieta_r donatore genere_d abo_d
 classieta_d    eta_d mismatch pra comp       mum qui
 [1,]         8        0     1           3        9        1     1
 4 56.17437        2   1    1 -6.645437   1
 [2,]         9        1     1           2        8        0     1
 3 48.77579        2   1    1 -5.905579   1
     ricevente genere_r abo_r classieta_r donatore genere_d abo_d
 classieta_d    eta_d mismatch pra comp       mum qui
 [1,]         9        1     1           2       10        0     0
 3 48.77579        0   1    1 -4.877579   1
 [2,]        10        0     2           5        9        1     1
 4 56.17437        0   1    1 -5.617437   1

 what I'd like to get:

     ricevente genere_r abo_r classieta_r donatore genere_d abo_d
 classieta_d    eta_d mismatch pra comp       mum qui
 [1,]         8        0     1           3        9        1     1
 4 56.17437        2   1    1 -6.645437   1
 [2,]         9        1     1           2        8        0     1
 3 48.77579        2   1    1 -5.905579   1
 [3,]         8        0     1           3       10        0     0
 3 48.77579        2   1    1 -5.905579   1
 [4,]        10        0     2           5        8        0     1
 3 48.77579        1   1    1 -5.391579   1
 [5,]         8        0     1           3        9        1     1
 4 56.17437        2   1    1 -6.645437   1
 [6,]         9        1     1           2        8        0     1
 3 48.77579        2   1    1 -5.905579   1
 [7,]         9        1     1           2       10        0     0
 3 48.77579        0   1    1 -4.877579   1
 [8,]        10        0     2           5        9        1     1
 4 56.17437        0   1    1 -5.617437   1

 (don't mind the values  names, this is just a small part of a longer
 algorithm)

 Thanks for your help, in advance :)

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error Bar Issues

2010-06-05 Thread Joris Meys
you can't refer to an argument within a function call. Try

uiw - Saline[,3]

plotCI(x=Saline [,1],y=Saline [,2], uiw=uiw, liw=uiw, err=y, pch=21,
pt.bg=par(bg), cex=1.5, lty=1, type=o, gap=0, sfrac=0.005,
xlim=c(-21,340),xaxp=c(-20,320,11), xlab=Time (min), ylim=c(0,12),
yaxp=c(0,12,11), ylab=Arterial Plasma Acetaminophen (µg/mL), las=1,
font.lab=2, add=TRUE)

Cheers
Joris

On Sat, Jun 5, 2010 at 6:09 PM, beloitstudent schu...@beloit.edu wrote:

 Hello all,

 I am an undergraduate student who is having syntax issues trying to get
 error bars on my graph.

 This is the data, which I assigned the name Saline to.
  Time       Average       SEM
 1   -20      0.00     0.000
 2
 3    30      0.00     0.000
 4    45      3.227902     0.7462524
 5    60      5.04     1.1623944
 6    80      6.107491     1.5027762
 7   110     6.968231     1.3799637
 8   140     7.325713     1.2282053
 9   200     7.875194     1.1185175
 10  260    6.513927     0.5386359
 11  320    4.204342     0.6855906

 This is the command that I typed in to get my error bars.

 plotCI(x=Saline [,1],y=Saline [,2], uiw=Saline [,3], liw=uiw, err=y, pch=21,
 pt.bg=par(bg), cex=1.5, lty=1, type=o, gap=0, sfrac=0.005,
 xlim=c(-21,340),xaxp=c(-20,320,11), xlab=Time (min), ylim=c(0,12),
 yaxp=c(0,12,11), ylab=Arterial Plasma Acetaminophen (µg/mL), las=1,
 font.lab=2, add=TRUE)

 And this is the error message I keep getting
 Error in plotCI(x = Saline[, 1], y = Saline[, 2], uiw = Saline[, 3], liw =
 uiw,  :
  object 'uiw' not found
 In addition: Warning message:
 In if (err == y) z - y else z - x :
  the condition has length  1 and only the first element will be used

 Now, to me, the command seems correct.
 I want the error bars to show up where the points on my graph are...so the x
 coordinates should be my time (aka Saline [1]) and the y coordinates should
 be my Averages (aka Saline [2])  and my upper and lower limits to my
 confidence interval should be the SEM from Saline [3], but something is
 wrong with this and I cannot figure out what it is.  If anyone has
 suggestions I would be very grateful.  Thanks for your help!

 beloitstudent
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Error-Bar-Issues-tp2244335p2244335.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ordinal variables

2010-06-04 Thread Joris Meys
Hi,

If you look around a bit, there is some great material on the web
about the powers and quirks of R. I've taught myself most of what I
know from R through reading a lot and trying it out on the console.
The help list is also a darn fine source of efficient code for a set
of general problems.

It won't help any more this year, but I'm working on a guide for R to
bundle valuable information I got from the help list and the internet.
It should be ready in a couple of months, and it will be available for
all to use. In any case, Owen's guide is of great value for an
introduction to the command line and basic statistics:
http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf

Also the introduction to R is a must-read for all our students :
http://cran.r-project.org/doc/manuals/R-intro.pdf

Next to that, a couple of websites are great additional sources of code :
Quick-R, a guide for those who come over from SAS/SPSS/Stata. It
contains tons of examples for statistical analyses in about every
field. If you didn't know it yet, you'll love it for sure :
http://www.statmethods.net/

The R graph gallery, to show what exactly can be done with the
graphical power of R :
http://addictedtor.free.fr/graphiques/

The R Graphics gallery, doing the same :
http://research.stowers-institute.org/efg/R/

There's many more to be found, a whole community of users is
contributing to the information in various ways. We give the sources
mentioned here to our students, with the message that they should
never underestimate the power of Google.

Last but not least, there is a specific mailing list regarding
teaching statistics using R:
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching

You might want to take a look at their archives as well.

Cheers
Joris

On Fri, Jun 4, 2010 at 6:39 AM, Iasonas Lamprianou lampria...@yahoo.com wrote:
 Thanks, I'll have a go and will let you know. I guess that the success has to 
 do with how efficiently I help them to demonstrate the efficiency of code 
 over menues. So part of the issue is how I teach them as well...


 Dr. Iasonas Lamprianou


 Assistant Professor (Educational Research and Evaluation)
 Department of Education Sciences
 European University-Cyprus
 P.O. Box 22006
 1516 Nicosia
 Cyprus
 Tel.: +357-22-713178
 Fax: +357-22-590539


 Honorary Research Fellow
 Department of Education
 The University of Manchester
 Oxford Road, Manchester M13 9PL, UK
 Tel. 0044  161 275 3485
 iasonas.lampria...@manchester.ac.uk


 --- On Thu, 3/6/10, S Ellison s.elli...@lgc.co.uk wrote:

 From: S Ellison s.elli...@lgc.co.uk
 Subject: Re: [R] ordinal variables
 To: Joris Meys jorism...@gmail.com, Iasonas Lamprianou 
 lampria...@yahoo.com
 Cc: r-help@r-project.org
 Date: Thursday, 3 June, 2010, 15:44
 If you set them a problem that has
 them doing the same sort of thing
 five times and compare the time it takes with code pasted
 from an editor
 (eg Tinn-R) and the time it takes via menius, you may have
 more luck
 convincing them.

 A command line sequence is harder than menus the first two
 times but
 easier for any n iterations thereafter.

 Steve ellison

  Iasonas Lamprianou lampria...@yahoo.com
 03/06/2010 14:51 
 Thank you Joris,
 I'll have a look into the commands you sent me. They look
 convincing. I
 hope my students will also see them in a positive way
 (although I can
 force them to pretend that they have a positive attitude)!

 Dr. Iasonas Lamprianou





 Assistant Professor (Educational Research and Evaluation)

 Department of Education Sciences

 European University-Cyprus

 P.O. Box 22006

 1516 Nicosia

 Cyprus

 Tel.: +357-22-713178

 Fax: +357-22-590539





 Honorary Research Fellow

 Department of Education

 The University of Manchester

 Oxford Road, Manchester M13 9PL, UK

 Tel. 0044  161 275 3485

 iasonas.lampria...@manchester.ac.uk


 --- On Thu, 3/6/10, Joris Meys jorism...@gmail.com
 wrote:

 From: Joris Meys jorism...@gmail.com
 Subject: Re: [R] ordinal variables
 To: Iasonas Lamprianou lampria...@yahoo.com
 Cc: r-help@r-project.org

 Date: Thursday, 3 June, 2010, 14:35

 see ?factor and ?as.factor. On ordered factors you can
 technically do a
 spearman without problem, apart from the fact that a
 spearman test by
 definition cannot give exact p-values with ties present.

 x - sample(c(a,b,c,d,e),100,replace=T)

 y - sample(c(a,b,c,d,e),100,replace=T)

 x.ordered -
 factor(x,levels=c(e,b,a,d,c),ordered=T)

 x.ordered
 y.ordered -
 factor(y,levels=c(e,b,a,d,c),ordered=T)
 y.ordered

 cor.test(x.ordered,y.ordered,method=spearman)

 require(pspearman)

 spearman.test(x.ordered,y.ordered)

 R commander has some menu options to deal with factors. R
 commander
 also provides a scripting window. Please do your students a
 favor, and
 show them how to use those commands.


 Cheers
 Joris


 On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou
 lampria...@yahoo.com
 wrote:

 Dear colleagues,



 I teach statistics using SPSS. I want to use R instead. I
 hit on one
 problem and I

Re: [R] Handling of par() with variables

2010-06-04 Thread Joris Meys
I think you misunderstand the working of par(). If you set new
parameters, R allows you to store the old parameters simultaneously.

Take a look at :

par(no.readonly=T)
oldpar - par(mar=c(1,1,1,1),tck=0.02)
par(no.readonly=T)
par(oldpar)
par(no.readonly=T)

So your line :
newpar - par(mar=c(3.1,3.1,0.1,0.1),  # margin for figure area
oma=c(0,0,0,0),  # margin for outer figure area
cex.axis=0.9,  # font size axis
mgp=c(2,0.6,0),  # distance of axis
tck=0.02# major ticks inside
)
actually stores the OLD parameters in newpar, and not the new ones. If
you want to set them using a variable, you'll need something like :
newmar - c(3.1,3.1,1.0,1.0) # store the mar values in a variable
oldpar - par(mar=newmar) # set the mar and store the old values
...
par(oldpar) # back to the old parameters

Cheers
Joris
On Fri, Jun 4, 2010 at 11:40 AM, Steffen Uhlig
steffen.uh...@htw-saarland.de wrote:
 Hello!

 In order to plot multiple graphs with the same setup I use the
 following code-structure:

 ###
 # storing old parameter set
 oldpar - par(no.readonly=T)t

 #copying old parameter set
 newpar - par(no.readonly=T)

 #adjusting parameters
 newpar - par(mar=c(3.1,3.1,0.1,0.1),  # margin for figure area
             oma=c(0,0,0,0),          # margin for outer figure area
             cex.axis=0.9,          # font size axis
             mgp=c(2,0.6,0),          # distance of axis
             tck=0.02                # major ticks inside
             )

 ...
 ...
 postscript(...)
 par(newpar)
 ...
 dev.off()
 ###

 Calling the variable newpar delivers the old paramter set only (from
 code-line newpar - par(no.readonly=T)). If the code-segment newpar
 - par(mar=... runs a second time, the correct paramter set is
 stored, however, just the 5 parameters adjusted and not the full list.

 My question is, why must the code segment newpar-par(mar...) run
 twice? Is there a better way to handle the graphics output? I would be
 grateful for a pointer on a FAQ-section or to an older discussion
 thread in this group!

 Thank you very much in advance!

 Regards,
 /steffen

 --
 Steffen Uhlig, PhD
 Mechatronik und Sensortechnik
 HTW des Saarlandes
 Goebenstraße 40
 66117 Saarbrücken

 Tel.: +49 (0) 681 58 67 274

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] save in for loop

2010-06-04 Thread Joris Meys
On a side note:

On Thu, May 20, 2010 at 9:43 AM, Ivan Calandra
ivan.calan...@uni-hamburg.de wrote:
 Thanks to all of you for your answers!

 ...

 Tao, I don't understand why you have backslashes before file and after
 .rda. I guess it's something about regular expression, but I'm still
 very new to it.
 eval(parse(text=paste(save(file, i, , file=\file, i, .rda\),
 sep=)))

Very simple: You need to give a command as a string. In the save
command, you have to put quotation marks around the filename. Now
within the paste function, a simple quotation mark would make R
believe the string to paste ends there, and you don't want that. So
you escape the  by typing \, then R knows you want to add the symbol
 to the string instead of end it.  :

 paste(save(file, i, , file=\file, i, .rda\),sep=)
[1] save(file2, file=\file2.rda\)

 parse(text=paste(save(file, i, , file=\file, i, .rda\),sep=))

expression(save(file2, file=file2.rda))
attr(,srcfile)
text

 paste(save(file, i, , file=file, i, .rda),sep=)
Error: unexpected symbol in paste(save(file, i, , file=file

Hope it's a bit more clear now.
Cheers
Joris
-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ps-output and LaTeX/DVIPS/PS2PDF - Greek letters disappear

2010-06-04 Thread Joris Meys
That's a problem of LateX and Ubuntu, not R :

https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/319495

You'll have more luck on an Ubuntu list or forum.

Cheers
Joris

On Fri, Jun 4, 2010 at 11:47 AM, Steffen Uhlig
steffen.uh...@htw-saarland.de wrote:
 Hello!

 My graphs are produced using the postscript-option in R (R version 2.10.1
 (2009-12-14)). When Greek letters are used on the axis, everything looks
 fine in the *.ps-file. If included in a LaTeX-file and (on Ubuntu 10.04,
 fresh install), the Greek letters appear in the DVI- and PS-output, however,
 if converted with ps2pdf they suddenly disappear. Could anyone suggest a
 solution?

 Best regards,
 /steffen

 --
 Steffen Uhlig, PhD
 Mechatronik und Sensortechnik
 HTW des Saarlandes
 Goebenstraße 40
 66117 Saarbrücken

 Tel.: +49 (0) 681 58 67 274

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R with Emacs

2010-06-04 Thread Joris Meys
Emacs ESS :
http://ess.r-project.org/

Cheers
Joris

On Fri, Jun 4, 2010 at 12:55 PM, dhanush dhana...@gmail.com wrote:

 I want to know how Emacs works with R. can anyone provide me a link or manual
 to read? Thank you
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/R-with-Emacs-tp2243022p2243022.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tinn-R keyboard problem

2010-06-04 Thread Joris Meys
Tinn-R works with SDI. Make sure you have both the settings in R and
the Rprofile.site correct. If the bug persists with the latest version
of Tinn-R, look for help on :

http://sourceforge.net/projects/tinn-r/support

Cheers
Joris

On Fri, Jun 4, 2010 at 11:56 AM, dhidh23061972 carsten.giess...@gmx.net wrote:

 I have the same problem. I also installed the older stable version (1.17.2.4,
 compatible version with MDI), but with no success. The keyboard worked fine
 before. I use Windows XP. Is there any solution?

 Many thanks, Carsten
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Tinn-R-keyboard-problem-tp839036p2242964.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package mgcv inconsistency in help files? cyclic P-spline cs not cyclic?

2010-06-04 Thread Joris Meys
Dear all,

I'm a bit stunned by the behaviour of a gam model using cyclic
P-spline smoothers.  I cannot provide the data, as I have about 61.000
observations from a time series.

I use the following model :
testgam - gam(NO~s(x)+s(y,bs=cs)+s(DD,bs=cs)+s(TT),data=Final)

The problem lies with the cyclic smoother I use for seasonal trends.
The variable Final$y is a numerical variable, going from 1 to 366,
representing the day of the year. I have hourly data from 2003 until
2009, so each day is represented 168 times in the dataset (apart from
366, that one only 48). DD is the wind direction, going from 1 to
3600, and is also modeled with the same cyclic smoother. Yet, if I
check the predictions, the smoother for y is far from cyclic.

I checked the help files ?smooth.terms, and found about 10 lines apart :

bs=cs specifies a shrinkage version of cr.

bs=cs gives a cyclic version of a P-spline.

When I use the (bs=cc) option, I get the results as I want them, so
I keep with the cyclic cubic splines for now. Yet, I find the
behaviour of bs=cs puzzling, and I'm wondering whether I missed
something, or if this really is an inconsistency in the package.

I currently run mgcv 1.6-1 on R 2.10.1

A small example showing what I experience. Mind you that here x is in
fact NOT cyclic, whereas in my data I'm sure it has to be :
y - rep(1:20,200)
x - 1:4000
DD - sample(1:360,4000,replace=T)
TT - sample(-10:10,4000,replace=T)
NO - TT^2 + (10-y+2)^2 + 10*sin(DD*2*pi/360) - 0.002*sqrt(x) +rnorm(4000,0,100)

model - gam(NO~s(x)+s(y,bs=cs)+s(DD,bs=cs)+s(TT))
plot(model)

model - gam(NO~s(x)+s(y,bs=cc)+s(DD,bs=cc)+s(TT))
plot(model)


Cheers
Joris

-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function

2010-06-03 Thread Joris Meys
This is what you asked for.

 Prod2007 - 1:10

 Prod2006 - Prod2007/1+c(0,diff(Prod2007))

 Prod2005 - Prod2006+(1+c(0,diff(Prod2006)))

 Prod2004 - Prod2005+(1+c(0,diff(Prod2005)))

 Prod2006
 [1]  1  3  4  5  6  7  8  9 10 11

 Prod2005
 [1]  2  6  6  7  8  9 10 11 12 13

 Prod2004
 [1]  3 11  7  9 10 11 12 13 14 15

Sure that's what you want?

On Thu, Jun 3, 2010 at 12:30 PM, n.via...@libero.it n.via...@libero.itwrote:


 Dear list,
 I would like to ask you a question. I'm trying to build  the time series'
 production with the Divisia index. The final step would require to do the
 following calculations:
 a)PROD(2006)=PROD(2007)/1+[DELTA_PROD(2007)]
 b)PROD(2005)=PROD(2006)+[1+DELTA_PROD(2006)]
 c)PROD(2004)=PROD(2005)+[1+DELTA_PROD(2005)]
 my question is how can I tell R to take the value generated in the previous
 step (for example is the case of the produciton of 2005 that need the value
 of the production of 2006) in order to generate the time series production??
 (PS:my data.frame is not set as a time series)
 Thanks for your attention!!



[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gam error

2010-06-03 Thread Joris Meys
Data?

On Thu, Jun 3, 2010 at 1:24 PM, natalieh fbs...@leeds.ac.uk wrote:


 Hi all,

 I'm trying to use a gam (mgcv package) to analyse some data with a roughly
 U
 shaped curve. My model is very simple with just one explanatory variable:

 m1-gam(CoT~s(incline))

 However I just keep getting the error message

 Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
  A term has fewer unique covariate combinations than specified maximum
 degrees of freedom

 Just wondering if anyone had come across this before/ could offer any
 advice
 on where the problem might lie

 Many thanks,

 Natalie



 --
 View this message in context:
 http://r.789695.n4.nabble.com/gam-error-tp2241518p2241518.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Continous variables with implausible transformation?

2010-06-03 Thread Joris Meys
x - rnorm(100,10,1)
sqrtx - sqrt(x)
y - rbinom(100,1,0.5)

lrm(y~x+sqrtx)

works. What's the problem?

But you wrote linear+ square. Don't you mean:
lrm(Y~x+x^2)

Cheers

On Thu, Jun 3, 2010 at 6:34 AM, zhu yao mailzhu...@gmail.com wrote:

 Dear r users

 I have a question in coding continuous variables in logistic regression.

 When rcs is used in transforming variables, sometime it gives implausible
 associations with the outcome although the model x2 is high.

 So what's your tips and tricks in coding continuous variables.

 P.S. How to code variables as linear+square in the formula such as lrm.
 lrm(y~x+sqrt(x)) can't work.

 Many thanks.

 Yao Zhu.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compare results of glms

2010-06-03 Thread Joris Meys
Mailing this twice ain't going to help you. Reading a course on statistics
might.

The test you want to do is answering following hypothesis : The mean
predicted value of a specific model differs when different datasets are used
to fit it. Seems likely to me if the datasets are not almost identical. Why
testing?

About that Z-test : that should be used in your field of research to test 2
proportions that are not too close to 0 or 1 and that originate from a
binomial distribution with large enough n. Suggesting to use it for
comparing a number of series of around 20 logit-transformed predicted
probabilities is plain shocking.

In case you are interested in the difference of the intercept for these
specific trials, add trial as a fixed effect to your model and do the
appropriate testing. You want to know whether the relation between state and
days differs in slope, you add an interaction term and again use the
appropriate testing. To know what is the appropriate testing, see line 1.

Cheers
Joris

On Thu, Jun 3, 2010 at 10:31 AM, Sacha Viquerat
sacha.v...@googlemail.comwrote:

 dear list!
 i have run several glm analysises to estimate a mean rate of dung decay for
 independent trials. i would like to compare these results statistically but
 can't find any solution. the glm calls are:

 dung.glm1-glm(STATE~DAYS, data=o_cov, family=binomial(link=logit))

 dung.glm2-glm(STATE~DAYS, data=o_cov_T12, family=binomial(link=logit))

 as all the trials have different sample sizes (around 20 each),

 anova(dung.glm1, dung.glm2)

 is not applicable. has anyone an idea?
 thanks in advance!

 ps: my advisor urges me to use the z-test (the common test statistic in my
 field of research), but i reject that due to the small sample size.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ordinal variables

2010-06-03 Thread Joris Meys
see ?factor and ?as.factor. On ordered factors you can technically do a
spearman without problem, apart from the fact that a spearman test by
definition cannot give exact p-values with ties present.

x - sample(c(a,b,c,d,e),100,replace=T)
y - sample(c(a,b,c,d,e),100,replace=T)

x.ordered - factor(x,levels=c(e,b,a,d,c),ordered=T)
x.ordered
y.ordered - factor(y,levels=c(e,b,a,d,c),ordered=T)
y.ordered

cor.test(x.ordered,y.ordered,method=spearman)

require(pspearman)
spearman.test(x.ordered,y.ordered)

R commander has some menu options to deal with factors. R commander also
provides a scripting window. Please do your students a favor, and show them
how to use those commands.

Cheers
Joris


On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou lampria...@yahoo.comwrote:

 Dear colleagues,

 I teach statistics using SPSS. I want to use R instead. I hit on one
 problem and I need some quick advice. When I want to work with ordinal
 variables, in SPSS I can compute the median or create a barchart or compute
 a spearman correlation with no problems. In R, if I read the ordinal
 variable as numeric, then I cannot do a barplot because I miss the category
 names. If I read the variables as characters, then I cannot run a spearman.
 How can I read a variable as numeric, still have the chance to assign value
 labels, and be able to get table of frequencies etc? I want to be able to do
 all these things in R commander. My students will probable be scared away if
 I try anything else other than R commander (just writing commands will not
 make them happy).

 I hope I am not asking for too much. Hopefully there is a way




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: the global deviance is increasing

2010-06-03 Thread Joris Meys
=
   NBII)
   GAMLSS-RS iteration 1: Global Deviance = 284.5993
   GAMLSS-RS iteration 2: Global Deviance = 281.9548
   ##..##
   GAMLSS-RS iteration 5: Global Deviance = 280.7311
   GAMLSS-RS iteration 15: Global Deviance = 280.6343
  
model_ZINBI - gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
   cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) +
 cs(LFAP200,df=3),data=data,family=
   ZINBI)
   GAMLSS-RS iteration 1: Global Deviance = 1672.234
   GAMLSS-RS iteration 2: Global Deviance = 544.742
   GAMLSS-RS iteration 3: Global Deviance = 598.9939
   Error in RS() : The global deviance is increasing
Try different steps for the parameters or the model maybe
 inappropriate
  
  
   Thus, in this case, only the Poisson (PO) and Negative Binomial type I
   (NBI)converge whereas all other models fail
  
   My first approach was to omit the smoothing factors for each model, or
   further reduce the number of variables but this does not solve the
 problem
   and most models fail, often yielding a Error in RS() : The global
 deviance
   is increasing message.
  
   I would think that, given the fact that the dependent variable is
   zero-inflated and overdispersed, that the Zero-Inflated Negative
 Binomial
   (ZINBI) distribution would be the best fit, but the ZINBI even fails in
 the
   following very simple examples.
  
model_ZINBI - gamlss(duck ~ cs(LFAP200,df=3),data=data,family=
 ZINBI)
   GAMLSS-RS iteration 1: Global Deviance = 3508.533
   GAMLSS-RS iteration 2: Global Deviance = 1117.121
   GAMLSS-RS iteration 3: Global Deviance = 652.5771
   GAMLSS-RS iteration 4: Global Deviance = 632.8885
   GAMLSS-RS iteration 5: Global Deviance = 645.1169
   Error in RS() : The global deviance is increasing
Try different steps for the parameters or the model maybe
 inappropriate
  
model_ZINBI - gamlss(duck ~ LFAP200,data=data,family= ZINBI)
   GAMLSS-RS iteration 1: Global Deviance = 3831.864
   GAMLSS-RS iteration 2: Global Deviance = 1174.605
   GAMLSS-RS iteration 3: Global Deviance = 562.5428
   GAMLSS-RS iteration 4: Global Deviance = 344.0637
   GAMLSS-RS iteration 5: Global Deviance = 1779.018
   Error in RS() : The global deviance is increasing
Try different steps for the parameters or the model maybe
 inappropriate
  
  
  
   Any suggestions on how to proceed with this?
  
   Many thanks in advance,
  
  
   Diederik
  
  
   Diederik Strubbe
   Evolutionary Ecology Group
   Department of Biology
   University of Antwerp
   Groenenborgerlaan 171
   2020 Antwerpen, Belgium
   tel: +32 3 265 3464
  
  
  [[alternative HTML version deleted]]
  
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w] 
 http://www.ucl.ac.uk/~ucfagls/http://www.ucl.ac.uk/%7Eucfagls/
  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function

2010-06-03 Thread Joris Meys
That's a bit more clear.
 Prod2007=2

 Delta=c(4,3,5)

 Delta - 1+Delta/100

 Series - Prod2007+cumsum(Delta)
 Series
[1] 3.04 4.07 5.12

On Thu, Jun 3, 2010 at 1:21 PM, n.via...@libero.it n.via...@libero.itwrote:

 What I would like to do is for example:
 Suppose that I have the following value

 a)PROD(2006)=PROD(2007)/1+[DELTA_PROD(2007)]
 b)PROD(2005)=PROD(2006)+[1+DELTA_PROD(2006)]
 c)PROD(2004)=PROD(2005)+[1+DELTA_PROD(2005)]


 where prod(2007)=2

 DELTA_PROD(2007)=4

 DELTA_PROD(2006)=3

 DELTA_PROD(2005)=5

 so prod(2007) is like the starting value of production from wich starts the
 construction of its the time series. So:

 prod(2006)=2+[1+4/100] which is equal to 3.04

 so i will have:

 prod(2005)=3.04+ [1+3/100]

 and so on







 Messaggio originale
 Da: jorism...@gmail.com
 Data: 03/06/2010 13.05
 A: n.via...@libero.itn.via...@libero.it
 Cc: r-help@r-project.org
 Ogg: Re: [R] function


 This is what you asked for.

  Prod2007 - 1:10

  Prod2006 - Prod2007/1+c(0,diff(Prod2007))

  Prod2005 - Prod2006+(1+c(0,diff(Prod2006)))

  Prod2004 - Prod2005+(1+c(0,diff(Prod2005)))

  Prod2006
  [1]  1  3  4  5  6  7  8  9 10 11

  Prod2005
  [1]  2  6  6  7  8  9 10 11 12 13

  Prod2004
  [1]  3 11  7  9 10 11 12 13 14 15

 Sure that's what you want?

 On Thu, Jun 3, 2010 at 12:30 PM, n.via...@libero.it n.via...@libero.itwrote:


 Dear list,
 I would like to ask you a question. I'm trying to build  the time series'
 production with the Divisia index. The final step would require to do the
 following calculations:
 a)PROD(2006)=PROD(2007)/1+[DELTA_PROD(2007)]
 b)PROD(2005)=PROD(2006)+[1+DELTA_PROD(2006)]
 c)PROD(2004)=PROD(2005)+[1+DELTA_PROD(2005)]
 my question is how can I tell R to take the value generated in the
 previous step (for example is the case of the produciton of 2005 that need
 the value of the production of 2006) in order to generate the time series
 production??
 (PS:my data.frame is not set as a time series)
 Thanks for your attention!!



[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gam error

2010-06-03 Thread Joris Meys
This doesn't tell us much either. What does the variable incline
represent, and what does the variable ToC represent?
I could guess your data looks something like :
ToC Incline
x1-90
x2-60
x3-30
x4 0
x5 30
x6 60
x7 90
x8 -90
...  ...

Or incline could be the number of the sample (going from 1 to 7). No way to
know what you did. Please, read the posting guide and take the hints given
there into consideration.

This said, you very likely just have not enough data to use a thin plate
regression spline without limiting k. see ?choose.k and
?null.space.dimension

Cheers
Joris

On Thu, Jun 3, 2010 at 2:49 PM, natalieh fbs...@leeds.ac.uk wrote:


 Data?

 The data are measures of energy use (continuous variable) for running on 7
 inclines between -90 and +90 degrees (n=7-21).
 --
 View this message in context:
 http://r.789695.n4.nabble.com/gam-error-tp2241518p2241608.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Continous variables with implausible transformation?

2010-06-03 Thread Joris Meys
You're right, it is the same. using I() won't work for the same reason sqrt
don't, so :

 x2 - x^2
 lrm(y~x+x2)

Thx for the correction.
Cheers
Joris

On Thu, Jun 3, 2010 at 6:14 PM, Bert Gunter gunter.ber...@gene.com wrote:

 Below. -- Bert

 Bert Gunter
 Genentech Nonclinical Biostatistics
 --
 But you wrote linear+ square. Don't you mean:
 lrm(Y~x+x^2)

 --- I believe this is the same as lrm(Y ~ x).
 You must protect the x^2 via

 lrm(Y ~ x + I(x^2))

 --

 --
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: the global deviance is increasing

2010-06-03 Thread Joris Meys
See below.

On Thu, Jun 3, 2010 at 5:35 PM, Gavin Simpson gavin.simp...@ucl.ac.ukwrote:

 On Thu, 2010-06-03 at 17:00 +0200, Joris Meys wrote:
  On Thu, Jun 3, 2010 at 9:27 AM, Gavin Simpson gavin.simp...@ucl.ac.uk
 wrote:
 
  
   vegan is probably not too useful here as the response is univariate;
   counts of ducks.
  
 
  If we assume that only one species is counted and of interest for the
 whole
  research.  I (probably wrongly) assumed that data for multiple species
 was
  available.
 
  Without knowledge about the whole research setup it is difficult to say
  which method is the best, or even which methods are appropriate. VGAM is
  indeed a powerful tool, but :
 
   proportion_non_zero - (sum(ifelse(data$duck == 0,0,1))/182)
  means 182 observations in the dataset
 
   model_NBI - gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
  cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,
  family= NBI)
 
  is 5 splines with 3df, an intercept, that's a lot of df for only 182
  observations. using VGAM ain't going to help here.

 How do you know?


I don't. I thought it would be like that because you use essentially the
same splines, and I overlooked the fact that the OP tried to reduce to a
single smooth. I stand corrected.

Cheers
Joris


   I'd reckon that the model
  itself should be reconsidered, rather than the distribution used to fit
 the
  error terms.

 I was going to mention that too, but the OP did reduce this down to a
 single smooth and the problem of increasing deviance remained. Hence
 trying to fit a /similar/ model in other software might give an
 indication whether the problems are restricted to a single software or a
 more general issue of the data/problem?

 At this stage the OP is stuck not knowing what is wrong; (s)he has
 nothing to do model checking on etc. Trying zeroinfl() and fitting a
 parametric model, for example, might be a useful starting point, then
 move on to models with smoothers if required.


He (quite positive on that one :-) ) can indeed try to use VGAM on the model
with one smooth and see if that turns out to give something. That should
give some clarity on the question whether it is the optimization of pscl
that goes wrong, or whether the problem is inherent to the data.

I'd like to suggest next to that to take a closer look at the iteration
parameters of the gamlss function itself. Honestly, I've never tried these
ones out before, but you never know whether it would work.
See ?gamlss.control

Cheers
Joris


 --
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: the global deviance is increasing

2010-06-03 Thread Joris Meys
Correction : That should give some clarity on the question whether it is the
optimization of GAMLSS that goes wrong, or whether the problem is inherent
to the data.

On Thu, Jun 3, 2010 at 7:00 PM, Joris Meys jorism...@gmail.com wrote:

 See below.

 On Thu, Jun 3, 2010 at 5:35 PM, Gavin Simpson gavin.simp...@ucl.ac.ukwrote:

 On Thu, 2010-06-03 at 17:00 +0200, Joris Meys wrote:
  On Thu, Jun 3, 2010 at 9:27 AM, Gavin Simpson gavin.simp...@ucl.ac.uk
 wrote:
 
  
   vegan is probably not too useful here as the response is univariate;
   counts of ducks.
  
 
  If we assume that only one species is counted and of interest for the
 whole
  research.  I (probably wrongly) assumed that data for multiple species
 was
  available.
 
  Without knowledge about the whole research setup it is difficult to say
  which method is the best, or even which methods are appropriate. VGAM is
  indeed a powerful tool, but :
 
   proportion_non_zero - (sum(ifelse(data$duck == 0,0,1))/182)
  means 182 observations in the dataset
 
   model_NBI - gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
  cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,
  family= NBI)
 
  is 5 splines with 3df, an intercept, that's a lot of df for only 182
  observations. using VGAM ain't going to help here.

 How do you know?


 I don't. I thought it would be like that because you use essentially the
 same splines, and I overlooked the fact that the OP tried to reduce to a
 single smooth. I stand corrected.

 Cheers
 Joris


   I'd reckon that the model
  itself should be reconsidered, rather than the distribution used to fit
 the
  error terms.

 I was going to mention that too, but the OP did reduce this down to a
 single smooth and the problem of increasing deviance remained. Hence
 trying to fit a /similar/ model in other software might give an
 indication whether the problems are restricted to a single software or a
 more general issue of the data/problem?

 At this stage the OP is stuck not knowing what is wrong; (s)he has
 nothing to do model checking on etc. Trying zeroinfl() and fitting a
 parametric model, for example, might be a useful starting point, then
 move on to models with smoothers if required.


 He (quite positive on that one :-) ) can indeed try to use VGAM on the
 model with one smooth and see if that turns out to give something. That
 should give some clarity on the question whether it is the optimization of
 pscl that goes wrong, or whether the problem is inherent to the data.

 I'd like to suggest next to that to take a closer look at the iteration
 parameters of the gamlss function itself. Honestly, I've never tried these
 ones out before, but you never know whether it would work.
 See ?gamlss.control

 Cheers
 Joris


 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cumsum function with data frame

2010-06-03 Thread Joris Meys
See ?split and ?unsplit.

Data - read.table(textConnection(variableYear   value
EC01 2005 5
EC01 2006 10
AAO12005  2
AAO1   2006  4),header=T)

Datalist -split(Data,Data$variable)
resultlist - lapply(Datalist,function(x){
x$cumul - cumsum(x$value)
return(x)
})
result - unsplit(resultlist,Data$variable)
result

  variable Year value cumul
1 EC01 2005 5 5
2 EC01 20061015
3 AAO1 2005 2 2
4 AAO1 2006 4 6

On a side note: I've used this construction now for a number of problems.
Some could be better solved using more specific functions (e.g. ave() for
adding a column with means for example). I'm not sure however this is the
most optimal approach to applying a function to subsets of a dataframe and
adding the result of that function as an extra variable. Anybody care to
elaborate on how the R masters had it in mind?

Cheers
Joris

On Thu, Jun 3, 2010 at 5:58 PM, n.via...@libero.it n.via...@libero.itwrote:


 Dear list,
 I have a problem with the cumsum function.
 I have a data frame like the following one
 variableYear   value
 EC01 2005 5

 EC01 2006 10

 AAO12005  2

 AAO1   2006  4
 what I would like to obtain is
 variableYear   value   cumsum


 EC01 2005 5   5


 EC01 2006 10 15


 AAO12005  22


 AAO1   2006  46


 if I use the by function or the aggregate function the result is a list or
 something else, what I want is a data frame as I showed above...
 anyone knows how to get it???
 THANKS A LOT





[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-03 Thread Joris Meys
Could you copy the data?

Data - data.frame(C.Mean,Mean.richness,Zoop,Diversity,Phyto)
dput(Data)

I have the feeling something's wrong there. I believe you have 48
observations (47df + 1 for the intercept), 2 levels of Diversity, 4 of Phyto
and 48/(3*4)=4 levels of Zoop. But you don't have 3df for Zoop. Either I'm
way off, or what goes in the lm is not what you think it is.

I tried a small sample with the datastructure I believe you have, but I
couldn't reproduce your error.

## Run
Phyto - as.factor(rep(rep(c(A,B,C,D),each=6),2))
Diversity - as.factor(rep(c(High,Low),each=24))
Zoop - rep(c(1,2,3,4),times=12)

C.Mean - rnorm(48)
Mean.richness -rnorm(48)

test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto +
Zoop*Diversity/Phyto)

Anova(test,type=III)

Zoop - as.factor(Zoop)
Anova(test,type=III)
## End Run

Cheers
Joris

On Thu, Jun 3, 2010 at 10:26 PM, Anita Narwani anitanarw...@gmail.comwrote:

 I would just like to add that when I remove the co-variate of Mean.richness
 from the model (i.e. eliminating the non-orthogonality), the aliasing
 warning is replaced by the following error message:
 Error in t(Z) %*% ip : non-conformable arguments

 That is when I enter this model:
 carbonmean-lm(C.Mean~ Diversity + Zoop + Diversity/Phyto +
 Zoop*Diversity/Phyto)







 On Wed, Jun 2, 2010 at 6:05 PM, Joris Meys jorism...@gmail.com wrote:

 that's diversity/phyto, zoop or phyto twice in the formula.


 On Thu, Jun 3, 2010 at 3:00 AM, Joris Meys jorism...@gmail.com wrote:

 That's what one would expect with type III sum of squares. You have Phyto
 twice in your model, but only as a nested factor. To compare the full model
 with a model without diversity of zoop, you have either the combination
 diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing.

 Depending on how you stand on type III sum of squares, you could call
 that a bug. Personally, I'd just not use them.

 https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html

 Cheers
 Joris


 On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote:

 Hello,

 I have been trying to get an ANOVA table for a linear model containing a
 single nested factor, two fixed factors and a covariate:

  carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop +
 Diversity/Phyto +
 Zoop*Diversity/Phyto)



 where, *Mean.richness* is a covariate*, Zoop* is a categorical variable
 (the
 species), *Diversity* is a categorical variable (Low or High), and
 *Phyto*(community composition) is also categorical but is nested
 within the level
 of *Diversity*. Quinn  Keough's statistics text recommends using Type
 III
 SS for a nested ANOVA with a covariate.

 I get the following output using the Type I SS ANOVA:



 Analysis of Variance Table
 Response: C.Mean
DfSum Sq
   Mean
 Sq  F valuePr(F)
 Mean.richness1  5638532656385326
 23.5855   3.239e-05 ***
 Diversity 1  14476593
  14476593
  6.0554 0.019634 *
 Zoop1  13002135
 13002135
  5.4387 0.026365 *
 Diversity:Phyto  6  126089387  21014898
 8.7904 1.257e-05 ***
 Diversity:Zoop   1  263036
 263036
 0.1100  0.742347
 Diversity:Zoop:Phyto 6  6171014510285024
 4.3021
   0.002879 **
 Residuals3174110911
 2390675

 I have tried using both the drop1() command and the Anova() command in
 the
 car package.

 When I use the Anova command I get the following error message:

 Anova(carbonmean,type=III)

 “Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,:
 One
 or more terms aliased in model.”



 I am not sure why this is aliased. There are no missing cells, and the
 cells
 are balanced (aside from for the covariate). Each Phyto by Zoop cross is
 replicated 3 times, and there are four Phyto levels within each level of
 Diversity. When I remove the nested factor (Phyto), I am able to get the
 Type III SS output.



 Then when I use drop1(carbonmean,.~.,Test=”F”) I get the following
 output:

  drop1(carbonmean,.~.,Test=F)

 Single term deletions



 Model:

 C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop *
 Diversity/Phyto

DfSum of Sq
 RSS AIC

 none74110911   718

 Mean.richness1  49790403
  123901314
 741

 Diversity 0 0
 74110911718

 Zoop0 0
 74110911718

 Diversity:Phyto  6  118553466  192664376
 752

 Diversity:Zoop   0

Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-03 Thread Joris Meys
I see where my confusion comes from. I counted 4 levels of Phyto, but
you have 8, being 4 in every level of Diversity. There's your
aliasing.

 table(Diversity,Phyto)
 Phyto
Diversity M1 M2 M3 M4 P1 P2 P3 P4
    H  0  0  0  0  6  6  6  6
    L  6  6  6  6  0  0  0  0

There's no need to code them differently for every level of Diversity.
If you don't, all is fine :

 Phyto - gsub(M,P,as.character(Phyto))
 Phyto - as.factor(Phyto)

 test - lm(C.Mean~  Mean.richness + Diversity + Zoop + Diversity/Phyto +
+ Zoop*Diversity/Phyto)

 Anova(test,type=III)
Anova Table (Type III tests)

Response: C.Mean
    Sum Sq Df F value    Pr(F)
(Intercept)   23935609  1 10.0121 0.0034729 **
Mean.richness 49790385  1 20.8269 7.471e-05 ***
Diversity 35807205  1 14.9779 0.0005234 ***
Zoop  10794614  1  4.5153 0.0416688 *
Diversity:Phyto  118553464  6  8.2650 2.184e-05 ***
Diversity:Zoop  261789  1  0.1095 0.7429356
Diversity:Zoop:Phyto  61710162  6  4.3021 0.0028790 **
Residuals 74110938 31
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


You can check with summary(test) that the model is fitted correctly.

On Fri, Jun 4, 2010 at 12:48 AM, Anita Narwani anitanarw...@gmail.com wrote:

 You have everything right except that there are only 2 zooplankton species (C 
  D, which stand for Ceriodaphnia and Daphnia).


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-03 Thread Joris Meys
Hi Anita,

I have to correct myself too, I've been rambling a bit. Off course you don't
delete the variable out of the interaction term when you test the main
effect. What I said earlier didn't really make any sense.

That testing a main effect without removing the interaction term is has a
tricky interpretation. By removing a main effect you test full model  A + B
+ A:B against the model A + A:B.  If you remove the main effect Zoop for
example, you basically nest Zoop within Diversity and test whether that's
not worse than the full model. This explains it very well:

https://stat.ethz.ch/pipermail/r-help/2010-March/230280.html

I'd go for type II, but you're free to test any hypothesis you want.

Cheers
Joris


On Thu, Jun 3, 2010 at 9:59 PM, Anita Narwani anitanarw...@gmail.comwrote:

 Thanks for your response Joris.

 I was aware of the potential for aliasing, although I thought that this was
 only a problem when you have missing cell means. It was interesting to read
 the vehement argument regarding the Type III sums of squares, and although I
 knew that there were different positions on the topic, I had no idea how
 divisive it was. Nevertheless, Type III SS are generally recommended by
 statistical texts in ecology for my type of experimental design.
 Interestingly, despite the aliasing, SPSS has no problems calculating Type
 III SS for this data set. This is simply because I am entering a co-variate,
 which causes non-orthogonality. I would be happier using R and the default
 Type I SS, which are the same as the Type III SS anyway when I omit the
 co-variate of Mean.richness, except that these results are very sensitive to
 the order in which I add the variables into the model when I do enter the
 co-variate. I understand that the order is very important relates back to
 the scientific hypothesis, but I am equally interested in the main effects
 of Zoop, Diversity, and the nested effect of Phyto, so entering either of
 these variables before the other does not make sense from an ecological
 perspective, and because the results do change, the order cannot be ignored
 from a statistical perspective.
 Finally, I have tried using the Type II SS and received similar warnings.

 Do you have a recommendations?
 Anita.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cumsum function with data frame

2010-06-03 Thread Joris Meys
But then you don't apply cumsum within each factor level. Hence the ddply.

Cheers
Joris

On Thu, Jun 3, 2010 at 9:35 PM, Jorge Ivan Velez
jorgeivanve...@gmail.com wrote:
 Or better yet, you can use transform only (in base):
 transform(Data, CUMSUM = cumsum(value))
 HTH,
 Jorge

 On Thu, Jun 3, 2010 at 3:30 PM, Felipe Carrillo  wrote:

 Better yet, is shorter using tranform instead of summarise:
 Data - read.table(textConnection(variable    Year  value
 EC01    2005    5
 EC01    2006    10
 AAO1    2005  2
 AAO1  2006  4),header=T)

 ddply(Data,.(variable),transform,CUMSUM=cumsum(value))




 - Original Message 
  From: Felipe Carrillo mazatlanmex...@yahoo.com
  To: Joris Meys jorism...@gmail.com; n.via...@libero.it
  n.via...@libero.it
  Cc: r-help@r-project.org
  Sent: Thu, June 3, 2010 11:28:58 AM
  Subject: Re: [R] cumsum function with data frame
 
  You can also use ddply from the plyr package:

 library(plyr)
 Data -
  read.table(textConnection(variable    Year  value
 EC01
  2005    5
 EC01    2006    10
 AAO1    2005
  2
 AAO1  2006
  4),header=T)
 Data

 ddply(Data,.(variable),summarise,Year=Year,value=value,CUMSUM=cumsum(value))

 Felipe
  D. Carrillo
 Supervisory Fishery Biologist
 Department of the Interior
 US
  Fish  Wildlife Service
 California, USA



 - Original
  Message 
  From: Joris Meys  ymailto=mailto:jorism...@gmail.com;
  href=mailto:jorism...@gmail.com;jorism...@gmail.com
  To:  ymailto=mailto:n.via...@libero.it;
  href=mailto:n.via...@libero.it;n.via...@libero.it 
  ymailto=mailto:n.via...@libero.it;
  href=mailto:n.via...@libero.it;n.via...@libero.it
  Cc:  ymailto=mailto:r-help@r-project.org;
  href=mailto:r-help@r-project.org;r-help@r-project.org
  Sent: Thu,
  June 3, 2010 9:26:17 AM
  Subject: Re: [R] cumsum function with data
  frame
 
  See ?split and ?unsplit.

 Data -
 
  read.table(textConnection(variable        Year
    value
 EC01
        2005
      5
 EC01            2006
        10
 AAO1
 
  2005          2
 AAO1
    2006
  4),header=T)

 Datalist
 
  -split(Data,Data$variable)
 resultlist -
 
  lapply(Datalist,function(x){
     x$cumul -
  cumsum(x$value)

    return(x)
 })
 result -
 
  unsplit(resultlist,Data$variable)
 result

   variable Year value
 
  cumul
 1    EC01 2005    5    5
 2
    EC01 2006    10
  15
 3    AAO1
  2005    2    2
 4    AAO1 2006
  4
  6

 On a side note: I've used this construction now for a
  number
  of problems.
 Some could be better solved using more specific functions
 
  (e.g. ave() for
 adding a column with means for example). I'm not
  sure however
  this is the
 most optimal approach to applying a
  function to subsets of a
  dataframe and
 adding the result of that
  function as an extra variable.
  Anybody care to
 elaborate on how the
  R masters had it in
  mind?

 Cheers
 Joris

 On Thu, Jun 3,
  2010 at 5:58 PM,  ymailto=mailto:
  href=mailto:n.via...@libero.it;n.via...@libero.it
 
  href=mailto: href=mailto:n.via...@libero.it;n.via...@libero.it
  ymailto=mailto:n.via...@libero.it;
  href=mailto:n.via...@libero.it;n.via...@libero.it 
  ymailto=mailto: href=mailto:n.via...@libero.it;n.via...@libero.it
 
  href=mailto: href=mailto:n.via...@libero.it;n.via...@libero.it
  ymailto=mailto:n.via...@libero.it;
  href=mailto:n.via...@libero.it;n.via...@libero.itwrote:

 
 
 
  Dear list,
  I have a problem with the cumsum function.
  I
  have a
  data frame like the following one
  variable
 
  Year      value
  EC01
    2005
  5
 
  EC01
          2006        10
 
 
 
  AAO1            2005
    2
 
  AAO1
  2006
          4
  what I would like to obtain is
 
 
  variable        Year      value
  cumsum
 
 
  EC01
 
  2005        5
  5
 
 
  EC01
 
  2006        10
  15
 
 
 
  AAO1
  2005          2
 
  2
 
 
  AAO1
  2006          4
 
    6
 
 
  if I use the by function or the aggregate
 
  function the result is a list or
  something else, what I want is
  a data
  frame as I showed above...
  anyone knows how to get
  it???
  THANKS
  A
  LOT
 
 
 
 
 
 
 
  [[alternative HTML version deleted]]
 
 
 
  __
   ymailto=mailto: ymailto=mailto:R-help@r-project.org;
  href=mailto:R-help@r-project.org;R-help@r-project.org
 
  href=mailto:
  href=mailto:R-help@r-project.org;R-help@r-project.org
  ymailto=mailto:R-help@r-project.org;
  href=mailto:R-help@r-project.org;R-help@r-project.org mailing list
 
 
target=_blank https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE
  do read the
  posting guide
 
  http://www.R-project.org/posting-guide.html
  and
  provide
  commented, minimal, self-contained, reproducible
 
  code.
 



 --
 Joris Meys
 Statistical
 
  Consultant

 Ghent University
 Faculty of Bioscience
 
  Engineering
 Department of Applied mathematics, biometrics and process
 
  control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264
  59
  87
  href

Re: [R] Subsetting for unwanted values

2010-06-03 Thread Joris Meys
  PcToAdd_-Pc[!(Pc %in% Pc.X)]
 PcToAdd_
[1] Res Os  Gov Rur
  PcToAdd_-subset(Pc,!(Pc %in% Pc.X))
 PcToAdd_
[1] Res Os  Gov Rur



On Fri, Jun 4, 2010 at 1:52 AM, LCOG1 jr...@lcog.org wrote:

 Hi all,
   I have toyed with this for too long today and in the past i used multiple
 lines of code to get at what i want.  Consider the following:

 All i need to do is subset Pc to the values that do not equal Pc.X.  The
 first attempt doesnt work because i have unequal lengths.  The second
 attempt doesnt give me an the right answer.


 Pc-c(Res,Com,Ind,Os,Mix,Gov,Rur)
 Pc.X-c(Com,Ind,Mix)
  PcToAdd_-Pc[Pc!=Pc.X]
 #Doesnt Work
 AND

  PcToAdd_-subset(Pc.X,Pc.X %in% Pc)
 #Works but doesnt get me what i want

 I am looking a return of PcToAdd_ - Res  Os Gov Rur

 This has got to be a simple answer.  Thanks
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Subsetting-for-unwanted-values-tp2242506p2242506.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-03 Thread Joris Meys
SPSS uses a different calculation. As far as I understood, they test main
effects without the covariate. Regarding the difference between my and your
results, did you use sum contrasts?
options(contrasts=c(contr.sum,contr.poly))

On Fri, Jun 4, 2010 at 2:19 AM, Anita Narwani anitanarw...@gmail.comwrote:

 Hi Joris,
 That seems to have worked and the contrasts look correct.
 I have tried comparing the results to what SPSS produces for the same
 model. The two programs produce very different results, although the model F
 statistics, R squared and adjusted R squared values are identical. The
 results are so different that I don't know what to trust.

 For the same model you coded I got:

 test - lm(C.Mean~  Mean.richness + Diversity + Zoop + Diversity/Phyto +
 + Zoop*Diversity/Phyto)
  Anova(test,type=III)
 Anova Table (Type III tests)

 Response: C.Mean
Sum Sq Df F valuePr(F)
 (Intercept)  28223311  1 11.8056  0.001701 **
 Mean.richness49790403  1 20.8269 7.471e-05 ***
 Diversity31055477  1 12.9903  0.001082 **
 Zoop  2736238  1  1.1445  0.292953
 Diversity:Phyto  27943313  6  1.9481  0.104103
 Diversity:Zoop 168184  1  0.0703  0.792584
 Diversity:Zoop:Phyto 61710145  6  4.3021  0.002879 **
 Residuals74110911 31
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 (Also sightly different from your result)

 and

  summary(test)

 Call:
 lm(formula = C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto +
 +Zoop * Diversity/Phyto)

 Residuals:
  Min   1Q   Median   3Q  Max
 -3555.26  -479.5349.94   423.49  4073.20

 Coefficients:
  Estimate Std. Error t value Pr(|t|)
 (Intercept)   -8562.9 2492.2  -3.436  0.00170 **
 Mean.richness  4605.7 1009.2   4.564 7.47e-05 ***
 DiversityL 6576.9 1824.8   3.604  0.00108 **
 ZoopD -1414.4 1322.1  -1.070  0.29295
 DiversityH:PhytoP2-4307.5 1824.8  -2.361  0.02472 *
 DiversityL:PhytoP2 -268.4 1262.5  -0.213  0.83300
 DiversityH:PhytoP3-2233.4 1393.0  -1.603  0.11900
 DiversityL:PhytoP3-1571.4 1262.5  -1.245  0.22257
 DiversityH:PhytoP4-7914.8 2647.2  -2.990  0.00543 **
 DiversityL:PhytoP4-1612.8 1262.5  -1.277  0.21092
 DiversityL:ZoopD484.9 1828.0   0.265  0.79258
 DiversityH:ZoopD:PhytoP2683.9 1855.3   0.369  0.71493
 DiversityL:ZoopD:PhytoP2   6346.4 1785.4   3.555  0.00124 **
 DiversityH:ZoopD:PhytoP3   4922.8 1786.3   2.756  0.00971 **
 DiversityL:ZoopD:PhytoP3   1085.4 1785.4   0.608  0.54766
 DiversityH:ZoopD:PhytoP4   3261.8 1985.6   1.643  0.11055
 DiversityL:ZoopD:PhytoP4681.9 1785.4   0.382  0.70513
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 1546 on 31 degrees of freedom
 Multiple R-squared: 0.7858, Adjusted R-squared: 0.6753
 F-statistic: 7.109 on 16 and 31 DF,  p-value: 1.810e-06

 From SPSS I got
 Tests of Between-Subjects Effects




  Dependent Variable:C Mean




  Source Type III Sum of Squares df Mean Square F Sig.  Corrected Model
 2.719E+08 16 1.700E+07 7.109 .000  Intercept 2.394E+07 1 2.394E+07 10.012
 .003  Meanrichness 4.979E+07 1 4.979E+07 20.827 .000  Diversity 3.581E+07
 1 3.581E+07 14.978 .001  Zoop 1.079E+07 1 1.079E+07 4.515 .042  Diversity
 * Zoop 261789.172 1 261789.172 .110 .743  Phyto(Diversity) 1.186E+08 6
 1.976E+07 8.265 .000  Phyto * Zoop(Diversity) 6.171E+07 6 1.029E+07 4.302
 .003  Error 7.411E+07 31 2.391E+06

  Total 7.959E+08 48


  Corrected Total 3.460E+08 47




 Which, gives some similar results, but a completely different F statistic
 and P-value for the main effect of Zoop and the nested effect of Phyto.
 Obviously SPSS is not necessarily the perfect reference, but when using the
 Type I SS, the results did agree. Any thoughts on why this might be? Could
 the two programs be calculating the Type III SS differently? Might it be
 wise to stick to Type I SS?

 Thanks very much for your time and effort. It has been very helpful.
 Anita.


 On Thu, Jun 3, 2010 at 4:25 PM, Joris Meys jorism...@gmail.com wrote:

 I see where my confusion comes from. I counted 4 levels of Phyto, but
 you have 8, being 4 in every level of Diversity. There's your
 aliasing.

  table(Diversity,Phyto)
  Phyto
 Diversity M1 M2 M3 M4 P1 P2 P3 P4
 H  0  0  0  0  6  6  6  6
 L  6  6  6  6  0  0  0  0

 There's no need to code them differently for every level of Diversity.
 If you don't, all is fine :

  Phyto - gsub(M,P,as.character(Phyto))
  Phyto - as.factor(Phyto)
 
  test - lm(C.Mean~  Mean.richness + Diversity + Zoop + Diversity/Phyto +
 + Zoop*Diversity/Phyto)
 
  Anova(test,type=III)
 Anova Table (Type III tests)

 Response: C.Mean
 Sum Sq Df F valuePr(F)
 (Intercept

Re: [R] storing output data from a loop that has varying row numbers

2010-06-02 Thread Joris Meys
Hi Ross,

the trick is especially split() and unsplit(). Split() splits up the
dataframe based on the combined factors, unsplit() transforms it to a
dataframe again. This way you can do the calculation for a set of
mini-dataframes that contain only the information for 1 combination of the
factors.

lapply is the apply-function specifically for lists. As split() gives you a
list of dataframes, lapply loops through those dataframes the appropriate
way. You can see that for yourself by doing str(seal_list).

Hope this clears things out a bit.
Cheers
Joris

On Wed, Jun 2, 2010 at 9:55 AM, RCulloch ross.cull...@dur.ac.uk wrote:


 Hi Jorvis,

 Many thanks for sorting that! I haven't seen it done that way before, so
 I'll have to look in to the properties of lapply a bit more to get a full
 appreciation of other approaches to looping data in R.

 Thanks again for your help, it is much appreciated,

 Ross
 --
 View this message in context:
 http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2239711.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Seeking help on Vectorize()

2010-06-02 Thread Joris Meys
Your arguments are not coming through.

fn - function(x = 1:3, y = 3:6) {
x - matrix(x, nrow=1)
y - matrix(y, ncol=1)
dat - apply(x, 2, function(xx) {
  apply(y, 1, function(yy) {
  return(xx + yy) } ) })
Vectorize(dat, SIMPLIFY = TRUE)
return(dat)}

 fn(1:3,3:7)
 [,1] [,2] [,3]
[1,]456
[2,]567
[3,]678
[4,]789
[5,]89   10

Cheers
Joris

On Wed, Jun 2, 2010 at 11:25 AM, Megh Dal megh700...@yahoo.com wrote:

 Dear falks, here I have written following function :

 fn - Vectorize(function(x = 1:3, y = 3:6) {
 x - matrix(x, nrow=1)
 y - matrix(y, ncol=1)
 dat - apply(x, 2, function(xx) {
   apply(y, 1, function(yy) {
   return(xx + yy) } ) })
 return(dat)}, SIMPLIFY = TRUE)

 If I run this function, I got some warning message, even format of the
 returned object is not correct, for example :

   fn(x = 1:3, y = 3:7)
 [1] 4 6 8 7 9
 Warning message:
 In mapply(FUN = function (x = 1:3, y = 3:6)  :
   longer argument not a multiple of length of shorter

 However if I run individual line of codes like :

  x - 1:3; y = 3:7
  x - matrix(x, nrow=1)
  y - matrix(y, ncol=1)
  dat - apply(x, 2, function(xx) {
 +   apply(y, 1, function(yy) {
 +   return(xx + yy) } ) })
  dat
  [,1] [,2] [,3]
 [1,]456
 [2,]567
 [3,]678
 [4,]789
 [5,]89   10


 I get exactly what I want. Where I am making fault?

 Thanks,



[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to label the som notes by the majority vote

2010-06-02 Thread Joris Meys
Hi Changbin,

I looked at your code again, and it appears as if you're using the mapping
plot for something that it isn't meant for. The mapping shows you how many
points you have in every circle, and these points are represented by the
labels. Your first plot gives the majority vote.

This said, you can hack the function using:

classif - predict(nir.xyf)
tmp - table(classif$unit.classif,classif$prediction)
label - colnames(tmp)
label - apply(tmp!=0,1,function(x){label[x]})[classif$unit.classif]
label[-match(1:16,classif$unit.classif)] - 


cl - colors()
bgcols - rev(heat.colors(4))
plot(nir.xyf,
type=mapping,bgcol=bgcols[as.numeric(as.factor(temp.predict))],
  main=Mapping plot,labels=label)

It does not calculate the majority vote itself, it just assigns a label to
the category based on the predicted labels. Which is equivalent in this
case.

Cheers
Joris

On Wed, Jun 2, 2010 at 5:08 AM, Changbin Du changb...@gmail.com wrote:


 library(kohonen)
 data(nir)
 attach(nir)

 #SOM, the supervised learning, train the map using temperature as the class
 variable.
 set.seed(13)
 nir.xyf- xyf(data=spectra, Y=classvec2classmat(temperature), xweight =
 0.9, grid=somgrid(4, 4, hexagonal))


 temp.xyf - predict(nir.xyf)$unit.prediction #get prediction
 temp.predict- as.numeric(classmat2classvec(temp.xyf)) #change matrix to
 vectors.

 par(mfrow=c(1,2))

 plot(nir.xyf, type=property, property=temp.predict, palette.name=rainbow, 
 main=Prediction )


 cl - colors()
 bgcols - cl[2:14]
 plot(nir.xyf, type=mapping, labels=nir$temperature,
 bgcol=bgcols[as.integer(temp.predict)],
   main=Mapping plot)

 par(mfrow=c(1,1))



 HI, Joris,

 Thanks so much for your suggestion!   I have modified the above codes, and
 what I want  is to label the notes by the temperature.
 if a note has 3 objects mapped to it (the temperature are 30, 40, 30), then
 I want the 30 be labeled on the note.

 the right plot is the mapping plot, I want it to be labeled by only one
 temperature.

 Thanks so much!











 On Tue, Jun 1, 2010 at 5:36 PM, Joris Meys jorism...@gmail.com wrote:

 Dear Changbin,

 Please provide a self-contained, minimal example, meaning the whole code
 should run and create the plot as it is now, without having to load your
 dataset (which we don't have). Otherwise it's impossible to see what's going
 on and help you.

 Cheers
 Joris

 On Wed, Jun 2, 2010 at 2:21 AM, Changbin Du changb...@gmail.com wrote:

 HI, Dear R community,

 I am using the following codes to do the som. I tried to label the notes
 by
 the majority vote. either through mapping or prediction.
 I attached my output, the left one dont have any labels in the note, the
 right one has  more than one label in each note. I need to have only one
 label for each note either by majority vote or prediction.

 Can anyone give some suggestions or advice? Thanks so much!



 alex-read.table(/home/cdu/operon/alex2.txt, , sep=\t, skip=0,
 header=T,
 fill=T)
 alex1-alex[,c(1:257)]
 levels(alex1$Label)

 alex1$outcome-as.numeric(alex1$Label)
 alex1$outcome[1:20]


 #self-organizing maps(unsupervised learning)
 library(kohonen)


 #SOM, the supervised learning, train the map using outcome as the class
 variable.
 set.seed(13)
 final.xyf- xyf(data=as.matrix(alex1[,c(1:256)]),
 Y=classvec2classmat(alex1$outcome), xweight = 0.99, grid=somgrid(20, 30,
 hexagonal))


 outcome.xyf - predict(final.xyf)$unit.prediction#get prediction
 outcome.predict- as.numeric(classmat2classvec(outcome.xyf)) #change
 matrix
 to vectors.

 outcome.label-LETTERS[outcome.predict] #conver the numeric value to
 letters.


 plot(final.xyf, type=property, property=outcome.predict,
 labels=outcome.label, palette.name =rainbow, main=Prediction )



 cl - colors()
 bgcols - cl[2:14]
 plot(final.xyf, type=mapping, labels=outcome.label, col=black,
 bgcol=bgcols[as.integer(outcome.predict)],
  main=Mapping plot)




 --
 Sincerely,
 Changbin
 --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




 --
 Sincerely,
 Changbin
 --





-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted

Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: the global deviance is increasing

2010-06-02 Thread Joris Meys
 GAMLSS-RS iteration 3: Global Deviance = 652.5771
 GAMLSS-RS iteration 4: Global Deviance = 632.8885
 GAMLSS-RS iteration 5: Global Deviance = 645.1169
 Error in RS() : The global deviance is increasing
  Try different steps for the parameters or the model maybe inappropriate

  model_ZINBI - gamlss(duck ~ LFAP200,data=data,family= ZINBI)
 GAMLSS-RS iteration 1: Global Deviance = 3831.864
 GAMLSS-RS iteration 2: Global Deviance = 1174.605
 GAMLSS-RS iteration 3: Global Deviance = 562.5428
 GAMLSS-RS iteration 4: Global Deviance = 344.0637
 GAMLSS-RS iteration 5: Global Deviance = 1779.018
 Error in RS() : The global deviance is increasing
  Try different steps for the parameters or the model maybe inappropriate



 Any suggestions on how to proceed with this?

 Many thanks in advance,


 Diederik


 Diederik Strubbe
 Evolutionary Ecology Group
 Department of Biology
 University of Antwerp
 Groenenborgerlaan 171
 2020 Antwerpen, Belgium
 tel: +32 3 265 3464


[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regexpr mystery can not remove trailing spaces

2010-06-02 Thread Joris Meys
Could you provide us with dput(becva$V1[1])?
Cheers
Joris

On Wed, Jun 2, 2010 at 2:07 PM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Dear all

 I encountered strange problem with regexpr replacement

 I made this character object

 str - 02.06.10 12:40 

  str(str)
  chr 02.06.10 12:40  

 I read in an object which seems to be quite similar

  str(as.character(becva$V1)[1])
  chr 02.06.10 12:40   

 However I can not remove trailing spaces from it

  sub(' +$', '', as.character(becva$V1[1]))

 [1] 02.06.10 12:40   
  sub(' +$', '', str)
 [1] 02.06.10 12:40
 

 Do somebody have an idea what to do?

 $version.string
 [1] R version 2.12.0 Under development (unstable) (2010-04-25 r51820)

 on Windows

 Regards
 Petr

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compute the associate vector of distances between leaves in a binary non-rooted tree

2010-06-02 Thread Joris Meys
Hi,

with a little hack you can use the function cophenetic.phylo from ape. You
just set all branch lengths to 1 :

require(ape)

tree - rtree(5,rooted=F)
n - length(tree$edge.length)
tree$edge.length - rep(1,n)
cophenetic.phylo(tree)

   t3 t1 t2 t4 t5
t3  0  3  3  3  3
t1  3  0  2  4  4
t2  3  2  0  4  4
t4  3  4  4  0  2
t5  3  4  4  2  0

Cheers


On Wed, Jun 2, 2010 at 2:47 PM, Arnau Mir Torres arnau@uib.es wrote:

 Hello.

 I'd like to compute the associate vector of distances between leaves in a
 binary non-rooted tree. The definition of a distance between two leaves in a
 binary non-rooted tree is the number of edges in the path joining the two
 leaves.
 I've tried the ape package but I'm unable to find this vector.
 For example, using rtree(5,rooted=F) I've obtained the following tree:

 $edge
 [,1] [,2]
 [1,]67
 [2,]71
 [3,]78
 [4,]82
 [5,]83
 [6,]64
 [7,]65

 $tip.label
 [1] t4 t3 t2 t1 t5

 $edge.length
 [1] 0.9126727 0.2765674 0.4996832 0.7904400 0.8508797 0.8174133 0.9027958

 $Nnode
 [1] 3


 My question is: how to compute the vector of distances between the 5
 leaves. This vector is in this case:

 v=(d(t1,t2),d(t1,t3),d(t1,t4),d(t1,t5),d(t2,t3),d(t2,t4),d(t2,t5),d(t3,t4),d(t3,t5),d(t4,t5))=(4,4,3,2,2,3,4,3,4,3).


 Thanks in advance,

 Arnau.
 
 Arnau Mir Torres
 Edifici A. Turmeda
 Campus UIB
 Ctra. Valldemossa, km. 7,5
 07122 Palma de Mca.
 tel: (+34) 971172987
 fax: (+34) 971173003
 email: arnau@uib.es
 URL: http://dmi.uib.es/~arnau http://dmi.uib.es/%7Earnau
 








 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regexpr mystery can not remove trailing spaces

2010-06-02 Thread Joris Meys
sub(\\s+$, '', bbb,perl=T)

does it for me.


On Wed, Jun 2, 2010 at 3:22 PM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi

  dput(bbb)
 c(02.06.10 12:40   , 02.06.10 12:00   , 02.06.10 11:00   ,
 02.06.10 10:00   , 02.06.10 09:00   , 02.06.10 08:00   ,
 02.06.10 07:00   , 02.06.10 06:00   , 02.06.10 05:00   ,
 02.06.10 04:00   , 02.06.10 03:00   , 02.06.10 02:00   ,
 02.06.10 01:00   , 02.06.10 00:00   , 01.06.10 23:00   ,
 01.06.10 22:00   , 01.06.10 21:00   , 01.06.10 20:00   ,
 01.06.10 19:00   , 01.06.10 18:00   , 01.06.10 17:00   ,
 01.06.10 16:00   , 01.06.10 15:00   , 01.06.10 14:00   ,
 01.06.10 13:00   , 01.06.10 05:00   , 31.05.10 05:00   ,
 30.05.10 05:00   , 29.05.10 05:00   , 28.05.10 05:00   ,
 27.05.10 05:00   )
 

 For simplicity I change the name and put it to single variable.
 I also reinstalled R to recent R-devel

  sub('\\w+$', '', bbb[1])
 [1] 02.06.10 12:40   
  sub('[:space:]', '', bbb[1])
 [1] 02.06.10 1240   
 

 I also tried Matt's suggestion but it did not help.

 Regards
 Petr

 Joris Meys jorism...@gmail.com napsal dne 02.06.2010 14:35:19:

  Could you provide us with dput(becva$V1[1])?
  Cheers
  Joris

  On Wed, Jun 2, 2010 at 2:07 PM, Petr PIKAL petr.pi...@precheza.cz
 wrote:
  Dear all
 
  I encountered strange problem with regexpr replacement
 
  I made this character object
 
  str - 02.06.10 12:40 
 
   str(str)
   chr 02.06.10 12:40  
 
  I read in an object which seems to be quite similar
 
   str(as.character(becva$V1)[1])
   chr 02.06.10 12:40   
 
  However I can not remove trailing spaces from it
 
   sub(' +$', '', as.character(becva$V1[1]))
 
  [1] 02.06.10 12:40   
   sub(' +$', '', str)
  [1] 02.06.10 12:40
  
 
  Do somebody have an idea what to do?
 
  $version.string
  [1] R version 2.12.0 Under development (unstable) (2010-04-25 r51820)
 
  on Windows
 
  Regards
  Petr
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
  Joris Meys
  Statistical Consultant
 
  Ghent University
  Faculty of Bioscience Engineering
  Department of Applied mathematics, biometrics and process control
 
  Coupure Links 653
  B-9000 Gent
 
  tel : +32 9 264 59 87
  joris.m...@ugent.be
  ---
  Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regexpr mystery can not remove trailing spaces

2010-06-02 Thread Joris Meys
Hi Petr,

Matt may very well have been right. As I copied the dput from the mail, any
white space is converted to spaces apparently. Still, it might be possible
the white spaces in your original data are tabs or even newline characters.
You can check that easily with

grep(\t, as.character(becva$V1[1]))
grep(\n, as.character(becva$V1[1]))

Cheers
Joris



On Wed, Jun 2, 2010 at 3:54 PM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi

 thanks. I am puzzled what was wrong. Now even

 sub(' +$', '', bbb[1])

 works. I am checking water throughput in nearby river and copying data
 from internet. So I wonder if there was some change recently as during
 floods they update it in about 10 minutes interval.

 Regards
 Petr


 jim holtman jholt...@gmail.com napsal dne 02.06.2010 15:44:42:

  You had the wrong case on 'w' and the wrong expression with
  [:space:]';  see below
 
   bbb - c(02.06.10 12:40   , 02.06.10 12:00   , 02.06.10 11:00 ,
  + 02.06.10 10:00   , 02.06.10 09:00   , 02.06.10 08:00   ,
  + 02.06.10 07:00   , 02.06.10 06:00   , 02.06.10 05:00   ,
  + 02.06.10 04:00   , 02.06.10 03:00   , 02.06.10 02:00   ,
  + 02.06.10 01:00   , 02.06.10 00:00   , 01.06.10 23:00   ,
  + 01.06.10 22:00   , 01.06.10 21:00   , 01.06.10 20:00   ,
  + 01.06.10 19:00   , 01.06.10 18:00   , 01.06.10 17:00   ,
  + 01.06.10 16:00   , 01.06.10 15:00   , 01.06.10 14:00   ,
  + 01.06.10 13:00   , 01.06.10 05:00   , 31.05.10 05:00   ,
  + 30.05.10 05:00   , 29.05.10 05:00   , 28.05.10 05:00   ,
  + 27.05.10 05:00   )
sub('\\W+$', '', bbb[1])
  [1] 02.06.10 12:40
   sub('[[:space:]]+$', '', bbb[1])
  [1] 02.06.10 12:40
  
 
 
  On Wed, Jun 2, 2010 at 9:22 AM, Petr PIKAL petr.pi...@precheza.cz
 wrote:
   Hi
  
   dput(bbb)
   c(02.06.10 12:40   , 02.06.10 12:00   , 02.06.10 11:00   ,
   02.06.10 10:00   , 02.06.10 09:00   , 02.06.10 08:00   ,
   02.06.10 07:00   , 02.06.10 06:00   , 02.06.10 05:00   ,
   02.06.10 04:00   , 02.06.10 03:00   , 02.06.10 02:00   ,
   02.06.10 01:00   , 02.06.10 00:00   , 01.06.10 23:00   ,
   01.06.10 22:00   , 01.06.10 21:00   , 01.06.10 20:00   ,
   01.06.10 19:00   , 01.06.10 18:00   , 01.06.10 17:00   ,
   01.06.10 16:00   , 01.06.10 15:00   , 01.06.10 14:00   ,
   01.06.10 13:00   , 01.06.10 05:00   , 31.05.10 05:00   ,
   30.05.10 05:00   , 29.05.10 05:00   , 28.05.10 05:00   ,
   27.05.10 05:00   )
  
  
   For simplicity I change the name and put it to single variable.
   I also reinstalled R to recent R-devel
  
   sub('\\w+$', '', bbb[1])
   [1] 02.06.10 12:40   
   sub('[:space:]', '', bbb[1])
   [1] 02.06.10 1240   
  
  
   I also tried Matt's suggestion but it did not help.
  
   Regards
   Petr
  
   Joris Meys jorism...@gmail.com napsal dne 02.06.2010 14:35:19:
  
   Could you provide us with dput(becva$V1[1])?
   Cheers
   Joris
  
   On Wed, Jun 2, 2010 at 2:07 PM, Petr PIKAL petr.pi...@precheza.cz
   wrote:
   Dear all
  
   I encountered strange problem with regexpr replacement
  
   I made this character object
  
   str - 02.06.10 12:40 
  
str(str)
chr 02.06.10 12:40  
  
   I read in an object which seems to be quite similar
  
str(as.character(becva$V1)[1])
chr 02.06.10 12:40   
  
   However I can not remove trailing spaces from it
  
sub(' +$', '', as.character(becva$V1[1]))
  
   [1] 02.06.10 12:40   
sub(' +$', '', str)
   [1] 02.06.10 12:40
   
  
   Do somebody have an idea what to do?
  
   $version.string
   [1] R version 2.12.0 Under development (unstable) (2010-04-25
 r51820)
  
   on Windows
  
   Regards
   Petr
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
   --
   Joris Meys
   Statistical Consultant
  
   Ghent University
   Faculty of Bioscience Engineering
   Department of Applied mathematics, biometrics and process control
  
   Coupure Links 653
   B-9000 Gent
  
   tel : +32 9 264 59 87
   joris.m...@ugent.be
   ---
   Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
 
  --
  Jim Holtman
  Cincinnati, OH
  +1 513 646 9390
 
  What is the problem that you are trying to solve?




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted

Re: [R] glmnet strange error message

2010-06-02 Thread Joris Meys
Could you give us the traceback? (In case you don't know, just type
traceback() right after you got the error message.) I can't reproduce the
error, so it gets a bit difficult to solve without having the real data.

Cheers
Joris

On Wed, Jun 2, 2010 at 6:51 PM, Dave_F friedenbe...@battelle.org wrote:


 Hello fellow R users,

 I have been getting a strange error message when using the cv.glmnet
 function in the glmnet package. I am attempting to fit a multinomial
 regression using the lasso. covars is a matrix with 80 rows and roughly
 4000
 columns, all the covariates are binary. resp is an eight level factor. I
 can
 fit the model with no errors but when I try to cross-validate after about
 30
 seconds I get the following:


  glmnet.fit = glmnet(covars,resp,family=multinomial)
  glmnet.cv = cv.glmnet(covars,resp,family=multinomial,type=class)
 Error in if (outlist$msg != Unknown error) return(outlist) :
  argument is of length zero

 It seems like it makes it through the first couple folds but trips up
 somewhere in the middle.
 The example in the documentation works perfectly on my machine. Any ideas
 on
 what the problem may be?

 Thanks!
 Dave
 --
 View this message in context:
 http://r.789695.n4.nabble.com/glmnet-strange-error-message-tp2240458p2240458.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why the dim gave me different results

2010-06-02 Thread Joris Meys
, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nnet: cannot coerce class c(terms, formula) into a data.frame

2010-06-02 Thread Joris Meys
Without checking R or the rest of the code, the error seems quite clear to
me: R finds a formula where it expects a data frame. cvc_lda is not a
dataframe. Do str(cvc_lda) to check for yourself. You really need to learn
this btw. Whenever you get an error, first thing to do is to check whether
everything you put in the function is what you think it is, and is what R
needs it to be.

Before you overload the help list with questions, please take some time to
read the introduction to R thoroughly. You really need to get to understand
the differences between vectors or arrays, matrices, data frames, lists, ...
You struggle with it quite obviously, and that's a problem we cannot solve
for you.

http://cran.r-project.org/doc/manuals/R-intro.pdf

If there is something that is not clear to you, feel free to ask here.

Cheers
Joris

On Wed, Jun 2, 2010 at 8:15 PM, cobbler_squad la.f...@gmail.com wrote:


 Dearest all,

 Objective: I am now learning neural networks. I want to see how well can
 train an artificial neural network model to discriminate between the two
 files I am attaching with this message.

 http://r.789695.n4.nabble.com/file/n2240582/3dMaskDump.txt 3dMaskDump.txt
 http://r.789695.n4.nabble.com/file/n2240582/test_vowels.txttest_vowels.txt

 Question: when I am attempting to run
 cvc_nnet - nnet(G ~ ., data=cvc_lda, size=1,iter=10,MaxNWts=100)
 I get an error saying:
 Error in as.data.frame.default(x[[i]], optional = TRUE) :
  cannot coerce class c(terms, formula) into a data.frame

 I have not encountered this error when I was running this script with
 previous lda results, and, I am not quite sure what the error means.

 Below is short (and, I hope, reproducible) code.

 library(nnet)

 cvc_nnet - nnet(G ~ ., data=cvc_lda, size=1,iter=10,MaxNWts=100)

 predict(cvc_nnet,cvc_lda,type = class)
 table(predict(cvc_nnet,cvc_lda,type = class),cvc_lda$G)

 cvc_nnet.out-NULL
 all = c(1:52)

 for(n in all){
   cvc_nnet - nnet(G ~ ., data=cvc_lda[all != n,], CV
 =TRUE,size=1,iter=10,MaxNWts=100)
cvc_nnet.out - c(cvc_nnet.out,predict(cvc_nnet,cvc_lda[all == n,],type
 =
 class))
 }

 table(cvc_nnet.out,cvc_lda$G)

 ===

 to get cvc_lda:

 library(MASS)

 vowel_features - data.frame(as.matrix(read.table(file =
 test_vowels.txt)))
 mask_features - data.frame(as.matrix(read.table(file = 3dmaskdump.txt)))
 G -vowel_features[,41]

 cvc_lda - lda(G ~ ., data=mask_features, na.action=na.omit, CV=TRUE)


 Your insight is very much appreciated it!

 --
 View this message in context:
 http://r.789695.n4.nabble.com/nnet-cannot-coerce-class-c-terms-formula-into-a-data-frame-tp2240582p2240582.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Use apply only on non-missing values

2010-06-02 Thread Joris Meys
Not really a direct answer on your question, but:
 system.time(replicate(1,apply(as.matrix(theta), 1, rasch, b_vector)))
   user  system elapsed
   4.510.034.55

 system.time(replicate(1,theta%*%t(b_vector)))
   user  system elapsed
   0.250.000.25

It does make a difference on large datasets...
Cheers
Joris

On Wed, Jun 2, 2010 at 4:44 PM, Doran, Harold hdo...@air.org wrote:

 I have a function that I am currently using very inefficiently. The
 following are needed to illustrate the problem:

 set.seed(12345)
 dat - matrix(sample(c(0,1), 110, replace = TRUE), nrow = 11, ncol=10)
 mis - sample(1:110, 5)
 dat[mis] - NA
 theta - rnorm(11)
 b_vector - runif(10, -4,4)
 empty - which(is.na(t(dat)))

 So, I have a matrix (dat) with some values within the matrix missing. In my
 real world problem, the matrix is huge, and most values are missing. The
 function in question is called derivs() and is below. But, let me step
 through the inefficient portions.

 First, I create a matrix of some predicted probabilities as:

 rasch - function(theta,b) 1/ (1 + exp(b-theta))
 mat - apply(as.matrix(theta), 1, rasch, b_vector)

 However, I only need those predicted probabilities in places where the data
 are not missing. So, the next step in the function is

 mat[empty] - NA

 which manually places NAs in places where the data are missing (notice the
 matrix 'mat' is the transpose of the data matrix and so I get the empty
 positions from the transpose of dat).

 Afterwards, the function computes the gradient and hessians needed to
 complete the MLE estimation.

 All of this works in the sense that it yields the correct answers for my
 problem. But, the glaring problem is that I create predicted probabilities
 for every cell in 'mat' when in many cases they are not needed. I end up
 replacing those values with NAs. In my real world problem, this is horribly
 inefficient and slow.

 My question is then is there a way to use apply such that is computes the
 necessary predicted probabilities only when the data are not missing to
 yield the matrix 'mat'. My desired end result is the matrix 'mat' created
 after the manually placing the NAs in the appropriate cells.

 Thanks
 Harold


 derivs - function(dat, b_vector, theta){
mat - apply(as.matrix(theta), 1, rasch,
 b_vector)
mat[empty] - NA
gradient - -(colSums(dat, na.rm = TRUE) -
 rowSums(mat, na.rm = TRUE))
hessian -  -(rowSums(mat * (1-mat), na.rm =
 TRUE))
list('gradient' = gradient, 'hessian' =
 hessian)
}



  sessionInfo()
 R version 2.10.1 (2009-12-14)
 i386-pc-mingw32

 locale:
 [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
 States.1252LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C   LC_TIME=English_United
 States.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 loaded via a namespace (and not attached):
 [1] tools_2.10.1
 

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-02 Thread Joris Meys
That's what one would expect with type III sum of squares. You have Phyto
twice in your model, but only as a nested factor. To compare the full model
with a model without diversity of zoop, you have either the combination
diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing.

Depending on how you stand on type III sum of squares, you could call that a
bug. Personally, I'd just not use them.

https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html

Cheers
Joris


On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote:

 Hello,

 I have been trying to get an ANOVA table for a linear model containing a
 single nested factor, two fixed factors and a covariate:

  carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto
 +
 Zoop*Diversity/Phyto)



 where, *Mean.richness* is a covariate*, Zoop* is a categorical variable
 (the
 species), *Diversity* is a categorical variable (Low or High), and
 *Phyto*(community composition) is also categorical but is nested
 within the level
 of *Diversity*. Quinn  Keough's statistics text recommends using Type III
 SS for a nested ANOVA with a covariate.

 I get the following output using the Type I SS ANOVA:



 Analysis of Variance Table
 Response: C.Mean
DfSum Sq
 Mean
 Sq  F valuePr(F)
 Mean.richness1  5638532656385326
 23.5855   3.239e-05 ***
 Diversity 1  14476593
  14476593
  6.0554 0.019634 *
 Zoop1  13002135
 13002135
  5.4387 0.026365 *
 Diversity:Phyto  6  126089387  21014898
 8.7904 1.257e-05 ***
 Diversity:Zoop   1  263036
 263036
 0.1100  0.742347
 Diversity:Zoop:Phyto 6  6171014510285024
 4.3021
   0.002879 **
 Residuals3174110911
 2390675

 I have tried using both the drop1() command and the Anova() command in the
 car package.

 When I use the Anova command I get the following error message:

 Anova(carbonmean,type=III)

 “Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,: One
 or more terms aliased in model.”



 I am not sure why this is aliased. There are no missing cells, and the
 cells
 are balanced (aside from for the covariate). Each Phyto by Zoop cross is
 replicated 3 times, and there are four Phyto levels within each level of
 Diversity. When I remove the nested factor (Phyto), I am able to get the
 Type III SS output.



 Then when I use drop1(carbonmean,.~.,Test=”F”) I get the following output:

  drop1(carbonmean,.~.,Test=F)

 Single term deletions



 Model:

 C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop *
 Diversity/Phyto

DfSum of Sq
 RSS AIC

 none74110911   718

 Mean.richness1  49790403123901314
 741

 Diversity 0 0
 74110911718

 Zoop0 0
 74110911718

 Diversity:Phyto  6  118553466  192664376
 752

 Diversity:Zoop   0  -1.49e-0874110911
 718

 Diversity:Zoop:Phyto 6  61710145135821055
 735



 There are zero degrees of freedom for Diversity, Zoop and their
 interaction,
 and zero sums of sq for Diversity and Zoop. This cannot be correct, however
 when I do the model simplification by dropping terms from the models
 manually and comparing them using anova(), I get virtually the same
 results.



 I would appreciate any suggestions for things to try or pointers as to what
 I may be doing incorrectly.



 Thank you.

 Anita Narwani.

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-02 Thread Joris Meys
that's diversity/phyto, zoop or phyto twice in the formula.

On Thu, Jun 3, 2010 at 3:00 AM, Joris Meys jorism...@gmail.com wrote:

 That's what one would expect with type III sum of squares. You have Phyto
 twice in your model, but only as a nested factor. To compare the full model
 with a model without diversity of zoop, you have either the combination
 diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing.

 Depending on how you stand on type III sum of squares, you could call that
 a bug. Personally, I'd just not use them.

 https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html

 Cheers
 Joris


 On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote:

 Hello,

 I have been trying to get an ANOVA table for a linear model containing a
 single nested factor, two fixed factors and a covariate:

  carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto
 +
 Zoop*Diversity/Phyto)



 where, *Mean.richness* is a covariate*, Zoop* is a categorical variable
 (the
 species), *Diversity* is a categorical variable (Low or High), and
 *Phyto*(community composition) is also categorical but is nested
 within the level
 of *Diversity*. Quinn  Keough's statistics text recommends using Type III
 SS for a nested ANOVA with a covariate.

 I get the following output using the Type I SS ANOVA:



 Analysis of Variance Table
 Response: C.Mean
DfSum Sq
 Mean
 Sq  F valuePr(F)
 Mean.richness1  5638532656385326
 23.5855   3.239e-05 ***
 Diversity 1  14476593
  14476593
  6.0554 0.019634 *
 Zoop1  13002135
 13002135
  5.4387 0.026365 *
 Diversity:Phyto  6  126089387  21014898
 8.7904 1.257e-05 ***
 Diversity:Zoop   1  263036
 263036
 0.1100  0.742347
 Diversity:Zoop:Phyto 6  6171014510285024
 4.3021
   0.002879 **
 Residuals3174110911
 2390675

 I have tried using both the drop1() command and the Anova() command in the
 car package.

 When I use the Anova command I get the following error message:

 Anova(carbonmean,type=III)

 “Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,:
 One
 or more terms aliased in model.”



 I am not sure why this is aliased. There are no missing cells, and the
 cells
 are balanced (aside from for the covariate). Each Phyto by Zoop cross is
 replicated 3 times, and there are four Phyto levels within each level of
 Diversity. When I remove the nested factor (Phyto), I am able to get the
 Type III SS output.



 Then when I use drop1(carbonmean,.~.,Test=”F”) I get the following output:

  drop1(carbonmean,.~.,Test=F)

 Single term deletions



 Model:

 C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop *
 Diversity/Phyto

DfSum of Sq
 RSS AIC

 none74110911   718

 Mean.richness1  49790403123901314
 741

 Diversity 0 0
 74110911718

 Zoop0 0
 74110911718

 Diversity:Phyto  6  118553466  192664376
 752

 Diversity:Zoop   0  -1.49e-0874110911
 718

 Diversity:Zoop:Phyto 6  61710145135821055
 735



 There are zero degrees of freedom for Diversity, Zoop and their
 interaction,
 and zero sums of sq for Diversity and Zoop. This cannot be correct,
 however
 when I do the model simplification by dropping terms from the models
 manually and comparing them using anova(), I get virtually the same
 results.



 I would appreciate any suggestions for things to try or pointers as to
 what
 I may be doing incorrectly.



 Thank you.

 Anita Narwani.

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Re: [R] storing output data from a loop that has varying row numbers

2010-06-01 Thread Joris Meys
There's something very unlogic in your code. You have the whole time the
same datafra

On Tue, Jun 1, 2010 at 1:51 PM, RCulloch ross.cull...@dur.ac.uk wrote:


 Hi All,

 I am trying to run a loop that will have varying numbers of rows with each
 output.

 Previously I have had the same number of rows so I would use (and I
 appreciate that this will no doubt achieve some gasps as being thoroughly
 inefficient!):

 xdfrow-(0)
 xdfrow1-(1:32)
 xdfrow2-(33:64)
 xdfrow3-(65:96)
 xdfrow4-(97:128)
 xdfrow5-(129:160)
 xdfrow6-(161:192)
 xdfrow7-(193:224)

 and so on

 xdf - matrix(999, nrow=1024, ncol=7)
 xdf - as.data.frame(xdf)
 NAM - c(NAME,ID2,DAY,BEH, B_FALSE, B_TRUE,TOTAL)
 colnames(xdf)-NAM

 I then use this matrix and then run the loop and assign the data to each of
 the xdfrows just doing +1 on each loop. (If that makes sense? Not really
 important, just trying to show that I do try and solve some of my own
 problems, albeit perhaps not in the best manner!)

 _

 However, the data I'm working with now has a very varied number of rows
 (0:2500) over a large data set and I can't work out how is best to do this.

 So my loop would be:

 for (i in 1:33){
SEL_DAY-seal_dist[seal_dist[,10]==i,]
print(paste(DAY, i, of 33))
for (s in 1:11){
SEL_HR-SEL_DAY[SEL_DAY[,5]==s,]
print(paste(HR, s, of 11))
indx - subset(SEL_HR, SEL_HR$DIST == 0)
SEL_HR$TO_ID - indx$ID[match(SEL_HR$TO, indx$TO)]}
 }

 where i is day and s is the hr within the day, the loop works fine because
 it prints as i expect it too. I have not given any info on the data because
 I assume this is more of a method question and will be very straight
 forward
 to most people on here!? But I am happy to post data if it is needed.

 I assume I need to set up a matrix before the loop,

 e.g. DIST_LOOP-matrix(NA,1000,ncol=11)

 and then I should be able to put something before the first } that allows
 me
 to add to the matrix, but everything I have tried doesn't work

 e.g. DIST_LOOP[[i]]-SEL_HR

 Any help would be much appreciated,

 Best wishes,

 Ross





 --
 View this message in context:
 http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238396.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] storing output data from a loop that has varying row numbers

2010-06-01 Thread Joris Meys
could you just give us the output of dput() for the data you copied in the
mail?

eg dput(seal_dist[,1:100])

and an example of how you want your output. I guess I get what you want to
do, but it's not what your code is doing. And it will be difficult to put
that in a matrix, as you have different labels and different numbers of
TO-levels for different days and HR values.

Cheers

On Tue, Jun 1, 2010 at 3:32 PM, RCulloch ross.cull...@dur.ac.uk wrote:


 Hi Ivan,

 Thanks for your help, your initial suggestion did not work, but that is no
 doubt down to my lack of making sense!

 Here is a short example of my dataset. Basically the loop is set up to
 match
 the ID with the TO column based on DIST = 0. So A1 = 2, A1.1 =1, A2 = 4,
 A2.1 = 3. That is fine for HR 9, but for HR 10 the numbers no longer match
 those IDs so I need to loop the data and store each loop - if that makes
 sense.


  FROM TO DIST  ID HR DD MM YY ANIMAL DAY
 1 1  1  2.63981'A1'  9 30  9  7  1   1
 2 1  2  0.0'A1'  9 30  9  7  1   1
 3 1  3  6.95836'A1'  9 30  9  7  1   1
 4 1  4  8.63809'A1'  9 30  9  7  1   1
 5 1  1  0.0  'A1.1'  9 30  9  7  7   1
 6 1  2  2.63981  'A1.1'  9 30  9  7  7   1
 7 1  3  8.03071  'A1.1'  9 30  9  7  7   1
 8 1  4  8.90896  'A1.1'  9 30  9  7  7   1
 9 1  1  8.90896'A2'  9 30  9  7  1   1
 101  2  8.63809'A2'  9 30  9  7  1   1
 111  3  2.85602'A2'  9 30  9  7  1   1
 121  4  0.0'A2'  9 30  9  7  1   1
 131  1  8.03071  'A2.1'  9 30  9  7  7   1
 141  2  6.95836  'A2.1'  9 30  9  7  7   1
 151  3  0.0  'A2.1'  9 30  9  7  7   1
 161  4  2.85602   A2.1'  9 30  9  7  7   1
 171  1  3.53695'A1' 10 30  9  7  1   1
 181  2  4.32457'A1' 10 30  9  7  1   1
 191  3  0.0'A1' 10 30  9  7  1   1
 201  4  8.85851'A1' 10 30  9  7  1   1
 211  5 12.09194'A1' 10 30  9  7  1   1
 221  1  7.44743  'A1.1' 10 30  9  7  7   1
 231  2  0.0  'A1.1' 10 30  9  7  7   1
 241  3  4.32457  'A1.1' 10 30  9  7  7   1
 251  4 13.16728  'A1.1' 10 30  9  7  7   1
 261  5 16.34761  'A1.1' 10 30  9  7  7   1
 271  1  6.13176'A2' 10 30  9  7  1   1
 281  2 13.16728'A2' 10 30  9  7  1   1
 291  3  8.85851'A2' 10 30  9  7  1   1
 301  4  0.0'A2' 10 30  9  7  1   1
 311  5  3.40726'A2' 10 30  9  7  1   1
 321  1  9.03345  'A2.1' 10 30  9  7  7   1
 331  2 16.34761  'A2.1' 10 30  9  7  7   1
 341  3 12.09194  'A2.1' 10 30  9  7  7   1
 351  4  3.40726  'A2.1' 10 30  9  7  7   1
 361  5  0.0  'A2.1' 10 30  9  7  7   1
 371  1  0.0 'MALE1' 10 30  9  7 12   1
 381  2  7.44743 'MALE1' 10 30  9  7 12   1
 391  3  3.53695 'MALE1' 10 30  9  7 12   1
 401  4  6.13176 'MALE1' 10 30  9  7 12   1
 411  5  9.03345 'MALE1' 10 30  9  7 12   1


 So the loop is:

 DIST_LOOP-matrix(NA,NA,ncol=11)

 for (i in 1:33){
SEL_DAY-seal_dist[seal_dist[,10]==i,]
 SEL_DAY[i]=dist[i]
 print(paste(DAY, i, of 33))
for (s in 1:11){
SEL_HR-SEL_DAY[SEL_DAY[,5]==s,]
print(paste(HR, s, of 11))
indx - subset(SEL_HR, SEL_HR$DIST == 0)
SEL_HR$TO_ID - indx$ID[match(SEL_HR$TO, indx$TO)]
 DIST_LOOP[i,]-SEL_HR
}
}

 But storing the data in the DIST_LOOP matrix doesn't work, I am just told
 in
 another post that a list might be better than a matrix?

 I hope this makes more sense!?

 Many thanks,

 Ross
 --
 View this message in context:
 http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238483.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] storing output data from a loop that has varying row numbers

2010-06-01 Thread Joris Meys
Is this what you're looking for?

seal_list - split(seal_dist,sel)

out - lapply(seal_list,function(x){
  indx - subset(x, x$DIST == 0)
  x$TO_ID - indx$ID[match(x$TO, indx$TO)]
  return(x)
})

output - unsplit(out,sel)

Cheers
Joris

On Tue, Jun 1, 2010 at 3:32 PM, RCulloch ross.cull...@dur.ac.uk wrote:


 Hi Ivan,

 Thanks for your help, your initial suggestion did not work, but that is no
 doubt down to my lack of making sense!

 Here is a short example of my dataset. Basically the loop is set up to
 match
 the ID with the TO column based on DIST = 0. So A1 = 2, A1.1 =1, A2 = 4,
 A2.1 = 3. That is fine for HR 9, but for HR 10 the numbers no longer match
 those IDs so I need to loop the data and store each loop - if that makes
 sense.


  FROM TO DIST  ID HR DD MM YY ANIMAL DAY
 1 1  1  2.63981'A1'  9 30  9  7  1   1
 2 1  2  0.0'A1'  9 30  9  7  1   1
 3 1  3  6.95836'A1'  9 30  9  7  1   1
 4 1  4  8.63809'A1'  9 30  9  7  1   1
 5 1  1  0.0  'A1.1'  9 30  9  7  7   1
 6 1  2  2.63981  'A1.1'  9 30  9  7  7   1
 7 1  3  8.03071  'A1.1'  9 30  9  7  7   1
 8 1  4  8.90896  'A1.1'  9 30  9  7  7   1
 9 1  1  8.90896'A2'  9 30  9  7  1   1
 101  2  8.63809'A2'  9 30  9  7  1   1
 111  3  2.85602'A2'  9 30  9  7  1   1
 121  4  0.0'A2'  9 30  9  7  1   1
 131  1  8.03071  'A2.1'  9 30  9  7  7   1
 141  2  6.95836  'A2.1'  9 30  9  7  7   1
 151  3  0.0  'A2.1'  9 30  9  7  7   1
 161  4  2.85602   A2.1'  9 30  9  7  7   1
 171  1  3.53695'A1' 10 30  9  7  1   1
 181  2  4.32457'A1' 10 30  9  7  1   1
 191  3  0.0'A1' 10 30  9  7  1   1
 201  4  8.85851'A1' 10 30  9  7  1   1
 211  5 12.09194'A1' 10 30  9  7  1   1
 221  1  7.44743  'A1.1' 10 30  9  7  7   1
 231  2  0.0  'A1.1' 10 30  9  7  7   1
 241  3  4.32457  'A1.1' 10 30  9  7  7   1
 251  4 13.16728  'A1.1' 10 30  9  7  7   1
 261  5 16.34761  'A1.1' 10 30  9  7  7   1
 271  1  6.13176'A2' 10 30  9  7  1   1
 281  2 13.16728'A2' 10 30  9  7  1   1
 291  3  8.85851'A2' 10 30  9  7  1   1
 301  4  0.0'A2' 10 30  9  7  1   1
 311  5  3.40726'A2' 10 30  9  7  1   1
 321  1  9.03345  'A2.1' 10 30  9  7  7   1
 331  2 16.34761  'A2.1' 10 30  9  7  7   1
 341  3 12.09194  'A2.1' 10 30  9  7  7   1
 351  4  3.40726  'A2.1' 10 30  9  7  7   1
 361  5  0.0  'A2.1' 10 30  9  7  7   1
 371  1  0.0 'MALE1' 10 30  9  7 12   1
 381  2  7.44743 'MALE1' 10 30  9  7 12   1
 391  3  3.53695 'MALE1' 10 30  9  7 12   1
 401  4  6.13176 'MALE1' 10 30  9  7 12   1
 411  5  9.03345 'MALE1' 10 30  9  7 12   1


 So the loop is:

 DIST_LOOP-matrix(NA,NA,ncol=11)

 for (i in 1:33){
SEL_DAY-seal_dist[seal_dist[,10]==i,]
 SEL_DAY[i]=dist[i]
 print(paste(DAY, i, of 33))
for (s in 1:11){
SEL_HR-SEL_DAY[SEL_DAY[,5]==s,]
print(paste(HR, s, of 11))
indx - subset(SEL_HR, SEL_HR$DIST == 0)
SEL_HR$TO_ID - indx$ID[match(SEL_HR$TO, indx$TO)]
 DIST_LOOP[i,]-SEL_HR
}
}

 But storing the data in the DIST_LOOP matrix doesn't work, I am just told
 in
 another post that a list might be better than a matrix?

 I hope this makes more sense!?

 Many thanks,

 Ross
 --
 View this message in context:
 http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238483.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] storing output data from a loop that has varying row numbers

2010-06-01 Thread Joris Meys
Sorry, forgot to add the sel. This is the first line, then just run the
rest.

sel - as.factor(paste(seal_dist[,10],-,seal_dist[,5],sep=))

cheers
Joris

On Tue, Jun 1, 2010 at 4:56 PM, Joris Meys jorism...@gmail.com wrote:

 Is this what you're looking for?

 seal_list - split(seal_dist,sel)

 out - lapply(seal_list,function(x){
   indx - subset(x, x$DIST == 0)
   x$TO_ID - indx$ID[match(x$TO, indx$TO)]
   return(x)
 })

 output - unsplit(out,sel)

 Cheers
 Joris

 On Tue, Jun 1, 2010 at 3:32 PM, RCulloch ross.cull...@dur.ac.uk wrote:


 Hi Ivan,

 Thanks for your help, your initial suggestion did not work, but that is no
 doubt down to my lack of making sense!

 Here is a short example of my dataset. Basically the loop is set up to
 match
 the ID with the TO column based on DIST = 0. So A1 = 2, A1.1 =1, A2 = 4,
 A2.1 = 3. That is fine for HR 9, but for HR 10 the numbers no longer match
 those IDs so I need to loop the data and store each loop - if that makes
 sense.


  FROM TO DIST  ID HR DD MM YY ANIMAL DAY
 1 1  1  2.63981'A1'  9 30  9  7  1   1
 2 1  2  0.0'A1'  9 30  9  7  1   1
 3 1  3  6.95836'A1'  9 30  9  7  1   1
 4 1  4  8.63809'A1'  9 30  9  7  1   1
 5 1  1  0.0  'A1.1'  9 30  9  7  7   1
 6 1  2  2.63981  'A1.1'  9 30  9  7  7   1
 7 1  3  8.03071  'A1.1'  9 30  9  7  7   1
 8 1  4  8.90896  'A1.1'  9 30  9  7  7   1
 9 1  1  8.90896'A2'  9 30  9  7  1   1
 101  2  8.63809'A2'  9 30  9  7  1   1
 111  3  2.85602'A2'  9 30  9  7  1   1
 121  4  0.0'A2'  9 30  9  7  1   1
 131  1  8.03071  'A2.1'  9 30  9  7  7   1
 141  2  6.95836  'A2.1'  9 30  9  7  7   1
 151  3  0.0  'A2.1'  9 30  9  7  7   1
 161  4  2.85602   A2.1'  9 30  9  7  7   1
 171  1  3.53695'A1' 10 30  9  7  1   1
 181  2  4.32457'A1' 10 30  9  7  1   1
 191  3  0.0'A1' 10 30  9  7  1   1
 201  4  8.85851'A1' 10 30  9  7  1   1
 211  5 12.09194'A1' 10 30  9  7  1   1
 221  1  7.44743  'A1.1' 10 30  9  7  7   1
 231  2  0.0  'A1.1' 10 30  9  7  7   1
 241  3  4.32457  'A1.1' 10 30  9  7  7   1
 251  4 13.16728  'A1.1' 10 30  9  7  7   1
 261  5 16.34761  'A1.1' 10 30  9  7  7   1
 271  1  6.13176'A2' 10 30  9  7  1   1
 281  2 13.16728'A2' 10 30  9  7  1   1
 291  3  8.85851'A2' 10 30  9  7  1   1
 301  4  0.0'A2' 10 30  9  7  1   1
 311  5  3.40726'A2' 10 30  9  7  1   1
 321  1  9.03345  'A2.1' 10 30  9  7  7   1
 331  2 16.34761  'A2.1' 10 30  9  7  7   1
 341  3 12.09194  'A2.1' 10 30  9  7  7   1
 351  4  3.40726  'A2.1' 10 30  9  7  7   1
 361  5  0.0  'A2.1' 10 30  9  7  7   1
 371  1  0.0 'MALE1' 10 30  9  7 12   1
 381  2  7.44743 'MALE1' 10 30  9  7 12   1
 391  3  3.53695 'MALE1' 10 30  9  7 12   1
 401  4  6.13176 'MALE1' 10 30  9  7 12   1
 411  5  9.03345 'MALE1' 10 30  9  7 12   1


 So the loop is:

 DIST_LOOP-matrix(NA,NA,ncol=11)

 for (i in 1:33){
SEL_DAY-seal_dist[seal_dist[,10]==i,]
 SEL_DAY[i]=dist[i]
 print(paste(DAY, i, of 33))
for (s in 1:11){
SEL_HR-SEL_DAY[SEL_DAY[,5]==s,]
print(paste(HR, s, of 11))
indx - subset(SEL_HR, SEL_HR$DIST == 0)
SEL_HR$TO_ID - indx$ID[match(SEL_HR$TO, indx$TO)]
 DIST_LOOP[i,]-SEL_HR
}
}

 But storing the data in the DIST_LOOP matrix doesn't work, I am just told
 in
 another post that a list might be better than a matrix?

 I hope this makes more sense!?

 Many thanks,

 Ross
 --
 View this message in context:
 http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238483.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

Re: [R] Help on aggregate method

2010-06-01 Thread Joris Meys
Take a look at
?split (and unsplit)

eg:
Dur - rnorm(100)
Attr1=rep(c(A,B),each=50)
Attr2=rep(c(A,B),times=50)

ap.dat -data.frame(Attr1,Attr2,Dur)

split.fact - paste(ap.dat$Attr1,ap.dat$Attr2)
ap.list -split(ap.dat,split.fact)
ap.mean -lapply(ap.list,function(x){
x$meanDur=rep(mean(x$Dur),dim(x)[1])
return(x)
  })

ap.dat.fast - unsplit(ap.mean,split.fact)

system.time on 1000 replicates gives :
 system.time(replicate(1000,{
+ split.fact - paste(ap.dat$Attr1,ap.dat$Attr2)
+ ap.list -split(ap.dat,split.fact)
+ ap.mean -lapply(ap.list,functi  [TRUNCATED]
   user  system elapsed
   4.880.004.88
 source(.trPaths[5], echo=TRUE, max.deparse.length=150)

 system.time(replicate(1000,{
+ avgDur - aggregate(ap.dat[[Dur]], by = list(ap.dat[[Attr1]],
+ ap.dat[[Attr2]]), FUN=mean)
+ meanDur - sapp  [TRUNCATED]
   user  system elapsed
  58.000.11   58.13


It should be a tenfold faster.

Cheers
Joris


On Tue, Jun 1, 2010 at 4:48 PM, Stella Pachidi stella.pach...@gmail.comwrote:

 Dear R experts,

 I would really appreciate if you had an idea on how to use more
 efficiently the aggregate method:

 More specifically, I would like to calculate the mean of certain
 values on a data frame,  grouped by various attributes, and then
 create a new column in the data frame that will have the corresponding
 mean for every row. I attach part of my code:

 matchMean - function(ind,dataTable,aggrTable)
 {
index - which((aggrTable[,1]==dataTable[[Attr1]][ind]) 
 (aggrTable[,2]==dataTable[[Attr2]][ind]))
as.numeric(aggrTable[index,3])
 }

 avgDur - aggregate(ap.dat[[Dur]], by = list(ap.dat[[Attr1]],
 ap.dat[[Attr2]]), FUN=mean)
 meanDur - sapply((1:length(ap.dat[,1])), FUN=matchMean, ap.dat, avgDur)
 ap.dat - cbind (ap.dat, meanDur)

 As I deal with very large dataset, it takes long time to run my
 matching function, so if you had an idea on how to automate more this
 matching process I would be really grateful.

 Thank you very much in advance!

 Kind regards,
 Stella



 --
 Stella Pachidi
 Master in Business Informatics student
 Utrecht University

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] storing output data from a loop that has varying row numbers

2010-06-01 Thread Joris Meys
OK, then I was right. It's exactly what my code does.
Enjoy.
Cheers

On Tue, Jun 1, 2010 at 4:25 PM, RCulloch ross.cull...@dur.ac.uk wrote:


 Hi Joris,

 Thanks for your help!

 The data as requested:

 structure(list(FROM = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
TO = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), DIST = c(2.63981,
0, 6.95836, 8.63809, 0, 2.63981, 8.03071, 8.90896, 8.90896,
8.63809, 2.85602, 0, 8.03071, 6.95836, 0, 2.85602, 3.53695,
4.32457, 0, 8.85851, 12.09194, 7.44743, 0, 4.32457, 13.16728,
16.34761, 6.13176, 13.16728, 8.85851, 0, 3.40726, 9.03345,
16.34761, 12.09194, 3.40726, 0, 0, 7.44743, 3.53695, 6.13176,
9.03345), ID = structure(c(12L, 12L, 12L, 12L, 11L, 11L,
11L, 11L, 14L, 14L, 14L, 14L, 13L, 13L, 13L, 143L, 12L, 12L,
12L, 12L, 12L, 11L, 11L, 11L, 11L, 11L, 14L, 14L, 14L, 14L,
14L, 13L, 13L, 13L, 13L, 13L, 94L, 94L, 94L, 94L, 94L), .Label =
 c('11.1',
'15.1', '15.5', '18.1', '24.2', '26.1', '26.2',
'28.3', '4.2', '7.1', 'A1.1', 'A1', 'A2.1', 'A2',
'B1', 'C1', 'D1.1', 'D1', 'D2.1', 'D2', 'D3.1',
'D3', 'D4.1', 'D4', 'D5.1', 'D5', 'D6.1', 'D6',
'E1.1', 'E1', 'E2.1', 'E2', 'E4', 'E5', 'F1.1',
'F1', 'F10.1', 'F10', 'F11', 'F2', 'F3', 'F4.1',
'F4', 'F5.1', 'F5', 'F7', 'F8.1', 'F8', 'G2.1',
'G2', 'G3.1', 'G3', 'G4.1', 'G4', 'G5.1', 'G5',
'H1.1', 'H1', 'H2', 'H3.1', 'H3', 'H8', 'I1.1',
'I1', 'I2', 'I4.1', 'I4', 'J1.1', 'J1', 'J2.1',
'J2', 'J3', 'J6', 'J7', 'JUV', 'K1.1', 'K1',
'K2', 'K3', 'K4.1', 'K4', 'L1.1', 'L1', 'L2.1',
'L2', 'L4', 'M1', 'M2.1', 'M2', 'M3.1', 'M3',
'M4.1', 'M4', 'MALE1', 'N1.1', 'N1', 'N2', 'N3',
'N4.1', 'N4', 'O1', 'O2', 'O3.1', 'O3', 'O4.1',
'O4', 'O5', 'P1.1', 'P1', 'Q1', 'Q2', 'Q3',
'R1.1', 'R1', 'R2', 'R3.1', 'R3', 'R4.1', 'R4',
'R5.1', 'R5', 'S1.1', 'S1', 'S2.1', 'S2', 'S3.1',
'S3', 'S4.1', 'S4', 'T1', 'U1.1', 'U1', 'U2',
'U3', 'UKFEM', 'UKMAL', 'UKPUP', 'V1.1', 'V1',
'W1.1', 'W1', 'WR', A2.1'), class = factor), HR = c(9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L), DD = c(30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L), MM = c(9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), YY = c(7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L), ANIMAL = c(1L, 1L, 1L, 1L, 7L, 7L,
7L, 7L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 1L, 1L, 1L, 1L, 1L,
7L, 7L, 7L, 7L, 7L, 1L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 7L,
12L, 12L, 12L, 12L, 12L), DAY = c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L)), .Names = c(FROM, TO, DIST, ID,
 HR, DD, MM, YY, ANIMAL, DAY), row.names = c(NA, 41L
 ), class = data.frame)



 The output should be as the original file is, but it should have an
 additional column for 'TO_ID'

 I hope that makes sense?

 Cheers,

 Ross

 --
 View this message in context:
 http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238576.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help spam detection; please help the moderators

2010-06-01 Thread Joris Meys
Hi all,

I also couldn't help but notice that some of my messages are bounced for
following reason:

   The message headers matched a filter rule

I included the header of one of the messages below, but neither of these
messages is sent trough Nabble, nor does any mail address has digits in it.
I also never had that before. Did you change some of the rules somehow?

Cheers
Joris

---

MIME-Version: 1.0
Received: by 10.140.173.9 with HTTP; Fri, 28 May 2010 05:32:32 -0700 (PDT)
In-Reply-To: aanlktim9etuy2efynloh2lyn7m133ytjencdjpkgp...@mail.gmail.com
References: aanlktikgc7v2zbsyrwcwbueezm8d24qj0vqeb2z1n...@mail.gmail.com
aanlktim9etuy2efynloh2lyn7m133ytjencdjpkgp...@mail.gmail.com
Date: Fri, 28 May 2010 14:32:32 +0200
Delivered-To: jorism...@gmail.com
Message-ID: aanlktimg4idyivhe1ek9mk6_rybjcnuu4msvwrvts...@mail.gmail.com
Subject: Re: [R] How to get values out of a string using regular expressions?
From: Joris Meys jorism...@gmail.com
To: Gabor Grothendieck ggrothendi...@gmail.com
Cc: R mailing list r-help@r-project.org
Content-Type: multipart/alternative; boundary=000e0cd2295481515c0487a6b3be

--000e0cd2295481515c0487a6b3be
Content-Type: text/plain; charset=ISO-8859-1



On Tue, Jun 1, 2010 at 3:25 PM, Martin Maechler
maech...@stat.math.ethz.chwrote:

 Dear readers of R-help

 as most of you will *not* be aware, R-help has continued to work the
 way it does, only thanks to a dozen of volunteers,
 see https://stat.ethz.ch/mailman/listinfo/r-help .

 The volunteers manually moderate e-mails that look like spam (and
 sometimes are and sometimes are not).
 While much more than 90% of the spam is filtered out long before
 a human sees it, with the increasing sophistication of spammers,
 manual intervention has deemed to be necessary and served the
 community very well.

 OTOH, in recent weeks, the amount of work for the volunteers has
 increased, mainly because an increasingly number of non-spam postings are
 erronously tagged as possibly spam.
 We have discussed about this and done some analysis and found
 that most of these message that produce a considerable amount of
 extra work share two properties :
  1) they are posted via Nabble  {which *always* attaches a small
 pro-Nabble spam at the end of the message}
  2) the e-mail address of the sender is from a freemail
provider, quite often 'at gmail dot com', and often the part
*before* the '@' (at-sign) ends with digits.

 We hereby ask those among you who use a freemail account to
 please no longer post via nabble.

 Thank you for your support of R-help, *the* community mailing
 list of the R project since even before that project existed
 formally, namely since 1997-04-01,
 today 13 years and two months.

 Martin Maechler, ETH Zurich
 (and R-help creator and principal manager)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.date

2010-06-01 Thread Joris Meys
Change this line to :

pose$CREATED.DATE=as.Date(pose$CREATED.DATE,%d/%m/%Y) # mind the capital
Y
pose
  DESCRIPTION CREATED.DATE QUANITY CLOSING.PRICE
1 COTTON NO.2 Jul/10   2010-05-13   1   81.2000
2 COTTON NO.2 Jul/10   2010-05-13   1   81.2000
3   PALLADIUM Jun/10   2010-05-14  -1  503.6000
4   PALLADIUM Jun/10   2010-05-14  -1  503.6000
5 SUGAR NO.11 Jul/10   2010-05-10   1   13.8900
6 SUGAR NO.11 Jul/10   2010-05-10   1   13.8900

Cheers
Joris

On Tue, Jun 1, 2010 at 5:57 PM, arnaud Gaboury arnaud.gabo...@gmail.comwrote:

 Dear group,

 Here is my df (obtained with a read.csv2()):


 df -
 structure(list(DESCRIPTION = c(COTTON NO.2 Jul/10, COTTON NO.2 Jul/10,
 PALLADIUM Jun/10, PALLADIUM Jun/10, SUGAR NO.11 Jul/10,
 SUGAR NO.11 Jul/10), CREATED.DATE = c(13/05/2010, 13/05/2010,
 14/05/2010, 14/05/2010, 10/05/2010, 10/05/2010), QUANITY = c(1,
 1, -1, -1, 1, 1), CLOSING.PRICE = c(81.2000, 81.2000, 503.6000,
 503.6000, 13.8900, 13.8900)), .Names = c(DESCRIPTION,
 CREATED.DATE, QUANITY, CLOSING.PRICE), row.names = c(NA,
 6L), class = data.frame)

  str(df)
 'data.frame':   6 obs. of  4 variables:
  $ DESCRIPTION  : chr  COTTON NO.2 Jul/10 COTTON NO.2 Jul/10 PALLADIUM
 Jun/10 PALLADIUM Jun/10 ...
  $ CREATED.DATE : chr  13/05/2010 13/05/2010 14/05/2010 14/05/2010
 ...
  $ QUANITY  : num  1 1 -1 -1 1 1
  $ CLOSING.PRICE: chr  81.2000 81.2000 503.6000 503.6000 ...

 I want to change the class of df$CREATED.DATE from Chr to Date:


 pose$CREATED.DATE=as.Date(pose$CREATED.DATE,%d/%m/%y)

 Here is what I get :

 df -
 structure(list(DESCRIPTION = c(COTTON NO.2 Jul/10, COTTON NO.2 Jul/10,
 PALLADIUM Jun/10, PALLADIUM Jun/10, SUGAR NO.11 Jul/10,
 SUGAR NO.11 Jul/10), CREATED.DATE = structure(c(18395, 18395,
 18396, 18396, 18392, 18392), class = Date), QUANITY = c(1,
 1, -1, -1, 1, 1), CLOSING.PRICE = c(81.2000, 81.2000, 503.6000,
 503.6000, 13.8900, 13.8900)), .Names = c(DESCRIPTION,
 CREATED.DATE, QUANITY, CLOSING.PRICE), row.names = c(NA,
 6L), class = data.frame)

 Where does the problem comes from?? Maybe from my sytem date ??

 TY for any help

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issue with assigning text to matrix

2010-06-01 Thread Joris Meys
Hi Jessica,

this tells me that your text is saved as a factor.
Try :
names - read.csv(file=Names.csv,stringsAsFactors=F)


Cheers
Joris

On Tue, Jun 1, 2010 at 11:04 AM, Jessica Queree
j.j.que...@googlemail.comwrote:

 My issue relates to adding text to a matrix and finding that the text is
 converted to a number.



 This is the section of code I'm having trouble with:



 # First, I load in a list of names from a .csv file to 'names'

 names - read.csv(file(Names.csv))



 # Then I define a matrix which will be populated with various test
 statistics, with several rows for each entry in names



 testOutput -matrix(nrow = 200, ncol = 5)

 for (i in 1:nrow(names)){



testOutput[i,1] - names[i,1]

testOutput[i,2] - names[i,2]



# test statistics code here



 }





 If I look at names[,1], I get the following:



 names[,1]

  [1] EQ_Level_UK   EQ_Level_EUR  EQ_Level_US   EQ_Level_Far
 East

  [5] IR_PC 1_UKIR_PC 2_UKIR_PC 3_UKSwap_PC 1_UK

  [9] Swap_PC 2_UK  Swap_PC 3_UK  FX_Level_EUR  FX_Level_US

 [13] FX_Level_Far East Infl_PC 1_UK  Infl_PC 2_UK  Infl_PC 3_UK

 [17] Prop_Level_UK CreditAAA_PC 1_UK CreditAAA_PC 2_UK CreditAAA_PC
 3_UK

 [21] CreditAA_PC 1_UK  CreditAA_PC 2_UK  CreditAA_PC 3_UK  CreditA_PC 1_UK

 [25] CreditA_PC 2_UK   CreditA_PC 3_UK   CreditBBB_PC 1_UK CreditBBB_PC
 2_UK

 [29] CreditBBB_PC 3_UK

 29 Levels: CreditA_PC 1_UK CreditA_PC 2_UK CreditA_PC 3_UK ... Swap_PC 3_UK



 But if I look at testOutput[,1], I get:



 testOutput[,1]

  [1] 15 13 16 14 23 24 25 27 28 29 17 19 18 20
 21

  [16] 22 26 7  8  9  4  5  6  1  2  3  10 11 12
 17

  [31] NA   NA   19 18 NA   NA   NA   20 NA   NA   21 NA   NA   22
 NA

  [46] NA   26 NA   NA   7  NA   NA   8  NA   NA   9  NA   NA   4
 NA

  [61] NA   5  NA   NA   6  NA   NA   1  NA   NA   2  NA   NA   3
 NA

  [76] NA   10 NA   NA   11 NA   NA   12 NA   NA   NA   NA   NA   NA
 NA

  [91] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
 NA

 [106] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
 NA

 [121] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
 NA

 [136] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
 NA

 [151] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
 NA

 [166] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
 NA

 [181] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
 NA

 [196] NA   NA   NA   NA   NA



 That is, the names are now converted to numbers. I think this might have
 something to do with the way I've defined the testOutput matrix, but
 haven't
 been able to find any information about how to fix it. Can anyone help?



 Many thanks.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] any doc to understand arima state space model?

2010-06-01 Thread Joris Meys
Type in Google

Arima R

Read the first hit, the third, the fifth, and any other that says tutorial

Cheers
Joris

On Tue, Jun 1, 2010 at 4:14 PM, shakira M m.shak...@gmail.com wrote:

 I am trying to understand R arima function. Any pointers would be
 appreciated.

 Thank you,
 Shakira.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BreastCancer Dataset for Classification in kknn

2010-06-01 Thread Joris Meys
Hi Nitin,

It can be solved by splitting your data a bit different. You need more
training data than you have evaluation data, eg :
i1 = 1:400
i2=401:d

Then it works on my computer. No clue as to where the error originates from
though.

Cheers
Joris

On Tue, Jun 1, 2010 at 4:27 PM, Nitin niti...@gmail.com wrote:

 Dear All,

 I'm getting a error while trying to apply the BreastCancer dataset
 (package=mlbench) to kknn (package=kknn) that I don't understand as I'm new
 to R.
 The codes are as follow:

 rm = (list = ls())
 library(mlbench)
 data(BreastCancer)
 library(kknn)

 BCancer = na.omit(BreastCancer)
 d  = dim(BCancer)[1]
 i1 = seq(1, d, 2)
 i2 = seq(2, d, 2)

 t1 = BCancer[i1, ]
 t2 = BCancer[i2, ]
 y2  = BCancer[i2, 11]

 x = 10
 k = array(1:x, dim = c(x,1))
 ker = array(c( rectangular, triangular, epanechnikov, biweight,
triweight, cos, inv, gaussian), dim = c(8,1))

 f = function(x, ker){

BreastCancer.kknn  -  kknn(Class~., train = t1, test = t2, k = x,
kernel = ker, distance = 1)
fit = fitted(BreastCancer.kknn)

z - (fit==y2)
z.e - (100 - (length(y2)-length(z[!z]))/length(y2)*100 )
 }

 err.k = function(ker){
error.BreastCancer = apply(k,1,function(y) f(y, ker))
 }

 err.ker = apply(ker, 1, err.k)
 colnames(err.ker) = c(rectangular, triangular, epanechnikov,
 biweight,
triweight, cos, inv, gaussian)
 print(err.ker)

 It throws a error: Error in as.matrix(learn[, ind == i]) :
  (subscript) logical subscript too long
 In addition: Warning messages:
 1: In model.matrix.default(mt, mf) : variable 'Id' converted to a factor
 2: In model.matrix.default(mt, test) : variable 'Id' converted to a factor

 I tried the codes with other datasets in mlbench package and most of them
 working. That is the mistake here for this particular dataset and how can I
 solve it?

 Thanks
 Nitin

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regexpr help (match.length=0)

2010-06-01 Thread Joris Meys
Dear all,

It sounds as if regexp works according to the same rules as Perl, very
nicely explained in:
http://blob.perl.org/books/beginning-perl/3145_Chap05.pdf

Yet, I couldn't help but wonder if there are also differences in behaviour.
I couldn't find any yet, but there must be some. Anybody care to elaborate
on this?

Cheers
Joris

On Wed, Jun 2, 2010 at 1:05 AM, Matt Shotwell shotw...@musc.edu wrote:

 On Tue, 2010-06-01 at 16:43 -0400, Erik Iverson wrote:
 
  McGehee, Robert wrote:
   R-help,
   Sorry if this is more of a regex question than an R question. However,
   help would be appreciated on my use of the regexpr function.
  
   In the first example below, I ask for all characters (a-z) in 'abc123';
   regexpr returns a 3-character match beginning at the first character.
  
   regexpr([[:alpha:]]*, abc123)
   [1] 1
   attr(,match.length)
   [1] 3
  
   However, when the text is flipped regexpr, and I ask for a match of all
   characters in '123abc', regexpr returns a zero-character match
 beginning
   at the first character. Can someone explain what a zero length match
   means (i.e. why not return -1), and why the result isn't 4,
   match.length=3?
 
  It means it matches 0 characters, which is fine since you use *, which
  means match 0 or more occurrences of the regex.  It sounds like you want
  + instead of *.  Also see gregexpr.

 Also, regular expressions try to match as early as possible. That's why
 the match is at position one of length zero, and not at position four of
 length three.

 Matt Shotwell
 Graduate Student
 Division of Biostatistics and Epidemiology
 Medical University of South Carolina

  
   regexpr([[:alpha:]]*, 123abc)
   [1] 1
   attr(,match.length)
   [1] 0
  
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to label the som notes by the majority vote

2010-06-01 Thread Joris Meys
Dear Changbin,

Please provide a self-contained, minimal example, meaning the whole code
should run and create the plot as it is now, without having to load your
dataset (which we don't have). Otherwise it's impossible to see what's going
on and help you.

Cheers
Joris

On Wed, Jun 2, 2010 at 2:21 AM, Changbin Du changb...@gmail.com wrote:

 HI, Dear R community,

 I am using the following codes to do the som. I tried to label the notes by
 the majority vote. either through mapping or prediction.
 I attached my output, the left one dont have any labels in the note, the
 right one has  more than one label in each note. I need to have only one
 label for each note either by majority vote or prediction.

 Can anyone give some suggestions or advice? Thanks so much!



 alex-read.table(/home/cdu/operon/alex2.txt, , sep=\t, skip=0,
 header=T,
 fill=T)
 alex1-alex[,c(1:257)]
 levels(alex1$Label)

 alex1$outcome-as.numeric(alex1$Label)
 alex1$outcome[1:20]


 #self-organizing maps(unsupervised learning)
 library(kohonen)


 #SOM, the supervised learning, train the map using outcome as the class
 variable.
 set.seed(13)
 final.xyf- xyf(data=as.matrix(alex1[,c(1:256)]),
 Y=classvec2classmat(alex1$outcome), xweight = 0.99, grid=somgrid(20, 30,
 hexagonal))


 outcome.xyf - predict(final.xyf)$unit.prediction#get prediction
 outcome.predict- as.numeric(classmat2classvec(outcome.xyf)) #change matrix
 to vectors.

 outcome.label-LETTERS[outcome.predict] #conver the numeric value to
 letters.


 plot(final.xyf, type=property, property=outcome.predict,
 labels=outcome.label, palette.name =rainbow, main=Prediction )



 cl - colors()
 bgcols - cl[2:14]
 plot(final.xyf, type=mapping, labels=outcome.label, col=black,
 bgcol=bgcols[as.integer(outcome.predict)],
  main=Mapping plot)




 --
 Sincerely,
 Changbin
 --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear Discriminant Analysis in R

2010-05-31 Thread Joris Meys
, 0.052999774, 0.513440813,
 0.402895033, 0.201576687, 0.076826481), V7 = c(0.642136394, 0.099776129,
 0.148801865, 0.603051825, 0.440594157, 0.215038249, 0.531623479,
 0.534920743, 0.45784502, 0.080887221), V8 = c(0.016004048, 0.519115043,
 0.149317949, 0.088362708, 0.705002368, 0.185590863, 0.434963787,
 0.847410734, 0.78777694, 0.443995646, 0.53903599), V9 = c(0.400620271,
 0.918472003, 0.446820588, 0.310981412, 0.734013866, 0.172112916
 ), V10 = c(0.532136091, 0.350028839, 0.40424688, 0.607395545,
 0.392450857, 0.306530929, 0.756277707, 0.63606622, 0.718866192,
 0.258778101)), .Names = c(V1, V2, V3, V4, V5, V6,
 V7, V8, V9, V10), class = data.frame, row.names = c(NA,
 -671L))


 Thank you once more for your help. I really can not say it enough.

 ps. original files i work with are attached.

 Cobbler.

 http://r.789695.n4.nabble.com/file/n2236083/3dMaskDump.txt 3dMaskDump.txt
 http://r.789695.n4.nabble.com/file/n2236083/vowel_features.txt
 vowel_features.txt


 --
 View this message in context:
 http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2236083.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about heatmap

2010-05-31 Thread Joris Meys
Hi,

Take a look at the heatmap.2 function in the library gplots, and the
brewer.pal in the library RColorBrewer. With this combination you have a far
bigger flexibility on the colors and the output, plus you get a colorcoded
legend. There used to be a bug in that function distorting the legend when
breaks with unequal intervals were used, but I've adapted the function
myself to work also in that case. If you need it, feel free to contact me.

Cheers
Joris

On Mon, May 31, 2010 at 9:54 AM, 孟欣 lm_meng...@163.com wrote:

 Hi all:
 As to the heatmap function, the default style is red and yellow,and red
 refers to low level and yellow refers to high level.
 How can I change the style to the contrary: red refers to high level and
 yellow refers to low level?

 Thanks a lot!
 My best
[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What does LOESS stand for?

2010-05-31 Thread Joris Meys
This is the paper on which the loess algorithm is based in general:
http://www.econ.pdx.edu/faculty/KPL/readings/cleveland88.pdf

The explanation about the origin of the term LOESS is given on page 597.

Cheers
Joris

On Mon, May 31, 2010 at 11:33 AM, Peter Neuhaus pneuh...@pneuhaus.dewrote:

 Dear R-community,

 maybe someone can help me with this:

 I've been using the loess() smoother for quite a while now, and for
 the matter of documentation I'd like to resolve the acronym LOESS.
 Unfortunately there's no explanation in the help file, and I didn't
 get anything convincing from google either.

 I know that the predecessor LOWESS stands for Locally Weighted
 Scatterplot Smoothing. But what does LOESS stand for, specifically?
 Locally Weighted Exponential Scatterplot Smoothing? As far as
 I understand LOESS is still a local polynomial regression, so that
 would probably make no sense.

 Any help appreciated!

 Thanks in advance,

 Peter

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] missing values in autocorelation

2010-05-31 Thread Joris Meys
Could you specify the problem and give a minimal example that represents
your datastructure and reproduces the error? See also the posting guides :
http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html

Cheers
Joris

On Mon, May 31, 2010 at 1:12 PM, nuncio m nunci...@gmail.com wrote:

 Hi all,
 I am trying to find the autocorrelation of some time series.  I
 have say 100 files, some files have only missing values(-99.99, say). I
 dont
 want to exclude these files as they represent some points in a grid.  But
 when the acf command is issued i get an error.
 Error in plot.window(...) : need finite 'ylim' values
 In addition: Warning messages:
 1: In min(x) : no non-missing arguments to min; returning Inf
 2: In max(x) : no non-missing arguments to max; returning -Inf

 Is this because of all the values in the time series is the same, if so How
 can I specify a bad value when the acf command is issued.  Also is it
 possible to return a flag(like, -999) of length the maximum lag for acf of
 bad grid points so that I can keep the number of files same for input and
 output

 Thanks
 nuncio

 --
 Nuncio.M
 Research Scientist
 National Center for Antarctic and Ocean research
 Head land Sada
 Vasco da Gamma
 Goa-403804

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to delete the previously saved workspace restored

2010-05-31 Thread Joris Meys
If you start R, type :

unlink(.RData)

This deletes the workspace file.
Cheers
Joris

On Mon, May 31, 2010 at 11:10 AM, Yanwei Tan t...@nbio.uni-heidelberg.dewrote:

 Dear all,

 I am a new user of R, here I have a question about remove the previous
 restored workspace.  I saved the workspace last time, but R always
 automatically load the workspace when I open it.  I try to remove the object
 and then close R without saving. But next time when I open R, it always load
 the previous workspace. I want to delete the .RData in the directory, but I
 have no clue where is the .RData directory.

 The message is Workspace restored from /Users/wei/.RData

 How could I avoid from this directory? because there is a dot before, I do
 not know where I can find this file.

 Also I already try this command : rm(list=ls())   But R still load the
 previous workspace.

 With many thanks for any advice!!

 Best,
 Wei

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems with apply

2010-05-31 Thread Joris Meys
Ivan is -partly- right. However, in the details it says as well that :
If X is not an array but has a dimension attribute, apply attempts to coerce
it to an array via as.matrix if it is two-dimensional (e.g., data frames) or
via as.array.

The main problem is the fact that what goes into the PromP function is not a
dataframe, not even a matrix, but a vector.  You can easily see where it
goes wrong if you place

print(str(HistRio))

as a first line in your function. You'll also see that (hopefully) it's a
named vector, meaning you could try to rewrite your function like :
if(length(which(AnaQuim$SecSte==HistRio[SecSte]))0){ xx[1]-1 }
etc...

I didn't test it out though, but it should work.

Cheers
Joris

On Mon, May 31, 2010 at 5:16 PM, Luis Felipe Parra 
felipe.pa...@quantil.com.co wrote:

 Hello I am tryin to use the apply functions with two data frames I've got
 and I am getting the following error message

 Error en HistRio$SecSte : $ operator is invalid for atomic vectors

  I don't understand why. when I use the apply I am doing:

 PromP - function(HistRio,AnaQuim){
 xx - c(0,0,0)
 if(length(which(AnaQuim$SecSte==HistRio$SecSte))0){ xx[1]-1 }
 if(length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FinCorte)))0){
 xx[2]
 - 1}
 if( length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FechaSiembra)))0){
 xx[3]-1 }
 if( length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FechaSiembra)))0 
 length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FinCorte)))0 ){ xx[4]
 - 2}
 return(xx)
 }
 zz- apply(HistRio,1,PromP,AnaQuim)
 and if I do exactly the same with a for

 xx - matrix(0,nrow(HistRio),4)
 for(i in 1:nrow(HistRio)){
 if(length(which(AnaQuim$SecSte==HistRio$SecSte[i]))0){ xx[1]-1 }
 if(length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FinCorte[i])))0){
 xx[2] - 1}
 if(
 length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FechaSiembra[i])))0){
 xx[3]-1 }
 if(
 length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FechaSiembra[i])))0
  length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FinCorte[i])))0 ){
 xx[4] - 2}
 }

 I get no error message. Attached is the data I am using. Any idea of why
 this is happening?

 Thank you

 Felipe Parra

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear Discriminant Analysis in R

2010-05-29 Thread Joris Meys
It's not your questions, Cobbler, but could you PLEASE just do what we asked
for?
Copy-paste the following in R and copy-paste ALL output you get in your next
mail.

test.vowel - vowel_features[,1:10]
test.mask - mask_features[,1:10]
dput(test.vowel)
dput(test.mask)

I don't know whether your vowel_features is a list or a data-frame (which is
technically also a list). But I know for sure that vowel_features[15] is NOT
giving you a column. Probably it has to be vowel_features[,15]. So start
with that one, and I'll take a look at the rest to get your lda running.

Cheers
Joris

On Sat, May 29, 2010 at 6:53 PM, cobbler_squad la.f...@gmail.com wrote:


 Thanks for being patient with me.

 I guess my problem is with understand how grouping in this particular case
 is used:

 one of the sample codes I found online
 (http://www.statmethods.net/advstats/discriminant.html)
 library(MASS)
 fit - lda(G ~ x1 + x2 + x3, data=mydata, na.action=na.omit, CV=TRUE)

 the mydata file in my case is the 3dmaskdump file with 52 columns and 671
 rows (all values range between 0 and 1 after they're scaled)

 the other file, what I assumed was the grouping file (or the
 vowel_feature) is the file that defines features for the vowels (i.e.
 column 1 of the file is vowel name (a, i, u) and every other column in a
 distinct combination of 0's and 1's defining the vowel (so this file has 26
 columns and 254 rows). Therefore, every column that follows represents a
 particular feature of that vowel.. (hope this makes sense!!)

 So, the reason I wanted to return G - vowel_feature[15] in my previous
 post
 is because I need to extract a column that represents backness of the
 vowel  (while other columns represent roundedness, nasalization
 features, etc). So what (in my mind) G - vowel_feature[15] would return is
 1 column which is 254 rows long with 0's and 1's in it.
 i.e.

 1   0
 2   1
 3   1
 4   0
 ...
 ..
 .
 2541

 I am a novice with R (so I know my questions are pretty dumb!), but I
 really
 hope I clarified my confusion a bit better.  I very much appreciate your
 help.

 Looking forward to your replies.

 Thank you again,
 Cobbler


 --
 View this message in context:
 http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2235777.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difference in sort order linux/Windows (R.2.11.0)

2010-05-28 Thread Joris Meys
Pretty obvious: You use different locales (collate). What happens if you use
the same on both machines?

Cheers
Joris

On Fri, May 28, 2010 at 10:17 AM, carslaw david.cars...@kcl.ac.uk wrote:


 Dear R users,

 I'm a bit perplexed with the effect sort has here, as it is different on
 ...
  the linux order is perhaps more intuitive.  However, the problem is the
 order is inconsistent between
  the two systems.  Any suggestions?

 sessionInfo()
 R version 2.11.0 (2010-04-22)
 x86_64-pc-linux-gnu

 locale:
  [1] LC_CTYPE=en_GB.utf8  LC_NUMERIC=C
  [3] LC_TIME=en_GB.utf8   LC_COLLATE=en_GB.utf8
  [5] LC_MONETARY=en_GB.utf8   LC_MESSAGES=en_GB.utf8
  [7] LC_PAPER=en_GB.utf8  LC_NAME=en_GB.utf8
  [9] LC_ADDRESS=en_GB.utf8LC_TELEPHONE=en_GB.utf8
 [11] LC_MEASUREMENT=en_GB.utf8LC_IDENTIFICATION=en_GB.utf8
 ...
  sessionInfo()
 R version 2.11.0 (2010-04-22)
 x86_64-pc-mingw32

 locale:
 [1] LC_COLLATE=English_United Kingdom.1252
 [2] LC_CTYPE=English_United Kingdom.1252
 [3] LC_MONETARY=English_United Kingdom.1252
 [4] LC_NUMERIC=C
 [5] LC_TIME=English_United Kingdom.1252
 ...
 Dr David Carslaw
 King's College London
 Environmental Research Group
 Franklin Wilkins Building
 150 Stamford Street
 London
 SE1 9NH
 --
 View this message in context:
 http://r.789695.n4.nabble.com/difference-in-sort-order-linux-Windows-R-2-11-0-tp2234251p2234251.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] clustering in R

2010-05-28 Thread Joris Meys
As Tal said.

Next to that, I read that column1 (and column2?) are supposed to be seen as
factors, not as numerical variables. Did you take that into account somehow?

It's easy to reproduce the error code :
 n - NULL
 if(n2)print(This is OK)
Error in if (n  2) print(This is OK) : argument is of length zero

In the hclust code, you find following line :
n - as.integer(attr(d, Size))
where d is the distance object entered in the hclust function. Looking at
the error you get, this means that the size attribute of your distance is
NULL. Which tells me that distA is not a dist-object.

 A - matrix(1:4,ncol=2)
 A
 [,1] [,2]
[1,]13
[2,]24
 hclust(A,method=single)
Error in if (n  2) stop(must have n = 2 objects to cluster) :
  argument is of length zero

Did you actually put in a distance object? see also ?dist or ?as.dist.

Cheers
Joris




On Fri, May 28, 2010 at 1:41 AM, Ayesha Khan ayesha.diamond...@gmail.comwrote:

 i have a matrix with the following dimensions
 136   3

 and it looks something like

 [,1] [,2] [,3]
  [1,]  402  675 1.802758
  [2,]  402  696 1.938902
  [3,]  402  699 1.994253
  [4,]  402  945 1.898619
  [5,]  424  470 1.812857
  [6,]  424  905 1.816345
  [7,]  470  905 1.871252
  [8,]  504  780 1.958191
  [9,]  504  848 1.997111...

 
 so you get the idea. I want to group similar items in one group/cluster
 following the friends of friends approach. I tried doing

 distclust - hclust(distA,method=single)
 However, I got the following error.

 Error in if (n  2) stop(must have n = 2 objects to cluster) :  argument
 is of length zero
 which probably means there's something wrong with my input here. Is there
 another way of doing this kind of clustering without getting into all the
  looping and ifelse etc. Basically, if 402 is close to 675,696,and699 and
 thus fall in cluster A then all items close to 675,696,and 699 should also
 fall into the same cluster A following a friends of friedns strategy.
 Any help would be highly appreciated.

 --
 Ayesha Khan

 MS Bioengineering
 Dept. of Bioengineering
 Rice University, TX

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handing significance digits

2010-05-28 Thread Joris Meys
Hi Christofer,

I don't know what .Net is doing, but for R these globals are dependent on
your machine and platform.
?.Machine
?.Platform

Don't know if you can actually hack R into believing otherwise.

Did you consider the possibility that the underlying algorithms differ
between .Net and R?
Cheers
Joris

On Fri, May 28, 2010 at 12:51 PM, Christofer Bogaso 
bogaso.christo...@gmail.com wrote:

 Hi folks, recently I was trying evaluation of some complex function having
 exactly same starting values as well as same algorithm in both R and .Net
 environment. However at the end point I notice that there are some
 differences in the reported figures from those two applications (as much as
 0.10%). I feel this is basically due to consideration of different
 significance digits in handling floating point numbers between R and .Net.
 Therefore I want to fix the number of digits that should be there after .
 in each and every calculations in R. For example suppose I am multiplying
 two numbers : 18.456 and 20.345. Ideally it should come as 375.48732.
 However I want R to consider only 2 significant digits i.e. 18.46  20.35
 and reports 375.66 and should consider this trimmed value for subsequent
 calculations.It would be good if there is any possibility to define such
 behavior once at the beginning of my R-session.

 Is there any way to do that?

 Thanks,

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to get values out of a string using regular expressions?

2010-05-28 Thread Joris Meys
Dear all,

I have a vector of filenames which begins like this :
X - c(OrthoP1_DNA_str.aln, OrthoP10_DNA_str.aln,
OrthoP100_DNA_str.aln,
OrthoP101_DNA_str.aln, OrthoP102_DNA_str.aln, OrthoP103_DNA_str.aln,
OrthoP104_DNA_str.aln, OrthoP105_DNA_str.aln, OrthoP106_DNA_str.aln,
OrthoP107_DNA_str.aln)

using
grep((\\d+),X,perl=T,value=T)

I get the complete values back. Yet, I want a vector :

c(1,10,100,101,102,103,104,105,106,107)

In Perl, using the brackets allows for extracting only the numbers (using a
construct with $1 for those who know Perl).

I want to do the same in R, but can't find a way of doing that without
extensive string manipulations. Problem is that the length of the numbers
differ, so I can't use substr.
I tried
 strsplit(X,\\d+)
[[1]]
[1] OrthoP   _DNA_str.aln
which gives me exactly what I want to throw away. So :
 strsplit(X,\\D+)
[[1]]
[1]   1

[[2]]
[1]10
gives something I can use, but it still requires a lot of list manipulation
afterwards to get the right vector. Is there an option or a function I'm
missing somewhere?

Cheers
Joris

-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova post hoc tests

2010-05-28 Thread Joris Meys
See :
http://www.statmethods.net/stats/anova.html
?TukeyHSD

Cheers
Joris

On Fri, May 28, 2010 at 2:11 PM, Iasonas Lamprianou lampria...@yahoo.comwrote:

 Hi everybody

 does anyone know how I can run ANOVA post-hoc tests using R commander or R
 in general?

 Thank you


 Dr. Iasonas Lamprianou




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear Discriminant Analysis in R

2010-05-28 Thread Joris Meys
Could you provide us with data to test the code? use dput (and limit the
size!)

eg:
dput(vowel_features)
dput(mask_features)

Without this information, it's impossible to say what's going wrong. It
looks like you're doing something wrong in the selection. What should
vowel_features[15] return? Did you check it's actually what you want? Did
you use str(G) to check the type?

Cheers
Joris

On Thu, May 27, 2010 at 5:28 PM, cobbler_squad la.f...@gmail.com wrote:


 Joris,

 You are a life saver. Based on two sample files above, I think lda should
 go
 something like this:

 vowel_features - read.table(file = mappings_for_vowels.txt)
 mask_features - data.frame(as.matrix(read.table(file =
 3dmaskdump_ICA_37_Combined.txt)))
 G - vowel_features[15]

 cvc_lda - lda(G~ vowel_features[15], data=mask_features,
 na.action=na.omit, CV=TRUE)

 ERROR: Error in model.frame.default(formula = G ~ vowel_features[15], data
 =
 mask_features,  :
  invalid type (list) for variable 'G'

 I am clearly doing something wrong declaring G (how should I declare
 grouping in R when I need to use one column from vowel_feature file)? Sorry
 for stupid questions and thank you for being so helpful!

 -
 again, sample files that I am working with:

 mappings_for_vowels.txt:

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
 V21 V22 V23 V24 V25 V26
 1E  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   0   1   0
 0   0   0   0   0   0
 2o  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0   1   0   1   0
 1   0   1   0   0   0
 3I  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   1   0   0
 0   0   0   0   0   0
 4^  0  0  0  0  0  0  0  0   0   0   0   0   1   0   1   0   0   1   0
 0   0   0   0   0   0
 5@  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0   1   0   0   1
 0   0   0   0   0   0

 and the mask_features file is:

  V42  V43  V44  V45  V46
 V47  V48  V49
  [1,]  2.890891625  2.881188521  2.88778 -2.882606612 -2.77341
 2.879834384  2.886483229  2.883815864
  [2,]  2.763404707  2.756198683  2.761863881 -2.756827983 -2.762268531
 2.754305072  2.760017050  2.758399799
  [3,]  0.556614506  0.556377530  0.556247414 -0.556300910 -0.556098321
 0.557495060  0.557383073  0.556867424
  [4,]  0.367065248  0.366962036  0.366870087 -0.366794442 -0.366644148
 0.366613343  0.366537320  0.366953464
  [5,]  0.423692393  0.421835623  0.421741829 -0.421897460 -0.421659824
 0.421567705  0.421465738  0.422407838

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p223.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to get values out of a string using regular expressions?

2010-05-28 Thread Joris Meys
Bingo! Thx Gabor.

Thank you too Tal, I looked briefly at the package and it looks like a nice
interface. I keep it in mind for later.

Cheers
Joris

On Fri, May 28, 2010 at 2:25 PM, Gabor Grothendieck ggrothendi...@gmail.com
 wrote:

 Try this:

 as.numeric(gsub(\\D, , X))

 On Fri, May 28, 2010 at 8:21 AM, Joris Meys jorism...@gmail.com wrote:
  Dear all,
 
  I have a vector of filenames which begins like this :
  X - c(OrthoP1_DNA_str.aln, OrthoP10_DNA_str.aln,
  OrthoP100_DNA_str.aln,
  OrthoP101_DNA_str.aln, OrthoP102_DNA_str.aln,
 OrthoP103_DNA_str.aln,
  OrthoP104_DNA_str.aln, OrthoP105_DNA_str.aln,
 OrthoP106_DNA_str.aln,
  OrthoP107_DNA_str.aln)
 
  using
  grep((\\d+),X,perl=T,value=T)
 
  I get the complete values back. Yet, I want a vector :
 
  c(1,10,100,101,102,103,104,105,106,107)
 
  In Perl, using the brackets allows for extracting only the numbers (using
 a
  construct with $1 for those who know Perl).
 
  I want to do the same in R, but can't find a way of doing that without
  extensive string manipulations. Problem is that the length of the numbers
  differ, so I can't use substr.
  I tried
  strsplit(X,\\d+)
  [[1]]
  [1] OrthoP   _DNA_str.aln
  which gives me exactly what I want to throw away. So :
  strsplit(X,\\D+)
  [[1]]
  [1]   1
 
  [[2]]
  [1]10
  gives something I can use, but it still requires a lot of list
 manipulation
  afterwards to get the right vector. Is there an option or a function I'm
  missing somewhere?
 
  Cheers
  Joris
 
  --
  Joris Meys
  Statistical Consultant
 
  Ghent University
  Faculty of Bioscience Engineering
  Department of Applied mathematics, biometrics and process control
 
  Coupure Links 653
  B-9000 Gent
 
  tel : +32 9 264 59 87
  joris.m...@ugent.be
  ---
  Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix interesting question!

2010-05-28 Thread Joris Meys
Provide a minimal example to start with. This sounds more like voodoo than
anything else.
Cheers
Joris

On Fri, May 28, 2010 at 6:30 PM, UM usman.muni...@imperial.ac.uk wrote:


 hi,
 I have been trying to do this in R (have implemented it in Excel) but I
 have
 been using a very inefficent way (loops etc.). I have matrix A (columns are
 years and ages are rows)  and matrix B (columns are birth yrs and rows are
 ages)


 I would like to first turn matrix A into matrix B

 And then I would like to convert matrix B back again to the original matrix
 A. (I have left out details of steps) but this is the gist of what I want
 to
 do. Can anyone please give any insights?


 Thanks









 http://r.789695.n4.nabble.com/file/n2234852/untitled.bmp

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Matrix-interesting-question-tp2234852p2234852.html
 Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] leave-one-out cross validation

2010-05-28 Thread Joris Meys
see ?cv.glm under the heading Value. The help files tell you what comes
out.


On Fri, May 28, 2010 at 10:19 PM, azam jaafari azamjaaf...@yahoo.comwrote:

 Hi


 Finally, I did leave-one-out cross validation in R for prediction error of
 logistic regression by cv.glm. But I don't know what are the produced
 data(almost 700)? does delta show me error estimation?


 cost-function(a,b)mean(abs(a-b))
 #SALIC=binary response
 salic.lr-glm(profilesample$SALIC~profilesample$wetnessindex ,
 profilesample, family=binomial('logit'))
 loadpackage(boot)
  cv.err-cv.glm(profilesample, salic.lr, cost, K=100)
  cv.err

 $call
 cv.glm(data = profilesample, glmfit = salic.lr, cost = cost,
 K = 100)

 $K
 [1] 100

 $delta
 1 1
 0.4278 0.4278

 $seed
 [1] 403 133 1654269195 -1877109783 -961256264 1403523942
 [7] 124639233 261424787 1836448066 1034917620 -13630729 468718317
 [13] 1694379396 1559298986 1935866133 -1450855505 2105396150 1802260960
 [19] 1077391651 539731521 122505520 230898510 -1940184647 1223031755
 [25] -1597886342 -1854140036 -1783225921 1484611221 1365746860 -346485118
 [31] 1206044253 1201793367 956757054 350214264 -1324711077
 .
 .
 .
 please help me

 Thanks alot

 --- On Wed, 5/26/10, Joris Meys jorism...@gmail.com wrote:


 From: Joris Meys jorism...@gmail.com
 Subject: Re: [R] validation logistic regression
 To: azam jaafari azamjaaf...@yahoo.com
 Cc: r-help@r-project.org
 Date: Wednesday, May 26, 2010, 5:00 AM


 Hi,

 first of all, you shouldn't backtransform your prediction, use the option
 type=response instead :

 salichpred-predict(salic.lr, newdata=profilevalidation,type=response)

 limit - 0.5
 salichpredcat - ifelse(salichpredlimit,0,1) # prediction of categories.

 Read in on sensitivity, specificity and ROC-curves. With changing the
 limit, you can calculate sensitivity and specificity, and you can construct
 a ROC curve that will tell you how well your predictions are. It all depends
 on how much error you allow on the predictions.

 Cheers
 Joris



 On Wed, May 26, 2010 at 10:04 AM, azam jaafari azamjaaf...@yahoo.com
 wrote:

 Hi

 I did validation for prediction by logistic regression according to
 following:

 validationsize - 23
 set.seed(1)
 random-runif(123)
 order(random)
 nrprofilesinsample-sort(order(random)[1:100])
 profilesample - data[nrprofilesinsample,]
 profilevalidation - data[-nrprofilesinsample,]
 salich-profilesample$SALIC.H.1
 salic.lr-glm(salich~wetnessindex, profilesample,
 family=binomial('logit'))
 summary(salic.lr)
 salichpred-predict(salic.lr, newdata=profilevalidation)
 expsalichpred-exp(salichpred)
 salichprediction-(expsalichpred/(1+expsalichpred))

 So,
  table(salichprediction, profilevalidation$SALIC.H.1)

 in result:
 salichprediction0 1
   0.0408806327422231 1 0
   0.094509645033899  1 0
   0.118665480273383  1 0
   0.129685441514168  1 0
   0.135452955695111 0
   0.137580612201769  1 0
   0.197265822234215  1 0
   0.199278585548248  0 1
   0.202436276322278  1 0
   0.211278767985746  1 0
   0.261036846823867  1 0
   0.283792703256058  1 0
   0.362229486187581  0 1
   0.362795636267779  1 0
   0.409067386115694  1 0
   0.410860613509484  0 1
   0.423960962956254  1 0
   0.428164288793652  1 0
   0.448509687866763  0 1
   0.538401659478058  0 1
   0.557282539294224  1 0
   0.603881788227797  0 1
   0.63633478460736   0 1

 So, I have salichprediction between 0 to 1 and binary variable(observed
 values) 0 or 1. I want to compare these data together and I want to know is
 ok this model(logistic regression) for prediction or no?

 please help me?

 Thanks alot

 Azam




[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted

Re: [R] clustering in R

2010-05-28 Thread Joris Meys
errr, forget about the output of dput(q), but keep it in mind for next time.

f = dist(t(q))
hclust(f,method=single)

it's as simple as that.
Cheers
Joris

On Fri, May 28, 2010 at 10:39 PM, Ayesha Khan
ayesha.diamond...@gmail.comwrote:

 v - dput(x,sampledata.txt)
 dim(v)
 q - v[1:10,1:10]
 f =as.matrix(dist(t(q)))

 distB=NULL
 for(k in 1:(nrow(f)-1)) for( m in (k+1):ncol(f)) {
 if(f[k,m] 2) distB=rbind(distB,c(k,m,f[k,m]))
 }
 #now distB looks like this

  distB
   [,1] [,2]  [,3]
  [1,]12  1.6275568
  [2,]13  0.5252058
  [3,]14  0.7323116
  [4,]15  1 .9966001
  [5,]16  1.6664110
  [6,]17  1.0800540
  [7,]18  1.8698925
  [8,]1   10  0.5161808
  [9,]23  1.7325811
 [10,]25  0.8267843
 [11,]26  0.5963280
 [12,]27  0.8787230

 #now from this output i want to cluster all 1's, friedns of 1 and friends
 of friends of 1 in one cluster. The same goes for 2,3 and so on
 But when i do that using hclust, i get the following error. I think what I
 need to do is convert my cureent matrix somehow into a format that would be
 accepted by the hclust function but I dont know how to achieve that.
  distclust - hclust(distB,method=single)

 Error in if (n  2) stop(must have n = 2 objects to cluster) :
   argument is of length zero

 P.S: Please let me know if this makes things more clear? cuz i dont know
 how looking at the original data set would help becuase the matrix under
 consdieration right now is the distance matrix and how it can be altered. I
 have tried as.dist, doesnt work because my matrix as i mentioned eralier is
 not a square matrix.
 On Fri, May 28, 2010 at 2:37 PM, Tal Galili tal.gal...@gmail.com wrote:

 Hi Ayesha,
 I wish to help you, but without a simple self contained example that shows
 your issue, I will not be able to help.
 Try using the ?dput command to create some simple data, and let us see
 what you are doing.

 Best,
 Tal
 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)

 --




   On Fri, May 28, 2010 at 9:04 PM, Ayesha Khan 
 ayesha.diamond...@gmail.com wrote:

 Thanks Tal  Joris!
 I created my distance matrix distA by using the dist() function in R
 manipulating my output in order to get a matrix.
 distA =as.matrix(dist(t(x2))) # x2 being my original dataset
 as according to the documentaion on dist()

 For the default method, a dist object, or a matrix (of distances) or
 an object which can be coerced to such a matrix using as.matrix()

   On Fri, May 28, 2010 at 6:34 AM, Joris Meys jorism...@gmail.comwrote:

 As Tal said.

 Next to that, I read that column1 (and column2?) are supposed to be seen
 as factors, not as numerical variables. Did you take that into account
 somehow?

 It's easy to reproduce the error code :
  n - NULL
  if(n2)print(This is OK)
 Error in if (n  2) print(This is OK) : argument is of length zero

 In the hclust code, you find following line :
 n - as.integer(attr(d, Size))
 where d is the distance object entered in the hclust function. Looking
 at the error you get, this means that the size attribute of your distance 
 is
 NULL. Which tells me that distA is not a dist-object.

  A - matrix(1:4,ncol=2)
  A
  [,1] [,2]
 [1,]13
 [2,]24
  hclust(A,method=single)

 Error in if (n  2) stop(must have n = 2 objects to cluster) :
   argument is of length zero

 Did you actually put in a distance object? see also ?dist or ?as.dist.

 Cheers
 Joris




  On Fri, May 28, 2010 at 1:41 AM, Ayesha Khan 
 ayesha.diamond...@gmail.com wrote:

  i have a matrix with the following dimensions
 136   3

 and it looks something like

 [,1] [,2] [,3]
  [1,]  402  675 1.802758
  [2,]  402  696 1.938902
  [3,]  402  699 1.994253
  [4,]  402  945 1.898619
  [5,]  424  470 1.812857
  [6,]  424  905 1.816345
  [7,]  470  905 1.871252
  [8,]  504  780 1.958191
  [9,]  504  848 1.997111...

 
 so you get the idea. I want to group similar items in one group/cluster
 following the friends of friends approach. I tried doing

 distclust - hclust(distA,method=single)
 However, I got the following error.

 Error in if (n  2) stop(must have n = 2 objects to cluster) :
  argument
 is of length zero
 which probably means there's something wrong with my input here. Is
 there
 another way of doing this kind of clustering without getting into all
 the
  looping and ifelse etc. Basically, if 402 is close to 675,696,and699
 and
 thus fall in cluster A then all items close to 675,696,and 699 should
 also
 fall into the same cluster A following a friends of friedns strategy.
 Any help would be highly

Re: [R] clustering in R

2010-05-28 Thread Joris Meys
I can't run your code.
Please, just give me whatever comes on your screen when you run:
dput(q)


On Fri, May 28, 2010 at 10:57 PM, Ayesha Khan
ayesha.diamond...@gmail.comwrote:

 I assume my matrix should look something like this?..

 round(distance, 4)
P00A   P00B   M02A   M02B   P04A   P04B   M06A   M06B   P08A   P08B   
 M10A
 P00B 0.9678
 M02A 1.0054 1.0349
 M02B 1.0258 1.0052 1.2106
 P04A 1.0247 0.9928 1.0145 0.9260
 P04B 0.9898 0.9769 0.9875 0.9855 0.6075
 M06A 1.0159 0.9893 1.0175 0.9521 0.9266 0.9660
 M06B 0.9837 0.9912 1.0124 1.0402 1.0272 1.0367 1.5693
 P08A 1.0279 1.0303 0.9865 0.9748 1.0184 1.0452 0.9799 1.0400
 P08B 1.0248 1.0299 0.9717 0.9673 1.0048 1.0329 1.0280 0.9907 0.2158
 M10A 0.9850 0.9603 1.0246 0.9708 1.0231 0.9771 0.9916 1.0168 0.9722 0.9525
 M10B 1.0150 1.0397 0.9754 1.0292 0.9769 1.0229 1.0084 0.9832 1.0278 1.0475 
 2.



 On Fri, May 28, 2010 at 3:39 PM, Ayesha Khan 
 ayesha.diamond...@gmail.comwrote:

 v - dput(x,sampledata.txt)
 dim(v)
 q - v[1:10,1:10]
 f =as.matrix(dist(t(q)))

 distB=NULL
 for(k in 1:(nrow(f)-1)) for( m in (k+1):ncol(f)) {
 if(f[k,m] 2) distB=rbind(distB,c(k,m,f[k,m]))
 }
 #now distB looks like this

  distB
   [,1] [,2]  [,3]
  [1,]12  1.6275568
  [2,]13  0.5252058
  [3,]14  0.7323116
  [4,]15  1 .9966001
  [5,]16  1.6664110
  [6,]17  1.0800540
  [7,]18  1.8698925
  [8,]1   10  0.5161808
  [9,]23  1.7325811
 [10,]25  0.8267843
 [11,]26  0.5963280
 [12,]27  0.8787230

 #now from this output i want to cluster all 1's, friedns of 1 and friends
 of friends of 1 in one cluster. The same goes for 2,3 and so on
 But when i do that using hclust, i get the following error. I think what I
 need to do is convert my cureent matrix somehow into a format that would be
 accepted by the hclust function but I dont know how to achieve that.
  distclust - hclust(distB,method=single)

 Error in if (n  2) stop(must have n = 2 objects to cluster) :
   argument is of length zero

 P.S: Please let me know if this makes things more clear? cuz i dont know
 how looking at the original data set would help becuase the matrix under
 consdieration right now is the distance matrix and how it can be altered. I
 have tried as.dist, doesnt work because my matrix as i mentioned eralier is
 not a square matrix.

   On Fri, May 28, 2010 at 2:37 PM, Tal Galili tal.gal...@gmail.comwrote:

 Hi Ayesha,
 I wish to help you, but without a simple self contained example that
 shows your issue, I will not be able to help.
 Try using the ?dput command to create some simple data, and let us see
 what you are doing.

 Best,
 Tal
 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)

 --




   On Fri, May 28, 2010 at 9:04 PM, Ayesha Khan 
 ayesha.diamond...@gmail.com wrote:

 Thanks Tal  Joris!
 I created my distance matrix distA by using the dist() function in R
 manipulating my output in order to get a matrix.
 distA =as.matrix(dist(t(x2))) # x2 being my original dataset
 as according to the documentaion on dist()

 For the default method, a dist object, or a matrix (of distances) or
 an object which can be coerced to such a matrix using as.matrix()

   On Fri, May 28, 2010 at 6:34 AM, Joris Meys jorism...@gmail.comwrote:

 As Tal said.

 Next to that, I read that column1 (and column2?) are supposed to be
 seen as factors, not as numerical variables. Did you take that into 
 account
 somehow?

 It's easy to reproduce the error code :
  n - NULL
  if(n2)print(This is OK)
 Error in if (n  2) print(This is OK) : argument is of length zero

 In the hclust code, you find following line :
 n - as.integer(attr(d, Size))
 where d is the distance object entered in the hclust function. Looking
 at the error you get, this means that the size attribute of your distance 
 is
 NULL. Which tells me that distA is not a dist-object.

  A - matrix(1:4,ncol=2)
  A
  [,1] [,2]
 [1,]13
 [2,]24
  hclust(A,method=single)

 Error in if (n  2) stop(must have n = 2 objects to cluster) :
   argument is of length zero

 Did you actually put in a distance object? see also ?dist or ?as.dist.

 Cheers
 Joris




  On Fri, May 28, 2010 at 1:41 AM, Ayesha Khan 
 ayesha.diamond...@gmail.com wrote:

  i have a matrix with the following dimensions
 136   3

 and it looks something like

 [,1] [,2] [,3]
  [1,]  402  675 1.802758
  [2,]  402  696 1.938902
  [3,]  402  699 1.994253
  [4,]  402  945 1.898619
  [5,]  424  470 1.812857
  [6,]  424  905 1.816345
  [7,]  470  905 1.871252
  [8,]  504  780 1.958191
  [9,]  504  848 1.997111

Re: [R] clustering in R

2010-05-28 Thread Joris Meys
Ah OK, I didn't get your question then.

a dist-object is actually a vector of numbers with a couple of attributes.
You can't just cut out values like that. The hclust function needs a perfect
distance matrix to use the calculations.

shortcut is easy : just do f - f/2*max(f), and all values are below 2.

Otherwise this function could do that for you :

to.dist - function(x){
x.names - sort(unique(c(x[[1]],x[[2]])))
n - length(x.names)
x.dist - matrix(0,n,n)
dimnames(x.dist) - list(x.names,x.names)
x.ind - rbind(cbind(match(x[[1]], x.names), match(x[[2]], x.names)),
cbind(match(x[[2]], x.names), match(x[[1]], x.names)))
x.dist[x.ind] - rep(x[[3]], 2)
x.dist - as.dist(x.dist)
return(x.dist)
}

 d - to.dist(distB)
 hclust(d)


Cheers
Joris



On Sat, May 29, 2010 at 12:04 AM, Ayesha Khan
ayesha.diamond...@gmail.comwrote:

 Yes Joris. I did try that and it does produce the results. I am now
 wondering why I wanted a matrix like structure in the first place. However,
 I do want 'f' to contain values less than 2 only. but when i try to get rid
 of values greater than 2 by doing N - (f[f2], f strcuture disrupts and
 hclust doesnt want to recognize it anyore again. Because obviously the data
 frame changes again with that. Any ideas on how to do that?


 On Fri, May 28, 2010 at 4:13 PM, Joris Meys jorism...@gmail.com wrote:

 errr, forget about the output of dput(q), but keep it in mind for next
 time.

 f = dist(t(q))
 hclust(f,method=single)

 it's as simple as that.
 Cheers
 Joris


 On Fri, May 28, 2010 at 10:39 PM, Ayesha Khan 
 ayesha.diamond...@gmail.com wrote:

 v - dput(x,sampledata.txt)
 dim(v)
 q - v[1:10,1:10]
 f =as.matrix(dist(t(q)))

 distB=NULL
 for(k in 1:(nrow(f)-1)) for( m in (k+1):ncol(f)) {
 if(f[k,m] 2) distB=rbind(distB,c(k,m,f[k,m]))
 }
 #now distB looks like this

  distB
   [,1] [,2]  [,3]
  [1,]12  1.6275568
  [2,]13  0.5252058
  [3,]14  0.7323116
  [4,]15  1 .9966001
  [5,]16  1.6664110
  [6,]17  1.0800540
  [7,]18  1.8698925
  [8,]1   10  0.5161808
  [9,]23  1.7325811
 [10,]25  0.8267843
 [11,]26  0.5963280
 [12,]27  0.8787230

 #now from this output i want to cluster all 1's, friedns of 1 and
 friends of friends of 1 in one cluster. The same goes for 2,3 and so on
 But when i do that using hclust, i get the following error. I think what
 I need to do is convert my cureent matrix somehow into a format that would
 be accepted by the hclust function but I dont know how to achieve that.
  distclust - hclust(distB,method=single)

 Error in if (n  2) stop(must have n = 2 objects to cluster) :
   argument is of length zero

 P.S: Please let me know if this makes things more clear? cuz i dont know
 how looking at the original data set would help becuase the matrix under
 consdieration right now is the distance matrix and how it can be altered. I
 have tried as.dist, doesnt work because my matrix as i mentioned eralier is
 not a square matrix.
  On Fri, May 28, 2010 at 2:37 PM, Tal Galili tal.gal...@gmail.comwrote:

 Hi Ayesha,
 I wish to help you, but without a simple self contained example that
 shows your issue, I will not be able to help.
 Try using the ?dput command to create some simple data, and let us see
 what you are doing.

 Best,
 Tal
 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
 | www.r-statistics.com (English)

 --




   On Fri, May 28, 2010 at 9:04 PM, Ayesha Khan 
 ayesha.diamond...@gmail.com wrote:

 Thanks Tal  Joris!
 I created my distance matrix distA by using the dist() function in R
 manipulating my output in order to get a matrix.
 distA =as.matrix(dist(t(x2))) # x2 being my original dataset
 as according to the documentaion on dist()

 For the default method, a dist object, or a matrix (of distances) or
 an object which can be coerced to such a matrix using as.matrix()

   On Fri, May 28, 2010 at 6:34 AM, Joris Meys jorism...@gmail.comwrote:

 As Tal said.

 Next to that, I read that column1 (and column2?) are supposed to be
 seen as factors, not as numerical variables. Did you take that into 
 account
 somehow?

 It's easy to reproduce the error code :
  n - NULL
  if(n2)print(This is OK)
 Error in if (n  2) print(This is OK) : argument is of length zero

 In the hclust code, you find following line :
 n - as.integer(attr(d, Size))
 where d is the distance object entered in the hclust function. Looking
 at the error you get, this means that the size attribute of your 
 distance is
 NULL. Which tells me that distA is not a dist-object.

  A - matrix(1:4,ncol=2)
  A
  [,1] [,2]
 [1,]13
 [2,]24
  hclust(A,method=single)

 Error in if (n  2) stop(must have n = 2 objects to cluster

Re: [R] data frame manipulation change elements meeting criteria

2010-05-27 Thread Joris Meys
The loop is due to the switch statement, not the condition. Without
condition it would become:

for (i in 1:length(Y)){
new.vect[i]-switch(
  EXPR = X[i],
  Sell=Buy,
  Buy=Sell,
  X[i])
}
You can make an sapply construct too off course :

new.vect - sapply(X[which(Y==DEL)],switch,Sell=Buy,Buy=Sell)

This will speed up things a little bit, but the effect is marginal.
Cheers
Joris

On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury arnaud.gabo...@gmail.comwrote:

 Thank you for the answer.
 Is there any way to combine if() and switch() in one line? In my case,
 something like :

 if(trade$Trade.Status==DEL)switch(.)

 I would like to avoid the loop .



 From: Joris Meys [mailto:jorism...@gmail.com]
 Sent: Wednesday, May 26, 2010 9:15 PM
 To: arnaud Gaboury
 Cc: r-help@r-project.org
 Subject: Re: [R] data frame manipulation change elements meeting criteria

 see ?switch

 X- rep(c(Buy,Sell,something else),each=5)
 Y- rep(c(DEL,INS,DEL),5)


 new.vect - X
 for (i in which(Y==DEL)){
 new.vect[i]-switch(
   EXPR = X[i],
   Sell=Buy,
   Buy=Sell,
   X[i])
 }
 cbind(new.vect,X,Y)
 On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury arnaud.gabo...@gmail.com
 wrote:
 Dear group,

 Here is my df :

 trade -
 structure(list(Trade.Status = c(DEL, INS, INS), Instrument.Long.Name=
 c(SUGAR NO.11,
 CORN, CORN), Delivery.Prompt.Date = c(Jul/10, Jul/10,
 Jul/10), Buy.Sell..Cleared. = c(Sell, Buy, Buy), Volume = c(1L,
 2L, 1L), Price = c(15.2500, 368., 368.5000), Net.Charges..sum. =
 c(4.01,
 -8.64, -4.32)), .Names = c(Trade.Status, Instrument.Long.Name,
 Delivery.Prompt.Date, Buy.Sell..Cleared., Volume, Price,
 Net.Charges..sum.), row.names = c(NA, 3L), class = data.frame)

 Here is what I want :

 If trade$Trade.Status==DEL: then if trade$buy.Sell..Cleared==Sell ,
 change
 it to Buy, if trade$buy.Sell..Cleared==Buy, change it to Sell.
 If trade$Trade.Status==INS, do nothing
 I tried to work around with ifelse, but don't know how to deal with so many
 conditions.

 Any help is appreciated.

 TY

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame manipulation change elements meeting criteria

2010-05-27 Thread Joris Meys
Off course. You put in a matrix to sapply, but sapply is for vectors. You
want to apply the switch command on every entry of the vector
trades$Buy.Sell..Cleared for which trades$Trade.Status equals DEL. Why do
you try to put in a matrix with all variables for the observations where
status is DEL?

You should have done :

tradesnew-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status==DEL)],
 switch,Sell=Buy,Buy=Sell)

Check the help files, and keep track of what goes in and out a function.

Cheers
Joris

On Thu, May 27, 2010 at 9:41 AM, arnaud Gaboury arnaud.gabo...@gmail.comwrote:

 Joris,

 If i pass this line :

 tradesnew-sapply(trades[which(trades$Trade.Status==DEL),],switch,Sel
 l=Buy,Buy=Sell)

 Here is what I get :

  tradesnew
 $Trade.Status
 NULL

 $Instrument.Long.Name
 NULL

 $Delivery.Prompt.Date
 NULL

 $Buy.Sell..Cleared.
 [1] Buy

 $Volume
 [1] Buy

 $Price
 NULL

 $Net.Charges..sum.
 NULL

 That's certainly not what I want.




 From: Joris Meys [mailto:jorism...@gmail.com]
 Sent: Thursday, May 27, 2010 8:43 AM
 To: arnaud Gaboury
 Cc: r-help@r-project.org
 Subject: Re: [R] data frame manipulation change elements meeting criteria

 The loop is due to the switch statement, not the condition. Without
 condition it would become:

 for (i in 1:length(Y)){
 new.vect[i]-switch(
   EXPR = X[i],
   Sell=Buy,
   Buy=Sell,
   X[i])
 }
 You can make an sapply construct too off course :

 new.vect - sapply(X[which(Y==DEL)],switch,Sell=Buy,Buy=Sell)

 This will speed up things a little bit, but the effect is marginal.
 Cheers
 Joris
 On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury arnaud.gabo...@gmail.com
 wrote:
 Thank you for the answer.
 Is there any way to combine if() and switch() in one line? In my case,
 something like :

 if(trade$Trade.Status==DEL)switch(.)

 I would like to avoid the loop .



 From: Joris Meys [mailto:jorism...@gmail.com]
 Sent: Wednesday, May 26, 2010 9:15 PM
 To: arnaud Gaboury
 Cc: r-help@r-project.org
 Subject: Re: [R] data frame manipulation change elements meeting criteria

 see ?switch

 X- rep(c(Buy,Sell,something else),each=5)
 Y- rep(c(DEL,INS,DEL),5)


 new.vect - X
 for (i in which(Y==DEL)){
 new.vect[i]-switch(
   EXPR = X[i],
   Sell=Buy,
   Buy=Sell,
   X[i])
 }
 cbind(new.vect,X,Y)
 On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury arnaud.gabo...@gmail.com
 wrote:
 Dear group,

 Here is my df :

 trade -
 structure(list(Trade.Status = c(DEL, INS, INS), Instrument.Long.Name=
 c(SUGAR NO.11,
 CORN, CORN), Delivery.Prompt.Date = c(Jul/10, Jul/10,
 Jul/10), Buy.Sell..Cleared. = c(Sell, Buy, Buy), Volume = c(1L,
 2L, 1L), Price = c(15.2500, 368., 368.5000), Net.Charges..sum. =
 c(4.01,
 -8.64, -4.32)), .Names = c(Trade.Status, Instrument.Long.Name,
 Delivery.Prompt.Date, Buy.Sell..Cleared., Volume, Price,
 Net.Charges..sum.), row.names = c(NA, 3L), class = data.frame)

 Here is what I want :

 If trade$Trade.Status==DEL: then if trade$buy.Sell..Cleared==Sell ,
 change
 it to Buy, if trade$buy.Sell..Cleared==Buy, change it to Sell.
 If trade$Trade.Status==INS, do nothing
 I tried to work around with ifelse, but don't know how to deal with so many
 conditions.

 Any help is appreciated.

 TY

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php



 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame manipulation change elements meeting criteria

2010-05-27 Thread Joris Meys
Ah, OK. sapply -evidently- only gives an output for every case that goes in.
Which is only one, as there is only one DEL case. You can use that output to
change the corresponding value in the dataframe, like :

tradenews - trades
tradenews$Buy.Sell..Cleared.[which(trades$Trade.Status==DEL)] -
 sapply(trades$Buy.Sell..Cleared.[which(trades$Trade.Status==DEL)],
switch,Sell=Buy,Buy=Sell)

Also take a look at these help files and the examples mentioned in there.
?switch
?sapply
?which

And please, give your variables some decent names. All those points make
your code very error-prone.

Cheers
Joris

On Thu, May 27, 2010 at 10:47 AM, arnaud Gaboury
arnaud.gabo...@gmail.comwrote:

 Sorry Joris, but I am totally lost on this issue!!



 tradenews-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status==DEL
 )],switch,Sell=Buy,Buy=Sell)

  tradenews
  Sell
 Buy

 Not really what I want !!

 From: Joris Meys [mailto:jorism...@gmail.com]
 Sent: Thursday, May 27, 2010 10:38 AM
 To: arnaud Gaboury
 Cc: r-help@r-project.org
 Subject: Re: [R] data frame manipulation change elements meeting criteria

 Off course. You put in a matrix to sapply, but sapply is for vectors. You
 want to apply the switch command on every entry of the vector
 trades$Buy.Sell..Cleared for which trades$Trade.Status equals DEL. Why do
 you try to put in a matrix with all variables for the observations where
 status is DEL?

 You should have done :


 tradesnew-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status==DEL)
 ],
  switch,Sell=Buy,Buy=Sell)

 Check the help files, and keep track of what goes in and out a function.

 Cheers
 Joris
 On Thu, May 27, 2010 at 9:41 AM, arnaud Gaboury arnaud.gabo...@gmail.com
 wrote:
 Joris,

 If i pass this line :

 tradesnew-sapply(trades[which(trades$Trade.Status==DEL),],switch,Sel
 l=Buy,Buy=Sell)

 Here is what I get :

  tradesnew
 $Trade.Status
 NULL

 $Instrument.Long.Name
 NULL

 $Delivery.Prompt.Date
 NULL

 $Buy.Sell..Cleared.
 [1] Buy

 $Volume
 [1] Buy

 $Price
 NULL

 $Net.Charges..sum.
 NULL

 That's certainly not what I want.




 From: Joris Meys [mailto:jorism...@gmail.com]
 Sent: Thursday, May 27, 2010 8:43 AM
 To: arnaud Gaboury
 Cc: r-help@r-project.org
 Subject: Re: [R] data frame manipulation change elements meeting criteria

 The loop is due to the switch statement, not the condition. Without
 condition it would become:

 for (i in 1:length(Y)){
 new.vect[i]-switch(
   EXPR = X[i],
   Sell=Buy,
   Buy=Sell,
   X[i])
 }
 You can make an sapply construct too off course :

 new.vect - sapply(X[which(Y==DEL)],switch,Sell=Buy,Buy=Sell)

 This will speed up things a little bit, but the effect is marginal.
 Cheers
 Joris
 On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury arnaud.gabo...@gmail.com
 wrote:
 Thank you for the answer.
 Is there any way to combine if() and switch() in one line? In my case,
 something like :

 if(trade$Trade.Status==DEL)switch(.)

 I would like to avoid the loop .



 From: Joris Meys [mailto:jorism...@gmail.com]
 Sent: Wednesday, May 26, 2010 9:15 PM
 To: arnaud Gaboury
 Cc: r-help@r-project.org
 Subject: Re: [R] data frame manipulation change elements meeting criteria

 see ?switch

 X- rep(c(Buy,Sell,something else),each=5)
 Y- rep(c(DEL,INS,DEL),5)


 new.vect - X
 for (i in which(Y==DEL)){
 new.vect[i]-switch(
   EXPR = X[i],
   Sell=Buy,
   Buy=Sell,
   X[i])
 }
 cbind(new.vect,X,Y)
 On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury arnaud.gabo...@gmail.com
 wrote:
 Dear group,

 Here is my df :

 trade -
 structure(list(Trade.Status = c(DEL, INS, INS), Instrument.Long.Name=
 c(SUGAR NO.11,
 CORN, CORN), Delivery.Prompt.Date = c(Jul/10, Jul/10,
 Jul/10), Buy.Sell..Cleared. = c(Sell, Buy, Buy), Volume = c(1L,
 2L, 1L), Price = c(15.2500, 368., 368.5000), Net.Charges..sum. =
 c(4.01,
 -8.64, -4.32)), .Names = c(Trade.Status, Instrument.Long.Name,
 Delivery.Prompt.Date, Buy.Sell..Cleared., Volume, Price,
 Net.Charges..sum.), row.names = c(NA, 3L), class = data.frame)

 Here is what I want :

 If trade$Trade.Status==DEL: then if trade$buy.Sell..Cleared==Sell ,
 change
 it to Buy, if trade$buy.Sell..Cleared==Buy, change it to Sell.
 If trade$Trade.Status==INS, do nothing
 I tried to work around with ifelse, but don't know how to deal with so many
 conditions.

 Any help is appreciated.

 TY

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Joris Meys
Hi Abanero,

first, I have to correct myself. Knn1 is a supervised learning algorithm, so
my comment wasn't completely correct. In any case, if you want to do a
clustering prior to a supervised classification, the function daisy() can
handle any kind of variable. The resulting distance matrix can be used with
a number of different methods.

And you're right, randomForest doesn't handle categorical variables either.
So I haven't been of great help here...
Cheers
Joris

On Thu, May 27, 2010 at 1:25 PM, abanero gdevi...@xtel.it wrote:


 Hi,

 thank you Joris and Ulrich for you answers.

 Joris Meys wrote:

 see the library randomForest for example


 I'm trying to find some example in randomForest with categorical variables
 but I haven't found anything. Do you know any example with both categorical
 and numerical variables? Anyway I don't have any class labels yet. How
 could
 I  find clusters with randomForest?


 Ulrich wrote:

 Probably the simplest way is Affinity Propagation[...] All you need is a
 way of measuring the similarity of samples which is straightforward both
 for numerical and categorical variables.

 I had a look at the documentation of the package apcluster. That's
 interesting but do you have any example using it with both categorical and
 numerical variables? I'd like to test it with a large dataset..

 Thanks a lot!
 Cheers

 Giuseppe

 --
 View this message in context:
 http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2232950.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Joris Meys
I'm confusing myself :-)

randomForest cannot handle character vectors as predictors. (Which is why I,
to my surprise, found out that a categorical variable could not be used in
the function). It can handle categorical variables as predictors IF they are
put in as a factor.

Obviously they handle categorical variables as a response variable.

 I hope I'm not going to add up more mistakes, it's been enough for the
day...
Cheers
Joris

On Thu, May 27, 2010 at 2:08 PM, steve_fried...@nps.gov wrote:

 Joris,

 I've been following this thread for a few days as I am beginning to use
 randomForest in my work.  I am confused by your last email.

 What do you mean that randomForest does not handle categorical variables ?

 It can be used in either regression or classification analysis.  Do you
 mean that categorical predictors are not suitable? Certainly they are as
 the response.
 Would you be so kind, and clarify what you were suggesting.

 Thanks,

 Steve Friedman Ph. D.
 Spatial Statistical Analyst
 Everglades and Dry Tortugas National Park
 950 N Krome Ave (3rd Floor)
 Homestead, Florida 33034

 steve_fried...@nps.gov
 Office (305) 224 - 4282
 Fax (305) 224 - 4147



 Joris Meys
 jorism...@gmail.
 com   To
 Sent by:  abanero gdevi...@xtel.it
 r-help-boun...@r-  cc
 project.org   r-help@r-project.org
   Subject
   Re: [R] cluster analysis and
 05/27/2010 07:56  supervised classification: an
 AMalternative to knn1?










 Hi Abanero,

 first, I have to correct myself. Knn1 is a supervised learning algorithm,
 so
 my comment wasn't completely correct. In any case, if you want to do a
 clustering prior to a supervised classification, the function daisy() can
 handle any kind of variable. The resulting distance matrix can be used with
 a number of different methods.

 And you're right, randomForest doesn't handle categorical variables either.
 So I haven't been of great help here...
 Cheers
 Joris

 On Thu, May 27, 2010 at 1:25 PM, abanero gdevi...@xtel.it wrote:

 
  Hi,
 
  thank you Joris and Ulrich for you answers.
 
  Joris Meys wrote:
 
  see the library randomForest for example
 
 
  I'm trying to find some example in randomForest with categorical
 variables
  but I haven't found anything. Do you know any example with both
 categorical
  and numerical variables? Anyway I don't have any class labels yet. How
  could
  I  find clusters with randomForest?
 
 
  Ulrich wrote:
 
  Probably the simplest way is Affinity Propagation[...] All you need is a
  way of measuring the similarity of samples which is straightforward both
  for numerical and categorical variables.
 
  I had a look at the documentation of the package apcluster. That's
  interesting but do you have any example using it with both categorical
 and
  numerical variables? I'd like to test it with a large dataset..
 
  Thanks a lot!
  Cheers
 
  Giuseppe
 
  --
  View this message in context:
 

 http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2232950.html

  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented

Re: [R] summary of arima model in R

2010-05-26 Thread Joris Meys
I reckon you misunderstand the function arima. If you're interested in the
significance of any regressor, you should use the proper fitting tools.
Check all the code examples from the book I recommended before on  :

http://www.stat.pitt.edu/stoffer/tsa2/index.html

There's a nice tutorial that explains quite well how to proceed. In the code
for chapter 1-5 they give you examples for the functions you need to do
formal testing of your regressors.

Cheers
Joris


On Wed, May 26, 2010 at 1:28 AM, Jianyun Wu jianyun.fred...@gmail.comwrote:

 Thanks for ur reply. But wot i want is to see the significancy of
 intervention regressors,rather than see the goodness of fit of time
 series itself.

 Thanks

 On 5/26/10, Joris Meys jorism...@gmail.com wrote:
  Check  http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf for
 some
  ideas on testing time series in R.  I'd go with the acf() and pacf() on
 the
  residuals of the arima model. If arima works, both plots will indicate
  absence of autocorrelation.
 
  also check ?tsdiag
 
  And if you're really going to use those more often, I really can
 recommend
  this book :
 
 http://www.amazon.com/Time-Analysis-Its-Applications-Statistics/dp/0387293175
 
  Cheers
  Joris
  On Tue, May 25, 2010 at 9:34 AM, Fred jianyun.fred...@gmail.com wrote:
 
  Hi,
 
  I want to give a summary or anova for arima model in R, as
  summary, and anova for lm.
 
  As including various intervention factors in arima(xreg = ) part, I
  want to assess the significancy of thse factors.
 
  I can do it using interrupted analysis of time series by linear
  regression, but want to see whether arima model works for the data
  first.
 
  summary, anova do not work for arima, any alternatives ???
 
  Thank you very much.
 
  Fred
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
  --
  Joris Meys
  Statistical Consultant
 
  Ghent University
  Faculty of Bioscience Engineering
  Department of Applied mathematics, biometrics and process control
 
  Coupure Links 653
  B-9000 Gent
 
  tel : +32 9 264 59 87
  joris.m...@ugent.be
  ---
  Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
 

 --
 Sent from my mobile device




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] validation logistic regression

2010-05-26 Thread Joris Meys
Hi,

first of all, you shouldn't backtransform your prediction, use the option
type=response instead :

salichpred-predict(salic.lr, newdata=profilevalidation,type=response)

limit - 0.5
salichpredcat - ifelse(salichpredlimit,0,1) # prediction of categories.

Read in on sensitivity, specificity and ROC-curves. With changing the limit,
you can calculate sensitivity and specificity, and you can construct a ROC
curve that will tell you how well your predictions are. It all depends on
how much error you allow on the predictions.

Cheers
Joris


On Wed, May 26, 2010 at 10:04 AM, azam jaafari azamjaaf...@yahoo.comwrote:

 Hi

 I did validation for prediction by logistic regression according to
 following:

 validationsize - 23
 set.seed(1)
 random-runif(123)
 order(random)
 nrprofilesinsample-sort(order(random)[1:100])
 profilesample - data[nrprofilesinsample,]
 profilevalidation - data[-nrprofilesinsample,]
 salich-profilesample$SALIC.H.1
 salic.lr-glm(salich~wetnessindex, profilesample,
 family=binomial('logit'))
 summary(salic.lr)
 salichpred-predict(salic.lr, newdata=profilevalidation)
 expsalichpred-exp(salichpred)
 salichprediction-(expsalichpred/(1+expsalichpred))

 So,
  table(salichprediction, profilevalidation$SALIC.H.1)

 in result:
 salichprediction0 1
   0.0408806327422231 1 0
   0.094509645033899  1 0
   0.118665480273383  1 0
   0.129685441514168  1 0
   0.135452955695111 0
   0.137580612201769  1 0
   0.197265822234215  1 0
   0.199278585548248  0 1
   0.202436276322278  1 0
   0.211278767985746  1 0
   0.261036846823867  1 0
   0.283792703256058  1 0
   0.362229486187581  0 1
   0.362795636267779  1 0
   0.409067386115694  1 0
   0.410860613509484  0 1
   0.423960962956254  1 0
   0.428164288793652  1 0
   0.448509687866763  0 1
   0.538401659478058  0 1
   0.557282539294224  1 0
   0.603881788227797  0 1
   0.63633478460736   0 1

 So, I have salichprediction between 0 to 1 and binary variable(observed
 values) 0 or 1. I want to compare these data together and I want to know is
 ok this model(logistic regression) for prediction or no?

 please help me?

 Thanks alot

 Azam




[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation time of isoMDS and the optimal number of dimensions

2010-05-26 Thread Joris Meys
Hi Michael,

thanks for your answer. Indeed, with a 100x100 matrix it runs even pretty
fast with k=30. But as with a lot of things in R, there is a
disproportionate rise in the calculation time once you exceed a certain size
limit on your matrices. In the end, it ran about 8 hours for my complete
matrix.

Thanks for the suggestion, that saves me quite a bit of time. For the full
story: I'm applying a new distance measure for comparing phylogenetic trees.
Now the space this is calculated in, has to be mapped back on an euclidean
space, which I'm trying out right now. I noticed that the 2D-solution seems
a bad representation of the real distances, so I increase the
dimensionality. I use the dimensions to get a medoid and a centroid in the
euclidean space, but those results obviously depend on the number of
dimensions used in the MDS. So I'm trying to figure out when these location
measures get more or less stable. graphical representation is off course
bound to 2 dimensions. 3D plots tend to be confusing with over 800 points.
Tried color coding, but that doesn't really help...

Cheers
Joris

On Wed, May 26, 2010 at 5:32 AM, Michael Denslow
michael.dens...@gmail.comwrote:

 Hi Joris,

 On Tue, May 25, 2010 at 1:00 PM, Joris Meys jorism...@gmail.com wrote:
  Dear all,
 
  I'm running a set of nonparametric MDS analyses, using a wrapper for
 isoMDS,
  on a 800x800 distance matrix. I noticed that setting the parameter k to
  larger numbers seriously increases the calculation time. Actually, with
 k=10
  it calculates already longer than for k=2 and k=5 together. It's now
  calculating for 6 hours, and counting...

 Seems like a long time, I have a 100x100 matrix that takes about 40
 secs to run with k=10. What is the wrapper function doing?

 
  There is quite a difference between the results using k=2 or k=5 when
  looking at the first 2 dimensions (logically...). I suspect the same when
  k=10. Yet, I start asking myself whether this makes sense if I'm only
 using
  the first 2 dimensions. And I can't think of a formal method to check in
 a
  nMDS framework how much dimensions are enough. Anybody an idea?

 You might want to look at the nmds.min() function in the ecodist
 package, which seeks to minimize stress. Out of curiosity, do you
 often use 10 dimensional solutions in your field of study?

 Hope this helps,
 Michael

  I use metaMDS from the vegan package, although it's not really meant to
 be
  used on these data.
 
  Cheers
  Joris
 
  --
  Joris Meys
  Statistical Consultant
 
  Ghent University
  Faculty of Bioscience Engineering
  Department of Applied mathematics, biometrics and process control
 
  Coupure Links 653
  B-9000 Gent
 
  tel : +32 9 264 59 87
  joris.m...@ugent.be
  ---
  Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Michael Denslow

 I.W. Carpenter Jr. Herbarium [BOON]
 Department of Biology
 Appalachian State University
 Boone, North Carolina U.S.A.
 -- AND --
 Communications Manager
 Southeast Regional Network of Expertise and Collections
 sernec.org

 36.214177, -81.681480 +/- 3103 meters




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation time of isoMDS and the optimal number of dimensions

2010-05-26 Thread Joris Meys
Hi Gavin,

thank you for the answer. I am aware of the fact that with nMDS it's about
the configuration, and that's exactly my problem: the configuration changes
pretty much when I increase the number of dimensions. As I am trying to go
from a CAT(0) space of trees (see Billera et al on geodesic distance) to an
euclidean space, the required amount of dimensions is not easily determined.
I have to restrict my euclidean space for practical reasons, but I want to
stay as close as possible to the original configuration of the trees.
Hence my playing with the dimensions in the nMDS.

I merely commented on the metaMDS as not really meant for this kind of
data because of the object that's returned. As you miss the species
component in the data, you get warning messages when using procrustes() or
other functions in the vegan package. But you're right. It might be written
for community data, but it is perfectly valid for any kind of distance
matrix.

thanks again for your insights.

Cheers
Joris

On Wed, May 26, 2010 at 9:34 AM, Gavin Simpson gavin.simp...@ucl.ac.ukwrote:

 On Tue, 2010-05-25 at 19:00 +0200, Joris Meys wrote:
  Dear all,
 
  I'm running a set of nonparametric MDS analyses, using a wrapper for
 isoMDS,
  on a 800x800 distance matrix. I noticed that setting the parameter k to
  larger numbers seriously increases the calculation time. Actually, with
 k=10
  it calculates already longer than for k=2 and k=5 together. It's now
  calculating for 6 hours, and counting...

 metaMDS will try 'trymax' random starts of isoMDS in an attempt to see
 if convergent solutions are reached. The 10d computation is clearly much
 more complex than fitting rank distances in 2 or even 5 d.

  There is quite a difference between the results using k=2 or k=5 when
  looking at the first 2 dimensions (logically...). I suspect the same when
  k=10. Yet, I start asking myself whether this makes sense if I'm only
 using
  the first 2 dimensions. And I can't think of a formal method to check in
 a
  nMDS framework how much dimensions are enough. Anybody an idea?

 In nMDS the configuration counts, not the axes (as they are themselves
 arbitrary directions --- having one or the other of a x or y
 geographical coordinate isn't much use without the other coordinate if
 you want to find your way to that location - you need both). It makes no
 sense what so ever to compute a 10d nMDS solution if you only want a 2d
 solution for later computations; there is no guarantee that the first
 two axes of a 10d nMDS solution will be as good as those from the 2d
 solution. If you only want a 2d solution, concentrate on finding the
 best 2d solution you can using metaMDS.

  I use metaMDS from the vegan package, although it's not really meant to
 be
  used on these data.

 Why do you say that? As long as you turn off a couple of the
 ecological helper bits in metaMDS, all it is doing is handling random
 starts of the isoMDS algorithm.

 
  Cheers
  Joris
 

 HTH

 G

 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w] 
 http://www.ucl.ac.uk/~ucfagls/http://www.ucl.ac.uk/%7Eucfagls/
  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2010-05-26 Thread Joris Meys
What exactly are you trying to do? If you want to know which position is
wrong, try :

if (sum(u$POSITION==0)0) cat(WARNING:POSITION IS WRONG FOR
,which(u$POSITION==0),\n)

or even :
wrong - which(u$POSITION==0)
if(length(wrong)0) cat(WARNING: POSITION IS WRONG
FOR,u$DESCRIPTION[wrong],\n)

Gives you the exact location of wrong positions. If you do that, make sure
u$DESCRIPTION is a character vector and not a factor.

Cheers
Joris

On Wed, May 26, 2010 at 2:31 PM, arnaud Gaboury arnaud.gabo...@gmail.comwrote:

 Dear group,

 Here is my data frame:

  dput(u)
 structure(list(DESCRIPTION = structure(c(2L, 5L, 6L, 7L, 9L,
 11L, 12L, 15L, 14L, 16L, 1L, 10L, 3L, 4L, 13L, 8L, 17L), .Label = c(COFFEE
 C Jul/10,
 COPPER May/10, CORN Jul/10, CORN May/10, COTTON NO.2 Jul/10,
 CRUDE OIL miNY May/10, GOLD Jun/10, HENRY HUB NATURAL GAS May/10,
 ROBUSTA COFFEE (10) Jul/10, SILVER May/10, SOYBEANS Jul/10,
 SPCL HIGH GRADE ZINC USD, STANDARD LEAD USD, SUGAR NO.11 Jul/10,
 SUGAR NO.11 May/10, WHEAT Jul/10, WHEAT May/10), class = factor),
PL = c(3500, -1874.999, -2612.503, -2169.998,
-680, 425, 1025, 1008.000, -3057.599, 3212.5,
-1781.251, -2265.0, 75, -387.5, 2950, 490.0013,
0), POSITION = c(-2, 3, 2, 2, 18, 3, -1, -1, 5, 5, 0, 0,
0, 0, 0, 0, 0)), .Names = c(DESCRIPTION, PL, POSITION
 ), class = data.frame, row.names = c(NA, -17L))

 I want to give a warning message if one of the element of the POSITION
 column is different from zero.

 I tried using mapply with some line like this :

  mapply(if,u$POSITION,==0,print(WARNING:POSITIONS ARE WRONG,quote=F))
 But it seems it is not the correct way to pass the various arguments.

 Any help is appreciated




 ***
 Arnaud Gaboury
 Mobile: +41 79 392 79 56
 BBM: 255B488F

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] condition apply to elements of a data frame column

2010-05-26 Thread Joris Meys
Arnaud,

check the vector :
 u$POSITION0
[1]  TRUE TRUE ...

what I do is putting u$POSITION==0
[1] FALSE FALSE ...
when you apply the sum() function on that vector, FALSE becomes 0 and TRUE
becomes 1. So this actually gives you a way of counting the amount of
positions that are not zero. if you have one element -2 and another 2, then
 c(-2,0,2)==0
[1] FALSE TRUE FALSE

sum(c(-2,0,2)==0)
[1] 1

Which gives you exactly the amount of elements that is 0.

Cheers
Joris


On Wed, May 26, 2010 at 3:14 PM, arnaud Gaboury arnaud.gabo...@gmail.comwrote:

 Joris,

 I want to add a line in a  function with a print warning if one element
 of
 the column is 0.
 I could use if(sum(u$POSITION)0) as a condition, but I can imagine having
 one element equal to -2, and another one to 2. So in this case, sum=0, but
 the condition is false in fact (minimum of one element different from
 zero).






 From: Joris Meys [mailto:jorism...@gmail.com]
 Sent: Wednesday, May 26, 2010 2:48 PM
 To: arnaud Gaboury
 Cc: r-help@r-project.org
 Subject: Re: [R] (no subject)

 What exactly are you trying to do? If you want to know which position is
 wrong, try :

 if (sum(u$POSITION==0)0) cat(WARNING:POSITION IS WRONG FOR
 ,which(u$POSITION==0),\n)

 or even :
 wrong - which(u$POSITION==0)
 if(length(wrong)0) cat(WARNING: POSITION IS WRONG
 FOR,u$DESCRIPTION[wrong],\n)

 Gives you the exact location of wrong positions. If you do that, make sure
 u$DESCRIPTION is a character vector and not a factor.

 Cheers
 Joris
 On Wed, May 26, 2010 at 2:31 PM, arnaud Gaboury arnaud.gabo...@gmail.com
 wrote:
 Dear group,

 Here is my data frame:

  dput(u)
 structure(list(DESCRIPTION = structure(c(2L, 5L, 6L, 7L, 9L,
 11L, 12L, 15L, 14L, 16L, 1L, 10L, 3L, 4L, 13L, 8L, 17L), .Label = c(COFFEE
 C Jul/10,
 COPPER May/10, CORN Jul/10, CORN May/10, COTTON NO.2 Jul/10,
 CRUDE OIL miNY May/10, GOLD Jun/10, HENRY HUB NATURAL GAS May/10,
 ROBUSTA COFFEE (10) Jul/10, SILVER May/10, SOYBEANS Jul/10,
 SPCL HIGH GRADE ZINC USD, STANDARD LEAD USD, SUGAR NO.11 Jul/10,
 SUGAR NO.11 May/10, WHEAT Jul/10, WHEAT May/10), class = factor),
PL = c(3500, -1874.999, -2612.503, -2169.998,
-680, 425, 1025, 1008.000, -3057.599, 3212.5,
-1781.251, -2265.0, 75, -387.5, 2950, 490.0013,
0), POSITION = c(-2, 3, 2, 2, 18, 3, -1, -1, 5, 5, 0, 0,
0, 0, 0, 0, 0)), .Names = c(DESCRIPTION, PL, POSITION
 ), class = data.frame, row.names = c(NA, -17L))

 I want to give a warning message if one of the element of the POSITION
 column is different from zero.

 I tried using mapply with some line like this :

  mapply(if,u$POSITION,==0,print(WARNING:POSITIONS ARE WRONG,quote=F))
 But it seems it is not the correct way to pass the various arguments.

 Any help is appreciated




 ***
 Arnaud Gaboury
 Mobile: +41 79 392 79 56
 BBM: 255B488F

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] stress function in isoMDS

2010-05-26 Thread Joris Meys
Dear all,

as far as my understanding goes, isoMDS uses the Kruskal definition of
stress, i.e. : the square root of the ratio of the sum of squared
differences between the input distances and those of the configuration to
the sum of configuration distances squared. (as stated in the help files).
Now the definition of Kruskal also includes weights. I checked the isoMDS
code, but they call to C routines that I can't really read.

Anybody an idea about whether or not isoMDS applies those weights?

Next to that, The input distances are allowed a monotonic transformation. 
How do I have to see that transformation within isoMDS?

Kind regards
Joris

-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-26 Thread Joris Meys
Not a direct answer, but from your description it looks like you are better
of with supervised classification algorithms instead of unsupervised
clustering. see the library randomForest for example. Alternatively, you can
try a logistic regression or a multinomial regression approach, but these
are parametric methods and put requirements on the data. randomForest is
completely non-parametric.

Cheers
Joris

On Wed, May 26, 2010 at 3:45 PM, abanero gdevi...@xtel.it wrote:


 Hi,
 I have a 1.000 observations with 10 attributes (of different types:
 numeric,
 dicotomic, categorical  ecc..) and a measure M.

 I need to cluster these observations in order to assign a new observation
 (with the same 10 attributes but not the measure) to a cluster.

 I want to calculate for the new observation a measure as the average of the
 meausures M of the observations in the cluster assigned.

 I would use cluster analysis ( “Clara” algorithm?) and then “knn1” (in
 package class) to assign the new observation to a cluster.

 The problem is: I’m not able to use “knn1” because some of attributes are
 categorical.

 Do you know  something like “knn1” that works with categorical variables
 too? Do you have any suggestion?

 --
 View this message in context:
 http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2231656.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to avoid a subset of a matrix to become a column vector

2010-05-26 Thread Joris Meys
What exactly are you trying to do?
An example (which you should have provided)

 A - matrix(1:100,nrow=10,ncol=10)
 B - A[10,1:3]
 B
[1] 10 20 30
 is.matrix(B)
[1] FALSE

 matrix(B)
 [,1]
[1,]   10
[2,]   20
[3,]   30

This is logic, as you convert a vector to a matrix, and he will assume you
have one column. If you transform it, you should do :
 matrix(B,ncol=3)
 [,1] [,2] [,3]
[1,]   10   20   30

Or use drop=F :

 C - A[10,1:3,drop=F]
 C
 [,1] [,2] [,3]
[1,]   10   20   30
 is.matrix(C)
[1] TRUE


On Wed, May 26, 2010 at 5:58 PM, mau...@alice.it wrote:

 I am assigning subset of a matrix A [n,3]  where n1  to a temporary matrix
 TMP
 I do not know how many rows of A will be assigned to TMP because this is
 established by a
 run-time test.
 I expect TMP to be a matrix [m,3], m =1
 But when 1 row only is transferred from A to TMP then TMP becomes [3,1]
 rather than [1,3]
 How can I avoid this unwanted transpose operation ?

 THank you in advance,
 Maura


 tutti i telefonini TIM!


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame manipulation change elements meeting criteria

2010-05-26 Thread Joris Meys
see ?switch

X- rep(c(Buy,Sell,something else),each=5)
Y- rep(c(DEL,INS,DEL),5)


new.vect - X
for (i in which(Y==DEL)){
new.vect[i]-switch(
  EXPR = X[i],
  Sell=Buy,
  Buy=Sell,
  X[i])
}
cbind(new.vect,X,Y)

On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury arnaud.gabo...@gmail.comwrote:

 Dear group,

 Here is my df :

 trade -
 structure(list(Trade.Status = c(DEL, INS, INS), Instrument.Long.Name=
 c(SUGAR NO.11,
 CORN, CORN), Delivery.Prompt.Date = c(Jul/10, Jul/10,
 Jul/10), Buy.Sell..Cleared. = c(Sell, Buy, Buy), Volume = c(1L,
 2L, 1L), Price = c(15.2500, 368., 368.5000), Net.Charges..sum. =
 c(4.01,
 -8.64, -4.32)), .Names = c(Trade.Status, Instrument.Long.Name,
 Delivery.Prompt.Date, Buy.Sell..Cleared., Volume, Price,
 Net.Charges..sum.), row.names = c(NA, 3L), class = data.frame)

 Here is what I want :

 If trade$Trade.Status==DEL: then if trade$buy.Sell..Cleared==Sell ,
 change
 it to Buy, if trade$buy.Sell..Cleared==Buy, change it to Sell.
 If trade$Trade.Status==INS, do nothing
 I tried to work around with ifelse, but don't know how to deal with so many
 conditions.

 Any help is appreciated.

 TY

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error variable names are limited to 256 bytes when sourcing code

2010-05-26 Thread Joris Meys
col=red,
lty=3)
## Add HIGH ERROR BAR
lines(hh.wealth.plot.ss$Year,   # x var
hh.wealth.plot.ss$EB.High,  # y var (EB
 HIGH)
type=l,   #
 line graph
col=red,
lty=3)
}
## Add QUANTILES if Argument 'n.quantiles' is = 3
if (n.quantiles = 3) {
# Cycle through column numbers and draw quantile lines
for (qcol in 1:length(cols.q)) {
lines(hh.wealth.quantiles$Year, # Plot quantile
 lines
hh.wealth.quantiles[,cols.q[qcol]],
type=l,
col=orange,
lty=3)
}
# Add MEDIAN line
if (plot.med == TRUE) {
lines(hh.wealth.quantiles$Year, # Plot median
hh.wealth.quantiles[,col.med],
type=l,
col=orange,
lty=3,
lwd=2)
}
}
## Add COUNT
points(hh.wealth.plot.ss$Year,  # x var
hh.wealth.plot.ss$Count,# y var (COUNT)
type=p,   # line graph
pch=16,
col=blue,
cex=0.5)
## Add LEGEND
legend(x=topright,
leg.txt.ss,
lty=leg.lty.ss,
lwd=leg.lwd.ss,
pch=leg.pch.ss,
col=leg.col.ss,
cex=1)

dev.off()


##*
## Plot wealth for individual households
##*
png(filename=output.png.wlth, width=10, height=7, units=in,
 res=300)

## Create an empty plot
if (log.plot == FALSE) {
plot(hh.wealth.plot$Year, hh.wealth.plot$Wealth,
type=n,
xlim=c(0,max.yr),
main=ttl.hh,
xlab=,
ylab=Wealth)
} else {
plot(hh.wealth.plot$Year, hh.wealth.plot$Wealth,
log=y,
type=n,
xlim=c(0,max.yr),
main=ttl.hh,
xlab=,
ylab=Wealth (log scale))
}

#legend(x=leg.x.coord, y=leg.y.coord,   # Sets the location for the
 legend
legend(x=topright,
leg.txt.hh, # text in the
 legent
col=c(red, red),# sets the line
 colors in the legend
lty=c(1,3), # draws lines
lwd=c(1,1), # sets line
 thickness
#   bty=n,# no border
 on the legend
ncol=2, # makes it
 a 2-column legend
cex=0.8)# sets the
 legend text size

## Loop through IDs and add a line for each
for (id in 1:length(uniq.hh.ids)) {
## Get the current HH ID
this.id - uniq.hh.ids[id]

## Extract the records for the current ID
this.sub - hh.wealth.plot[hh.wealth.plot$HHID00 ==
 this.id,]

if (dim(this.sub)[1]  0) {

## Set line type
if (mean(this.sub$Status) == 0) {
ltype - 1
} else {
ltype - 3
}

## Add the line for this ID
lines(this.sub$Year, this.sub$Wealth,
type=l,
 col= colors.id[id],
lwd=1,
lty=ltype)
}

}

dev.off()

 }


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted

Re: [R] R editor

2010-05-26 Thread Joris Meys
I'm not Erik, but what the heck.

What platform, linux or Windows? On Windows, I use Tinn-R, which is great
for using with R as you get full control over the console. You need to take
into account that you should install R with the SDI option, and that you
have to configure Tinn-R the first time from the menu
(RConfigurePermanent).

SciTe can be used with R as well. On how to set SciTe for R :
http://tolstoy.newcastle.edu.au/R/e6/help/09/03/6695.html

Cheers
Joris

On Wed, May 26, 2010 at 9:51 PM, b...@email.unc.edu wrote:

 Erik,

 What R editor do you use? I've tried SciTE but it won't color the code.

 Brian

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear Discriminant Analysis in R

2010-05-26 Thread Joris Meys
Why exactly do you need lda and not another method? For lda to be
applicable, you should check :
1) whether the regressors are normally distributed within the classes
2) whether the variance-covariance matrices are equal for all classes

Essentially, this means that the boundary between both classes is a
hyperplane (or in 2 dimensions, a straight line). Otherwise you can try qda,
or go to other supervised learning methods.

How to use lda is explained rather well in the help files. if it doesn't
work, provide us with self-contained code (i.e. code that can be run without
need of extra information like data frames) that reproduces the error.

Cheers
Joris

PS : There's an error in your code.
scaled_features - scale(mask_features, center = FALSE, scale =
apply(abs(mask_features, 2, median)))

should be
scaled_features - scale(mask_features, center = FALSE, scale =
apply(abs(mask_features), 2, median))


On Wed, May 26, 2010 at 5:55 PM, cobbler_squad la.f...@gmail.com wrote:


 Dear R gurus,

 Thank you all for continuous support and guidance -- learning without you
 would not be efficient.

 I have a question regarding LD analysis and how to best code it up in R.

 I have a file of (V52 and 671 time points across all columns) and another
 file of phonetic features (each vowel is aligned with a distinct binary
 sequence, i.e.
 E 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 and so on). I need to
 run lda (at first for one of the features, meaning one column only
 extracted
 from the binary file mentioned above). In code so far I have very little,
 but here the short examples of both files:
 V57 file:

  V27   V28   V29   V30   V31   V32
 V33   V34
 1   -2.515000e-03 -0.203858  6.531000e-03  0.248686  6.76e-04  0.084677
 -1.262000e-03
 2   -2.406000e-03 -0.194943  6.248000e-03  0.237851  6.47e-04  0.081001
 -1.207000e-03
 3   -4.86e-04 -0.039288  1.263000e-03  0.047980  1.30e-04  0.016292
 -2.43e-04

 and binary file

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
 V21 V22 V23 V24 V25 V26
 1E  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   0   1   0
 0   0   0   0   0   0
 2o  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0   1   0   1   0
 1   0   1   0   0   0
 3I  0  0  0  0  0  0  0  0   0   0   0   0   1   1   0   0   1   0   0
 0   0   0   0   0   0

 thus in code I have the following:

 library(MASS)

 vowel_features - read.table(file = mappings_for_vowels.txt)
 mask_features - read.table(file = 3dmaskdump_ICA_37_Combined.txt)

 #scale the mask_features file

 scaled_features - scale(mask_features, center = FALSE, scale =
 apply(abs(mask_features, 2, median)))

 #input vowel feature, lda

 lda(ROI_values ~ mappings_for_vowels[15]...)

 not sure what is the correct approach to use for lda

 any pointers would be greatly appreciated

 thanks again all!

 Cobbler

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2231922.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] More efficient way to use ifelse()? - A follow up

2010-05-26 Thread Joris Meys
Remove the  around the options. You also have to put it in a sapply, as
switch only works on single values. But I wouldn't call this optimal...

elevation.DM -sapply(Population,switch, CO= 2169, CN = 1121,
 Ga =500, KO = 2500, Mw = 625, Ng = 300 )

Cheers
Joris


On Wed, May 26, 2010 at 9:04 PM, Ian Dworkin idwor...@msu.edu wrote:


 # Dennis Murphy suggested switch.. I have not gotten it working yet..

 elevation.DM - switch(Population, CO= 2169, CN = 1121, Ga =
 500, KO = 2500, Mw = 625, Ng = 300 )





 --
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fill a matrix using logical arguments?

2010-05-26 Thread Joris Meys
Hi Alistair,

?match will help, but you need to extract the site names first. Quick and
dirty :

seed - substr(rownames(bank.plot),1,5)
site - rownames(site.veg)

for(i in 1:length(seed)){
  bank.plot[i,]-site.veg[match(seed[i],site),]
}

Cheers
Joris
On Wed, May 26, 2010 at 2:50 PM, Alistair Auffret 
alistair.auff...@natgeo.su.se wrote:

 Hello all,
 I am going slightly mad trying to create a table for running
 co-correspondence analysis.

 What I have is seed bank and vegetation data, and my aim  is to see if the
 vegetation found in a site (containing several seed bank samples) can
 predict the composition of a seed bank sample within that site. So for this
 I need two tables with matching rows.

 I have created an empty matrix, where the rows correspond to the seed bank
 samples

 bank.plot-matrix(,5,3,dimnames=list(c(AB 01 01, AB 01 02, AB 02
 01,AB 03 01,AB 03 02),c(1:3)))
 bank.plot

 And I have a matrix where I have presence/absence of species in the
 vegetation at each site.

 site.veg-matrix((c(1,0,1,1,0,1,0,1,1)),3,3,dimnames=list(c(AB 01, AB
 02, AB 03)))
 site.veg

 Is there a way to fill the bank.plot matrix with the results from the
 vegetation survey, duplicating them appropriately to match sites to plots,
 even when the number or sites per plot are unequal? i.e. in my example, the
 row AB 01 in site.veg would be duplicatied for the first two rows, AB 02
 only once, and AB 03 twice.

 Hope you can help!

 Many thanks.


 --

 Alistair Auffret
 PhD Student

 Department of Physical Geography and Quaternary Geology
 Stockholm University
 106 91 Stockholm
 Sweden

 +46(0)8 674 7568
 +46(0)76 7158975

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need Help! Poor performance about randomForest for large data

2010-05-25 Thread Joris Meys
Hi Jia,

without seeing the actual data, it's difficult to give solid options. But
it's quite normal this runs for hours : it has to make a whole lot of
decisions, and it can grow tremendous large trees with that amount of data.
Also the error is quite logic : you just can't store all those huge trees.

Try to set the following options in RandomForest :
mtry : number of variables selected at each split. Smaller number speeds up
things, but this effect will be not too big.
nodesize : this is the minimum node size. In default, it is 1 for
classification, meaning that you build a tree until every observation is in
a seperate leaf. In your case, this should be set waay higher.
maxnodes : this is the maximum number of nodes. Again, with the amount of
data you have, this number goes skyrocket and thus produces huge trees (you
can have more than 200.000 nodes... ). No need to do that, so you should set
it to a reasonable low amount.

Try this for example :
res - randomForest(x=sdata1,y=sdata2,ntrees=500,
mtry=5, nodesize=100,maxnodes=60)

These trees assume that the minimum size of a group with similar
observations is 100. Sounds reasonable, it still gives you over 2800 groups
for a full tree. The maximum number of nodes I chose to allow that every
variable occurs once in the tree, although it doesn't have to be this way.
If you still get errors, play a bit more with those numbers.

Actually, you should do that anyway, regardless of memory and computation
time. RandomForest is known to have the danger of overfitting. Restricting
the tree size avoids this and gives you a more general fit.

Cheers
Joris

On Tue, May 25, 2010 at 11:51 AM, Jia ZJ Zou jia...@cn.ibm.com wrote:

 Hi, dears,

 I am processing some data with 60 columns, and 286,730 rows.
 Most columns are numerical value, and some columns are categorical value.

 It turns out that: when ntree sets to the default value (500), it says can
 not allocate a vector of 1.1 GB size; And when I set ntree to be a very
 small number like 10, it will run for hours.
 I use the (x,y) rather than the (formula,data).

 My code:

  sdata-read.csv(D://zSignal Dump////.csv)
  sdata1-subset(sdata,select=-38)
  sdata2-subset(sdata,select=38)
  res-randomForest(x=sdata1,y=sdata2,ntrees=10)


 Am I doing anything wrong? Or do you have other suggestions? Are there any
 other packages to do the same thing?
 I will appreciate if anyone can help me out, thanks!


 Thanks and Best regards,
 
 Jia, Zou (×Þ¼Î), Ph.D.
 IBM Research -- China
 Diamond Building, #19 Zhongguancun Software Park, 8 Dongbeiwang West Road,
 Haidian District, Beijing 100193, P.R. China
 Tel: +86 (10) 58748518
 E-mail: jia...@cn.ibm.com
[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Joris Meys
without any clue about your data-file this is definitely unsolvable. But
some things to consider :  Where is the dataset coming from? Did you check
for special characters?  Is there an apostrophe somewhere in a string? (That
messed up things for me once). Is the delimiter placed correctly everywhere?


Did you check how the dataframe looks like? If you see what's the last
observation read in, you can jump to that line number in the txt file and
check yourself what goes wrong.


On Tue, May 25, 2010 at 6:15 PM, Changbin Du changb...@gmail.com wrote:

 c...@nuuk:~/operon$ grep '^#' id_name_gh5.txt
 c...@nuuk:~/operon$

 no lines starts with #



 On Tue, May 25, 2010 at 9:11 AM, Barry Rowlingson 
 b.rowling...@lancaster.ac.uk wrote:

  On Tue, May 25, 2010 at 4:42 PM, Changbin Du changb...@gmail.com
 wrote:
   HI, Dear R community,
  
   My original file has 1932 lines, but when I read into R, it changed to
  1068
   lines, how comes?
  
  
   c...@nuuk:~/operon$ wc -l id_name_gh5.txt
   1932 id_name_gh5.txt
  
  
   gene_name-read.table(/home/cdu/operon/id_name_gh5.txt, sep=\t,
   skip=0, header=F, fill=T)
   dim(gene_name)
   [1] 10683
  
  
 
   Do any of your lines start with a #?
 
   read.table(test.txt,sep=\t)
   V1
  1 line 1
  2 line 2
  3 line 3
  4 line 4
 
   read.table(test.txt,comment.char=,sep=\t)
V1
  1  line 1
  2  #commented
  3  line 2
  4  line 3
  5 #nother comment
  6  line 4
 
   just a guess. hard to tell without the file...
 
  Barry
 



 --
 Sincerely,
 Changbin
 --

 Changbin Du
 DOE Joint Genome Institute
 Bldg 400 Rm 457
 2800 Mitchell Dr
 Walnut Creet, CA 94598
 Phone: 925-927-2856

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


<    1   2   3   4   5   >