Re: [R] plot.hclust point to older version

2014-11-26 Thread Martin Maechler

 Thanks! That worked

Of course: As in about 99.99% of all cases where Bill Dunlap  helps.


 You probably have a local copy of an old version of plot.hclust or 
 plot.dendrogram in your global environmenet or another package that masks the 
 one in package:stats.  E.g., I fired up R-2.14.2 and copied those 2 plot 
 methods to .GlobalEnv and then saved by workspace when quitting R.  I then 
 fired up R-3.1.1, which loads the workspace saved by the older version of R.  
 I get:

  objects()
 [1] plot.dendrogram plot.hclust
  plot(hclust(dist(c(2,3,5,7,11,13,17,19
 Error in .Internal(dend.window(n, merge, height, hang, labels, ...)) :
   there is no .Internal function 'dend.window'
  traceback()
 2: plot.hclust(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19
 1: plot(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19

 Note how calling traceback() after an error gives more information about the 
 source of the error.

 To fix this, get rid of the .RData file that is being loaded when R starts.

In the spirit of the old -- now politically incorrect -- sayings
 `` Real men don't . '''
I'd like to emphasize my own view that
 Real useRs don't use .RData

in other words, experienced R users do not let their workspace
be saved automatically (to '.RData') and hence do not load any
.RData automatically at startup.

Consequently, use R with the '--no-save' command line argument
(maybe also with '--no-restore').

ESS (Emacs Speaks Statistics) users can put

(custom-set-variables
 '(inferior-R-args --no-restore-history --no-save )
)

into their ~/.emacs
{and I'd like to see a way to do this easily with RStudio...}

Martin Maechler,
ETH Zurich and R Core Team 

 Bill Dunlap
 TIBCO Software
 wdunlap tibco.comhttp://tibco.com

 On Tue, Nov 25, 2014 at 12:18 PM, Rolf Turner 
 r.tur...@auckland.ac.nzmailto:r.tur...@auckland.ac.nz wrote:
 On 26/11/14 08:53, Michael Mason wrote:
 Here you are. I expect most folks won't get the error.

 N   = 100; M = 1000
 mat = matrix(1:(N*M) + rnorm(N*M,0,.5),N,M)
 h   = hclust(as.dist(1-cor(mat)))
 plot(h)

 Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) :
there is no .Internal function 'dend.window'



 Thanks again


 On 11/25/14 11:29 AM, Rolf Turner 
 r.tur...@auckland.ac.nzmailto:r.tur...@auckland.ac.nz wrote:



 Reproducible example???

 (I know from noddink about hclust, but I tried the example from the help
 page and it plotted without any problem.)

 cheers,

 Rolf Turner

 On 26/11/14 06:13, Michael Mason wrote:
 Hello fellow R users,

 I have recently updated to R 3.1.2. When trying to plot an hclust
 object to generate the dendrogram I get the following error:

 Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) :
 there is no .Internal function 'dend.window'


 I am indeed using R3.1.2 but my understanding is that the .Internal API
 to the C code is no longer used. I have tried detaching the stats
 package and restarting R to no avail.
 I would love any help from any wiser guRus.

 Please keep communications on-list; there are others on the list far more 
 likely to be able to help you than I.  I am cc-ing this reply to the list.

 For what it's worth, I can run your example without error.

 As to how to track down what is going wrong on your system, I'm afraid I have 
 no idea.  Someone on the list may have some thoughts.

 cheers,

 Rolf Turner

 --
 Rolf Turner
 Technical Editor ANZJS

 __
 R-help@r-project.orgmailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 --CONFIDENTIALITY NOTICE--: The information contained in this email is 
 intended for the exclusive use of the addressee and may contain confidential 
 information. If you are not the intended recipient, you are hereby notified 
 that any form of dissemination of this communication is strictly prohibited. 
 www.benaroyaresearch.org

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot.hclust point to older version

2014-11-26 Thread Pascal Oettli
 into their ~/.emacs
 {and I'd like to see a way to do this easily with RStudio...}


In RStudio:

Tools - Global Options - General - uncheck Restore .RData into
workspace at startup and choose Never for Save workspace to .RData
on exit


-- 
Pascal Oettli
Project Scientist
JAMSTEC
Yokohama, Japan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot.hclust point to older version

2014-11-26 Thread PIKAL Petr
Hi

You say

 in other words, experienced R users do not let their workspace be saved
 automatically (to '.RData') and hence do not load any .RData
 automatically at startup.

I save/load .RData for years without any issues (except of not installed 
packages when working on different PCs).I usually keep each project in 
separated .RData (and separated folder, together with all stuff belonging to 
that project), which prevent to mess things together.

There is no such warning as do not use .RData in books I have available.

I wonder how experienced useR keep track of several projects without using 
startup loading .RData?
What would you recommend for keeping track of commands and created objects 
instead of .RData?

Petr


 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Martin
 Maechler
 Sent: Wednesday, November 26, 2014 10:03 AM
 To: Michael Mason
 Cc: R help
 Subject: Re: [R] plot.hclust point to older version


  Thanks! That worked

 Of course: As in about 99.99% of all cases where Bill Dunlap  helps.


  You probably have a local copy of an old version of plot.hclust or
 plot.dendrogram in your global environmenet or another package that
 masks the one in package:stats.  E.g., I fired up R-2.14.2 and copied
 those 2 plot methods to .GlobalEnv and then saved by workspace when
 quitting R.  I then fired up R-3.1.1, which loads the workspace saved
 by the older version of R.  I get:

   objects()
  [1] plot.dendrogram plot.hclust
   plot(hclust(dist(c(2,3,5,7,11,13,17,19
  Error in .Internal(dend.window(n, merge, height, hang, labels, ...))
 :
there is no .Internal function 'dend.window'
   traceback()
  2: plot.hclust(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19
  1: plot(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19

  Note how calling traceback() after an error gives more information
 about the source of the error.

  To fix this, get rid of the .RData file that is being loaded when R
 starts.

 In the spirit of the old -- now politically incorrect -- sayings  ``
 Real men don't . '''
 I'd like to emphasize my own view that
  Real useRs don't use .RData

 in other words, experienced R users do not let their workspace be saved
 automatically (to '.RData') and hence do not load any .RData
 automatically at startup.

 Consequently, use R with the '--no-save' command line argument (maybe
 also with '--no-restore').

 ESS (Emacs Speaks Statistics) users can put

 (custom-set-variables
  '(inferior-R-args --no-restore-history --no-save )
 )

 into their ~/.emacs
 {and I'd like to see a way to do this easily with RStudio...}

 Martin Maechler,
 ETH Zurich and R Core Team

  Bill Dunlap
  TIBCO Software
  wdunlap tibco.comhttp://tibco.com

  On Tue, Nov 25, 2014 at 12:18 PM, Rolf Turner
 r.tur...@auckland.ac.nzmailto:r.tur...@auckland.ac.nz wrote:
  On 26/11/14 08:53, Michael Mason wrote:
  Here you are. I expect most folks won't get the error.

  N   = 100; M = 1000
  mat = matrix(1:(N*M) + rnorm(N*M,0,.5),N,M)
  h   = hclust(as.dist(1-cor(mat)))
  plot(h)

  Error in .Internal(dend.window(n, merge, height2, hang, labels, ...))
 :
 there is no .Internal function 'dend.window'



  Thanks again


  On 11/25/14 11:29 AM, Rolf Turner
 r.tur...@auckland.ac.nzmailto:r.tur...@auckland.ac.nz wrote:



  Reproducible example???

  (I know from noddink about hclust, but I tried the example from the
  help page and it plotted without any problem.)

  cheers,

  Rolf Turner

  On 26/11/14 06:13, Michael Mason wrote:
  Hello fellow R users,

  I have recently updated to R 3.1.2. When trying to plot an hclust
  object to generate the dendrogram I get the following error:

  Error in .Internal(dend.window(n, merge, height2, hang, labels, ...))
 :
  there is no .Internal function 'dend.window'


  I am indeed using R3.1.2 but my understanding is that the .Internal
  API to the C code is no longer used. I have tried detaching the stats
  package and restarting R to no avail.
  I would love any help from any wiser guRus.

  Please keep communications on-list; there are others on the list far
 more likely to be able to help you than I.  I am cc-ing this reply to
 the list.

  For what it's worth, I can run your example without error.

  As to how to track down what is going wrong on your system, I'm
 afraid I have no idea.  Someone on the list may have some thoughts.

  cheers,

  Rolf Turner

  --
  Rolf Turner
  Technical Editor ANZJS

  __
  R-help@r-project.orgmailto:R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  
  --CONFIDENTIALITY NOTICE--: The information contained in this email
 is
  intended for the exclusive use of the addressee and may contain
  confidential information. If you are not the 

Re: [R] Presentation tables in R (knitr)

2014-11-26 Thread Franzini, Gabriele [Nervianoms]

I found also knitr + html + the ReporteRs package a good combination,
and less intimidating than Latex. Have a look at their FlexTable tool.

HTH,
Gabriele  


-Original Message-
From: Tom Wright [mailto:t...@maladmin.com] 
Sent: Tuesday, November 25, 2014 9:12 PM
To: r-help@r-project.org
Subject: [R] Presentation tables in R (knitr)

Hi,
This problem has me stumped so I thought I'd ask the experts. I'm trying
to create a pretty summary table of some data (which patients have had
what tests at what times). Ideally I'd like to knitr this into a pretty
PDF for presentation.
If anyone has pointers I'll be grateful.

require(tables)
require(reshape2)

data-data.frame('ID'=paste0('pat',c(rep(1,8),rep(2,8))),
 'Time'=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
 'Eye'=rep(c('OS','OS','OD','OD'),4),
 'Measure'=rep(c('Height','Weight'),8))

tabular(Measure~factor(ID)*factor(Time)*factor(Eye),data)
#All levels of Time are repeated for all IDs, I'd prefer to just show
the relevant times.

tabular(Measure~factor(ID)*Time*factor(Eye),data)
#Time is getting collapsed by ID

data$value=1
dcast(data,Measure~ID+Time+Eye)
#close but not very pretty

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error Missing values where true/false needed

2014-11-26 Thread Michael Dewey

Comments in-line below

On 26/11/2014 06:27, Frederic Ntirenganya wrote:

Hi PIKAL,


Actually I am Michael, Petr is one of the other respondents.


The error seems to be starnge to me because i access the indices of NAs.
Indices can't be non-applicable.


But you are not testing the indexes, see below


This is the output of indecs having the NA in my dataset. my dataset is
very big that's why I did not provide it.

  indicNAs - which(data$Rain %in% NA)
  indicNAs
  [1]   426   792  1158  1890  2256  2622  3354  3720  4086  4818  5184
5550  6282  6648  7014  7746  8112
[18]  8478  9210  9576  9942 10674 11040 11406 12138 12504 12870 13602
13968 14334 15066 15432 15798 16530
[35] 16896 17262 17994 18360 18726 19458 19824 20190

Regards,
Frederic.

Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za mailto:fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Tue, Nov 25, 2014 at 3:51 PM, Michael Dewey i...@aghmed.fsnet.co.uk
mailto:i...@aghmed.fsnet.co.uk wrote:

You do not tell us what you are trying to do but I think there is
something wrong in the logic of your thinking as on the one hand you
are selecting just precisely those elements of data$Rain which are
NA and then testing whether any of them equals 60.



My comments on your code are preceded ## to make them clear




On 25/11/2014 12:19, Frederic Ntirenganya wrote:

Dear All,

I am getting this error and don't know why it comes. can you
please help ?

Error in if (data$Rain[i_NA] == 60) { :
missing value where TRUE/FALSE needed

The loop is :

indicNAs - which(data$Rain %in% NA)

## so at this point indicNAs is the indexes of all the NA
## values in dat$Rain

ind_nonleap = c() # NAs due to non leap years
ind_nonrecord = c() # NAs due to non recording values
for (i_NA in indicNAs ){ ## step through those indexes
  if(data$Rain[i_NA] == 60){

## since i_NA is the index of a value of data$Rain which
## you know to be NA this evaluates to NA and if() complains
## I expect you really meant some other variable in data
## incidentally it is better not to call your data data

ind_nonleap - append(ind_nonleap,i_NA)
  }
  else {
ind_nonrecord-append(ind___nonrecord,i_NA)
  }
 #cat(ind_nonrecord)
 #cat( ind_nonleap)
}
ind_nonleap

Regards,
Frederic.

Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za mailto:fr...@aims.ac.za
https://sites.google.com/a/__aims.ac.za/fredo/
https://sites.google.com/a/aims.ac.za/fredo/

 [[alternative HTML version deleted]]


R-help@r-project.org mailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/__listinfo/r-help
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/__posting-guide.html
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


-
No virus found in this message.
Checked by AVG - www.avg.com http://www.avg.com
Version: 2015.0.5577 / Virus Database: 4223/8627 - Release Date:
11/25/14



--
Michael
http://www.dewey.myzen.co.uk


No virus found in this message.
Checked by AVG - www.avg.com http://www.avg.com
Version: 2015.0.5577 / Virus Database: 4223/8632 - Release Date: 11/25/14



--
Michael
http://www.dewey.myzen.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot facet and subsetting

2014-11-26 Thread PIKAL Petr
Dear all

I encountered strange behaviour of ggplot with combination of facet and 
subsetting. I use for creating plots sometimes a for cycle, something like this

for (i in n:m) { p-ggplot(data, aes(x=x, y=data[,i], colour=f))), ...}

However I found strange result with this combination

This is OK but only in BW
p-ggplot(vec.c, aes(x=fi, y=nad1mi))
p+geom_point(size=5)+geom_line()+facet_grid(.~stroj)

this is OK with colour
p-ggplot(vec.c, aes(x=fi, y=nad1mi, colour=as.factor(cas)))
p+geom_point(size=5)+geom_line()+facet_grid(.~stroj)

Here results in facets are mismatched
p-ggplot(vec.c, aes(x=fi, y=vec.c[,2], colour=as.factor(cas)))
p+geom_point(size=5)+geom_line()+facet_grid(.~stroj)

and this is mismatched too
p-ggplot(vec.c, aes(x=fi, y=vec.c[,2]))
p+geom_point(size=5)+geom_line()+facet_grid(.~stroj)

Doeas anybody know what I am doing wrong?

 dput(vec.c)
structure(list(cas = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L,
1L, 2L, 0L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
nad1mi = c(3, 2.7, 0.3, 0.5, 1.9, 5.3, 0.4, 3, 5.4, 0.7,
20.6, 16.7, 16.6, 20.7, 16.1, 15.2, 20.5, 16.4, 14.8, 24.6,
19.3, 15.2, 26.9, 21.3, 20.6, 22.6, 16.3, 15.7, 19.3, 16.5,
15.5, 3.6, 3.4, 5.9, 4.6, 5.4, 4.2, 5.3, 5.6, 5.1, 5), stroj = 
structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(mastersizer,
odstredivka, zetasizer), class = factor), fi = c(341L,
341L, 285L, 285L, 401L, 401L, 231L, 231L, 190L, 190L, 341L,
341L, 341L, 285L, 285L, 285L, 401L, 401L, 401L, 231L, 231L,
231L, 190L, 190L, 190L, 167L, 167L, 167L, 161L, 161L, 161L,
341L, 341L, 285L, 285L, 401L, 401L, 231L, 231L, 190L, 190L
)), .Names = c(cas, nad1mi, stroj, fi), class = data.frame, 
row.names = c(1L,
2L, 6L, 7L, 11L, 12L, 16L, 17L, 21L, 22L, 26L, 27L, 28L, 32L,
33L, 34L, 38L, 39L, 40L, 44L, 45L, 46L, 50L, 51L, 52L, 56L, 57L,
58L, 62L, 63L, 64L, 68L, 69L, 73L, 74L, 78L, 79L, 83L, 84L, 88L,
89L))


Regards
Petr

 sessionInfo(package = NULL)
R Under development (unstable) (2014-07-16 r66175)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Czech_Czech Republic.1250  LC_CTYPE=Czech_Czech Republic.1250
[3] LC_MONETARY=Czech_Czech Republic.1250 LC_NUMERIC=C
[5] LC_TIME=Czech_Czech Republic.1250

attached base packages:
[1] stats datasets  utils grDevices graphics  methods   base

other attached packages:
[1] ggplot2_1.0.0   lattice_0.20-29 fun_1.0

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4 digest_0.6.4 grid_3.2.0   gtable_0.1.2
 [5] labeling_0.2 MASS_7.3-33  munsell_0.4.2plyr_1.8.1
 [9] proto_0.3-10 Rcpp_0.11.2  reshape2_1.4 scales_0.2.4
[13] stringr_0.6.2tools_3.2.0



Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract 

Re: [R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

2014-11-26 Thread Rune Haubo
Dear Charlie,

I admit that I haven't read your email closely, but here is a way to
test for non-proportional odds using the ordinal package (warning:
self-promotion) using the wine data set also from the ordinal package.
There is more information in the package vignettes

Hope this is something you can use.
Cheers,
Rune

 library(ordinal)
 ## Fit model:
 fm - clm(rating ~ temp + contact, data=wine)
 summary(fm)
formula: rating ~ temp + contact
data:wine

 link  threshold nobs logLik AICniter max.grad cond.H
 logit flexible  72   -86.49 184.98 6(0)  4.64e-15 2.7e+01

Coefficients:
   Estimate Std. Error z value Pr(|z|)
tempwarm 2.5031 0.5287   4.735 2.19e-06 ***
contactyes   1.5278 0.4766   3.205  0.00135 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Threshold coefficients:
Estimate Std. Error z value
1|2  -1.3444 0.5171  -2.600
2|3   1.2508 0.4379   2.857
3|4   3.4669 0.5978   5.800
4|5   5.0064 0.7309   6.850
 ## Model with non-proportional odds for contact:
 fm2 - clm(rating ~ temp, nominal=~contact, data=wine)
 ## Likelihood ratio test of non-proportional odds:
 anova(fm, fm2)
Likelihood ratio tests of cumulative link models:

formula:nominal: link: threshold:
fm  rating ~ temp + contact ~1   logit flexible
fm2 rating ~ temp   ~contact logit flexible

no.parAIC  logLik LR.stat df Pr(Chisq)
fm   6 184.98 -86.492
fm2  9 190.42 -86.209  0.5667  3  0.904
 ## Automatic tests of non-proportional odds for all varibles:
 nominal_test(fm)
Tests of nominal effects

formula: rating ~ temp + contact
Df  logLikAICLRT Pr(Chi)
none -86.492 184.98
temp 3 -84.904 187.81 3.1750   0.3654
contact  3 -86.209 190.42 0.5667   0.9040

On 25 November 2014 at 17:21, Charlotte Whitham
charlotte.whit...@gmail.com wrote:
 Dear list,

 I have used the ‘polr’ function in the MASS package to run an ordinal 
 logistic regression for an ordinal categorical response variable with 15 
 continuous explanatory variables.
 I have used the code (shown below) to check that my model meets the 
 proportional odds assumption following advice provided at 
 (http://www.ats.ucla.edu/stat/r/dae/ologit.htm) – which has been extremely 
 helpful, thank you to the authors! However, I’m a little worried about the 
 output implying that not only are the coefficients across various cutpoints 
 similar, but they are exactly the same (see graphic below).

 Here is the code I used (and see attached for the output graphic)

 FGV1b-data.frame(FG1_val_cat=factor(FGV1b[,FG1_val_cat]),scale(FGV1[,c(X,Y,Slope,Ele,Aspect,Prox_to_for_FG,Prox_to_for_mL,Prox_to_nat_border,Prox_to_village,Prox_to_roads,Prox_to_rivers,Prox_to_waterFG,Prox_to_watermL,Prox_to_core,Prox_to_NR,PCA1,PCA2,PCA3)]))

 b-polr(FGV1b$FG1_val_cat ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele + 
 FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL + 
 FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads + 
 FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL + 
 FGV1b$Prox_to_core + FGV1b$Prox_to_NR, data = FGV1b, Hess=TRUE)

 #Checking the assumption. So the following code will estimate the values to 
 be graphed. First it shows us #the logit transformations of the probabilities 
 of being greater than or equal to each value of the target #variable

 FGV1b$FG1_val_cat-as.numeric(FGV1b$FG1_val_cat)

 sf - function(y) {

   c('VC=1' = qlogis(mean(FGV1b$FG1_val_cat = 1)),

 'VC=2' = qlogis(mean(FGV1b$FG1_val_cat = 2)),

 'VC=3' = qlogis(mean(FGV1b$FG1_val_cat = 3)),

 'VC=4' = qlogis(mean(FGV1b$FG1_val_cat = 4)),

 'VC=5' = qlogis(mean(FGV1b$FG1_val_cat = 5)),

 'VC=6' = qlogis(mean(FGV1b$FG1_val_cat = 6)),

 'VC=7' = qlogis(mean(FGV1b$FG1_val_cat = 7)),

 'VC=8' = qlogis(mean(FGV1b$FG1_val_cat = 8)))

 }

   (t - with(FGV1b, summary(as.numeric(FGV1b$FG1_val_cat) ~ FGV1b$X + FGV1b$Y 
 + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG + 
 FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + 
 FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + 
 FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, fun=sf)))



 #The table displays the (linear) predicted values we would get if we 
 regressed our

 #dependent variable on our predictor variables one at a time, without the 
 parallel slopes

 #assumption. So now, we can run a series of binary logistic regressions with 
 varying cutpoints

 #on the dependent variable to check the equality of coefficients across 
 cutpoints

 par(mfrow=c(1,1))

 plot(t, which=1:8, pch=1:8, xlab='logit', main=' ', xlim=range(s[,7:8]))



 Apologies that I am no statistics expert and perhaps I am missing something 
 obvious here. However, I have spent a long time trying to figure out if there 
 is a problem in how I tested the model assumption and also trying to figure 
 out other ways to run the same 

Re: [R] ggplot facet and subsetting

2014-11-26 Thread Jeff Newmiller
I am not quite sure what you want to achieve here, but you only have one factor 
column so shouldn't you be using facet_wrap(~stroj), perhaps with nrow or ncol 
parameters?
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On November 26, 2014 5:40:07 AM PST, PIKAL Petr petr.pi...@precheza.cz wrote:
Dear all

I encountered strange behaviour of ggplot with combination of facet and
subsetting. I use for creating plots sometimes a for cycle, something
like this

for (i in n:m) { p-ggplot(data, aes(x=x, y=data[,i], colour=f))), ...}

However I found strange result with this combination

This is OK but only in BW
p-ggplot(vec.c, aes(x=fi, y=nad1mi))
p+geom_point(size=5)+geom_line()+facet_grid(.~ p-ggplot(vec.c, aes(x=fi, 
y=nad1mi))
p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) )

this is OK with colour
p-ggplot(vec.c, aes(x=fi, y=nad1mi, colour=as.factor(cas)))
p+geom_point(size=5)+geom_line()+facet_grid(.~stroj)

Here results in facets are mismatched
p-ggplot(vec.c, aes(x=fi, y=vec.c[,2], colour=as.factor(cas)))
p+geom_point(size=5)+geom_line()+facet_grid(.~stroj)

and this is mismatched too
p-ggplot(vec.c, aes(x=fi, y=vec.c[,2]))
p+geom_point(size=5)+geom_line()+facet_grid(.~stroj)

Doeas anybody know what I am doing wrong?

 dput(vec.c)
structure(list(cas = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L,
1L, 2L, 0L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
nad1mi = c(3, 2.7, 0.3, 0.5, 1.9, 5.3, 0.4, 3, 5.4, 0.7,
20.6, 16.7, 16.6, 20.7, 16.1, 15.2, 20.5, 16.4, 14.8, 24.6,
19.3, 15.2, 26.9, 21.3, 20.6, 22.6, 16.3, 15.7, 19.3, 16.5,
15.5, 3.6, 3.4, 5.9, 4.6, 5.4, 4.2, 5.3, 5.6, 5.1, 5), stroj =
structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(mastersizer,
odstredivka, zetasizer), class = factor), fi = c(341L,
341L, 285L, 285L, 401L, 401L, 231L, 231L, 190L, 190L, 341L,
341L, 341L, 285L, 285L, 285L, 401L, 401L, 401L, 231L, 231L,
231L, 190L, 190L, 190L, 167L, 167L, 167L, 161L, 161L, 161L,
341L, 341L, 285L, 285L, 401L, 401L, 231L, 231L, 190L, 190L
)), .Names = c(cas, nad1mi, stroj, fi), class = data.frame,
row.names = c(1L,
2L, 6L, 7L, 11L, 12L, 16L, 17L, 21L, 22L, 26L, 27L, 28L, 32L,
33L, 34L, 38L, 39L, 40L, 44L, 45L, 46L, 50L, 51L, 52L, 56L, 57L,
58L, 62L, 63L, 64L, 68L, 69L, 73L, 74L, 78L, 79L, 83L, 84L, 88L,
89L))


Regards
Petr

 sessionInfo(package = NULL)
R Under development (unstable) (2014-07-16 r66175)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Czech_Czech Republic.1250  LC_CTYPE=Czech_Czech
Republic.1250
[3] LC_MONETARY=Czech_Czech Republic.1250 LC_NUMERIC=C
[5] LC_TIME=Czech_Czech Republic.1250

attached base packages:
[1] stats datasets  utils grDevices graphics  methods   base

other attached packages:
[1] ggplot2_1.0.0   lattice_0.20-29 fun_1.0

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4 digest_0.6.4 grid_3.2.0   gtable_0.1.2
 [5] labeling_0.2 MASS_7.3-33  munsell_0.4.2plyr_1.8.1
 [9] proto_0.3-10 Rcpp_0.11.2  reshape2_1.4 scales_0.2.4
[13] stringr_0.6.2tools_3.2.0



Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
určeny pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho
kopie vymažte ze svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou
modifikacemi či zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky
ze strany příjemce s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
výslovným dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně
zmocněn nebo písemně pověřen a takové pověření 

Re: [R] list.files() not compatible with all Unicode characters; file.exists() is compatible.

2014-11-26 Thread Prof Brian Ripley

On 25/11/2014 06:53, Prof Brian Ripley wrote:

On 25/11/2014 01:25, MacQueen, Don wrote:

Sorry, your email was undecipherable because you sent HTML formatted
email.
Please send plain text



Also, the 'at a minimum' information requested by the posting guide is
essential here (which OS and locale, in particular).  In general file
names not in the locale's encoding are unsupported.


An off-list reply indicated this was Windows XP.  Although the message 
body was unreadable, the gist is in the subject line.


From ?list.files under Windows

  path must specify paths which can be represented in the current
  codepage.

whereas ?file.exists says

  Most of these functions accept UTF-8 filepaths not valid in the
  current locale.

So this is documented behaviour.

[For anyone curious as to why list.files is different: note that it does 
regexp pattern matching.  Adding support for Unicode file paths would 
not be impossible but it would require hundreds of lines of Windows-only 
code.]


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to use ggplot2

2014-11-26 Thread jarod...@libero.it
Dear All!!
I'll try to plot a barplot using aggplot2

head(alt)
  as.factor.data...7..Col ColMat  Fastq  miseq
1  189158158158104
2  190  54272  54272  54272  32122
3  191 301574 301574 301574 152625
4  192 161620 161620 161620 100469
5  193  61263  61263  61263  38109
6  194  83800  83800  83800  40095
 
p- ggplot(data = alt, aes(y = alt[,2]))  +  geom_bar() 

Error : Mapping a variable to y and also using stat=bin.
  With stat=bin, it will attempt to set the y value to the count of cases in 
each group.
  This can result in unexpected behavior and will not be allowed in a future 
version of ggplot2.
  If you want y to represent counts of cases, use stat=bin and don't map a 
variable to y.
  If you want y to represent values in the data, use stat=identity.
  See ?geom_bar for examples. (Defunct; last used in version 0.9.2)
How can resolve this problem?
My data are in column: each columns are conditions and each row rappresnt a 
sample
thanks for your help!
M


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot facet and subsetting

2014-11-26 Thread John Kane
Below

John Kane
Kingston ON Canada
 This is OK but only in BW
 p-ggplot(vec.c, aes(x=fi, y=nad1mi))
 p+geom_point(size=5)+geom_line()+facet_grid(.~stroj)
Perhaps:
p - ggplot(vec.c, aes(x=fi, y=nad1mi, colour = stroj))
p+geom_point(size=5)+geom_line()+facet_grid(.~stroj)


 and this is mismatched too
 p-ggplot(vec.c, aes(x=fi, y=vec.c[,2]))
 p+geom_point(size=5)+geom_line()+facet_grid(.~stroj)

I don'[ understand what you want  here so cannot suggest anything


Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords  protects your account.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to use ggplot2

2014-11-26 Thread John Kane
It is useful to have a reproducable example
https://github.com/hadley/devtools/wiki/Reproducibility
 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

However is this somethingl like what you want?  Note I changed variable names 
and removed caps to make life easier and renamed the dataset to dat1 (just 
handier for me).  I think col as a reserved word should not be used. It 
seemed to be causing a problem.

library(ggplot2)
dat1  -  structure(list(aa = 189:194, bb = c(158L, 54272L, 301574L, 161620L, 
61263L, 83800L), colmat = c(158L, 54272L, 301574L, 161620L, 61263L, 
83800L), fastq = c(158L, 54272L, 301574L, 161620L, 61263L, 83800L
), miseq = c(104L, 32122L, 152625L, 100469L, 38109L, 40095L)), .Names = c(aa, 
bb, colmat, fastq, miseq), class = data.frame, row.names = c(NA, 
-6L))

p1  -  ggplot(dat1, aes( as.factor(aa), y = bb, fill = as.factor(aa))) 
p1  -  p1 + geom_bar(stat = identity)
p1 


John Kane
Kingston ON Canada


 -Original Message-
 From: jarod...@libero.it
 Sent: Wed, 26 Nov 2014 18:04:21 +0100 (CET)
 To: r-help@r-project.org
 Subject: [R] How to use ggplot2
 
 Dear All!!
 I'll try to plot a barplot using aggplot2
 
 head(alt)
   as.factor.data...7..Col ColMat  Fastq  miseq
 1  189158158158104
 2  190  54272  54272  54272  32122
 3  191 301574 301574 301574 152625
 4  192 161620 161620 161620 100469
 5  193  61263  61263  61263  38109
 6  194  83800  83800  83800  40095
 
 p- ggplot(data = alt, aes(y = alt[,2]))  +  geom_bar()
 
 Error : Mapping a variable to y and also using stat=bin.
   With stat=bin, it will attempt to set the y value to the count of
 cases in each group.
   This can result in unexpected behavior and will not be allowed in a
 future version of ggplot2.
   If you want y to represent counts of cases, use stat=bin and don't
 map a variable to y.
   If you want y to represent values in the data, use stat=identity.
   See ?geom_bar for examples. (Defunct; last used in version 0.9.2)
 How can resolve this problem?
 My data are in column: each columns are conditions and each row rappresnt
 a sample
 thanks for your help!
 M
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot.hclust point to older version

2014-11-26 Thread William Dunlap
How disruptive would it be if R were changed so the startup line
   [Previously saved workspace restored]
were changed to show the complete name, from normalizePath(), of the
saved workspace file?  E.g.,
   [Previously saved workspace restored from 'C:\Program Files\R\.RData']

(It is bad enough that the file name starts with a dot so it is hidden from
'ls',
but on Windows lots of people don't know what directory R is starting in.
On
my Windows PC R-3.1.2 starts in C:/Program Files/R, the parent of its RHOME
directory.)


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Nov 26, 2014 at 1:02 AM, Martin Maechler maech...@stat.math.ethz.ch
 wrote:


  Thanks! That worked

 Of course: As in about 99.99% of all cases where Bill Dunlap  helps.


  You probably have a local copy of an old version of plot.hclust or
 plot.dendrogram in your global environmenet or another package that masks
 the one in package:stats.  E.g., I fired up R-2.14.2 and copied those 2
 plot methods to .GlobalEnv and then saved by workspace when quitting R.  I
 then fired up R-3.1.1, which loads the workspace saved by the older version
 of R.  I get:

   objects()
  [1] plot.dendrogram plot.hclust
   plot(hclust(dist(c(2,3,5,7,11,13,17,19
  Error in .Internal(dend.window(n, merge, height, hang, labels, ...)) :
there is no .Internal function 'dend.window'
   traceback()
  2: plot.hclust(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19
  1: plot(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19

  Note how calling traceback() after an error gives more information about
 the source of the error.

  To fix this, get rid of the .RData file that is being loaded when R
 starts.

 In the spirit of the old -- now politically incorrect -- sayings
  `` Real men don't . '''
 I'd like to emphasize my own view that
  Real useRs don't use .RData

 in other words, experienced R users do not let their workspace
 be saved automatically (to '.RData') and hence do not load any
 .RData automatically at startup.

 Consequently, use R with the '--no-save' command line argument
 (maybe also with '--no-restore').

 ESS (Emacs Speaks Statistics) users can put

 (custom-set-variables
  '(inferior-R-args --no-restore-history --no-save )
 )

 into their ~/.emacs
 {and I'd like to see a way to do this easily with RStudio...}

 Martin Maechler,
 ETH Zurich and R Core Team

  Bill Dunlap
  TIBCO Software
  wdunlap tibco.comhttp://tibco.com

  On Tue, Nov 25, 2014 at 12:18 PM, Rolf Turner r.tur...@auckland.ac.nz
 mailto:r.tur...@auckland.ac.nz wrote:
  On 26/11/14 08:53, Michael Mason wrote:
  Here you are. I expect most folks won't get the error.

  N   = 100; M = 1000
  mat = matrix(1:(N*M) + rnorm(N*M,0,.5),N,M)
  h   = hclust(as.dist(1-cor(mat)))
  plot(h)

  Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) :
 there is no .Internal function 'dend.window'



  Thanks again


  On 11/25/14 11:29 AM, Rolf Turner r.tur...@auckland.ac.nzmailto:
 r.tur...@auckland.ac.nz wrote:



  Reproducible example???

  (I know from noddink about hclust, but I tried the example from the help
  page and it plotted without any problem.)

  cheers,

  Rolf Turner

  On 26/11/14 06:13, Michael Mason wrote:
  Hello fellow R users,

  I have recently updated to R 3.1.2. When trying to plot an hclust
  object to generate the dendrogram I get the following error:

  Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) :
  there is no .Internal function 'dend.window'


  I am indeed using R3.1.2 but my understanding is that the .Internal API
  to the C code is no longer used. I have tried detaching the stats
  package and restarting R to no avail.
  I would love any help from any wiser guRus.

  Please keep communications on-list; there are others on the list far
 more likely to be able to help you than I.  I am cc-ing this reply to the
 list.

  For what it's worth, I can run your example without error.

  As to how to track down what is going wrong on your system, I'm afraid I
 have no idea.  Someone on the list may have some thoughts.

  cheers,

  Rolf Turner

  --
  Rolf Turner
  Technical Editor ANZJS

  __
  R-help@r-project.orgmailto:R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  
  --CONFIDENTIALITY NOTICE--: The information contained in this email is
 intended for the exclusive use of the addressee and may contain
 confidential information. If you are not the intended recipient, you are
 hereby notified that any form of dissemination of this communication is
 strictly prohibited. www.benaroyaresearch.org

[[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list
  

Re: [R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

2014-11-26 Thread Rune Haubo
On 26 November 2014 at 17:55, Charlotte Whitham
charlotte.whit...@gmail.com wrote:
 Dear Rune,

 Thank you for your prompt reply and it looks like the ordinal package could 
 be the answer I was looking for!

 If you don't mind, I'd also like to know please what to do if the tests show 
 the proportional odds assumption is NOT met. (Unfortunately I notice effects 
 from almost all variables that breach the proportional odds assumption in my 
 dataset)

That depends almost entirely on the purpose of the analysis and is not
a topic fit for email - consulting a local statistician is probably
sound advice... Yet: With enough data these tests can be sensitive
beyond practical significance; if the 'proportional' part of the
effect explains the majority of the deviance, perhaps the proportional
odds model provides a reasonably good description of the main
structures in the data anyway. On the other hand, if the magnitude
(not significance!) of the non-proportional effects are large, perhaps
a cumulative link model is not the right kind of model structure and
you should be looking at alternative approaches in your analysis.

Cheers,
Rune


 Would you recommend a multinomial logistic model? Or re-scaling of the data?

 Thank you for your time,
 Best wishes,

 Charlie


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot.hclust point to older version

2014-11-26 Thread Jeff Newmiller
Short answer to your question is R files and original data from external 
sources.

I tend to keep my projects in separate directories. I make a core R file that I 
can run from beginning to end using source() to generate my primary analysis 
objects. I then make another file to keep my source() function call in, as well 
as a few exploratory plot commands. Recently I have been also sourcing the 
analysis script in Rmd or Rnw files to knit my observations with the output.

Some people complain that their analysis takes too long to be sourcing it all 
the time. When I have that problem I set up a variable outside my analysis 
script that I test in my analysis script. If the variable indicates it is time 
to recalculate, then I do all of that and then save the data in sn rds or rda 
file. If the variable indicates that I should reuse the cached data, then it 
skips the calculations and just loads the data. This way I always load the 
right libraries along with the data, and I don't accidentally save data that I 
changed outside the analysis script... keeping my results reproducible. (Rds 
files can be convenient if I have several different slow analyses to compare 
and I want to only work on one at a time. I set up one control variable for 
each analysis.)

Some people (smarter than me?) like to build their analysis into an Sweave or 
knitr file. They can then strip out an analysis R file to use the way I have 
described if they choose to do so (literate programming) but I have not 
picked up that habit yet.

The key is keeping a record of how every object that is in your save file was 
originally created. If you tolerate auto saving and loading of the environment 
then you lose that record, and pernicious errors can creep into your 
environment from who knows where, and you might as well be using Excel if that 
is how you work. (Note that this means I hardly ever copy data straight from 
Excel via the clipboard as that is not reproducible. Usually this means Save As 
CSV in Excel to start my R analysis if that is the data source.)
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On November 26, 2014 2:05:54 AM PST, PIKAL Petr petr.pi...@precheza.cz wrote:
Hi

You say

 in other words, experienced R users do not let their workspace be
saved
 automatically (to '.RData') and hence do not load any .RData
 automatically at startup.

I save/load .RData for years without any issues (except of not
installed packages when working on different PCs).I usually keep each
project in separated .RData (and separated folder, together with all
stuff belonging to that project), which prevent to mess things
together.

There is no such warning as do not use .RData in books I have
available.

I wonder how experienced useR keep track of several projects without
using startup loading .RData?
What would you recommend for keeping track of commands and created
objects instead of .RData?

Petr


 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
Martin
 Maechler
 Sent: Wednesday, November 26, 2014 10:03 AM
 To: Michael Mason
 Cc: R help
 Subject: Re: [R] plot.hclust point to older version


  Thanks! That worked

 Of course: As in about 99.99% of all cases where Bill Dunlap  helps.


  You probably have a local copy of an old version of plot.hclust or
 plot.dendrogram in your global environmenet or another package that
 masks the one in package:stats.  E.g., I fired up R-2.14.2 and copied
 those 2 plot methods to .GlobalEnv and then saved by workspace when
 quitting R.  I then fired up R-3.1.1, which loads the workspace saved
 by the older version of R.  I get:

   objects()
  [1] plot.dendrogram plot.hclust
   plot(hclust(dist(c(2,3,5,7,11,13,17,19
  Error in .Internal(dend.window(n, merge, height, hang, labels,
...))
 :
there is no .Internal function 'dend.window'
   traceback()
  2: plot.hclust(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19
  1: plot(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19

  Note how calling traceback() after an error gives more information
 about the source of the error.

  To fix this, get rid of the .RData file that is being loaded when R
 starts.

 In the spirit of the old -- now politically incorrect -- sayings  ``
 Real men don't . '''
 I'd like to emphasize my own view that
  Real useRs don't use .RData

 in other words, experienced R users do not let their workspace be
saved
 automatically (to '.RData') and hence do not load any .RData
 

Re: [R] plot.hclust point to older version

2014-11-26 Thread David Winsemius

On Nov 26, 2014, at 9:49 AM, William Dunlap wrote:

 How disruptive would it be if R were changed so the startup line
   [Previously saved workspace restored]
 were changed to show the complete name, from normalizePath(), of the
 saved workspace file?  E.g.,
   [Previously saved workspace restored from 'C:\Program Files\R\.RData']
 
 (It is bad enough that the file name starts with a dot so it is hidden from
 'ls',
 but on Windows lots of people don't know what directory R is starting in.
 On
 my Windows PC R-3.1.2 starts in C:/Program Files/R, the parent of its RHOME
 directory.)

On the Mac Gui that happens with no effort as well as a message saying where 
the GUI history file resides. I just checked my .Rprofile file to make sure it 
wasn't doing that. I also have a line that prints the data and time:

utils:::timestamp(stamp = Sys.Date() )

Couldn't you just create a template .Rprofile with the appropriate message 
printed to console?

-- 
david.
 
 
 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com
 
 On Wed, Nov 26, 2014 at 1:02 AM, Martin Maechler maech...@stat.math.ethz.ch
 wrote:
 
 
 Thanks! That worked
 
 Of course: As in about 99.99% of all cases where Bill Dunlap  helps.
 
 
 You probably have a local copy of an old version of plot.hclust or
 plot.dendrogram in your global environmenet or another package that masks
 the one in package:stats.  E.g., I fired up R-2.14.2 and copied those 2
 plot methods to .GlobalEnv and then saved by workspace when quitting R.  I
 then fired up R-3.1.1, which loads the workspace saved by the older version
 of R.  I get:
 
 objects()
 [1] plot.dendrogram plot.hclust
 plot(hclust(dist(c(2,3,5,7,11,13,17,19
 Error in .Internal(dend.window(n, merge, height, hang, labels, ...)) :
  there is no .Internal function 'dend.window'
 traceback()
 2: plot.hclust(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19
 1: plot(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19
 
 Note how calling traceback() after an error gives more information about
 the source of the error.
 
 To fix this, get rid of the .RData file that is being loaded when R
 starts.
 
 In the spirit of the old -- now politically incorrect -- sayings
 `` Real men don't . '''
 I'd like to emphasize my own view that
 Real useRs don't use .RData
 
 in other words, experienced R users do not let their workspace
 be saved automatically (to '.RData') and hence do not load any
 .RData automatically at startup.
 
 Consequently, use R with the '--no-save' command line argument
 (maybe also with '--no-restore').
 
 ESS (Emacs Speaks Statistics) users can put
 
 (custom-set-variables
 '(inferior-R-args --no-restore-history --no-save )
 )
 
 into their ~/.emacs
 {and I'd like to see a way to do this easily with RStudio...}
 
 Martin Maechler,
 ETH Zurich and R Core Team
 
 Bill Dunlap
 TIBCO Software
 wdunlap tibco.comhttp://tibco.com
 
 On Tue, Nov 25, 2014 at 12:18 PM, Rolf Turner r.tur...@auckland.ac.nz
 mailto:r.tur...@auckland.ac.nz wrote:
 On 26/11/14 08:53, Michael Mason wrote:
 Here you are. I expect most folks won't get the error.
 
 N   = 100; M = 1000
 mat = matrix(1:(N*M) + rnorm(N*M,0,.5),N,M)
 h   = hclust(as.dist(1-cor(mat)))
 plot(h)
 
 Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) :
   there is no .Internal function 'dend.window'
 
 
 
 Thanks again
 
 
 On 11/25/14 11:29 AM, Rolf Turner r.tur...@auckland.ac.nzmailto:
 r.tur...@auckland.ac.nz wrote:
 
 
 
 Reproducible example???
 
 (I know from noddink about hclust, but I tried the example from the help
 page and it plotted without any problem.)
 
 cheers,
 
 Rolf Turner
 
 On 26/11/14 06:13, Michael Mason wrote:
 Hello fellow R users,
 
 I have recently updated to R 3.1.2. When trying to plot an hclust
 object to generate the dendrogram I get the following error:
 
 Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) :
there is no .Internal function 'dend.window'
 
 
 I am indeed using R3.1.2 but my understanding is that the .Internal API
 to the C code is no longer used. I have tried detaching the stats
 package and restarting R to no avail.
 I would love any help from any wiser guRus.
 
 Please keep communications on-list; there are others on the list far
 more likely to be able to help you than I.  I am cc-ing this reply to the
 list.
 
 For what it's worth, I can run your example without error.
 
 As to how to track down what is going wrong on your system, I'm afraid I
 have no idea.  Someone on the list may have some thoughts.
 
 cheers,
 
 Rolf Turner
 
 --
 Rolf Turner
 Technical Editor ANZJS
 
 __
 R-help@r-project.orgmailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 --CONFIDENTIALITY NOTICE--: The information contained in this email is
 

[R] Using grid.layout inside grid.layout with grid package: naming of the viewports affects plotting

2014-11-26 Thread Helske Satu
R version 3.1.1 (2014-07-10)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] C

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.1.1


I have a plotting function to produce plots with stacked plots (for simplicity, 
here two rectangles).

library(grid)
stackedplot - function(main=){
  top.vp - viewport(
layout=grid.layout(2, 1))
  p1 - viewport(layout.pos.col=1, layout.pos.row=1, name=plot1)
  p2 - viewport(layout.pos.col=1, layout.pos.row=2, name=plot2)
  splot - vpTree(top.vp, vpList(p1,p2))
  pushViewport(splot)
  seekViewport(plot1)
  grid.rect(width=unit(0.9, npc), height=unit(0.9, npc))
  seekViewport(plot2)
  grid.rect(width=unit(0.9, npc), height=unit(0.9, npc))
 }

For creating a 2x2 grid with four stacked plots I tried to use the following 
code:

grid.newpage()
multitop.vp - viewport(layout=grid.layout(2,2))
pl1 - viewport(layout.pos.col=1, layout.pos.row=1, name=A)
pl2 - viewport(layout.pos.col=1, layout.pos.row=2, name=B)
pl3 - viewport(layout.pos.col=2, layout.pos.row=1, name=C)
pl4 - viewport(layout.pos.col=2, layout.pos.row=2, name=D)
vpall - vpTree(multitop.vp, vpList(pl1,pl2,pl3,pl4))
pushViewport(vpall)
seekViewport(A)
stackedplot(main=A)
seekViewport(B)
stackedplot(main=B)
seekViewport(C)
stackedplot(main=C)
seekViewport(D)
stackedplot(main=D)

This does not work as all the plots are plotted in the same cell of the grid 
(viewport A). However, if I plot them in a reversed order, the plots arrange as 
was supposed to: D to D, C to C and so on.

seekViewport(D)
stackedplot(main=D)
seekViewport(C)
stackedplot(main=C)
seekViewport(B)
stackedplot(main=B)
seekViewport(A)
stackedplot(main=A)

I tried with different names and found out that if I plot in reversed 
alphabetical order everything works fine. Once I try to plot in a viewport with 
a name earlier in alphabetical order, all other plots thereafter are plotted in 
the same viewport.

Why is this happening?

Regards,
Satu Helske

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting list to character

2014-11-26 Thread Massimiliano Tripoli
Thanks David,
that's I was looking for.
Thanks to Chel too.

Massimiliano

- Messaggio originale -
Da: David L Carlson dcarl...@tamu.edu
A: Chel Hee Lee chl...@mail.usask.ca, Massimiliano Tripoli 
mtrip...@istat.it, r-help@r-project.org
Inviato: Martedì, 25 novembre 2014 19:40:51
Oggetto: RE: [R] Converting list to character

Or just modify your aggregate() command:

 TAB - aggregate(mydata$CODE, by=list(ID=mydata$ID, 
+YEAR=mydata$YEAR), FUN=paste0, collapse=, )
 TAB
 ID YEAR  x
1   986 2008 GR.3.8
2  1251 2008 GR.3.1, GR.3.8
3  1801 2008 GR.3.8
411 2009 GR.3.7
5   986 2009 GR.3.8
6  1251 2009 GR.3.1, GR.3.8
7  1801 2009 GR.3.8
811 2010 GR.3.7
9   460 2010 GR.3.1
10  986 2010 GR.3.8
11 1251 2010 GR.3.1, GR.3.8
12 1801 2010 GR.3.8
13  460 2011 GR.3.1
14  986 2011 GR.3.8
15 1251 2011 GR.3.1, GR.3.8
16 1801 2011 GR.3.8

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Lee, Chel Hee
Sent: Tuesday, November 25, 2014 11:23 AM
To: Massimiliano Tripoli; r-help@r-project.org
Subject: Re: [R] Converting list to character

  do.call(rbind, TAB$x)
[,1] [,2]
1  GR.3.8 GR.3.8
2  GR.3.1 GR.3.8
4  GR.3.8 GR.3.8
5  GR.3.7 GR.3.7
6  GR.3.8 GR.3.8
7  GR.3.1 GR.3.8
9  GR.3.8 GR.3.8
10 GR.3.7 GR.3.7
11 GR.3.1 GR.3.1
12 GR.3.8 GR.3.8
13 GR.3.1 GR.3.8
15 GR.3.8 GR.3.8
16 GR.3.1 GR.3.1
17 GR.3.8 GR.3.8
18 GR.3.1 GR.3.8
20 GR.3.8 GR.3.8
 

Is this what you are looking for?  I hope this helps.

Chel Hee Lee

On 11/25/2014 6:07 AM, Massimiliano Tripoli wrote:


 Dear all,

 I can't convert the result of aggregate function in a dataframe. My data
 looks like:

 mydata - structure(list(ID = c(11, 11, 460, 460, 986, 986, 986, 986, 1251,
 1251, 1251, 1251, 1251, 1251, 1251, 1251, 1801, 1801, 1801, 1801
 ), YEAR = c(2009, 2010, 2010, 2011, 2008, 2009, 2010, 2011, 2008,
 2008, 2009, 2009, 2010, 2010, 2011, 2011, 2008, 2009, 2010, 2011
 ), Y = c(158126, 153015, 3701, 5880, 718663, 661112, 527233,
 558281, 450, 131714, 427, 124648, 425, 116500, 434, 123853, 17400,
 16493, 8057, 8329), CODE = c(GR.3.7, GR.3.7, GR.3.1, GR.3.1,
 GR.3.8, GR.3.8, GR.3.8, GR.3.8, GR.3.1, GR.3.8, GR.3.1,
 GR.3.8, GR.3.1, GR.3.8, GR.3.1, GR.3.8, GR.3.8, GR.3.8,
 GR.3.8, GR.3.8)), .Names = c(ID, YEAR, Y, CODE), row.names = c(NA,
 20L), class = data.frame)

 and by using aggregate function

 TAB - 
 aggregate(mydata$CODE,by=list(ID=mydata$ID,YEAR=mydata$YEAR),FUN=paste0)

 What I want is a dataframe like of printing TAB:
 TAB
   ID YEAR  x
 1   986 2008 GR.3.8
 2  1251 2008 GR.3.1, GR.3.8
 3  1801 2008 GR.3.8
 411 2009 GR.3.7
 5   986 2009 GR.3.8
 6  1251 2009 GR.3.1, GR.3.8
 7  1801 2009 GR.3.8
 811 2010 GR.3.7
 9   460 2010 GR.3.1
 10  986 2010 GR.3.8
 11 1251 2010 GR.3.1, GR.3.8
 12 1801 2010 GR.3.8
 13  460 2011 GR.3.1
 14  986 2011 GR.3.8
 15 1251 2011 GR.3.1, GR.3.8
 16 1801 2011 GR.3.8

 str(TAB)[1:10]
 'data.frame':16 obs. of  3 variables:
   $ ID  : num  986 1251 1801 11 986 ...
   $ YEAR: num  2008 2008 2008 2009 2009 ...
   $ x   :List of 16
..$ 1 : chr GR.3.8
..$ 2 : chr  GR.3.1 GR.3.8
..$ 4 : chr GR.3.8
..$ 5 : chr GR.3.7
..$ 6 : chr GR.3.8
..$ 7 : chr  GR.3.1 GR.3.8
..$ 9 : chr GR.3.8
..$ 10: chr GR.3.7
..$ 11: chr GR.3.1
..$ 12: chr GR.3.8
..$ 13: chr  GR.3.1 GR.3.8
..$ 15: chr GR.3.8
..$ 16: chr GR.3.1
..$ 17: chr GR.3.8
..$ 18: chr  GR.3.1 GR.3.8
..$ 20: chr GR.3.8
 NULL

 As you can see the x coloumn is a list and I would want to change it to 
 character variable.
 Anyone may help me?
 Thanks,

 Massimiliano


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-- 
Massimiliano Tripoli 
Collaboratore T.E.R. scado il 31/12/2014 
ISTAT - DCCN - Direzione Centrale della Contabilità Nazionale 
U.O. Contabilità dei flussi di materia del sistema economico - CSA/C
Via Depretis, 74/B 00184 Roma 
Tel. 06.4673.3132 
E-mail: mtrip...@istat.it 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rJava Package

2014-11-26 Thread Krishna Bhargava S K
Hi All,

I am a beginner to R. I have installed tried a sample of JRI 
using Rengine and Rserve.
I found normalization and sqrt function in some sample code.
Is there any link where there is a list of functions that is 
provided in R which I can use to process data in java programs.

Regards
KB

LT Technology Services Ltd

www.LntTechservices.comhttp://www.lnttechservices.com/

This Email may contain confidential or privileged information for the intended 
recipient (s). If you are not the intended recipient, please do not use or 
disseminate the information, notify the sender and delete it from your system.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

2014-11-26 Thread Charlotte Whitham
Dear Rune,

Thank you for your prompt reply and it looks like the ordinal package could be 
the answer I was looking for!

If you don't mind, I'd also like to know please what to do if the tests show 
the proportional odds assumption is NOT met. (Unfortunately I notice effects 
from almost all variables that breach the proportional odds assumption in my 
dataset) 

Would you recommend a multinomial logistic model? Or re-scaling of the data?

Thank you for your time,
Best wishes,

Charlie

On 26 Nov 2014, at 14:08, Rune Haubo rune.ha...@gmail.com wrote:

 Dear Charlie,
 
 I admit that I haven't read your email closely, but here is a way to
 test for non-proportional odds using the ordinal package (warning:
 self-promotion) using the wine data set also from the ordinal package.
 There is more information in the package vignettes
 
 Hope this is something you can use.
 Cheers,
 Rune
 
 library(ordinal)
 ## Fit model:
 fm - clm(rating ~ temp + contact, data=wine)
 summary(fm)
 formula: rating ~ temp + contact
 data:wine
 
 link  threshold nobs logLik AICniter max.grad cond.H
 logit flexible  72   -86.49 184.98 6(0)  4.64e-15 2.7e+01
 
 Coefficients:
   Estimate Std. Error z value Pr(|z|)
 tempwarm 2.5031 0.5287   4.735 2.19e-06 ***
 contactyes   1.5278 0.4766   3.205  0.00135 **
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 Threshold coefficients:
Estimate Std. Error z value
 1|2  -1.3444 0.5171  -2.600
 2|3   1.2508 0.4379   2.857
 3|4   3.4669 0.5978   5.800
 4|5   5.0064 0.7309   6.850
 ## Model with non-proportional odds for contact:
 fm2 - clm(rating ~ temp, nominal=~contact, data=wine)
 ## Likelihood ratio test of non-proportional odds:
 anova(fm, fm2)
 Likelihood ratio tests of cumulative link models:
 
formula:nominal: link: threshold:
 fm  rating ~ temp + contact ~1   logit flexible
 fm2 rating ~ temp   ~contact logit flexible
 
no.parAIC  logLik LR.stat df Pr(Chisq)
 fm   6 184.98 -86.492
 fm2  9 190.42 -86.209  0.5667  3  0.904
 ## Automatic tests of non-proportional odds for all varibles:
 nominal_test(fm)
 Tests of nominal effects
 
 formula: rating ~ temp + contact
Df  logLikAICLRT Pr(Chi)
 none -86.492 184.98
 temp 3 -84.904 187.81 3.1750   0.3654
 contact  3 -86.209 190.42 0.5667   0.9040
 
 On 25 November 2014 at 17:21, Charlotte Whitham
 charlotte.whit...@gmail.com wrote:
 Dear list,
 
 I have used the ‘polr’ function in the MASS package to run an ordinal 
 logistic regression for an ordinal categorical response variable with 15 
 continuous explanatory variables.
 I have used the code (shown below) to check that my model meets the 
 proportional odds assumption following advice provided at 
 (http://www.ats.ucla.edu/stat/r/dae/ologit.htm) – which has been extremely 
 helpful, thank you to the authors! However, I’m a little worried about the 
 output implying that not only are the coefficients across various cutpoints 
 similar, but they are exactly the same (see graphic below).
 
 Here is the code I used (and see attached for the output graphic)
 
 FGV1b-data.frame(FG1_val_cat=factor(FGV1b[,FG1_val_cat]),scale(FGV1[,c(X,Y,Slope,Ele,Aspect,Prox_to_for_FG,Prox_to_for_mL,Prox_to_nat_border,Prox_to_village,Prox_to_roads,Prox_to_rivers,Prox_to_waterFG,Prox_to_watermL,Prox_to_core,Prox_to_NR,PCA1,PCA2,PCA3)]))
 
 b-polr(FGV1b$FG1_val_cat ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele + 
 FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL + 
 FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads + 
 FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL + 
 FGV1b$Prox_to_core + FGV1b$Prox_to_NR, data = FGV1b, Hess=TRUE)
 
 #Checking the assumption. So the following code will estimate the values to 
 be graphed. First it shows us #the logit transformations of the 
 probabilities of being greater than or equal to each value of the target 
 #variable
 
 FGV1b$FG1_val_cat-as.numeric(FGV1b$FG1_val_cat)
 
 sf - function(y) {
 
  c('VC=1' = qlogis(mean(FGV1b$FG1_val_cat = 1)),
 
'VC=2' = qlogis(mean(FGV1b$FG1_val_cat = 2)),
 
'VC=3' = qlogis(mean(FGV1b$FG1_val_cat = 3)),
 
'VC=4' = qlogis(mean(FGV1b$FG1_val_cat = 4)),
 
'VC=5' = qlogis(mean(FGV1b$FG1_val_cat = 5)),
 
'VC=6' = qlogis(mean(FGV1b$FG1_val_cat = 6)),
 
'VC=7' = qlogis(mean(FGV1b$FG1_val_cat = 7)),
 
'VC=8' = qlogis(mean(FGV1b$FG1_val_cat = 8)))
 
 }
 
  (t - with(FGV1b, summary(as.numeric(FGV1b$FG1_val_cat) ~ FGV1b$X + FGV1b$Y 
 + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG + 
 FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + 
 FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + 
 FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, fun=sf)))
 
 
 
 #The table displays the (linear) predicted values we would get if we 
 regressed our
 
 #dependent variable on our 

[R] How can I run a TSP program inside R

2014-11-26 Thread Yousri Fanous
I have the following TSP code:

options memory = 6;
options crt;
in 'mydat.tlb' ;
?
? Create 2 new variables
?
age20 = age -20;
lwage = log(wage);
?
?
olsq lwage c f edy tenure age20 pu;

How can I run it inside R?
Where can I get more explanation on how to code for TSP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] Duda sobre cómo analizar un experimento factorial con algoritmos de extracción de características, clustering y clasificación como factores

2014-11-26 Thread Daniel Carrillo Zapata
 Hola Isidro,

 mira, te explico mejor: tengo una base de datos con información de 
10 conductores en un recorrido de 30 minutos en coche. Para cada 
conductor, se le midió parámetros biomédicos como la temperatura 
corporal, su electrocardiograma, etc., durante todo el recorrido; en 
total 22 parámetros.

 Mi objetivo principal es poder determinar, dados dichos parámetros, 
los distintos estados en los que puede estar un conductor a lo largo del 
recorrido. Sin embargo, mi conjunto de datos no está etiquedo, es decir, 
no sé a priori la variable de respuesta, el estado del conductor, para 
cada combinación; tengo que descubrirla.

 Lo que quería hacer es, primero, transformar los parámetros porque 
suele ser recomendado para no tener overfitting y reducir la dimensión 
de los datos. Para ello, quiero probar dos técnicas: ICA y PCA.

 Tras esto, pensaba probar distintos algoritmos de clustering para 
ver cómo agrupan los datos. Con cada uno, puedo obtener la bondad con la 
que asignan un elemento a un cluster con, por ejemplo, el silhouette 
coefficient, o algún otro índice interno/externo. Con cada algoritmo de 
clustering que pruebe, etiquetaré mis datos de entrenamiento 
asignándoles un cluster (que luego más adelante intentaré darle una 
explicación semántica del estado que representa).

 Por cada conjunto resultado (ahora, etiquetado) de aplicar una 
técnica de extracción de características y otro de clustering, quiero 
probar distintos clasificadores, para ver cómo se comportan con esa 
agrupación. Por tanto, obtendré varios errores asociados a clasificación 
porqué haré cross-validation.

 De esta forma, si pruebo 2 algoritmos de extracción de 
características, 3 de clustering y 4 de clasificación, tengo un 
experimento factorial 2x3x4, ¿no?

 Lo que me gustaría obtener posteriormente es la mejor combinación 
de técnica de extracción de características, algoritmo de clustering y 
clasificador, teniendo en cuenta los errores de clasificación y cuán 
bien los algoritmos de clustering agrupan.

 De ahí, mi duda es cómo analizar los resultados, porque había 
pensado aplicar una ANOVA de 3 vías con interacción, pero no sé si es 
correcto. Además, no sé si tendría sentido, porque también quiero tener 
en cuenta la bondad del algoritmo de clustering, no solo los errores de 
clasificación. Es decir, necesitaría analizar las parejas (muestras del 
error de clasificación, bondad del clustering) para cada combinación de 
algoritmo de extracción de características, algoritmo de clustering y 
algoritmo de clasificación.

 Espero que te haya aclarado :)

 Muchas gracias.

 Un saludo,
 DANI


On 26/11/14 01:02, Isidro Hidalgo Arellano wrote:
 Hola, Daniel:
 Quizá deberías ser más explícito porque de la información que 
 suministras yo solo te puedo decir que no veo la relación entre los 3 
 tipos de algoritmos que nombras:
 - un análisis de componentes principales puede ser una fase previa de 
 los otros dos
 - hacer un cluster es un tipo de aprendizaje no supervisado, mientras 
 que un clasificador normalmente es utilizado en aprendizaje 
 supervisado, porque se modeliza conociendo la variable dependiente
 Por ello, no veo cómo montar un ANOVA para analizar 3 procedimientos 
 que a mí me parece que se utilizan para cosas completamente diferentes...
 Me imagino que no he sido de mucha ayuda, pero... ¿por qué no nos 
 dices exactamente que quieres hacer, a ver si te podemos ayudar algo más?
 Un saludo,
 Isidro Hidalgo



  El 25/11/2014, a las 22:09, Daniel Carrillo Zapata escribió:
 
 
 
  Hola compañeros
 
  Soy Daniel Carrillo, y os escribo porque me ha surgido una duda 
 sobre si
  puedo tratar algoritmos de clustering como un factor en un experimento.
  Concretamente, tengo un conjunto de datos sin etiquetar, y quiero 
 probar
  los siguientes algoritmos sobre él:
 
  1) Extracción de características por PCA y por ICA.
  2) Una vez tenga extraídas las características, para cada uno de
  los dos conjuntos transformados quisiera probar 3 diferentes algoritmos
  de clustering: k-medoids, EM y hierachical clustering.
  3) Por último, para cada conjunto etiquetado quisiera probar 4 ó 5
  clasificadores.
 
  Como se puede ver, estoy diseñando un experimento factorial para
  encontrar el mejor clasificador basándome en probar diferentes técnicas
  de extracción de características, clustering y clasificación.
 
  Mi objetivo final es entrenar al mejor clasificador basándome en el
  mejor algoritmo de clustering, de clasificación y de extracción de
  características para que etiquete futuros datos.
 
  Sin embargo, me han surgido dudas de cómo analizar los resultados, y es
  que no sé si se puede aplicar una ANOVA de 3 vías con interacción,
  siendo los 3 factores el algoritmo de extracción de características,
  algoritmo de clustering y algoritmo de clasificación. Mis preguntas por
  tanto son:
 
  1) ¿Tiene sentido aplicar ANOVA de 3 vías con interacción?
  2) Si 

Re: [R-es] Duda sobre cómo analizar un experimento factorial con algoritmos de extracción de características, clustering y clasificación como factores

2014-11-26 Thread Julio Alejandro Di Rienzo
CREO QUE ESTE TIPO DE CONSULTA, EXCEDE EL PROP�SITO DE ESTE FORO.




El mi�rcoles, 26 de noviembre de 2014, Daniel Carrillo Zapata 
daniel.carril...@um.es escribi�:

  Hola Isidro,

  mira, te explico mejor: tengo una base de datos con informaci�n de
 10 conductores en un recorrido de 30 minutos en coche. Para cada
 conductor, se le midi� par�metros biom�dicos como la temperatura
 corporal, su electrocardiograma, etc., durante todo el recorrido; en
 total 22 par�metros.

  Mi objetivo principal es poder determinar, dados dichos par�metros,
 los distintos estados en los que puede estar un conductor a lo largo del
 recorrido. Sin embargo, mi conjunto de datos no est� etiquedo, es decir,
 no s� a priori la variable de respuesta, el estado del conductor, para
 cada combinaci�n; tengo que descubrirla.

  Lo que quer�a hacer es, primero, transformar los par�metros porque
 suele ser recomendado para no tener overfitting y reducir la dimensi�n
 de los datos. Para ello, quiero probar dos t�cnicas: ICA y PCA.

  Tras esto, pensaba probar distintos algoritmos de clustering para
 ver c�mo agrupan los datos. Con cada uno, puedo obtener la bondad con la
 que asignan un elemento a un cluster con, por ejemplo, el silhouette
 coefficient, o alg�n otro �ndice interno/externo. Con cada algoritmo de
 clustering que pruebe, etiquetar� mis datos de entrenamiento
 asign�ndoles un cluster (que luego m�s adelante intentar� darle una
 explicaci�n sem�ntica del estado que representa).

  Por cada conjunto resultado (ahora, etiquetado) de aplicar una
 t�cnica de extracci�n de caracter�sticas y otro de clustering, quiero
 probar distintos clasificadores, para ver c�mo se comportan con esa
 agrupaci�n. Por tanto, obtendr� varios errores asociados a clasificaci�n
 porqu� har� cross-validation.

  De esta forma, si pruebo 2 algoritmos de extracci�n de
 caracter�sticas, 3 de clustering y 4 de clasificaci�n, tengo un
 experimento factorial 2x3x4, �no?

  Lo que me gustar�a obtener posteriormente es la mejor combinaci�n
 de t�cnica de extracci�n de caracter�sticas, algoritmo de clustering y
 clasificador, teniendo en cuenta los errores de clasificaci�n y cu�n
 bien los algoritmos de clustering agrupan.

  De ah�, mi duda es c�mo analizar los resultados, porque hab�a
 pensado aplicar una ANOVA de 3 v�as con interacci�n, pero no s� si es
 correcto. Adem�s, no s� si tendr�a sentido, porque tambi�n quiero tener
 en cuenta la bondad del algoritmo de clustering, no solo los errores de
 clasificaci�n. Es decir, necesitar�a analizar las parejas (muestras del
 error de clasificaci�n, bondad del clustering) para cada combinaci�n de
 algoritmo de extracci�n de caracter�sticas, algoritmo de clustering y
 algoritmo de clasificaci�n.

  Espero que te haya aclarado :)

  Muchas gracias.

  Un saludo,
  DANI


 On 26/11/14 01:02, Isidro Hidalgo Arellano wrote:
  Hola, Daniel:
  Quiz� deber�as ser m�s expl�cito porque de la informaci�n que
  suministras yo solo te puedo decir que no veo la relaci�n entre los 3
  tipos de algoritmos que nombras:
  - un an�lisis de componentes principales puede ser una fase previa de
  los otros dos
  - hacer un cluster es un tipo de aprendizaje no supervisado, mientras
  que un clasificador normalmente es utilizado en aprendizaje
  supervisado, porque se modeliza conociendo la variable dependiente
  Por ello, no veo c�mo montar un ANOVA para analizar 3 procedimientos
  que a m� me parece que se utilizan para cosas completamente diferentes...
  Me imagino que no he sido de mucha ayuda, pero... �por qu� no nos
  dices exactamente que quieres hacer, a ver si te podemos ayudar algo m�s?
  Un saludo,
  Isidro Hidalgo
 
 
 
   El 25/11/2014, a las 22:09, Daniel Carrillo Zapata escribi�:
  
  
  
   Hola compa�eros
  
   Soy Daniel Carrillo, y os escribo porque me ha surgido una duda
  sobre si
   puedo tratar algoritmos de clustering como un factor en un experimento.
   Concretamente, tengo un conjunto de datos sin etiquetar, y quiero
  probar
   los siguientes algoritmos sobre �l:
  
   1) Extracci�n de caracter�sticas por PCA y por ICA.
   2) Una vez tenga extra�das las caracter�sticas, para cada uno de
   los dos conjuntos transformados quisiera probar 3 diferentes algoritmos
   de clustering: k-medoids, EM y hierachical clustering.
   3) Por �ltimo, para cada conjunto etiquetado quisiera probar 4 � 5
   clasificadores.
  
   Como se puede ver, estoy dise�ando un experimento factorial para
   encontrar el mejor clasificador bas�ndome en probar diferentes t�cnicas
   de extracci�n de caracter�sticas, clustering y clasificaci�n.
  
   Mi objetivo final es entrenar al mejor clasificador bas�ndome en el
   mejor algoritmo de clustering, de clasificaci�n y de extracci�n de
   caracter�sticas para que etiquete futuros datos.
  
   Sin embargo, me han surgido dudas de c�mo analizar los resultados, y es
   que no s� si se puede aplicar una ANOVA de 3 v�as con interacci�n,
   siendo 

[R-es] Duda sobre cómo analizar un experimento factorial con algoritmos de extracción de características, clustering y clasificación como factores

2014-11-26 Thread Daniel Carrillo Zapata

Hola compañeros :)

Soy Daniel Carrillo, y os escribo porque me ha surgido una duda sobre si 
puedo tratar algoritmos de clustering como un factor en un experimento. 
Concretamente, tengo un conjunto de datos sin etiquetar, y quiero probar 
los siguientes algoritmos sobre él:


1) Extracción de características por PCA y por ICA.
2) Una vez tenga extraídas las características, para cada uno de 
los dos conjuntos transformados quisiera probar 3 diferentes algoritmos 
de clustering: k-medoids, EM y hierachical clustering.
3) Por último, para cada conjunto etiquetado quisiera probar 4 ó 5 
clasificadores.


Como se puede ver, estoy diseñando un experimento factorial para 
encontrar el mejor clasificador basándome en probar diferentes técnicas 
de extracción de características, clustering y clasificación.


Sin embargo, me han surgido dudas de cómo analizar los resultados, y es 
que no sé si se puede aplicar una ANOVA de 3 vías con interacción, 
siendo los 3 factores el algoritmo de extracción de características, 
algoritmo de clustering y algoritmo de clasificación. Mis preguntas por 
tanto son:


1) ¿Puedo aplicar ANOVA de 3 vías con interacción?
2) Si no, ¿cuál sería la mejor manera de analizar los resultados 
del experimento?


Mis dudas vienen suscitadas por el hecho de que pienso que los 
algoritmos de clasificación son totalmente dependientes del los de 
clustering (que les etiqueta los datos).


Confío en vuestra experiencia para que me aportéis un rayo de luz en esto :)

¡Muchísimas gracias!

Un saludo,
DANI

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Duda sobre cómo analizar un experimento factorial con algoritmos de extracción de características, clustering y clasificación como factores

2014-11-26 Thread eric
Hola Daniel, no te vayas a desanimar, seguro hay foros donde puedes
plantear asuntos mas estadisticos que de R mismo.

Saludos y suerte con todo,

Eric.





On 26/11/14 11:16, DANIEL CARRILLO ZAPATA wrote:
 Hola de nuevo a todos,
 
 me gustaría pediros disculpas por los correos que he enviado. La razón de 
 enviarlos es que pensaba que era también un foro en el que podía plantear 
 cuestiones estadísticas, no solo sobre R en concreto. 
 
 Siempre es importante aprender algo de todo lo que haces, así que lo que me 
 llevo es el conocimiento de que aquí solo puedo plantear cuestiones de 
 implementación en R, y así lo haré de aquí en adelante, puesto que trabajo 
 todos los días con él.
 
 De nuevo, mis más sinceras disculpas si os habéis molestado. Mi intención no 
 era en ningún momento pedir que me hicierais el proyecto, ni mucho menos. 
 Seguiré estudiando más y más cada día para formarme lo más que pueda y que no 
 parezca eso ;)
 
 Gracias a todos!
 
 Un saludo,
 DANI
 
 On 26 November 2014 12:53:32 CET, Jorge I Velez jorgeivanve...@gmail.com 
 wrote:
 Coincido con el Prof. Di Rienzo.

 A proposito, esta consulta me recuerda

 R require(fortunes)
 R fortune('brain')

 I wish to perform brain surgery this afternoon at 4pm and don't know
 where
 to
 start. My background is the history of great statistician sports
 legends
 but I
 am willing to learn. I know there are courses and numerous books on
 brain
 surgery but I don't have the time for those. Please direct me to the
 appropriate HowTos, and be on standby for solving any problem I may
 encounter
 while in the operating room. Some of you might ask for specifics of the
 case,
 but that would require my following the posting guide and spending even
 more
 time than I am already taking to write this note.
   -- I. Ben Fooled (aka Frank Harrell)
  R-help (April 1, 2005)

 Saludos,
 Jorge.-



 2014-11-26 22:34 GMT+11:00 Julio Alejandro Di Rienzo 
 dirienzo.ju...@gmail.com:

 CREO QUE ESTE TIPO DE CONSULTA, EXCEDE EL PROPÓSITO DE ESTE FORO.




 El miércoles, 26 de noviembre de 2014, Daniel Carrillo Zapata 
 daniel.carril...@um.es escribió:

  Hola Isidro,

  mira, te explico mejor: tengo una base de datos con
 información de
 10 conductores en un recorrido de 30 minutos en coche. Para cada
 conductor, se le midió parámetros biomédicos como la temperatura
 corporal, su electrocardiograma, etc., durante todo el recorrido;
 en
 total 22 parámetros.

  Mi objetivo principal es poder determinar, dados dichos
 parámetros,
 los distintos estados en los que puede estar un conductor a lo
 largo del
 recorrido. Sin embargo, mi conjunto de datos no está etiquedo, es
 decir,
 no sé a priori la variable de respuesta, el estado del conductor,
 para
 cada combinación; tengo que descubrirla.

  Lo que quería hacer es, primero, transformar los parámetros
 porque
 suele ser recomendado para no tener overfitting y reducir la
 dimensión
 de los datos. Para ello, quiero probar dos técnicas: ICA y PCA.

  Tras esto, pensaba probar distintos algoritmos de clustering
 para
 ver cómo agrupan los datos. Con cada uno, puedo obtener la bondad
 con la
 que asignan un elemento a un cluster con, por ejemplo, el
 silhouette
 coefficient, o algún otro índice interno/externo. Con cada
 algoritmo de
 clustering que pruebe, etiquetaré mis datos de entrenamiento
 asignándoles un cluster (que luego más adelante intentaré darle una
 explicación semántica del estado que representa).

  Por cada conjunto resultado (ahora, etiquetado) de aplicar una
 técnica de extracción de características y otro de clustering,
 quiero
 probar distintos clasificadores, para ver cómo se comportan con esa
 agrupación. Por tanto, obtendré varios errores asociados a
 clasificación
 porqué haré cross-validation.

  De esta forma, si pruebo 2 algoritmos de extracción de
 características, 3 de clustering y 4 de clasificación, tengo un
 experimento factorial 2x3x4, ¿no?

  Lo que me gustaría obtener posteriormente es la mejor
 combinación
 de técnica de extracción de características, algoritmo de
 clustering y
 clasificador, teniendo en cuenta los errores de clasificación y
 cuán
 bien los algoritmos de clustering agrupan.

  De ahí, mi duda es cómo analizar los resultados, porque había
 pensado aplicar una ANOVA de 3 vías con interacción, pero no sé si
 es
 correcto. Además, no sé si tendría sentido, porque también quiero
 tener
 en cuenta la bondad del algoritmo de clustering, no solo los
 errores de
 clasificación. Es decir, necesitaría analizar las parejas (muestras
 del
 error de clasificación, bondad del clustering) para cada
 combinación de
 algoritmo de extracción de características, algoritmo de clustering
 y
 algoritmo de clasificación.

  Espero que te haya aclarado :)

  Muchas gracias.

  Un saludo,
  DANI


 On 26/11/14 01:02, Isidro Hidalgo Arellano wrote:
 Hola, Daniel:
 Quizá deberías ser más explícito porque de la información que
 suministras 

[R-es] foro http://stats.stackexchange.com

2014-11-26 Thread Marcuzzi, Javier Rubén
Estimados

Separando de una consulta anterior a esta lista de correos (sobre 
estad�stica sin R), y por la pregunta de Rub�n Casal.

Yo supe utilizar http://stats.stackexchange.com , algunas cosas me 
fueron �tiles, buenas ideas, otras estaban con errores, o escrito de 
otra forma, en mi computadora no daba el mismo resultado. La diferencia 
aparte del idioma, es la velocidad, me refiero a que si alguien escribi� 
algo es f�cil el copiar y pegar, comparado a esperar la respuesta por 
correo electr�nico.

Yo comenc� a escribir los trucos que a m� me ayudaron con R, pero algo 
de tiempo, complejidad, organizar ideas y c�digo, ejemplos y ... , 
estad�stica para no estad�sticos, programaci�n para no programadores, 
estad�sticos para otras ciencias, as� es R, mezcla estudiantes a 
profesores de alt�simo nivel.

Javier Marcuzzi

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es