from:"Marc Schwartz"

Re: [R] Conversion of Matlab code to an R code

2015-03-23 Thread Marc Schwartz


 On Mar 23, 2015, at 10:10 AM, Abhinaba Roy abhinabaro...@gmail.com wrote:
 
 Hi,
 
 Can a Matlab code be converted to R code?
 
 I am finding it difficult to do so.
 
 Could you please help me out with it.
 
 Your help will be highly appreciated.
 
 Here comes the Matlab code

snip of code


Hi,

Not do the conversion automatically, certainly.

I don't know that anyone will volunteer here to convert such a large volume of 
code, though I could be wrong of course.

That being said, there are two R/Matlab references that you should leverage, if 
you have not already:

  http://www.math.umaine.edu/~hiebeler/comp/matlabR.pdf

  http://mathesaurus.sourceforge.net/octave-r.html


That might make your job a bit easier.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Superimposing 2 curves on the same graph with par(new=TRUE)

2015-03-23 Thread Marc Schwartz

Hi,

If he wants the two sets of data plotted on the same y axis scale, with the 
range of the y axis adjusted to the data, an alternative to the use of plot() 
and points() is:

  matplot(Date, cbind(MORTSFr, MORTSBu), type = l)


See ?matplot

Regards,

Marc Schwartz


 On Mar 23, 2015, at 12:04 PM, Boris Steipe boris.ste...@utoronto.ca wrote:
 
 ... which is exactly what he shouldn't do because now it the plot falsely 
 asserts that both curves are plotted to the same scale.
 
 
 B.
 
 
 
 On Mar 23, 2015, at 12:34 PM, Clint Bowman cl...@ecy.wa.gov wrote:
 
 Try:
 plot(Date,MORTSBu,lwd=2,lty=dashed,axes=F,xlab=,ylab=)
 
 
 
 Clint Bowman INTERNET:   cl...@ecy.wa.gov
 Air Quality Modeler  INTERNET:   cl...@math.utah.edu
 Department of EcologyVOICE:  (360) 407-6815
 PO Box 47600 FAX:(360) 407-7534
 Olympia, WA 98504-7600
 
   USPS:   PO Box 47600, Olympia, WA 98504-7600
   Parcels:300 Desmond Drive, Lacey, WA 98503-1274
 
 On Mon, 23 Mar 2015, varin sacha wrote:
 
 Dear R-Experts,
 
 I try to superimpose/present 2 curves/plots on the same graph. I would like 
 the result/graph to be readable.
 For that, I use the par(new=TRUE) argument but on the Y-axis there is a 
 superposition of writings and the Y-axis becomes unreadable.
 How can I solve this problem ?
 
 Here is a reproducible example :
 Date-c(1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010)
 
 MORTSFr-c(16445,17671,18113,17043,14738,14355,15028,14283,13229,13603,13672,13547,13527,13021,12737,11388,11947,10742,11497,11476,11215,10483,9900,9568,9019,8891,8541,8444,8918,8487,8079,8160,7655,6058,5593,5318,4709,4620,4275,4273,3992)
 
 MORTSBu-c(838,889,934,946,960,1030,1021,1040,1153,1149,1199,1219,1229,1123,1119,1113,1070,1153,1153,1280,1567,1114,1299,1307,1390,1264,1014,915,1003,1047,1012,1011,959,960,943,957,1043,1006,1061,901,776)
 
 plot(Date,MORTSFr,type=l)
 par(new=TRUE)
 
 plot(Date,MORTSBu,lwd=2,lty=dashed)
 
 Thanks for your time.
 Best,
 S

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installing R on Linux Red Hat Server

2015-03-12 Thread Marc Schwartz


 On Mar 12, 2015, at 3:39 PM, Axel Urbiz axel.ur...@gmail.com wrote:
 
 Hello,
 
 My apologies if this is not the right place to post this question.
 
 I need to get R installed on a Linux Red Hat server. I have very limited
 exposure to R and would appreciate some basic guidance if you could point
 me to resources describing the process, requirements, etc.
 
 Thank you in advance for any help.
 
 Best,
 Axel.


Hi,

Pointers to some references:

1. The EPEL, which is how you would obtain pre-compiled binary RPMs of R. You 
will need to have root permissions on the server in order to do this. Once 
their yum repos are configured on your server, 'sudo yum install R' is 
essentially what you would need.

  https://fedoraproject.org/wiki/EPEL


2. The R Installation and Administration Manual, which will provide some 
guidance, in the Linux section and in the appendices (primarily A) for 
additional items that may be relevant:

  http://cran.r-project.org/manuals.html


3. The R-SIG-Fedora list, which is focused on the use of R on RH and derivative 
(eg. Fedora) Linux distributions. Follow up questions should be posted there, 
ideally after you subscribe, lest you be subject to on-going moderation 
(speaking as a co-moderator of that list).

  https://stat.ethz.ch/mailman/listinfo/r-sig-fedora


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPSS command match files for merging one-to-many (hierarchical) equivalent in R?

2015-03-09 Thread Marc Schwartz


 On Mar 9, 2015, at 1:53 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 
 On 09/03/2015 1:40 PM, Kristina Loderer wrote:
 Dear R community,
 
 to combine data sets of hierarchical, nested nature (i.e., data sets
 linked by, for example, the variable study ID and then also by
 outcome_variable_1 and outcome_variable_2) I can use the match files
 command in SPSS. What is the equivalent command / function in R? Is it
 the merge function, or the match function? The more I read, the more
 confused I become..
 
 
 I don't know SPSS at all, so I can't help you.  If nobody else does, you 
 might try putting together a tiny example in R showing what you're starting 
 with, and what you want to produce.  From what you wrote, I'd guess merge(), 
 not match(), but you might really be asking for something completely 
 different.
 
 Duncan Murdoch


Based upon the info here:

  http://www.ats.ucla.edu/stat/spss/modules/merge.htm

I would go with ?merge, since the desired functionality appears to be a 
relational join operation.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] numbering consecutive rows based on length criteria

2015-03-02 Thread Marc Schwartz

On Mar 2, 2015, at 11:43 AM, Morway, Eric emor...@usgs.gov wrote:
 
 Using this dataset:
 
 dat - read.table(textConnection(daynoRes.QwRes.Q
 1  237074.41 215409.41
 2 2336240.20 164835.16
 3   84855.42 357062.72
 4   76993.48 386326.78
 5   73489.47 307144.09
 6   70246.96  75885.75
 7   69630.09  74054.33
 8   66714.78  70071.80
 9  122296.90  66579.08
 10   63502.71  65811.37
 11   63401.84  64795.12
 12   63387.84  64401.14
 13   63186.10  64163.95
 14   63160.74  63468.25
 15   60471.15  60719.15
 16   58235.63  57655.14
 17   58089.73  58061.34
 18   57846.39  57357.89
 19   57839.42  56495.69
 20   57740.06  56219.97
 21   58068.57  55810.91
 22   58358.34  56437.81
 23   76284.90  73722.92
 24  105138.31 100729.00
 25  147203.03 178079.38
 26  109996.02 13.95
 27   91424.20  87391.56
 28   89065.91  87196.69
 29   86628.74  84809.07
 30   79357.60  77555.62),header=T)
 
 I'm attempting to generate a column that continuously numbers consecutive
 rows where wRes.Q is greater than noRes.Q.  To that end, I've come up with
 the following:
 
 dat$flg - dat$wRes.Qdat$noRes.Q
 dat$cnt - with(dat, ave(integer(length(flg)), flg, FUN=seq_along))
 
 The problem with dat$cnt is that it doesn't start over with 1 when a 'new'
 group of either true or false is encountered.  Thus, row 9's cnt value
 should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc.
 (the desired result is shown below)
 
 In the larger dataset I'm working with (6,000 rows), there are blocks of
 rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100.
 My goal is to plot these blocks of rows as polygons in a time series plot.
 If, for the small example provided, the number of consecutive rows with
 dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows
 satisfying this criteria in this small example are rows 3-8 and 10-15), is
 there a way to add a column that uniquely numbers these blocks of rows? I'd
 like to end up with the following, which shows the correct cnt column and
 a column called plygn that is my ultimate goal:
 
 dat
 # daynoRes.QwRes.Q   flg cnt  plygn
 #   1  237074.41 215409.41 FALSE   1 NA
 #   2 2336240.20 164835.16 FALSE   2 NA
 #   3   84855.42 357062.72  TRUE   1  1
 #   4   76993.48 386326.78  TRUE   2  1
 #   5   73489.47 307144.09  TRUE   3  1
 #   6   70246.96  75885.75  TRUE   4  1
 #   7   69630.09  74054.33  TRUE   5  1
 #   8   66714.78  70071.80  TRUE   6  1
 #   9  122296.90  66579.08 FALSE   1 NA
 #  10   63502.71  65811.37  TRUE   1  2
 #  11   63401.84  64795.12  TRUE   2  2
 #  12   63387.84  64401.14  TRUE   3  2
 #  13   63186.10  64163.95  TRUE   4  2
 #  14   63160.74  63468.25  TRUE   5  2
 #  15   60471.15  60719.15  TRUE   6  2
 #  16   58235.63  57655.14 FALSE   1 NA
 #  17   58089.73  58061.34 FALSE   2 NA
 #  18   57846.39  57357.89 FALSE   3 NA
 #  19   57839.42  56495.69 FALSE   4 NA
 #  20   57740.06  56219.97 FALSE   5 NA
 #  21   58068.57  55810.91 FALSE   6 NA
 #  22   58358.34  56437.81 FALSE   7 NA
 #  23   76284.90  73722.92 FALSE   8 NA
 #  24  105138.31 100729.00 FALSE   9 NA
 #  25  147203.03 178079.38  TRUE   1 NA
 #  26  109996.02 13.95  TRUE   2 NA
 #  27   91424.20  87391.56 FALSE   1 NA
 #  28   89065.91  87196.69 FALSE   2 NA
 #  29   86628.74  84809.07 FALSE   3 NA
 #  30   79357.60  77555.62 FALSE   4 NA
 
 Thanks, Eric


Hi,

See ?rle

 unlist(sapply(rle(with(dat, wRes.Q  noRes.Q))$lengths, seq))
 [1] 1 2 1 2 3 4 5 6 1 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 1 2 3 4

cbind() the result above to your data frame.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert windows source package for Mac use

2015-02-24 Thread Marc Schwartz


 On Feb 24, 2015, at 7:19 AM, Warthog arjarvis.wart...@gmail.com wrote:
 
 Hi, 
 I am on a Mac. 
 Is there a way to convert a Windows source package so it can be installed on 
 a Mac?
 
 I have a package in zip form from a friend who runs Windows. 
 I THINK that it is in compiled format for Windows. 
 The Description says: 
 Built: R 3.1.2 x86_64-w64-mingw32windows 
 
 I tried to convert it to a tgz then Install/Load on Mac R, but I get the 
 error message: 
 Error: package 'package' was built for x86_64-w64-mingw32 
 
 I can run Windows on Parallels Desktop, and the original zip format installs 
 and loads OK. 
 
 I'd prefer to run R on my Mac. 
 Sorry if this is a stupid question: I read the R-exts and it doesn't say if 
 you can or cannot do this. 
 
 Thanks, 
 Alan 


Hi,

Just as an FYI, there is a Mac specific SIG list:

  https://stat.ethz.ch/mailman/listinfo/r-sig-mac

Next, the Windows .zip file is a *binary*, not source package, specifically 
compiled for Windows, as you hint at above. If the package contains any 
C/C++/FORTRAN code, then that code is also compiled for Windows and is not 
portable.

The source package would/should have a .tar.gz extension and you would want 
your friend to provide that version of his/her package, presuming that he/she 
created this package and that it is not otherwise available (eg. from CRAN or a 
third party location). 

If you can get that version of the package, then you may be able to install it 
on OS X, using:

  install.packages(PackageFileName, repos = NULL, type = source)

That presumes that there is no C/C++/FORTRAN code that requires compilation. If 
so, you would also need to install required development related tools which are 
referenced in the R FAQ for OSX and the Installation and Admin manual.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Change error bar length in barplot2

2015-02-17 Thread Marc Schwartz

On Feb 17, 2015, at 10:46 AM, Joule Madinga jmadi...@yahoo.fr wrote:
 
 Hi,I'm new to R.I would like to make a barplot of parasite infection 
 prevalence (with 95% confidence interval) by age group.I have 4 parasite 
 species and 5 age-groups and the example by Marc Schwartz (barplot2) fits 
 very well to my data.However, I would like to plot my own 95%CI (as 
 calculated with my own data) instead of faked 95%CI provided in the 
 example.How can I proceed?
 Thank you in advance. Joule 


Joule,

Please see my reply to your offlist e-mail to me this morning.

It looks like there was a delay in your post here coming through, perhaps as a 
result of moderation.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Database connection query

2015-02-09 Thread Marc Schwartz


 On Feb 9, 2015, at 4:33 AM, Lalitha Kristipati 
 lalitha.kristip...@techmahindra.com wrote:
 
 Hi,
 
 I would like to know when to use drivers and when to use packages to connect 
 to databases in R
 
 Regards,
 Lalitha Kristipati
 Associate Software Engineer


In general, you will need both.

There is more information in the R Data Import/Export manual:

  
http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases

and there is a SIG list for R and DB specific subject matter:

  https://stat.ethz.ch/mailman/listinfo/r-sig-db

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] odbcConnectAccess2007 errors with Access databases on new PC

2015-02-02 Thread Marc Schwartz

On Feb 2, 2015, at 12:00 PM, utz.ryan utz.r...@gmail.com wrote:
 
 Hello,
 
 I've connected R to Microsoft Access databases for years now
 using odbcConnectAccess2007. I recently got a new computer and R is
 absolutely refusing to connect to any Access database with the following
 error message:
 
 Warning messages:
 1: In odbcDriverConnect(con, ...) :
  [RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver
 Manager] Data source name not found and no default driver specified
 2: In odbcDriverConnect(con, ...) : ODBC connection failed
 
 It's definitely not a path name problem-I've checked a dozen times. A few
 things online have mentioned something about 32-bit and 64-bit systems
 causing problems. I've tried opening both the 64-bit and 32-bit versions of
 R with zero luck. My Office is running a 32-bit system.
 
 Is there anything else I can try? I really would hate to lose the ability
 to connect R to my Access databases due to some intractable problem.
 
 Thanks,
 Ryan


Take a look at the RODBC vignette:

  vignette(RODBC)

or

  http://cran.r-project.org/web/packages/RODBC/vignettes/RODBC.pdf

and see the footnote (16) at the bottom of page 22 regarding the creation of 32 
bit DSNs and the following from page 20:

32-bit Windows drivers for Access 2007 and Excel 2007 are bundled with Office 
2007 but can be installed separately via the installer AccessDatabaseEngine.exe 
available from http://www.microsoft.com/en-us/download/details.aspx?id=23734.;


The entire tool chain needs to be of the same architecture. So 32 bit Office, 
32 bit ODBC drivers, 32 bit DSN and 32 bit R.

BTW, as you may be aware, there is a DB SIG list specifically for these types 
of questions:

  https://stat.ethz.ch/mailman/listinfo/r-sig-db

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems installing jpeg package

2015-01-27 Thread Marc Schwartz


 On Jan 27, 2015, at 6:05 AM, Jorge Fernández García jorfeg...@hotmail.com 
 wrote:
 
 I need help installing jpeg package. Simple command install.package(jpeg) 
 produce the following result. My OS is Fedora 21. Thanks in advance for your 
 help.
 
 install.packages(jpeg)
 Installing package into ‘/home/cgg/R/x86_64-redhat-linux-gnu-library/3.1’
 (as ‘lib’ is unspecified)
 trying URL 'http://cran.rstudio.com/src/contrib/jpeg_0.1-8.tar.gz'
 Content type 'application/x-gzip' length 18046 bytes (17 Kb)
 opened URL
 ==
 downloaded 17 Kb
 
 * installing *source* package ‘jpeg’ ...
 ** package ‘jpeg’ successfully unpacked and MD5 sums checked
 ** libs
 gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG  -I/usr/local/include-fpic  
 -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 
 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
 -grecord-gcc-switches  -m64 -mtune=generic  -c read.c -o read.o
 In file included from read.c:1:0:
 rjcommon.h:11:21: fatal error: jpeglib.h: No such file or directory
 #include jpeglib.h
 ^
 compilation terminated.
 /usr/lib64/R/etc/Makeconf:133: recipe for target 'read.o' failed
 make: *** [read.o] Error 1
 ERROR: compilation failed for package ‘jpeg’
 * removing ‘/home/cgg/R/x86_64-redhat-linux-gnu-library/3.1/jpeg’
 Warning in install.packages :
  installation of package ‘jpeg’ had non-zero exit status
 
 The downloaded source packages are in
‘/tmp/RtmpYpmDcb/downloaded_packages’


Hi, 

You are missing the header file jpeglib.h, which is required for compiling the 
package from source.

On Fedora, such files are typically contained in a *-devel RPM, where the '*' 
is the prefix for the Fedora RPM that provides the binary and related files.

Specifically in this case, libjpeg is contained in the libjpeg-turbo RPM, thus 
you need, as root:

  yum install libjpeg-turbo-devel

or 

  sudo yum install libjpeg-turbo-devel

from the CLI. The R Installation and Administration manual covers this in:

  
http://cran.r-project.org/doc/manuals/r-release/R-admin.html#Essential-programs-and-libraries

As an aside, there is a SIG list specifically for R on RH/Fedora based Linux 
distros:

  https://stat.ethz.ch/mailman/listinfo/r-sig-fedora

Regards,

Marc Schwartz


  
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nonmonotonic glm?

2015-01-11 Thread Marc Schwartz


 On Jan 11, 2015, at 4:00 PM, Ben Bolker bbol...@gmail.com wrote:
 
 Stanislav Aggerwal stan.aggerwal at gmail.com writes:
 
 
 I have the following problem.
 DV is binomial p
 IV is quantitative variable that goes from negative to positive values.
 
 The data look like this (need nonproportional font to view):
 
 
  [snip to make gmane happy]
 
 If these data were symmetrical about zero, 
 I could use abs(IV) and do glm(p
 ~ absIV).
 I suppose I could fit two glms, one to positive and one to negative IV
 values. Seems a rather ugly approach.
 
 
 [snip]
 
 
  What's wrong with a GLM with quadratic terms in the predictor variable?
 
 This is perfectly respectable, well-defined, and easy to implement:
 
  glm(y~poly(x,2),family=binomial,data=...)
 
 or   y~x+I(x^2)  or y~poly(x,2,raw=TRUE)
 
 (To complicate things further, this is within-subjects design)
 
 glmer, glmmPQL, glmmML, etc. should all support this just fine.


As an alternative to Ben's recommendation, consider using a piecewise cubic 
spline on the IV. This can be done using glm():

  # splines is part of the Base R distribution
  # I am using 'df = 5' below, but this can be adjusted up or down as may be 
apropos
  require(splines)
  glm(DV ~ ns(IV, df = 5), family = binomial, data = YourDataFrame)


and as Ben's notes, is more generally supported in mixed models.

If this was not mixed model, another logistic regression implementation is in 
Frank's rms package on CRAN, using his lrm() instead of glm() and rcs() instead 
of ns():

# after installing rms from CRAN
require(rms)
lrm(DV ~ rcs(IV, 5), data = YourDataFrame)


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] i need help for var.test()

2015-01-08 Thread Marc Schwartz


 On Jan 8, 2015, at 5:12 AM, sait k sa...@hotmail.de wrote:
 
 Dear Sir or Madam,
 i want to use the var.test() (f.test()) for n samples.
 But in R the var.test() can only used for variances of two samples. In the 
 intruductions stands:  Performs an F test to compare the variances of two 
 samples from normal populations.
 I need a variance test for n samples. It will be great, if you tell me the 
 which test i can use in R for this problem.
 Thank you for the help.
 Yours sincerely,
 Sait Polat


You can take a look at ?bartlett.test (which is listed in the See Also section 
of ?var.test) or perhaps ?fligner.test for a non-parametric method.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to group by then count?

2015-01-06 Thread Marc Schwartz


 On Jan 6, 2015, at 3:29 PM, Monnand monn...@gmail.com wrote:
 
 Thank you, all! Your replies are very useful, especially Don's explanation!
 
 One complaint I have is: the function name (talbe) is really not very
 informative.


Why not? You used the word 'table' in your original post, except as Don noted, 
you were overthinking the problem.

The basic concept is a tabulation of discrete values in a vector, which is a 
basic analytic method.

Using commands like:

  ??table
  ??frequency

would have led you to the table() function, as well as others.

Believe it or not, taking a few minutes to have read/searched An Introduction 
to R, which is the basic R manual, would have led you to the same solution:

  
http://cran.r-project.org/doc/manuals/r-release/R-intro.html#Frequency-tables-from-factors

Regards,

Marc Schwartz


 
 On Sun Jan 04 2015 at 5:03:47 PM MacQueen, Don macque...@llnl.gov wrote:
 
 This seems to me to be a case where thinking in terms of computer
 programming concepts is getting in the way a bit. Approach it as a data
 analysis task; the S language (upon which R is based) is designed in part
 for data analysis so there is a function that does most of the job for you.
 
 (I changed your vector of strings to make the result more easily
 interpreted)
 
 x = c(1, 1, 2, 1, 5, 2,'3','5','5','2','2')
 tmp - table(x)  ## counts the number of appearances of each element
 tmp[tmp==max(tmp)]   ## finds which one occurs most often
 2
 4
 
 Meaning that the element '2' appears 4 times.  The table() function should
 be fast even with long vectors. Here's an example with a vector of length
 1 million:
 
 foo - table( sample(letters, 1e6, replace=TRUE) )
 
 
 One of the seminal books on the S language is John M Chambers' Programming
 with Data -- and I would emphasize the with Data part of that title.
 
 --
 
 Don MacQueen
 
 Lawrence Livermore National Laboratory
 7000 East Ave., L-627
 Livermore, CA 94550
 925-423-1062
 
 
 
 
 
 On 1/4/15, 1:02 AM, Monnand monn...@gmail.com wrote:
 
 Hi all,
 
 I thought this was a very naive problem but I have not found any solution
 which is idiomatic to R.
 
 The problem is like this:
 
 Assuming we have vector of strings:
 x = c(1, 1, 2, 1, 5, 2)
 
 We want to count number of appearance of each string. i.e. in vector x,
 string 1 appears 3 times; 2 appears twice and 5 appears once. Then I
 want to know which string is the majority. In this case, it is 1.
 
 For imperative languages like C, C++ Java and python, I would use a hash
 table to count each strings where keys are the strings and values are the
 number of appearance. For functional languages like clojure, there're
 higher order functions like group-by.
 
 However, for R, I can hardly find a good solution to this simple problem.
 I
 found a hash package, which implements hash table. However, installing a
 package simple for a hash table is really annoying for me. I did find
 aggregate and other functions which operates on data frames. But in my
 case, it is a simple vector. Converting it to a data frame may be not
 desirable. (Or is it?)
 
 Could anyone suggest me an idiomatic way of doing such job in R? I would
 be
 appreciate for your help!
 
 -Monnand

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] move date-values from one line to several lines

2014-12-02 Thread Marc Schwartz


 On Dec 2, 2014, at 9:29 AM, Matthias Weber matthias.we...@fntsoftware.com 
 wrote:
 
 Hello together,
 
 i have a data.frame with date-values. What I want is a data.frame with a 
 several lines for each date.
 
 My current data.frame looks like this one:
 
 ID FROM TOREASON
 1  2015-02-27   2015-02-28Holiday
 1  2015-03-15   2015-03-20Illness
 2  2015-05-20   2015-02-23Holiday
 2  2015-06-01   2015-06-03Holiday
 2  2015-07-01   2015-07-01Illness
 
 The result looks like this one:
 
 ID   DATE   REASON
 12015-02-27Holiday
 12015-02-28Holiday
 12015-03-15Illness
 12015-03-16Illness
 12015-03-17Illness
 12015-03-18Illness
 12015-03-19Illness
 12015-03-20Illness
 22015-05-20   Holiday
 22015-05-21   Holiday
 22015-05-22   Holiday
 22015-05-23   Holiday
 22015-06-01   Holiday
 22015-06-02   Holiday
 22015-06-02   Holiday
 22015-07-01   Illness
 
 Maybe anyone can help me, how I can do this.
 
 Thank you.
 
 Best regards.
 
 Mat


A quick and dirty approach.

First, note that in your source data frame, the TO value in the third row is 
incorrect. I changed it here:

 DF
  ID   FROM TO  REASON
1  1 2015-02-27 2015-02-28 Holiday
2  1 2015-03-15 2015-03-20 Illness
3  2 2015-05-20 2015-05-23 Holiday
4  2 2015-06-01 2015-06-03 Holiday
5  2 2015-07-01 2015-07-01 Illness

With that in place, you can use R's recycling of values to create multiple data 
frame rows from the date sequences and the single ID and REASON entries:

i - 1

 data.frame(ID = DF$ID[i], DATE = seq(DF$FROM[i], DF$TO[i], by = day), 
 REASON = DF$REASON[i])
  ID   DATE  REASON
1  1 2015-02-27 Holiday
2  1 2015-02-28 Holiday


So just put that into an lapply() based loop, which returns a list:

 DF.TMP - lapply(seq(nrow(DF)), 
   function(i) data.frame(ID = DF$ID[i], 
  DATE = seq(DF$FROM[i], DF$TO[i], by = 
day), 
  REASON = DF$REASON[i]))

 DF.TMP
[[1]]
  ID   DATE  REASON
1  1 2015-02-27 Holiday
2  1 2015-02-28 Holiday

[[2]]
  ID   DATE  REASON
1  1 2015-03-15 Illness
2  1 2015-03-16 Illness
3  1 2015-03-17 Illness
4  1 2015-03-18 Illness
5  1 2015-03-19 Illness
6  1 2015-03-20 Illness

[[3]]
  ID   DATE  REASON
1  2 2015-05-20 Holiday
2  2 2015-05-21 Holiday
3  2 2015-05-22 Holiday
4  2 2015-05-23 Holiday

[[4]]
  ID   DATE  REASON
1  2 2015-06-01 Holiday
2  2 2015-06-02 Holiday
3  2 2015-06-03 Holiday

[[5]]
  ID   DATE  REASON
1  2 2015-07-01 Illness


Then use do.call() on the result:

 do.call(rbind, DF.TMP)
   ID   DATE  REASON
1   1 2015-02-27 Holiday
2   1 2015-02-28 Holiday
3   1 2015-03-15 Illness
4   1 2015-03-16 Illness
5   1 2015-03-17 Illness
6   1 2015-03-18 Illness
7   1 2015-03-19 Illness
8   1 2015-03-20 Illness
9   2 2015-05-20 Holiday
10  2 2015-05-21 Holiday
11  2 2015-05-22 Holiday
12  2 2015-05-23 Holiday
13  2 2015-06-01 Holiday
14  2 2015-06-02 Holiday
15  2 2015-06-03 Holiday
16  2 2015-07-01 Illness


See ?seq.Date for the critical step.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] install R without texlive dependencies

2014-11-20 Thread Marc Schwartz


 On Nov 20, 2014, at 8:00 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 
 On 20/11/2014 3:31 AM, Muraki Kazutaka wrote:
 Hi all
 I'm trying install R from EPEL repo on Scientific Linux. It's going
 together texlive rpm dependencies also from repo, but I already have
 installed TexLive with tlmgr from CTAN mirror and I don't want texlive
 rpms from linux repo.
 So... Question is... How can I install R without these deps and at the
 same time work in without any problems in R and TexLive.
 
 You might get an answer to your question here, but this is more a question 
 about your Linux distribution, and you will likely have better luck asking on 
 a forum dedicated to it.  R should be perfectly happy working with the 
 TexLive you already have.
 
 Duncan Murdoch


As Duncan noted, since this is Linux distro specific, you would be better off 
posting to R-SIG-Fedora, which is for R on RH based distros:

  https://stat.ethz.ch/mailman/listinfo/r-sig-fedora

A number of the R related RH/Fedora package maintainers monitor that list and 
you can discuss the nuances of some of the dependencies for R there.

That being said and while it has been a number of years for me on Linux, a 
Google search did not turn up any CLI options for 'yum' to be able to install 
without the hard coded RPM dependencies.

However, there would be an option using 'rpm' at the command line, along the 
lines of:

  rpm -ivh --nodeps RPMName.rpm

where you can either download the R RPM and install it locally, or include the 
full URL to the RPM on the EPEL server.

Of course, the above incantation can leave you without other needed 
dependencies, so use with caution.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adapt sweave function to produce an automatic pdf.

2014-11-20 Thread Marc Schwartz


 On Nov 20, 2014, at 4:03 AM, Frederic Ntirenganya ntfr...@gmail.com wrote:
 
 Hi All,
 
 I want to make a climate method (sweave_function). The aim is to be able
 to adapt sweave code that produces an automatic pdf so that it can works
 for my climate object. i.e. instead of compile pdf, I call :
 data_obj$sweave_function() and get the pdf. for instance I have
 boxplot_method() and I want to output it in this way. Thanks for the help!!!
 
 Ex: This how I started but I don't understand how I can proceed.
 
 climate$methods(sweave_function = function(climate_data_objs_str,
 climate_data_objs_ind
 ) {
 
 #-
  #  This function returns the pdf using sweave function in R
  #  The required arguments are:
  #   climate_data_objs_str  : list of the names of climate data objects in
 the climate object
  #   climate_data_objs_ind  : list of the indices of climate data object
 in the climate object
  #   note that one of the above arguments is enough.
 
 #
 
 
  # get_climate_data_objects returns a list of the climate_data objects
 specified
  # in the arguements.
  # If no objects specified then all climate_data objects will be taken by
 default
 
  climate_data_objs = get_climate_data_objects(climate_data_objs_str,
 climate_data_objs_ind)
 })


You need to either dynamically generate the entire final .tex file including 
the preamble and so forth via your function(s)

or:

alternatively use a .Rnw Sweave master file template that contains the static 
content and then in that file, insert the dynamic content using one or more 
directives along the lines of \input{InsertContentHere.tex}, where 
InsertContentHere.tex contains raw TeX content to be inserted at that point in 
the master file when processed by Sweave. 

The content of the *.tex files can be generated via various R functions 
including ?cat for raw text and others like the xtable() function in the CRAN 
package of the same name or the latex() function in Hmisc, which can be used to 
create tables, etc.

Note that the content of InsertContentHere.tex will not itself be processed by 
Sweave, as if it was a child .Rnw file. If you want that type of 
functionality, you would need to use \SweaveInput{InsertContentHere.Rnw}.

Once you have your final .tex file, you can then run pdflatex on that file via 
?system.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sweave package for R version 3.02

2014-11-11 Thread Marc Schwartz


 On Nov 11, 2014, at 5:25 AM, Frederic Ntirenganya ntfr...@gmail.com wrote:
 
 Hi All,
 
 I would like to install the package sweave but got the following warning:
 
 install.packages(sweave)
 Installing package into ‘/home/fredo/R/x86_64-pc-linux-gnu-library/3.0’
 (as ‘lib’ is unspecified)
 Warning in install.packages :
  package ‘sweave’ is not available (for R version 3.0.2)
 
 I tryied to download it's zip file but not get it.
 
 Anyone help me on how I can do it. Thanks
 
 Regrads,
 Frederic.


Sweave is part of the 'utils' package, which is a part of base R and does not 
need to be installed, it already is.

BTW, 3.1.2 is the current version of R. 3.0.2 is over a year old already.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inverse Student t-value

2014-09-30 Thread Marc Schwartz

FWIW, I get:

  4.117456652 in Excel 2011 on OS X

and:

  4.117457 in R 3.1.1 on OS X

There is a KB article on the TINV function here, suggesting that the threshold 
for the iterative algorithm in Excel has been tightened in recent versions:

  http://support.microsoft.com/kb/828340

Regards,

Marc Schwartz

On Sep 30, 2014, at 1:49 PM, jlu...@ria.buffalo.edu wrote:

 My Excel (2013) returns exactly what R does.   I used both T.INV and 
 T.INV.T2There is no TINV.  Has Excel been updated?
 
 
 
 
 
 Duncan Murdoch murdoch.dun...@gmail.com 
 Sent by: r-help-boun...@r-project.org
 09/30/2014 02:36 PM
 
 To
 Andre geomodel...@gmail.com, 
 cc
 r-help@r-project.org
 Subject
 Re: [R] Inverse Student t-value
 
 
 
 
 
 
 On 30/09/2014 2:26 PM, Andre wrote:
 Hi Duncan,
 
 Actually, I am trying trace the formula for the Critical value of Z 
 and manual formula is 
 =(I7-1)/SQRT(I7)*SQRT((TINV(0.05/I7,I7-2))^2/(I7-2+TINV(0.05/I7,I7-2)))
 
 So, I got new problem for TINV formula. I just need a manual equation 
 for TINV.
 
 Sorry, can't help.  I'm not sure I understand what you want, but if it's 
 a simple formula for quantiles of the t distribution, it doesn't exist.
 
 Duncan Murdoch
 
 
 Hope solve this problem.
 
 Cheers!
 
 
 On Wed, Oct 1, 2014 at 1:20 AM, Duncan Murdoch 
 murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com wrote:
 
On 30/09/2014 2:11 PM, Andre wrote:
 
Hi Duncan,
 
No, that's correct. Actually, I have data set below;
 
 
Then it seems Excel is worse than I would have expected.  I
confirmed R's value in two other pieces of software,
OpenOffice and some software I wrote a long time ago based on an
algorithm published in 1977 in Applied Statistics. (They are
probably all using the same algorithm.  I wonder what Excel is 
 doing?)
 
N= 1223
alpha= 0.05
 
Then
probability= 0.05/1223=0.408831
degree of freedom= 1223-2= 1221
 
So, TINV(0.408831,1221) returns 4.0891672
 
 
Could you show me more detail a manual equation. I really
appreciate it if you may give more detail.
 
 
I already gave you the expression:  abs(qt(0.408831/2,
df=1221)). For more detail, I suppose you could look at the help
page for the qt function, using help(qt).
 
Duncan Murdoch
 
 
Cheers!
 
 
On Wed, Oct 1, 2014 at 1:01 AM, Duncan Murdoch
murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com
mailto:murdoch.dun...@gmail.com
mailto:murdoch.dun...@gmail.com wrote:
 
On 30/09/2014 1:31 PM, Andre wrote:
 
Dear Sir/Madam,
 
I am trying to use calculation for two-tailed inverse
of the
student`s
t-distribution function presented by Excel functions 
 like
=TINV(probability, deg_freedom).
 
For instance: The Excel function
=TINV(0.408831,1221) = returns
  4.0891672.
 
Would you like to show me a manual calculation for this?
 
Appreciate your helps in advance.
 
 
That number looks pretty far off the true value. Have you
got a
typo in your example?
 
You can compute the answer to your question as
abs(qt(0.408831/2, df=1221)), but you'll get 4.117.
 
Duncan Murdoch
 
 
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error : '.path.package' is defunct.

2014-09-25 Thread Marc Schwartz

On Sep 25, 2014, at 1:02 PM, Yuan, Rebecca rebecca.y...@bankofamerica.com 
wrote:

 Hello all,
 
 After this reinstallation of R 3.1.1 and Rstudio 0.98.1028, I have the 
 following error messages whenever I tried to load a library to it:
 
 library('zoo')
 Error : '.path.package' is defunct.
 Use 'path.package' instead.
 See help(Defunct)
 
 Attaching package: 'zoo'
 
 The following objects are masked from 'package:base':
 
as.Date, as.Date.numeric
 
 Could you please help on this?
 
 Thanks!
 
 Rebecca


Check the version of 'zoo' that you have installed by using:

  library(help = zoo)

More than likely, you have an old version of the package installed (current is 
1.7-11) that still uses the now defunct function, hence the error message.

You can run:

  update.packages(checkBuilt = TRUE)

to update all of your installed packages and be sure that they are built for 
the current version of R that you now have running.

There may be other nuances here, such as OS, having Admin access and where the 
CRAN packages are installed, but at least checking the version of zoo will be a 
good staring point.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to combine character month and year columns into one column

2014-09-23 Thread Marc Schwartz

On Sep 23, 2014, at 10:41 AM, Kuma Raj pollar...@gmail.com wrote:

 Dear R users,
 
 I have a data with  month and year columns which are both characters
 and wanted to create a new column like Jan-1999
 with the following code. The result is all NA for the month part. What
 is wrong with the and what is the right way to combine the two?
 
 ddf$MonthDay - paste(month.abb[ddf$month], ddf$Year, sep=- )
 
 
 Thanks
 
 dput(ddf)
 structure(list(month = c(01, 02, 03, 04, 05, 06,
 07, 08, 09, 10, 11, 12), Year = c(1999, 1999,
 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999,
 1999, 1999), views = c(42, 49, 44, 38, 37, 35, 38, 39, 38,
 39, 38, 46), MonthDay = c(NA-1999, NA-1999, NA-1999, NA-1999,
 NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999,
 NA-1999, NA-1999)), .Names = c(month, Year, views,
 MonthDay), row.names = 109:120, class = data.frame)
 
 



Since you are trying to use ddf$month as an index into month.abb, you will 
either need to coerce ddf$month to numeric in your code, or adjust how the data 
frame is created.

In the case of the former approach:

 paste(month.abb[as.numeric(ddf$month)], ddf$Year, sep=- )
 [1] Jan-1999 Feb-1999 Mar-1999 Apr-1999 May-1999 Jun-1999
 [7] Jul-1999 Aug-1999 Sep-1999 Oct-1999 Nov-1999 Dec-1999


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to combine character month and year columns into one column

2014-09-23 Thread Marc Schwartz

Two things:

1. You need to convert the result of the paste() to a Date related class.

2. R's standard Date classes require a full date, so you would have to add in 
some default day of the month:

See ?as.Date

NewDate - as.Date(paste(month.abb[as.numeric(ddf$month)], 01, ddf$Year, 
sep=-), 
   format = %b-%d-%Y)

or without using month.abb, which is not really needed. Note the difference in 
the format argument:

NewDate - as.Date(paste(as.numeric(ddf$month), 01, ddf$Year, sep=-), 
   format = %m-%d-%Y)

 class(NewDate)
[1] Date

 str(NewDate)
 Date[1:12], format: 1999-01-01 1999-02-01 1999-03-01 1999-04-01 ...


You can then format the output of NewDate as you might require:

 format(NewDate, format = %b-%d-%Y)
 [1] Jan-01-1999 Feb-01-1999 Mar-01-1999 Apr-01-1999
 [5] May-01-1999 Jun-01-1999 Jul-01-1999 Aug-01-1999
 [9] Sep-01-1999 Oct-01-1999 Nov-01-1999 Dec-01-1999


Note that the output of the last step is a character vector:

 str(format(NewDate, format = %b-%d-%Y))
 chr [1:12] Jan-01-1999 Feb-01-1999 Mar-01-1999 ...

which is fine for formatting/printing, even though NewDate is a Date class 
object.


Alternatively, I believe that Gabor's 'zoo' package on CRAN has a 'yearmon' 
class for this type of partial date.

Regards,

Marc


On Sep 23, 2014, at 12:04 PM, Kuma Raj pollar...@gmail.com wrote:

 Many thanks for your quick answer which has created what I wished. May
 I ask followup question on the same issue. I failed to convert the new
 column into date format with this code. The class of MonthDay is still
 character
 
 df$MonthDay - format(df$MonthDay, format=c(%b %Y))
 I would appreciate if you could suggest a working solution
 Thanks
 
 
 On 23 September 2014 18:03, Marc Schwartz marc_schwa...@me.com wrote:
 On Sep 23, 2014, at 10:41 AM, Kuma Raj pollar...@gmail.com wrote:
 
 Dear R users,
 
 I have a data with  month and year columns which are both characters
 and wanted to create a new column like Jan-1999
 with the following code. The result is all NA for the month part. What
 is wrong with the and what is the right way to combine the two?
 
 ddf$MonthDay - paste(month.abb[ddf$month], ddf$Year, sep=- )
 
 
 Thanks
 
 dput(ddf)
 structure(list(month = c(01, 02, 03, 04, 05, 06,
 07, 08, 09, 10, 11, 12), Year = c(1999, 1999,
 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999,
 1999, 1999), views = c(42, 49, 44, 38, 37, 35, 38, 39, 38,
 39, 38, 46), MonthDay = c(NA-1999, NA-1999, NA-1999, NA-1999,
 NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999,
 NA-1999, NA-1999)), .Names = c(month, Year, views,
 MonthDay), row.names = 109:120, class = data.frame)
 
 
 
 
 
 Since you are trying to use ddf$month as an index into month.abb, you will 
 either need to coerce ddf$month to numeric in your code, or adjust how the 
 data frame is created.
 
 In the case of the former approach:
 
 paste(month.abb[as.numeric(ddf$month)], ddf$Year, sep=- )
 [1] Jan-1999 Feb-1999 Mar-1999 Apr-1999 May-1999 Jun-1999
 [7] Jul-1999 Aug-1999 Sep-1999 Oct-1999 Nov-1999 Dec-1999
 
 
 Regards,
 
 Marc Schwartz
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to combine character month and year columns into one column

2014-09-23 Thread Marc Schwartz

Hi David,

My initial reaction (not that the decision is mine to make), is that from a 
technical perspective, obviously indexing by name is common.

There are two considerations, off the top of my head:

1. There would be a difference, of course, between:

 month.abb[1]
NA 
  NA 

and

 month.abb[01]
   01 
Jan 


Thus, is this approach overly fragile and potentially going to create more 
problems (bugs, head scratching, etc.) than it solves.


2. From a consistency standpoint, I don't see an indication that other built-in 
constants have similar name attributes, not that I did an exhaustive review. So 
I suspect that if there were reasonable justification for it here, it would 
also need to at least be considered for other constants, which increases the 
scope of work a good bit.


If there is a desire for this, one could file an RFE at 
https://bugs.r-project.org to gauge the reactions from R Core, unless they 
comment here first.

Regards,

Marc


On Sep 23, 2014, at 12:47 PM, David Winsemius dwinsem...@comcast.net wrote:

 Marc;
 
 Feature request:
 
 Would it make sense to construct month.abb as a named vector so that the 
 operation that was attempted would have succeeded? Adding alphanumeric names 
 c(01, 02, 03, 04, 05, 06,
 07, 08, 09, 10, 11, 12) would allow character extraction from 
 substring or regex extracted month values which are always character-class.
 
 Example:
 
 names(month.abb) - c(01, 02, 03, 04, 05, 06,
 + 07, 08, 09, 10, 11, 12)
 month.abb
   010203040506070809101112 
 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
 
 
 month.abb[ substr(Sys.Date(), 6,7) ]
   09 
 Sep 
 
 -- 
 David.
 
 On Sep 23, 2014, at 9:03 AM, Marc Schwartz wrote:
 
 On Sep 23, 2014, at 10:41 AM, Kuma Raj pollar...@gmail.com wrote:
 
 Dear R users,
 
 I have a data with  month and year columns which are both characters
 and wanted to create a new column like Jan-1999
 with the following code. The result is all NA for the month part. What
 is wrong with the and what is the right way to combine the two?
 
 ddf$MonthDay - paste(month.abb[ddf$month], ddf$Year, sep=- )
 
 
 Thanks
 
 dput(ddf)
 structure(list(month = c(01, 02, 03, 04, 05, 06,
 07, 08, 09, 10, 11, 12), Year = c(1999, 1999,
 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999,
 1999, 1999), views = c(42, 49, 44, 38, 37, 35, 38, 39, 38,
 39, 38, 46), MonthDay = c(NA-1999, NA-1999, NA-1999, NA-1999,
 NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999,
 NA-1999, NA-1999)), .Names = c(month, Year, views,
 MonthDay), row.names = 109:120, class = data.frame)
 
 
 
 
 
 Since you are trying to use ddf$month as an index into month.abb, you will 
 either need to coerce ddf$month to numeric in your code, or adjust how the 
 data frame is created.
 
 In the case of the former approach:
 
 paste(month.abb[as.numeric(ddf$month)], ddf$Year, sep=- )
 [1] Jan-1999 Feb-1999 Mar-1999 Apr-1999 May-1999 Jun-1999
 [7] Jul-1999 Aug-1999 Sep-1999 Oct-1999 Nov-1999 Dec-1999
 
 
 Regards,
 
 Marc Schwartz
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius
 Alameda, CA, USA
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] To Add a variable from Df1 to Df2 which have a same common variable

2014-09-19 Thread Marc Schwartz

 Cadre, 61-Homme-Non Cadre,
 61-Homme-Non Cadre, 61-Homme-Non Cadre, 61-Homme-Non Cadre,
 61-Homme-Non Cadre, 61-Homme-Non Cadre, 62-Femme-Non Cadre,
 62-Femme-Non Cadre, 62-Femme-Non Cadre, 62-Femme-Non Cadre,
 62-Femme-Non Cadre, 62-Femme-Non Cadre, 62-Homme-Non Cadre,
 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre,
 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre,
 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre,
 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre,
 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre,
 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre,
 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre,
 62-Homme-Non Cadre, 63-Femme-Non Cadre, 63-Femme-Non Cadre,
 63-Femme-Non Cadre, 63-Femme-Non Cadre, 63-Femme-Non Cadre,
 63-Femme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre,
 63-Homme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre,
 63-Homme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre,
 63-Homme-Non Cadre, 63-Homme-Non Cadre, 64-Femme-Non Cadre,
 64-Femme-Non Cadre, 64-Femme-Non Cadre, 64-Femme-Non Cadre,
 64-Femme-Non Cadre, 64-Femme-Non Cadre, 64-Femme-Non Cadre,
 64-Femme-Non Cadre, 64-Femme-Non Cadre, 64-Homme-Non Cadre,
 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre,
 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre,
 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre,
 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre,
 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre,
 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre,
 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre,
 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre,
 65-Homme-Non Cadre, 65-Homme-Non Cadre, 66-Femme-Non Cadre,
 66-Femme-Non Cadre, 66-Femme-Non Cadre, 66-Homme-Non Cadre,
 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre,
 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre,
 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre,
 66-Homme-Non Cadre, 67-Homme-Non Cadre, 67-Homme-Non Cadre,
 67-Homme-Non Cadre, 68-Homme-Non Cadre, 68-Homme-Non Cadre,
 68-Homme-Non Cadre, 68-Homme-Non Cadre, 68-Homme-Non Cadre,
 68-Homme-Non Cadre, 69-Homme-Non Cadre, 69-Homme-Non Cadre
 )), .Names = c(Matricule, AgeSexeCadNCad), class = data.frame, 
 row.names = c(37,
 58, 79, 104, 163, 220, 263, 276, 333, 422,
 442, 587, 653, 684, 21, 25, 35, 42, 45, 47,
 73, 76, 93, 100, 118, 133, 137, 138, 158, 174,
 176, 179, 204, 208, 231, 249, 254, 312, 325,
 439, 491, 500, 825, 928, 954, 1093, 1116, 1128,
 1136, 1141, 1143, 1212, 1232, 1270, 1396, 14,
 56, 66, 106, 148, 153, 226, 308, 717, 720,
 1046, 1287, 36, 41, 54, 124, 144, 188, 197,
 198, 201, 206, 242, 262, 377, 598, 611, 633,
 683, 714, 742, 919, 980, 993, 1000, 1071, 1073,
 1127, 1223, 32, 121, 456, 458, 462, 1013, 27,
 43, 53, 59, 65, 67, 75, 77, 83, 97, 103,
 107, 109, 110, 328, 412, 516, 698, 715, 740,
 1122, 1267, 1824, 16, 452, 540, 557, 870, 1086,
 5, 82, 94, 115, 123, 209, 339, 341, 862, 2211,
 20, 61, 152, 358, 685, 760, 803, , 1134,
 11, 22, 33, 49, 92, 193, 241, 394, 396, 463,
 522, 595, 896, 1097, 1129, 1302, 7, 9, 18,
 26, 81, 85, 185, 728, 884, 1029, 1155, 297,
 479, 842, 3, 13, 15, 23, 51, 55, 63, 199,
 574, 655, 1119, 48, 668, 1125, 1, 6, 10, 24,
 40, 154, 2, 117))


Hi,

Thanks for including data.

See ?merge, which performs an SQL-like join.

Since you have non-matching values between Df1 and Df2, you will need to decide 
if you want non-matching rows included in the resultant data frame or not (eg. 
a right/left outer or inner join). See the all, all.x and all.y arguments to 
merge(). 

The default (all = FALSE) is an inner join on matching rows only:

  Df3 - merge(Df1, Df2, by = AgeSexeCadNCad)

If you include non-matching values in the resultant data frame (eg. all = 
TRUE), Pourcent will contains NA's in those rows.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot

2014-09-19 Thread Marc Schwartz

On Sep 19, 2014, at 10:48 AM, IZHAK shabsogh ishaqb...@yahoo.com wrote:

 Hi,
 kindly give me some guide on how to plot the following data in a single line 
 graph that is ( y1,y2,y3,y4 against x) including title and key
 
 y1-c(0.84,1.03,0.96)
 y2-c(1.30,1.46,1.48)
 y3-c(1.32,1.47,1.5)
 y4-c(0.07,0.07,0.07)
 x-c(500,1000,2000)
 
 Thanks
 Ishaq


See ?matplot and ?legend

matplot(x, cbind(y1, y2, y3, y4), type = l, 
main = Plot Title, ylab = Y Vals, 
xlab = X Vals)

legend(right, lty = 1:4, col = 1:4, 
   legend = c(y1, y2, y3, y4))


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] X11/Intrinsic.h preventing build on rhel

2014-09-19 Thread Marc Schwartz


On Sep 19, 2014, at 1:28 PM, Gaurav Chakravorty gc...@circulumvite.com wrote:

 I am trying to build R-3.1.0 on RHEL
 But configure returns with an error due to X11/Intrinsic.h missing
 
 Is there a workaround ?
 


In most Linuxen, the header files are contained in *-dev[el] packages. For 
RHEL, this is likely to be libX11-devel, so you will need to install that RPM.

Note that a pre-compiled binary RPM for R is available from the EPEL for RHEL:

  https://fedoraproject.org/wiki/EPEL

Also note that there is the R-SIG-Fedora list, which covers support for RH 
based distros specifically:

  https://stat.ethz.ch/mailman/listinfo/r-sig-fedora

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] X11/Intrinsic.h preventing build on rhel

2014-09-19 Thread Marc Schwartz


On Sep 19, 2014, at 1:28 PM, Gaurav Chakravorty gc...@circulumvite.com wrote:

 I am trying to build R-3.1.0 on RHEL
 But configure returns with an error due to X11/Intrinsic.h missing
 
 Is there a workaround ?
 


In most Linuxen, the header files are contained in *-dev[el] packages. For 
RHEL, this is likely to be libX11-devel, so you will need to install that RPM.

Note that a pre-compiled binary RPM for R is available from the EPEL for RHEL:

  https://fedoraproject.org/wiki/EPEL

Also note that there is the R-SIG-Fedora list, which covers support for RH 
based distros specifically:

  https://stat.ethz.ch/mailman/listinfo/r-sig-fedora

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using R in our commercial business application

2014-09-18 Thread Marc Schwartz

On Sep 18, 2014, at 4:36 AM, Pasu pasupat...@gmail.com wrote:

 Hi
 
 I would like to know how to use R in our commercial business application
 which we plan to host in cloud or deploy on customer's premise.
 
 1. Using R and its package, does it enforce that my commercial business
 application should be distributed under GPL, as the statistical derivation
 (output) by using R will be presented to the end users as part of of our
 commercial business application
 2. Whom to contact to get commercial license if required for using R?
 
 Rgds
 Pasupathy


You will not get a definitive legal opinion here and my comments below do not 
represent any formal opinion on the part of any organization.

There is nothing preventing you or your company from using R as an end user. 
There are many of us who use R in commercial settings and in general, the 
output of a GPL'd application (text or binary) is not considered to be also 
GPL'd.

The subtleties get into the distribution of R (which you seem to plan to do), 
the nature of any additional functionality/code that you or your company may 
write/distribute, how that code interacts with R and/or modifies R source code 
copyrighted by the R Foundation and others. If you distribute R to clients, you 
will need to make R's source code available to them in some manner along with 
any modifications to that same code, while preserving appropriate copyrights.

A proprietary (closed source) application cannot be licensed under the GPL, but 
your company's application/code may be forced to be GPL (the so called viral 
aspect of the GPL) depending upon how your application is implemented as I 
noted in the prior paragraph. Thus, you may be forced to make your source code 
available to your clients as well.

If you plan to move forward, you should consult with an attorney well educated 
in software licensing and distribution issues, especially as they pertain to 
the GPL. The risks are not inconsequential of falling on the wrong side of the 
GPL.

The official R distribution is not available via a commercial or developer 
license, but there are commercial vendors of R and a Google search will point 
you in their direction, if desired. However, since their products are founded 
upon the official R distribution and the GPL, they will have similar issues 
with respect to any enhancements that they have created and therefore, your 
concerns do not necessarily go away. They will have also consulted legal 
counsel on these issues because the viability of their business depends upon it.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using R in our commercial business application

2014-09-18 Thread Marc Schwartz


On Sep 18, 2014, at 3:42 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote:

 On 18/09/2014 2:35 PM, Marc Schwartz wrote:
 On Sep 18, 2014, at 4:36 AM, Pasu pasupat...@gmail.com wrote:
 
  Hi
 
  I would like to know how to use R in our commercial business application
  which we plan to host in cloud or deploy on customer's premise.
 
  1. Using R and its package, does it enforce that my commercial business
  application should be distributed under GPL, as the statistical derivation
  (output) by using R will be presented to the end users as part of of our
  commercial business application
  2. Whom to contact to get commercial license if required for using R?
 
  Rgds
  Pasupathy
 
 
 You will not get a definitive legal opinion here and my comments below do 
 not represent any formal opinion on the part of any organization.
 
 There is nothing preventing you or your company from using R as an end user. 
 There are many of us who use R in commercial settings and in general, the 
 output of a GPL'd application (text or binary) is not considered to be also 
 GPL'd.
 
 The subtleties get into the distribution of R (which you seem to plan to 
 do), the nature of any additional functionality/code that you or your 
 company may write/distribute, how that code interacts with R and/or modifies 
 R source code copyrighted by the R Foundation and others. If you distribute 
 R to clients, you will need to make R's source code available to them in 
 some manner along with any modifications to that same code, while preserving 
 appropriate copyrights.
 
 A proprietary (closed source) application cannot be licensed under the GPL, 
 but your company's application/code may be forced to be GPL (the so called 
 viral aspect of the GPL) depending upon how your application is implemented 
 as I noted in the prior paragraph. Thus, you may be forced to make your 
 source code available to your clients as well.
 
 If you plan to move forward, you should consult with an attorney well 
 educated in software licensing and distribution issues, especially as they 
 pertain to the GPL. The risks are not inconsequential of falling on the 
 wrong side of the GPL.
 
 The official R distribution is not available via a commercial or developer 
 license, but there are commercial vendors of R and a Google search will 
 point you in their direction, if desired. However, since their products are 
 founded upon the official R distribution and the GPL, they will have similar 
 issues with respect to any enhancements that they have created and 
 therefore, your concerns do not necessarily go away. They will have also 
 consulted legal counsel on these issues because the viability of their 
 business depends upon it.
 
 I agree with all of that but for one thing:  not all distributions are built 
 on the GPL'd original.  I believe Tibco is selling an independent 
 implementation.
 
 Duncan Murdoch


Thanks Duncan, I stand corrected. 

A quick Google search supports the point that the Tibco TERR system is an 
independent, closed-source, re-implementation of R, not based upon GPL R.

Regards,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] frequencies of a discrete numeric variable, including zeros

2014-09-02 Thread Marc Schwartz

, 5L, 6L, 6L,
 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L,
 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 9L, 9L, 10L,
 11L, 12L, 12L, 16L, 19L)


Micheal,

Corece the vector to be tabulated to a factor, that contains all of the levels 
0:19, then use barplot():

art.fac - factor(art, levels = 0:19)

 table(art.fac)
art.fac
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
275 246 178  84  67  27  17  12   1   2   1   1   2   0   0   0   1   0 
 18  19 
  0   1 


barplot(table(art.fac), cex.names = 0.5)


Thanks for providing the data above.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] split a string a keep the last part

2014-08-28 Thread Marc Schwartz


On Aug 28, 2014, at 12:41 PM, Jun Shen jun.shen...@gmail.com wrote:

 Hi everyone,
 
 I believe I am not the first one to have this problem but couldn't find a
 relevant thread on the list.
 
 Say I have a string (actually it is the whole column in a data frame) in a
 format like this:
 
 test- 'AF14-485-502-89-00235'
 
 I would like to split the test string and keep the last part. I think I can
 do the following
 
 sub('.*-.*-.*-.*-(.*)','\\1', test)
 
 to keep the fifth part of the string. But this won't work if other strings
 have more or fewer parts separated by '-'. Is there a general way to do it?
 Thanks.
 
 Jun


Try this:

test - 'AF14-485-502-89-00235'

 sub(^.*-(.*)$, \\1, test)
[1] 00235


test - 'AF14-485-502-89-00235-1234'

 sub(^.*-(.*)$, \\1, test)
[1] 1234


Another option:

 tail(unlist(strsplit(test, -)), 1)
[1] 1234


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installing RODBC

2014-08-20 Thread Marc Schwartz

On Aug 20, 2014, at 5:43 PM, William Deese williamde...@gmail.com wrote:

 I tried installing RODBC but got the following message:
 
 Checks were yes until the following
 
 checking sql.h usability... no
 checking sql.h presence... no
 checking for sql.h... no
 checking sqlext.h usability... no
 checking sqlext.h presence... no
 checking for sqlext.h... no
 configure: error: ODBC headers sql.h and sqlext.h not found
 ERROR: configuration failed for package ‘RODBC’
 * removing ‘/home/bill/R/x86_64-pc-linux-gnu-library/3.1/RODBC’
 
 Apparently RODBC was there when R was installed, but library() shows
 it is not there now, although the DBI package is. Best ideas for
 installing RODBC?
 
 Bill


You are missing the indicated header files, which are required if you are 
building the package from source.

As per the extensive vignette that Prof. Ripley has provided:

  http://cran.r-project.org/web/packages/RODBC/vignettes/RODBC.pdf

in Appendix A, which describes Installation, you will find:

For other systems the driver manager of choice is likely to be unixODBC, part 
of almost all Linux distributions and with sources downloadable from 
http://www.unixODBC.org. In Linux binary distributions it is likely that 
package unixODBC-devel or unixodbc-dev or some such will be needed.

Thus, for whatever Linux distribution you are using, install the relevant RPMs 
or Debs or ...

Also, for future reference, there is a specific mailing list for DB related 
queries:

  https://stat.ethz.ch/mailman/listinfo/r-sig-db

and a search of the list archives, for example using rseek.org, would likely 
result in your finding queries and answers to this same issue over the years.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regex pattern assistance

2014-08-15 Thread Marc Schwartz

On Aug 15, 2014, at 11:18 AM, Tom Wright t...@maladmin.com wrote:

 Hi,
 Can anyone please assist.
 
 given the string 
 
 x-/mnt/AO/AO Data/S01-012/120824/
 
 I would like to extract S01-012
 
 require(stringr)
 str_match(x,\\/mnt\\/AO\\/AO Data\\/(.+)\\/+)
 str_match(x,\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+)
 
 both nearly work. I expected I would use something like:
 str_match(x,\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+)
 
 but I don't seem able to get the square bracket grouping to work
 correctly. Can someone please show me where I am going wrong?
 
 Thanks,
 Tom


Is the desired substring always in the same relative position in the path?

If so:

 strsplit(x, /)
[[1]]
[1] mnt AO  AO Data S01-012 120824 

 unlist(strsplit(x, /))[5]
[1] S01-012



Alternatively, again, presuming the same position:

 gsub(/mnt/AO/AO Data/([^/]+)/.+, \\1, x)
[1] S01-012


You don't need all of the double backslashes in your regex above. The '/' 
character is not a special regex character, whereas '\' is and needs to be 
escaped.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regex pattern assistance

2014-08-15 Thread Marc Schwartz


On Aug 15, 2014, at 11:56 AM, Tom Wright t...@maladmin.com wrote:

 WOW!!!
 
 What can I say 4 answers in less than 4 minutes. Thank you everyone. If
 I can't make it work now I don't deserve to. 
 
 btw. the strsplit approach wouldn't work for me as:
 a) I wanted to play with regex and 
 b) the location isn't consistent.


Tom,

If not in the same relative position, is the substring pattern always the same? 
That is 3 characters, a hyphen, then 3 characters? If so, would any other part 
of the path follow the same pattern or is it unique?

If the pattern is the same and is unique in the path:

 gsub(.*([[:alnum:]]{3}-[[:alnum:]]{3}).*, \\1, x)
[1] S01-012


is another possible alternative and more flexible:

y - /mnt/AO/AO Data/Another Level/Yet Another Level/S01-012/120824/

 gsub(.*([[:alnum:]]{3}-[[:alnum:]]{3}).*, \\1, y)
[1] S01-012


z - /mnt/AO/AO Data/Another Level/Yet Another Level/S01-012/One More 
Level/120824/

 gsub(.*([[:alnum:]]{3}-[[:alnum:]]{3}).*, \\1, z)
[1] S01-012


 
 Nice to see email support still works, not everything has moved to
 linkedin and stackoverflow.


Stackoverflow?  ;-)

Regards,

Marc


 
 
 Thanks again,
 Tom
 
 
 On Fri, 2014-08-15 at 12:18 -0400, Tom Wright wrote:
 Hi,
 Can anyone please assist.
 
 given the string 
 
 x-/mnt/AO/AO Data/S01-012/120824/
 
 I would like to extract S01-012
 
 require(stringr)
 str_match(x,\\/mnt\\/AO\\/AO Data\\/(.+)\\/+)
 str_match(x,\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+)
 
 both nearly work. I expected I would use something like:
 str_match(x,\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+)
 
 but I don't seem able to get the square bracket grouping to work
 correctly. Can someone please show me where I am going wrong?
 
 Thanks,
 Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generating a sequence of seconds

2014-08-12 Thread Marc Schwartz


On Aug 12, 2014, at 1:51 PM, Erin Hodgess erinm.hodg...@gmail.com wrote:

 Hello!
 
 If I would like to generate a sequence of seconds for a date, I would do
 the following:
 
 x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
 23:59:59),by=secs)
 
 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?
 
 thanks,
 Erin


Erin,

Do you want just the numeric vector of seconds, with the first value being 0, 
incrementing by 1 to the final value?

x - seq(from = as.POSIXct(2014-08-12 00:00:00), 
 to = as.POSIXct(2014-08-12 23:59:59), 
 by = secs)

 head(x)
[1] 2014-08-12 00:00:00 CDT 2014-08-12 00:00:01 CDT
[3] 2014-08-12 00:00:02 CDT 2014-08-12 00:00:03 CDT
[5] 2014-08-12 00:00:04 CDT 2014-08-12 00:00:05 CDT

 tail(x)
[1] 2014-08-12 23:59:54 CDT 2014-08-12 23:59:55 CDT
[3] 2014-08-12 23:59:56 CDT 2014-08-12 23:59:57 CDT
[5] 2014-08-12 23:59:58 CDT 2014-08-12 23:59:59 CDT


 head(as.numeric(x - x[1]))
[1] 0 1 2 3 4 5

 tail(as.numeric(x - x[1]))
[1] 86394 86395 86396 86397 86398 86399


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generating a sequence of seconds

2014-08-12 Thread Marc Schwartz

Erin,

Is a sequential resolution of seconds required, as per your original post?

If so, then using my approach and specifying the start and end dates and times 
will work, with the coercion of the resultant vector to numeric as I included. 
The method I used (subtracting the first value) will also give you the starting 
second as 0, or you can alter the math to adjust the origin of the vector as 
you desire.

As Bill notes, there will be some days where the number of seconds in the day 
will be something other than 86,400. In Bill's example, it is due to his 
choosing the start and end dates of daylight savings time in a relevant time 
zone. Thus, his second date is short an hour, while the third has an extra hour.

Regards,

Marc


On Aug 12, 2014, at 2:26 PM, Erin Hodgess erinm.hodg...@gmail.com wrote:

 What I would like to do is to look at several days and determine activities
 that happened at times on those days.  I don't really care which days, I
 just care about what time.
 
 Thank you!
 
 
 
 
 On Tue, Aug 12, 2014 at 3:14 PM, William Dunlap wdun...@tibco.com wrote:
 
 What if I just want the seconds vector without the date, please?  Is
 there
 a convenient way to create such a vector, please?
 
 Why do you want such a thing?  E.g., do you want it to print the time
 of day without the date?  Or are you trying to avoid numeric problems
 when you do regressions with the seconds-since-1970 numbers around
 1414918800?  Or is there another problem you want solved?
 
 Note that the number of seconds in a day depends on the day and the
 time zone.  In US/Pacific time I get:
 
 length(seq(from=as.POSIXct(2014-08-12
 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs))
  [1] 86400
 length(seq(from=as.POSIXct(2014-03-09
 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs))
  [1] 82800
 length(seq(from=as.POSIXct(2014-11-02
 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs))
  [1] 9
 
 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com
 
 
 On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com
 wrote:
 Hello!
 
 If I would like to generate a sequence of seconds for a date, I would do
 the following:
 
 x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
 23:59:59),by=secs)
 
 What if I just want the seconds vector without the date, please?  Is
 there
 a convenient way to create such a vector, please?
 
 thanks,
 Erin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generating a sequence of seconds

2014-08-12 Thread Marc Schwartz


On Aug 12, 2014, at 2:49 PM, John McKown john.archie.mck...@gmail.com wrote:

 And some people wonder why I absolutely abhor daylight saving time.
 I'm not really fond of leap years and leap seconds either. Somebody
 needs to fix the Earth's rotation and orbit!


I have been a longtime proponent of slowing the rotation of the Earth on its 
axis, so that we could have longer days to be more productive.

Unfortunately, so far, my wish has gone unfulfilled...at least as it is 
relevant within human lifetimes.

;-)

Regards,

Marc


 
 On Tue, Aug 12, 2014 at 2:14 PM, William Dunlap wdun...@tibco.com wrote:
 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?
 
 Why do you want such a thing?  E.g., do you want it to print the time
 of day without the date?  Or are you trying to avoid numeric problems
 when you do regressions with the seconds-since-1970 numbers around
 1414918800?  Or is there another problem you want solved?
 
 Note that the number of seconds in a day depends on the day and the
 time zone.  In US/Pacific time I get:
 
 length(seq(from=as.POSIXct(2014-08-12
 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs))
  [1] 86400
 length(seq(from=as.POSIXct(2014-03-09
 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs))
  [1] 82800
 length(seq(from=as.POSIXct(2014-11-02
 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs))
  [1] 9
 
 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com
 
 
 On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com 
 wrote:
 Hello!
 
 If I would like to generate a sequence of seconds for a date, I would do
 the following:
 
 x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
 23:59:59),by=secs)
 
 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?
 
 thanks,
 Erin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Better use with gsub

2014-08-01 Thread Marc Schwartz

On Aug 1, 2014, at 9:46 AM, Doran, Harold hdo...@air.org wrote:

 I have done an embarrassingly bad job using a mixture of gsub and strsplit to 
 solve a problem. Below is sample code showing what I have to start with (the 
 vector xx) and I want to end up with two vectors x and y that contain only 
 the digits found in xx.
 
 Any regex users with advice most welcome
 
 Harold
 
 xx - c(S24:57,   S24:86,   S24:119,  S24:129,  S24:138,  S24:163)
 yy - gsub(S,\\1, xx)
 a1 - gsub(:, , yy)
 a2 - sapply(a1, function(x) strsplit(x, ' '))
 x - as.numeric(sapply(a2, function(x) x[1]))
 y - as.numeric(sapply(a2, function(x) x[2]))


If a matrix is a satisfactory result, rather than two separate vectors:

 sapply(strsplit(gsub(S, , xx), xx, split = :), as.numeric)
 [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   24   24   24   24   24   24
[2,]   57   86  119  129  138  163


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to randomly extract a number of rows in a data frame

2014-08-01 Thread Marc Schwartz

On Aug 1, 2014, at 1:58 PM, Stephen HK Wong hon...@stanford.edu wrote:

 Dear ALL,
 
 I have a dataframe contains 4 columns and several 10 millions of rows like 
 below! I want to extract out randomly say 1 millions of rows, can you tell 
 me how to do that in R using base packages? Many Thanks
 
 Col_1 Col_2   Col_3   Col_4
 chr1  3000215 3000250 -
 chr1  3000909 3000944 +
 chr1  3001025 3001060 +
 chr1  3001547 3001582 +
 chr1  3002254 3002289 +
 chr1  3002324 3002359 -
 chr1  3002833 3002868 -
 chr1  3004565 3004600 -
 chr1  3004945 3004980 +
 chr1  3004974 3005009 -
 chr1  3005115 3005150 +
 chr1  3005124 3005159 +
 chr1  3005240 3005275 -
 chr1  3005558 3005593 -
 chr1  3005890 3005925 +
 chr1  3005929 3005964 +
 chr1  3005913 3005948 -
 chr1  3005913 3005948 -
 
 Stephen HK Wong


If your data frame is called 'DF':

  DF.Rand - DF[sample(nrow(DF), 100), ]

See ?sample which will generate a random sample from a uniform distribution.

In the above, nrow(DF) returns the number of rows in DF and defines the sample 
space of 1:nrow(DF), from which 100 random integer values will be selected 
and used as indices to return the rows.

Using the built in 'iris' dataset, select 20 random rows from the 150 total:

 iris[sample(nrow(iris), 20), ]
Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies
122  5.6 2.8  4.9 2.0  virginica
79   6.0 2.9  4.5 1.5 versicolor
109  6.7 2.5  5.8 1.8  virginica
106  7.6 3.0  6.6 2.1  virginica
49   5.3 3.7  1.5 0.2 setosa
125  6.7 3.3  5.7 2.1  virginica
15.1 3.5  1.4 0.2 setosa
68   5.8 2.7  4.1 1.0 versicolor
84   6.0 2.7  5.1 1.6 versicolor
110  7.2 3.6  6.1 2.5  virginica
113  6.8 3.0  5.5 2.1  virginica
64   6.1 2.9  4.7 1.4 versicolor
102  5.8 2.7  5.1 1.9  virginica
71   5.9 3.2  4.8 1.8 versicolor
69   6.2 2.2  4.5 1.5 versicolor
65   5.6 2.9  3.6 1.3 versicolor
74   6.1 2.8  4.7 1.2 versicolor
99   5.1 2.5  3.0 1.1 versicolor
135  6.1 2.6  5.6 1.4  virginica
41   5.0 3.5  1.3 0.3 setosa



Regards,

Marc Schwartz
 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] separate numbers from chars in a string

2014-07-31 Thread Marc Schwartz


On Jul 31, 2014, at 3:17 AM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote:

 
 
 On 31.07.2014 04:46, carol white wrote:
 There are some level of variation either chars followed by numbers or chars, 
 numbers, chars
 
 
 Perhaps, I should use gsub as you suggested all and if the string is 
 composed of chars followed by numbers, it will return the 3rd part empty?
 
 
 Please read about regularvexpressions and describe your problem accurately. 
 If the last strings are onot always present, use * rather than + at the very 
 end of the regular expression.
 
 Best,
 Uwe Ligges


Carol,

As Uwe notes, reviewing the documentation for ?regex and the examples in ?gsub 
can be helpful. There are also online regex resources such as:

  http://www.regular-expressions.info

The question is how much variation might be present. If it will always be up to 
3 possible components, then as Uwe indicated, using the '*' instead of '+' will 
allow for the possibility that one or more patterns will not be present. '*' 
means that 0 or more of the patterns must be present, whereas '+' requires that 
at least one or more matches are present.

 strsplit(gsub(([a-z]*)([0-9]*)([a-z]*), \\1 \\2 \\3, absdfds0213451ab), 
  )
[[1]]
[1] absdfds 0213451 ab   

 strsplit(gsub(([a-z]*)([0-9]*)([a-z]*), \\1 \\2 \\3, absdfds0213451),  
 )
[[1]]
[1] absdfds 0213451

 strsplit(gsub(([a-z]*)([0-9]*)([a-z]*), \\1 \\2 \\3, 0213451ab),  )
[[1]]
[1] 0213451 ab  


Using the 3 back references in the regex above will limit the parsing to up to 
3 possible components. If you may have more than 3 you can increase the back 
reference sequence to some maximum number. However that can get tedious, so you 
may want to consider multiple passes using strsplit() to extract letters during 
one pass and then numbers during a second, or write a function to encapsulate 
that process.

Here are examples using strsplit():

# Get the numbers, using letters as the split
 strsplit(absdfds0213451ab, split = [a-z]+)
[[1]]
[1] 0213451

 strsplit(absdfds0213451ab4567, split = [a-z]+)
[[1]]
[1] 0213451 4567   


# Get the letters, using numbers as the split
 strsplit(absdfds0213451ab, split = [0-9]+)
[[1]]
[1] absdfds ab 

 strsplit(0213451ab, split = [0-9]+)
[[1]]
[1]ab

 strsplit(0213451ab123xyz789lmn, split = [0-9]+)
[[1]]
[1] ab  xyz lmn


Regards,

Marc


 
 
 Regards,
 
 Carol
 
 
 On Wednesday, July 30, 2014 10:52 PM, Marc Schwartz marc_schwa...@me.com 
 wrote:
 
 
 
 On Jul 30, 2014, at 3:13 PM, carol white wht_...@yahoo.com wrote:
 
 Hi,
 If I have a string of consecutive chars followed by consecutive numbers and 
 then chars, like absdfds0213451ab, how to separate the consecutive chars 
 from consecutive numbers?
 
 grep doesn't seem to be helpful
 
 grep([a-z],absdfds0213451ab, ignore.case=T)
 [1] 1
 
 
   grep([0-9],absdfds0213451ab, ignore.case=T)
 [1] 1
 
 Thanks
 
 Carol
 
 
 grep() will only tell you that a pattern is present. You want to use gsub() 
 or similar with back references to return parts of the vector.
 
 Will they ALWAYS appear in that pattern (letters, numbers, letters) or is 
 there some level of variation?
 
 If they will always appear as in your example, then one approach is:
 
 strsplit(gsub(([a-z]+)([0-9]+)([a-z]+), \\1 \\2 \\3, 
 absdfds0213451ab),  )
 
 [[1]]
 [1] absdfds 0213451 ab
 
 
 The initial gsub() returns the 3 parts separated by a space, which is then 
 used as the split argument to strsplit().
 
 If there will be some variation, you can use multiple calls to gsub() or 
 similar, each getting either the letters or the numbers.
 
 Regards,
 
 Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] separate numbers from chars in a string

2014-07-30 Thread Marc Schwartz

On Jul 30, 2014, at 3:13 PM, carol white wht_...@yahoo.com wrote:

 Hi,
 If I have a string of consecutive chars followed by consecutive numbers and 
 then chars, like absdfds0213451ab, how to separate the consecutive chars 
 from consecutive numbers?
 
 grep doesn't seem to be helpful
 
 grep([a-z],absdfds0213451ab, ignore.case=T)
 [1] 1
 
 
  grep([0-9],absdfds0213451ab, ignore.case=T)
 [1] 1
 
 Thanks
 
 Carol


grep() will only tell you that a pattern is present. You want to use gsub() or 
similar with back references to return parts of the vector.

Will they ALWAYS appear in that pattern (letters, numbers, letters) or is there 
some level of variation?

If they will always appear as in your example, then one approach is:

 strsplit(gsub(([a-z]+)([0-9]+)([a-z]+), \\1 \\2 \\3, absdfds0213451ab), 
  )
[[1]]
[1] absdfds 0213451 ab


The initial gsub() returns the 3 parts separated by a space, which is then used 
as the split argument to strsplit().

If there will be some variation, you can use multiple calls to gsub() or 
similar, each getting either the letters or the numbers.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SASxport function read.xport gives error object 'w' not found

2014-07-25 Thread Marc Schwartz

On Jul 25, 2014, at 8:26 AM, Jocelyn Ireson-Paine p...@j-paine.org wrote:

 The subject line says it. I've just tried converting an SAS .xpt file with 
 this call:
   read.xport( 'formats.xpt' )
 
 I get the message
  Error in read.xport(formats.xpt) : object 'w' not found
 but there's no explanation about what 'w' is or how I should make it known to 
 R. I certainly wasn't expecting to have to provide such a variable, unless 
 I've overlooked something in the SASxport documentation.
 
 This is using R version 3.1.0 under Windows 7, and SASxport installed this 
 morning: version 1.5.0 (2014-07-21).
 
 Any idea what this 'w' is that read.xport wants me to give it?
 
 The .xpt file is confidential, so I can't make it available, and I don't know 
 exactly what's in it. I suspect, however, that it may contain quite a bit of 
 text. However, I'm pretty sure that its author used SAS correctly when 
 generating it. Other files containing purely numeric data from the same 
 author converted OK, using analogous calls to read.xport .
 
 Googling the error didn't find anything.
 
 Thanks,
 
 Jocelyn Ireson-Paine
 07768 534 091
 http://www.jocelyns-cartoons.co.uk


Have you tried reading the file with the read.xport() function in the foreign 
package, which is part of the default R installation?

require(foreign)
?read.xport

I would start a fresh R session, just to be sure that the SASxport version is 
not used unintentionally.

There might be a bug in the SASxport version of the function, which apparently 
uses some of the foreign package version's code, or there might be something 
about your xpt file that is causing a problem. You may need to contact Greg, 
who is the package maintainer for SASxport, to get a sense from him as to what 
would trigger the error you are experiencing. Otherwise, you may have to trace 
through the code (see ?debug, for example) with your file and see if you can 
identify a trigger.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Norton Virus program indicates that R3.1.1 is not reliable

2014-07-13 Thread Marc Schwartz

Jim,

You can file a Type I error on the file here:

  https://submit.symantec.com/false_positive/

rather than waiting.

I had seen similar reports, not on R, but elsewhere with this particular 
Symantec community based detection. I am not a user, but since it appears to be 
a community based system for this detection, it will take the Symantec user 
community to file reports and get it removed from detection.

Regards,

Marc Schwartz

On Jul 13, 2014, at 10:30 AM, jim holtman jholt...@gmail.com wrote:

 Glad to see that I am not the only one seeing the error.  I was
 getting it on my other (company) computer that has Symantec and it
 will not allow me to override and do the install.  Guess I will check
 again in a couple of days and see it it clears up.  Will also check to
 see if I can contact Norton and Symantec about the problem.
 
 Jim Holtman
 Data Munger Guru
 
 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.
 
 
 On Sun, Jul 13, 2014 at 11:17 AM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
 Have seen it. Had to override Norton and tell it to ignore the threat. Been 
 awhile, don't off the top of my head remember how I did that.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.
 
 On July 13, 2014 8:03:37 AM PDT, jim holtman jholt...@gmail.com wrote:
 I was downloading the latest version of R from the CMU mirror and got
 the following message (also tried the MTU mirror and got the same).
 Has anyone else seen this?
 
 
 Filename: r-3.1.1-win.exe
 Threat name: WS.Reputation.1
 Full Path: c:\users\owner\downloads\r-3.1.1-win.exe
 
 
 
 
 
 Details
 Unknown Community Usage,  Unknown Age,  Risk Medium
 
 
 
 
 
 Origin
 Downloaded from
 http://lib.stat.cmu.edu/R/CRAN/bin/windows/base/R-3.1.1-win.exe
 
 
 
 
 
 Activity
 Actions performed: Actions performed: 1
 
 
 
 
 
 
 
 On computers as of
 Not Available
 
 
 Last Used
 7/13/14 at 10:56:56
 
 
 Startup Item
 No
 
 
 Launched
 No
 
 
 
 
 
 Unknown
 It is unknown how many users in the Norton Community have used this
 file.
 
 Unknown
 This file release is currently not known.
 
 Medium
 This file risk is medium.
 
 Threat type: Insight Network Threat. There are many indications that
 this file is untrustworthy and therefore not safe
 
 
 
 
 
 
 http://lib.stat.cmu.edu/R/CRAN/bin/windows/base/R-3.1.1-win.exe
 
 Downloaded File r-3.1.1-win.exe Threat name: WS.Reputation.1
 from cmu.edu
 
 Source: External Media
 
 
 
 
 
 File Actions
 
 File: c:\users\owner\downloads\ r-3.1.1-win.exe Removed
 
 
 
 File Thumbprint - SHA:
 ce6fb76612aefc482583fb92f4f5c3cb8e8e3bf1a8dda97df7ec5caf746e53fe
 File Thumbprint - MD5:
 Not available
 
 
 Jim Holtman
 Data Munger Guru
 
 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with read.table and data structure

2014-07-11 Thread Marc Schwartz


On Jul 11, 2014, at 9:15 AM, Tim Richter-Heitmann trich...@uni-bremen.de 
wrote:

 Hi there!
 
 I have huge datafile of 600 columns 360 samples:
 
 data - read.table(small.txt, header = TRUE, sep = \t, dec = ., 
 row.names=1)
 
 The txt.file (compiled with excel) is showing me only numbers, however R 
 gives me the structure of ANY column as factor.
 
 When i try stringsAsFactors=FALSE in the read command, the structure 
 of the dataset becomes character.
 
 When i try as.numeric(data), i get
 
 Error: (list) object cannot be coerced to type 'double'
 
 
 even, if i try to subset columns with [].
 
 
 When i try as.numeric on single columns with $, i am successful, but the 
 numbers dont make any sense at all, as the factors are not converted by their 
 levels:
 
 
 Factor w/ 358 levels 0,123111694,..: 11 14 50 12 38 44 13 76 31 30
 
 
 becomes
 
 
 num  11 14 50 12 38 44 13 76 31 30
 
 
 whereas i would need the levels, though!
 
 
 I suspect excel to mess up the save as tab-delimited text, but the text 
 file seems fine with me on surface (i dont know how the numbers are stored  
 internally). I just see correct numbers, also the View command
 yields the correct content.
 
 
 
 Anyone knows help? Its pretty annoying.
 
 
 
 Thank you!


Hi,

See:

  
http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with read.table and data structure

2014-07-11 Thread Marc Schwartz


On Jul 11, 2014, at 2:36 PM, Marc Schwartz marc_schwa...@me.com wrote:

 
 On Jul 11, 2014, at 9:15 AM, Tim Richter-Heitmann trich...@uni-bremen.de 
 wrote:
 
 Hi there!
 
 I have huge datafile of 600 columns 360 samples:
 
 data - read.table(small.txt, header = TRUE, sep = \t, dec = ., 
 row.names=1)
 
 The txt.file (compiled with excel) is showing me only numbers, however R 
 gives me the structure of ANY column as factor.
 
 When i try stringsAsFactors=FALSE in the read command, the structure 
 of the dataset becomes character.
 
 When i try as.numeric(data), i get
 
 Error: (list) object cannot be coerced to type 'double'
 
 
 even, if i try to subset columns with [].
 
 
 When i try as.numeric on single columns with $, i am successful, but the 
 numbers dont make any sense at all, as the factors are not converted by 
 their levels:
 
 
 Factor w/ 358 levels 0,123111694,..: 11 14 50 12 38 44 13 76 31 30
 
 
 becomes
 
 
 num  11 14 50 12 38 44 13 76 31 30
 
 
 whereas i would need the levels, though!
 
 
 I suspect excel to mess up the save as tab-delimited text, but the text 
 file seems fine with me on surface (i dont know how the numbers are stored  
 internally). I just see correct numbers, also the View command
 yields the correct content.
 
 
 
 Anyone knows help? Its pretty annoying.
 
 
 
 Thank you!
 
 
 Hi,
 
 See:
 
  
 http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f
 
 Regards,
 
 Marc Schwartz


Sorry, I just noted that you defined dec = . in your call to read.table(), 
whereas it appears that a comma (,) is being used as a decimal separator in 
your source data.

Modify the dec = . to dec = , and that should obviate the need to convert 
the numeric values to factors during import. They should be converted to 
numerics right away.

For example:

 str(read.table(textConnection(0,1234), dec = .))
'data.frame':   1 obs. of  1 variable:
 $ V1: Factor w/ 1 level 0,1234: 1

 str(read.table(textConnection(0,1234), dec = ,))
'data.frame':   1 obs. of  1 variable:
 $ V1: num 0.123


Regards,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] table over a matrix dimension...

2014-07-10 Thread Marc Schwartz


On Jul 10, 2014, at 12:03 PM, Jonathan Greenberg j...@illinois.edu wrote:

 R-helpers:
 
 I'm trying to determine the frequency of characters for a matrix
 applied to a single dimension, and generate a matrix as an output.
 I've come up with a solution, but it appears inelegant -- I was
 wondering if there is an easier way to accomplish this task:
 
 # Create a matrix of factors (characters):
 random_characters=matrix(sample(letters[1:4],1000,replace=TRUE),100,10)
 
 # Applying with the table() function doesn't work properly, because not all 
 rows
 # have ALL of the factors, so I get a list output:
 apply(random_characters,1,table)
 
 # Hacked solution:
 unique_values = letters[1:4]
 
 countsmatrix - t(apply(random_characters,1,function(x,unique_values)
 {
 counts=vector(length=length(unique_values))
 for(i in seq(unique_values))
 {
 counts[i] = sum(x==unique_values[i])
 }
 return(counts)
 },
 unique_values=unique_values
 ))
 
 # Gets me the output I want but requires two nested loops (apply and
 for() ), so
 # not efficient for very large datasets.
 
 ###
 
 Is there a more elegant solution to this?
 
 --j
 


If I am correctly understanding your issue, you simply need to coerce the input 
to table() to a factor with a common set of levels, since the matrix will be 
'character' by default:


set.seed(1)
random_characters - matrix(sample(factor(letters[1:4]), 1000, replace = TRUE), 
100, 10)

 random_characters 
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
  [1,] b  c  b  c  c  c  d  d  d  d  
  [2,] b  b  a  a  a  c  d  d  a  d  
  [3,] c  b  c  b  d  c  a  d  d  b  
  [4,] d  d  b  b  d  c  c  c  c  a  
  [5,] a  c  a  b  d  b  d  c  b  a  
  [6,] d  a  c  d  c  d  d  a  c  a  
  [7,] d  a  c  a  b  b  b  b  b  a  
  [8,] c  b  a  d  d  d  b  c  d  a  
  [9,] c  d  b  a  a  d  d  d  b  a  
 [10,] a  c  c  b  d  c  a  c  a  a  
 [11,] a  d  d  a  d  d  d  c  b  c  
 [12,] a  c  a  a  b  b  b  b  b  d  
 [13,] c  b  d  d  c  a  c  a  b  c  
 [14,] b  b  d  c  d  c  c  d  d  a  
 [15,] d  a  d  b  c  c  c  b  b  a  
 [16,] b  a  b  b  b  a  b  b  c  b  
 [17,] c  c  c  a  b  c  a  a  d  a  
 [18,] d  a  d  b  b  c  b  a  d  c 
 ...


RES - t(apply(random_characters, 1, 
   function(x) table(factor(x, levels = letters[1:4]

 RES
   a b c d
  [1,] 0 2 4 4
  [2,] 4 2 1 3
  [3,] 1 3 3 3
  [4,] 1 2 4 3
  [5,] 3 3 2 2
  [6,] 3 0 3 4
  [7,] 3 5 1 1
  [8,] 2 2 2 4
  [9,] 3 2 1 4
 [10,] 4 1 4 1
 [11,] 2 1 2 5
 [12,] 3 5 1 1
 [13,] 2 2 4 2
 [14,] 1 2 3 4
 [15,] 2 3 3 2
 [16,] 2 7 1 0
 [17,] 4 1 4 1
 [18,] 2 3 2 3
 ...



Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] symbols in a data frame

2014-07-09 Thread Marc Schwartz

On Jul 9, 2014, at 12:19 PM, Sam Albers tonightstheni...@gmail.com wrote:

 Hello,
 
 I have recently received a dataset from a metal analysis company. The
 dataset is filled with less than symbols. What I am looking for is a
 efficient way to subset for any whole numbers from the dataset. The column
 is automatically formatted as a factor because of the  symbols making it
 difficult to deal with the numbers is a useful way.
 
 So in sum any ideas on how I could subset the example below for only whole
 numbers?
 
 Thanks in advance!
 
 Sam
 
 #code
 
 metals -
 
 
 structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
 = c(Antimony,
 Arsenic, Barium, Beryllium, Boron (Hot Water Soluble),
 Cadmium, Chromium, Cobalt, Copper, Lead, Mercury,
 Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium,
 Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L,
 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200,
 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4,
 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4,
 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50,
 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72,
 77, 89, 951), class = factor)), .Names = c(Parameter,
 Cedar.Creek), row.names = c(NA, 19L), class = data.frame)


Sam,

You can use ?gsub to remove the '' characters from the column and then use 
?subset to select the records you wish.

Note that gsub() returns a character vector, so you want to coerce to numeric.

 as.numeric(gsub(, , metals$Cedar.Creek))
 [1]  100  100  500  100   10 1000  100  516  550   10  200  500  100
[14]  500  100  951 1000 1000  100


For example:

 subset(metals, as.numeric(gsub(, , Cedar.Creek)) == 100)
   Parameter Cedar.Creek
1   Antimony100
2Arsenic100
4  Beryllium100
7 Cobalt100
13  Selenium100
15  Thallium100
19  Antimony100


 subset(metals, as.numeric(gsub(, , Cedar.Creek)) = 500)
Parameter Cedar.Creek
1Antimony100
2 Arsenic100
3  Barium500
4   Beryllium100
5 Cadmium 10
7  Cobalt100
10Mercury 10
11 Molybdenum200
12 Nickel500
13   Selenium100
14 Silver500
15   Thallium100
19   Antimony100


You can also just create a new column that is numeric and go from there:

metals$CC.Num - as.numeric(gsub(, , metals$Cedar.Creek))

 str(metals)
'data.frame':   19 obs. of  3 variables:
 $ Parameter  : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6 7 8 9 10 
11 ...
 $ Cedar.Creek: Factor w/ 45 levels 1,10,100,..: 3 3 7 3 2 4 3 34 36 2 
...
 $ CC.Num : num  100 100 500 100 10 1000 100 516 550 10 ...


 metals
Parameter Cedar.Creek CC.Num
1Antimony100100
2 Arsenic100100
3  Barium500500
4   Beryllium100100
5 Cadmium 10 10
6Chromium   1000   1000
7  Cobalt100100
8  Copper 516516
9Lead 550550
10Mercury 10 10
11 Molybdenum200200
12 Nickel500500
13   Selenium100100
14 Silver500500
15   Thallium100100
16Tin 951951
17   Vanadium   1000   1000
18   Zinc   1000   1000
19   Antimony100100



Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] odd behavior of seq()

2014-07-03 Thread Marc Schwartz

On Jul 3, 2014, at 1:28 PM, Matthew Keller mckellerc...@gmail.com wrote:

 Hi all,
 
 A bit stumped here.
 
 z - seq(.05,.85,by=.1)
 z==.05 #good
 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 
 z==.15  #huh
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 
 More generally:
 sum(z==.25)
 [1] 1
 sum(z==.35)
 [1] 0
 sum(z==.45)
 [1] 1
 sum(z==.55)
 [1] 1
 sum(z==.65)
 [1] 0
 sum(z==.75)
 [1] 0
 sum(z==.85)
 [1] 1
 
 Does anyone have any ideas what is going on here?


See the MFAQ[1]:

  
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

Regards,

Marc Schwartz

[1] Most Frequently Asked Question

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] access an element of a list without looping

2014-07-03 Thread Marc Schwartz

On Jul 3, 2014, at 2:35 PM, carol white wht_...@yahoo.com wrote:

 Hi,
 Is there any way to access an element of a list without looping over the list 
 nor using unlist? Just to avoid parsing a very long list.
 
 
 For ex, how to find a vector of a length 2 in a list without using a loop?
 
 l = list (c(1), c(2,3), c(1,2,3))
 for (i in 1:length(l))
 if(length(l[[i]]==2){
 print (i)
 break
 }
 
 Thanks
 
 Carol


You can use one of the *apply() functions, albeit, it is still effectively 
looping through the list. It may or may not be faster in some cases than an 
explicit for() loop, but it can be easier to read, depending upon the 
complexity of the function being utilized within the call.

For example:

 which(sapply(l, function(x) length(x) == 2))
[1] 2

This presumes that you only have a single level of list elements to scan. If 
you have sub-levels within the list, you might want to look at ?rapply, which 
is a recursive version.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Output levels of categorical data to Excel using with()

2014-06-19 Thread Marc Schwartz


On Jun 18, 2014, at 11:31 PM, Daniel Schwartz das2...@gmail.com wrote:

 I have coded qualitative data with many (20+) different codes from a survey
 in an excel file. I am using the with() function to output the codes so we
 know what's there. Is it possible to direct the output from with() to an
 excel file? If not, what's another function that has the same, er,
 functionality?! Thanks, R World!


It is not clear from your description, that the use of with() is really 
relevant here. with() is typically used as a convenience wrapper to be able to 
evaluate the names of data frame columns in the environment of the data frame, 
rather than having to repeat the 'DataFrameName$' prefix over and over.

To export data from R to Excel files, there are various options which are 
listed both in the R wiki:

  http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows

and in the R Data Import/Export manual:

  
http://cran.r-project.org/doc/manuals/r-release/R-data.html#Reading-Excel-spreadsheets

Worst case, you can use ?write.csv to dump the data to a CSV file, which can 
then be opened with Excel.

The option you may prefer will depend upon your operating system, how 
comfortable you may or may not be relative to installing additional software, 
do you want to create a new Excel file with each export or be able to append to 
existing worksheets and how you may want to structure or format the 
worksheet(s) in Excel.

Regards,

Marc Schwartz

P.S. I have a cousin Daniel, but different gmail e-mail address.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm.fit: fitted probabilities numerically 0 or 1 occurred for a continuous variable?

2014-06-16 Thread Marc Schwartz


On Jun 16, 2014, at 2:34 PM, Nwinters nicholas.wint...@mail.mcgill.ca wrote:

 I have gotten the this error before: glm.fit: fitted probabilities
 numerically 0 or 1 occurred
 
 and the problem was usually solved by combining one or more categories were
 there were no observations.
 
 I am now having this error show up for a variable that is continuous (not
 categorical).
 
 What could be the cause of this for a continuous variable??
 
 Thanks, 
 Nick


Presuming that this is logistic regression (family = binomial), the error is 
suggestive of complete or near complete separation in the association between 
your continuous IV and your binary response. This can occur if there is a 
breakpoint within the range of your IV where the dichotomous event is present 
on one side of the break and is absent on the other side of the break.

The resolution for the problem will depend upon first confirming the etiology 
of it and then, within the context of subject matter expertise, making some 
decisions on how to proceed. 

If you Google logistic regression separation, you will get some resources 
that can be helpful.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with arrays

2014-05-29 Thread Marc Schwartz


On May 29, 2014, at 11:02 AM, Olivier Charansonney 
olivier.charanson...@orange.fr wrote:

 Hello,
 
 I would like to extract the value in row 1 corresponding to the maximum in
 row 2
 
 
 
 Array W
 
  [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]
 [,8] [,9][,10]
 
 [1,] 651.0 651.0 651.0 651.0 651.0 651.0 651.0
 119.0 78.0 78.0
 
 [2,]  13.24184  13.24184  13.24184  13.24184  13.24184  13.24184  13.24184
 16.19418 15.47089 15.47089
 
 valinit-max(W[2,])
 
 valinit
 
 [1] 16.19418
 
 How to obtain ‘119’
 
 Thanks,


Hi,

Using ?dput can help make it easier for others to recreate your object to test 
code:

 dput(W)
structure(c(651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 
13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 119, 16.19418, 
78, 15.47089, 78, 15.47089), .Dim = c(2L, 10L))


W - structure(c(651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 
 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 119, 
16.19418, 
 78, 15.47089, 78, 15.47089), .Dim = c(2L, 10L))


See ?which.max, which returns the index of the *first* maximum in the vector 
passed to it:

 W[1, which.max(W[2, ])]
[1] 119


You should consider what happens if there is more than one of the maximum value 
in the first row and if it might correspond to non-unique values in the second 
row.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with arrays

2014-05-29 Thread Marc Schwartz


On May 29, 2014, at 11:22 AM, Marc Schwartz marc_schwa...@me.com wrote:

 
 On May 29, 2014, at 11:02 AM, Olivier Charansonney 
 olivier.charanson...@orange.fr wrote:
 
 Hello,
 
 I would like to extract the value in row 1 corresponding to the maximum in
 row 2
 
 
 
 Array W
 
 [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]
 [,8] [,9][,10]
 
 [1,] 651.0 651.0 651.0 651.0 651.0 651.0 651.0
 119.0 78.0 78.0
 
 [2,]  13.24184  13.24184  13.24184  13.24184  13.24184  13.24184  13.24184
 16.19418 15.47089 15.47089
 
 valinit-max(W[2,])
 
 valinit
 
 [1] 16.19418
 
 How to obtain ‘119’
 
 Thanks,
 
 
 Hi,
 
 Using ?dput can help make it easier for others to recreate your object to 
 test code:
 
 dput(W)
 structure(c(651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 
 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 119, 16.19418, 
 78, 15.47089, 78, 15.47089), .Dim = c(2L, 10L))
 
 
 W - structure(c(651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 
 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 119, 
 16.19418, 
 78, 15.47089, 78, 15.47089), .Dim = c(2L, 10L))
 
 
 See ?which.max, which returns the index of the *first* maximum in the vector 
 passed to it:
 
 W[1, which.max(W[2, ])]
 [1] 119
 
 
 You should consider what happens if there is more than one of the maximum 
 value in the first row and if it might correspond to non-unique values in the 
 second row.


Correction in the above sentence, it should be:

You should consider what happens if there is more than one of the maximum value 
in the second row and if it might correspond to non-unique values in the first 
row.

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looking at C code from the stats package

2014-05-29 Thread Marc Schwartz


On May 29, 2014, at 7:12 PM, Erin Hodgess erinm.hodg...@gmail.com wrote:

 Dear R People:
 
 How are you?
 
 I would like to look at the underlying C code from the program C_ARIMA_Like
 in the stats package.
 
 However, since that is a base package, I'm not entirely sure how to access
 this.
 
 When I used the .C(C_ARIMA_Like)
 
 it says that the C_ARIMA_Like cannot be found.
 
 This is on Windows 7, R version 3.0.2.
 
 Thank you for any help!
 Sincerely,
 Erin


Hi Erin,

If you are working from a binary install of R, you won't able to see the 
sources for C or FORTRAN based functions.

If it is a base package in the '../library' tree like 'stats', in the source 
tarball from CRAN or in the R SVN repo, there will be a 'src' directory for the 
package where relevant C and/or FORTRAN code will be contained.

As an example for arima.c, in the SVN repo for the 3.0 branch tree:

  
https://svn.r-project.org/R/branches/R-3-0-branch/src/library/stats/src/arima.c

For R-Devel, it will be in 'trunk':

  https://svn.r-project.org/R/trunk/src/library/stats/src/arima.c

Scroll down or search in the arima.c source for the function name ARIMA_Like.

If you know something about SVN repo trees, the path will make sense.

Other common C and/or FORTRAN code that is not part of the base packages may be 
in the ../src/main directory:

  https://svn.r-project.org/R/branches/R-3-0-branch/src/main/

and there is a file 'names.c' that can be helpful in locating specific C 
functions and their associated declared C names.

For Recommended packages, there is also a separate SVN repo at:

  https://svn.r-project.org/R-packages/

but it may be easier to download the tarball for each package from CRAN.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reference category in binomal glm

2014-05-27 Thread Marc Schwartz


On May 27, 2014, at 3:51 AM, Xebar Saram zelt...@gmail.com wrote:

 Hi all
 
 i know this is probably a silly question but im wondering what is the
 'reference' category when you run a binomal glm. that is my outcome/DV is
 0,1 and i run a regression and get coefficients. do the coefficients refer
 to the probability to get 0 or 1?
 
 thanks so much in advance
 
 Z


As per the Details section of ?glm:

A typical predictor has the form response ~ terms where response is the 
(numeric) response vector and terms is a series of terms which specifies a 
linear predictor for response. For binomial and quasibinomial families the 
response can also be specified as a factor (when the first level denotes 
failure and all others success) or as a two-column matrix with the columns 
giving the numbers of successes and failures.


Thus, if you have a numeric 0/1 response, you are predicting 1's and if you use 
a two level factor, you are predicting the second level of the factor.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Login

2014-05-27 Thread Marc Schwartz


On May 27, 2014, at 4:46 AM, Andy Siddaway andysidda...@googlemail.com wrote:

 Dear R help,
 
 I cannot login to my account. I am keen to remove the posting I made to R
 help from google web searches - see
 http://r.789695.n4.nabble.com/R-software-installation-problem-td4659556.html
 
 
 Thanks,
 
 Andy


You cannot. From the bottom of the R Posting Guide 
(http://www.r-project.org/posting-guide.html):

Posters should be aware that the R lists are public discussion lists and 
anything you post will be archived and accessible via several websites for many 
years.


There are a plethora of list archives on the web and there is no provision for 
removing specific posts from all sites that might possibly have a copy of your 
post. Google, by the way, is not the only search engine that will include and 
archive your post in searches.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summary to data frame in R!!

2014-05-07 Thread Marc Schwartz

On May 7, 2014, at 5:15 AM, Abhinaba Roy abhinabaro...@gmail.com wrote:

 Hi R-helpers,
 
 sumx - summary(mtcars[,c(mpg,disp)])
 sumx
  mpg disp
 Min.   :10.40   Min.   : 71.1
 1st Qu.:15.43   1st Qu.:120.8
 Median :19.20   Median :196.3
 Mean   :20.09   Mean   :230.7
 3rd Qu.:22.80   3rd Qu.:326.0
 Max.   :33.90   Max.   :472.0
 
 I want a dataframe as
 
 mpgdisp
 Min.  10.40   71.1
 1st Qu. 15.43  120.8
 Median 19.20  196.3
 Mean20.09  230.7
 3rd Qu. 22.80  326.0
 Max.  33.90  472.0
 
 How can it be done in R?
 -- 
 Regards
 Abhinaba Roy


summary(), when applied to multiple columns, as you are doing, returns a 
character table object:

 str(sumx)
 'table' chr [1:6, 1:2] Min.   :10.40   1st Qu.:15.43   ...
 - attr(*, dimnames)=List of 2
  ..$ : chr [1:6] ...
  ..$ : chr [1:2]  mpg  disp


Note that the actual table elements contain both character and numeric values 
that have been formatted.

If you use:

 sapply(mtcars[, c(mpg, disp)], summary)
  mpg  disp
Min.10.40  71.1
1st Qu. 15.42 120.8
Median  19.20 196.3
Mean20.09 230.7
3rd Qu. 22.80 326.0
Max.33.90 472.0

this applies the summary() function to each column separately, returning a 
numeric matrix:

 str(sapply(mtcars[, c(mpg, disp)], summary))
 num [1:6, 1:2] 10.4 15.4 19.2 20.1 22.8 ...
 - attr(*, dimnames)=List of 2
  ..$ : chr [1:6] Min. 1st Qu. Median Mean ...
  ..$ : chr [1:2] mpg disp


If you actually want a data frame, you can coerce the result:

 as.data.frame(sapply(mtcars[, c(mpg, disp)], summary))
  mpg  disp
Min.10.40  71.1
1st Qu. 15.42 120.8
Median  19.20 196.3
Mean20.09 230.7
3rd Qu. 22.80 326.0
Max.33.90 472.0


 str(as.data.frame(sapply(mtcars[, c(mpg, disp)], summary)))
'data.frame':   6 obs. of  2 variables:
 $ mpg : num  10.4 15.4 19.2 20.1 22.8 ...
 $ disp: num  71.1 120.8 196.3 230.7 326 ...


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] get element of list with default?

2014-04-15 Thread Marc Schwartz

On Apr 15, 2014, at 10:53 AM, Spencer Graves 
spencer.gra...@structuremonitoring.com wrote:

 Hello:
 
 
  Do you know of a simple function to return the value of a named element 
 of a list if that exists, and return a default value otherwise?
 
 
  It's an easy function to write (e.g., below).  I plan to add this to the 
 Ecfun package unless I find it in another CRAN package.
 
 
  Thanks,
  Spencer
 
 
getElement - function(element, default, list){
 #   get element of list;  return elDefault if absent
El - list[[element]]
if(is.null(El)){
El - default
}
El
}


Hi Spencer,

I don't know of a function elsewhere, but you can probably simplify the above 
with:

  getElement - function(element, default, list) {
ifelse(is.null(list[[element]]), default, list[[element]])
  }


MyList - list(L1 = 1, L2 = 2) 

 MyList
$L1
[1] 1

$L2
[1] 2



 getElement(L1, 5, MyList) 
[1] 1

 getElement(L2, 5, MyList) 
[1] 2

 getElement(L3, 5, MyList) 
[1] 5


You might want to think about the ordering of the function arguments, given 
typical use, for ease of calling it. For example:

  getElement - function(list, element, default = SomeValue)

Another consideration is that the above function will only get the element if 
it is a 'first level' element in the list. If it is in a sub-list of the main 
list, you would need to think about a recursive approach of some type, along 
the lines of what ?rapply does.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] get element of list with default?

2014-04-15 Thread Marc Schwartz


On Apr 15, 2014, at 11:22 AM, Marc Schwartz marc_schwa...@me.com wrote:

 On Apr 15, 2014, at 10:53 AM, Spencer Graves 
 spencer.gra...@structuremonitoring.com wrote:
 
 Hello:
 
 
 Do you know of a simple function to return the value of a named element 
 of a list if that exists, and return a default value otherwise?
 
 
 It's an easy function to write (e.g., below).  I plan to add this to the 
 Ecfun package unless I find it in another CRAN package.
 
 
 Thanks,
 Spencer
 
 
   getElement - function(element, default, list){
 #   get element of list;  return elDefault if absent
   El - list[[element]]
   if(is.null(El)){
   El - default
   }
   El
   }
 
 
 Hi Spencer,
 
 I don't know of a function elsewhere, but you can probably simplify the above 
 with:
 
  getElement - function(element, default, list) {
ifelse(is.null(list[[element]]), default, list[[element]])
  }
 
 
 MyList - list(L1 = 1, L2 = 2) 
 
 MyList
 $L1
 [1] 1
 
 $L2
 [1] 2
 
 
 
 getElement(L1, 5, MyList) 
 [1] 1
 
 getElement(L2, 5, MyList) 
 [1] 2
 
 getElement(L3, 5, MyList) 
 [1] 5
 
 
 You might want to think about the ordering of the function arguments, given 
 typical use, for ease of calling it. For example:
 
  getElement - function(list, element, default = SomeValue)
 
 Another consideration is that the above function will only get the element if 
 it is a 'first level' element in the list. If it is in a sub-list of the main 
 list, you would need to think about a recursive approach of some type, along 
 the lines of what ?rapply does.
 
 Regards,
 
 Marc Schwartz


Spencer,

A quick heads up here. I forgot, that there is already a function called 
getElement() in base R which appears to be designed to handle S4 objects and 
slots, lacking the default return value however, where it returns NULL if the 
'name' element is not present:

 getElement
function (object, name) 
{
if (isS4(object)) 
slot(object, name)
else object[[name, exact = TRUE]]
}
bytecode: 0x100905870
environment: namespace:base


Thus, I would suggest calling your variant something else, or wrap the default 
function in your version, if you need/want to handle S4 objects and slots.

Regards,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error logistic analysis

2014-04-08 Thread Marc Schwartz


On Apr 8, 2014, at 7:20 AM, lghansse lise.hanss...@ugent.be wrote:

 I'm trying to conduct a single level logistic analysis (as a beginning step
 for a more advanced Multi-level analysis). However, when I try to run it, I
 get following error:
 
 Warning messages:
 1: In model.response(mf, numeric) :
  using type = numeric with a factor response will be ignored
 2: In Ops.factor(y, z$residuals) : - not meaningful for factors
 
 I haven't got a clue why I'm getting this because I used the exact same
 syntax (same data preparation etc...) for a similar analysis (same
 datastructure, different country). 
 
 Syntax: 
 Single_model1 - lm(openhrs1 ~ genhealt1 + age + sexpat1 + hhincome1 +
 edupat1 
 + etniciteit1, data=Slovakije)
 
 My Missing data are coded as such, I already tried to run the analysis in a
 data frame without the missing cases, but that didn't work either. 


You are using the lm() function above, which is a regular least squares linear 
regression for a continuous response variable.

If you want to run a logistic regression, you need to use glm() with 'family = 
binomial':

Single_model1 - glm(openhrs1 ~ genhealt1 + age + sexpat1 + hhincome1 +
 edupat1 + etniciteit1, family = binomial, data = Slovakije)


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] moses extreme reaction test

2014-04-08 Thread Marc Schwartz


On Apr 8, 2014, at 12:37 PM, José Trujillo Carmona truji...@unex.es wrote:

 Is there a package that contains moses extreme reaction test?
 
 Thank's


A search using rseek.org indicates that the DescTools package on CRAN contains 
a function called  
MosesTest() that appears to implement it.

  http://cran.r-project.org/web/packages/DescTools/

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange sprintf Behavior

2014-04-02 Thread Marc Schwartz


On Apr 2, 2014, at 6:32 AM, Michael Smith my.r.h...@gmail.com wrote:

 All,
 
 I'm getting this:
 
 sprintf(%.17f, 0.8)
 [1] 0.80004
 
 Where does the `4` at the end come from? Shouldn't it be zero at the
 end? Maybe I'm missing something.


Hi,

First, please start a new thread when posting, do not just reply to an existing 
thread and change the subject line. Your post gets lost in the archive and is 
improperly linked to other posts.

Second, see the Most Frequently Asked Question:

  
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CORDIF test

2014-04-02 Thread Marc Schwartz

On Apr 2, 2014, at 8:09 AM, Elizabeth Caron-Gamache babeth_...@icloud.com 
wrote:

 Hi, 
 
 I search on your website for a definition of the CORDIF test, but it wasn’t 
 successful. I’m analyzing an article that use that test and it’s not really 
 documented on the net. The article refers to your website, so I pretend that 
 you will be able to give me a brief explanation of this test. Here is the 
 cote that talk about this test in my article : 
 
 ‘' To compare these regressions and to see which—either body height or LLL—is 
 best related to performance (Pearson correlation coefficients comparison), a 
 CORDIF test (R software [www.r-project.org], multilevel package, ver- sion 
 2.12.1) was performed.
 
 Does it use parametric or non-parametric values ?
 Is it a test to compare 2 groups only or it can be used for a comparison of 
 more than two groups ?
 Why is it so hard to find information on that test on the net ?
 
 Thanks for your time
 Have a nice day 
 
 Elizabeth Caron
 Physical therapist student, Laval University, Qc, Canada


Thanks for including the citation, which indicates that the CORDIF test is part 
of the 'multilevel' package, which is on CRAN:

  http://cran.r-project.org/web/packages/multilevel/index.html

The reason that it is likely difficult is that 'cordif' is an abbreviation for 
correlation difference, not the proper name for a test.

If you review the provided documentation for the package:

  http://cran.r-project.org/web/packages/multilevel/multilevel.pdf

you will see that there is a description of the cordif() function and a 
reference given:

Cohen, J.  Cohen, P. (1983). Applied multiple regression/correlation analysis 
for the behavioral sciences (2nd Ed.). Hillsdale, NJ: Lawrence Erlbaum 
Associates.

I would review the package documentation and reference and if you have further 
questions, contact the authors of the paper.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NA to NA

2014-03-31 Thread Marc Schwartz


On Mar 31, 2014, at 1:29 PM, eliza botto eliza_bo...@hotmail.com wrote:

 Dear useRs,
 Sorry for such a ridiculous question but i really need to know that what is 
 the difference between NA and NA and how to convert NA to NA.
 Thankyou very much in advance
 Eliza


NA is the printed output that you would typically get when NA is an element 
in a factor:

 factor(NA)
[1] NA

 is.na(factor(NA))
[1] TRUE

 NA
[1] NA

 is.na(NA)
[1] TRUE


See ?factor for additional details.

It is, other than the displayed output, the same as a 'normal' NA, which is to 
say that the value is missing and otherwise undefined.

The behavior appears to evolve from the use of ?encodeString, which is called 
within print.factor:

 encodeString(NA)
[1] NA

 encodeString(NA, na.encode = FALSE)
[1] NA


The default for the 'na.encode' argument is TRUE, so you get the formatting of 
the NA as you observe for factors.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] numeric to factor via lookup table

2014-03-28 Thread Marc Schwartz


On Mar 28, 2014, at 3:38 PM, Jonathan Greenberg j...@illinois.edu wrote:

 R-helpers:
 
 Hopefully this is an easy one.  Given a lookup table:
 
 mylevels - data.frame(ID=1:10,code=letters[1:10])
 
 And a set of values (note these do not completely cover the mylevels range):
 
 values - c(1,2,5,5,10)
 
 How do I convert values to a factor object, using the mylevels to
 define the correct levels (ID matches the values), and code is the
 label?
 
 --j


One approach would be to use ?merge and specify the 'by.*' arguments using 
column indices, where 'values' is column 1 and you want to match that with 
mylevels$ID, which is also column 1. Hence:

 merge(values, mylevels, by.x = 1, by.y = 1)
   x code
1  1a
2  2b
3  5e
4  5e
5 10j



Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R mail list archive Google search function not work

2014-03-27 Thread Marc Schwartz


On Mar 27, 2014, at 11:58 AM, Martin Maechler maech...@stat.math.ethz.ch 
wrote:

 Marc Schwartz marc_schwa...@me.com
on Wed, 26 Mar 2014 16:25:08 -0500 writes:
 
 On Mar 26, 2014, at 4:14 PM, David Winsemius dwinsem...@comcast.net wrote:
 
 
 On Mar 25, 2014, at 5:31 PM, Rolf Turner wrote:
 
 On 26/03/14 12:51, David Winsemius wrote:
 
 On Mar 25, 2014, at 9:52 AM, Luo Weijun wrote:
 
 Dear Robert and R project team,
 I notice that the Google search function on the R mail list archives 
 page has stopped working for quite a while, 
 http://tolstoy.newcastle.edu.au/R/.
 Is there any solution on this or this has been move to another webpage? 
 I know Google advance search can be used but the query is more 
 complicated. This simple function could help R users greatly. Thank you!
 Weijun
 
 Why not use MarkMail:
 http://markmail.org/search/?q=list%3Aorg.r-project.r-help
 
 Or Gmane
 http://dir.gmane.org/gmane.comp.lang.r.general/search/list:org.r-project.r-help
 
 
 Why not?  Well, for one thing the first link that the R web site points at 
 is the tolstoy.newcastle.edu.au/R/ link.  Which is *still there* but 
 terminates at 31 March 2012.
 
 
 I'm a bit confused about what is meant by the R web site. Are you 
 pointing out deficiencies in MarkMail or GMane?
 
 
 David,
 
 If you go to:
 
 http://www.r-project.org
 
 and look at the left hand navigation frame, there is a Search link there, 
 which brings up, in the right hand frame, a list of search sites 
 (http://www.r-project.org/search.html), which still includes Robert's site 
 at http://tolstoy.newcastle.edu.au/R/.
 
 The archives there seem to stop at 2012 and the Google search box there 
 seems to be non-functional from a quick check.
 
 I have not used his site in years and will use rseek.org these days.
 
 It seems to me that I recall discussion in the past about the status of 
 Robert's site, but cannot seem to locate anything at the moment. I am not 
 sure who maintains the R web site these days, but presumably that link 
 should be removed if the search engine is no longer actively maintained.
 
 Regards,
 Marc Schwartz
 
 Thank you, Marc (and the other posters).
 R core has always been responsible for that, it is also an svn
 repos, mirrored daily to the web server.
 I have commented Robert King's mirror (and also added a bit
 about Nabble).
 You should be able to see the result within 24 hours.
 
 Martin


Thanks Martin!

Regards,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clinical significance - Equivalence test

2014-03-27 Thread Marc Schwartz


On Mar 27, 2014, at 8:53 AM, Manuel Carona unku...@gmail.com wrote:

 Hi,
 
 I have implemented a therapeutic intervention on two groups (one is a
 control group) and tested them in two moments using some assessment
 tools (with normative data). Now I want to compare the experimental
 group with the control group using clinical equivalence testing. To do
 this I need to specify a range of closeness (One for each assessment
 tool according to the specificity of this same tool) and do two
 one-tailed tests to test if the two groups are considered clinically
 equivalent in the first moment and on the end I want to compare the
 experimental group with the normative data (Here I have to add the mean
 and standard deviation of the normative sample because I don't have the
 normative sample).
 
 I know that R has a package named equivalence but I don't know how to do
 this kind of calculations with it. Is it even possible with the actual
 packages?
 
 Thanks in advance


I have not used it, but a quick review of the documentation the 'equivalence' 
would suggest that at least the tost() function might be what you need.

The being said, you may need to seek the assistance of a local statistician 
familiar with the methods and underlying theory to guide you, beyond simply 
performing the analyses. 

It is not clear from your description above if the two time points are a 
baseline and post-treatment pair for each subject, or if they represent 2 time 
points beyond baseline (3 measures per subject), which would make this a 
repeated measures scenario and more complicated. In addition, with multiple 
assessment tools, what multiple testing adjustments may be required to control 
the likelihood of Type I errors?

If this is a formal study, with a powered a priori hypothesis, all of this 
should have been pre-specified in the study protocol in the statistical 
analysis section (and possibly in a stand alone statistical analysis plan) by 
someone familiar with study designs of this type and the appropriate analytic 
methods.

There are also regulatory guidance documents (eg. FDA) and books that cover the 
design and analysis of bioequivalence studies and those should have served as a 
reference for such a study design.

Again, seeking local expertise would seem apropos here.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] completely different results for shapiro.test and ks.test

2014-03-27 Thread Marc Schwartz

On Mar 27, 2014, at 8:14 AM, Hermann Norpois hnorp...@gmail.com wrote:

 Hello,
 
 My main question is wheter my data is distributed normally. As the
 shapiro.test doesnt work for large
 data sets I prefer the ks.test.
 But I have some problems to understand the completely different p-values:
 
 ks.test (test, pnorm, mean (test), sd (test))
 
One-sample Kolmogorov-Smirnov test
 
 data:  test
 D = 0.0434, p-value = 0.1683
 alternative hypothesis: two-sided
 
 Warnmeldung:
 In ks.test(test, pnorm, mean(test), sd(test)) :
  für den Komogorov-Smirnov-Test sollten keine Bindungen vorhanden sein
 shapiro.test (test)
 
Shapiro-Wilk normality test
 
 data:  test
 W = 0.9694, p-value = 1.778e-10
 
 
 Generating some random data the difference is acceptable:
 
 nt - rnorm (200, mean=5, sd=1)
 ks.test (nt, pnorm, mean=5, sd=1)
 
One-sample Kolmogorov-Smirnov test
 
 data:  nt
 D = 0.0641, p-value = 0.3841
 alternative hypothesis: two-sided
 
 shapiro.test (nt)
 
Shapiro-Wilk normality test
 
 data:  nt
 W = 0.9933, p-value = 0.5045
 
 
 Thanks
 hermann

snip

The discussion here (and other similar ones) might be helpful:

  
http://stats.stackexchange.com/questions/362/what-is-the-difference-between-the-shapiro-wilk-test-of-normality-and-the-kolmog

You may also be served by searching the R-Help list archives for prior 
discussions on using normality tests and why they are essentially useless in 
practice.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R mail list archive Google search function not work

2014-03-26 Thread Marc Schwartz


On Mar 26, 2014, at 4:14 PM, David Winsemius dwinsem...@comcast.net wrote:

 
 On Mar 25, 2014, at 5:31 PM, Rolf Turner wrote:
 
 On 26/03/14 12:51, David Winsemius wrote:
 
 On Mar 25, 2014, at 9:52 AM, Luo Weijun wrote:
 
 Dear Robert and R project team,
 I notice that the Google search function on the R mail list archives page 
 has stopped working for quite a while, http://tolstoy.newcastle.edu.au/R/.
 Is there any solution on this or this has been move to another webpage? I 
 know Google advance search can be used but the query is more complicated. 
 This simple function could help R users greatly. Thank you!
 Weijun
 
 Why not use MarkMail:
 http://markmail.org/search/?q=list%3Aorg.r-project.r-help
 
 Or Gmane
 http://dir.gmane.org/gmane.comp.lang.r.general/search/list:org.r-project.r-help
 
 
 Why not?  Well, for one thing the first link that the R web site points at 
 is the tolstoy.newcastle.edu.au/R/ link.  Which is *still there* but 
 terminates at 31 March 2012.
 
 
 I'm a bit confused about what is meant by the R web site. Are you pointing 
 out deficiencies in MarkMail or GMane?


David,

If you go to:

  http://www.r-project.org

and look at the left hand navigation frame, there is a Search link there, 
which brings up, in the right hand frame, a list of search sites 
(http://www.r-project.org/search.html), which still includes Robert's site at 
http://tolstoy.newcastle.edu.au/R/.

The archives there seem to stop at 2012 and the Google search box there seems 
to be non-functional from a quick check.

I have not used his site in years and will use rseek.org these days.

It seems to me that I recall discussion in the past about the status of 
Robert's site, but cannot seem to locate anything at the moment. I am not sure 
who maintains the R web site these days, but presumably that link should be 
removed if the search engine is no longer actively maintained.

Regards,

Marc Schwartz


 
 
 cheers,
 
 Rolf
 
 David Winsemius
 Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Duplicate of columns when merging two data frames

2014-03-13 Thread Marc Schwartz


On Mar 13, 2014, at 10:19 AM, Stefano Sofia stefano.so...@regione.marche.it 
wrote:

 Dear list users,
 I have two data frames df1 and df2, where the columns of df1 are
 
 Sensor_RM Place_RM Station_RM Y_init_RM M_init_RM D_init_RM Y_fin_RM M_fin_RM 
 D_fin_RM
 
 and the columns of df2 are
 
 Sensor_RM Station_RM Place_RM Province_RM Region_RM Net_init_RM 
 GaussBoaga_EST_RM GaussBoaga_NORD_RM Gradi_Long_RM Primi_Long_RM 
 Secondi_Long_RM Gradi_Lat_RM Primi_Lat_RM Secondi_Lat_RM Long_Cent_RM 
 Lat_Cent_RM Height_RM
 
 When I merge the two data frames through
 
 df3 - merge(df1, df2, by=c(Sensor_RM, Station_RM))
 
 I get a new data frame with columns
 
 Sensor_RM Station_RM Place_RM.x Y_init_RM M_init_RM D_init_RM Y_fin_RM 
 M_fin_RM D_fin_RM Place_RM.y Province_RM Region_RM Net_init_RM 
 GaussBoaga_EST_RM GaussBoaga_NORD_RM Gradi_Long_RM Primi_Long_RM 
 Secondi_Long_RM Gradi_Lat_RM Primi_Lat_RM Secondi_Lat_RM Long_Cent_RM 
 Lat_Cent_RM Height_RM
 
 I am sure that df1$Place_RM and df2$Place_RM are equal. I checked it from the 
 shell using awk and diff.
 Why then I have a duplicate of Place_RM, i.e. Place_RM.x and Place_RM.y, and 
 only of them?
 
 Thank you for your help
 Stefano
 


From the Details section of ?merge:

If the columns in the data frames not used in merging have any common names, 
these have suffixes (.x and .y by default) appended to try to make the 
names of the result unique. If this is not possible, an error is thrown.


If you don't want both columns in the resultant data frame, use them in the 
'by' argument or remove one of them prior to merge()ing. If you use them in the 
'by' argument, be sure that they will be compared as exactly equal, which can 
be problematic if they are floating point values. If so, you would be better of 
subsetting one of the source data frames to remove the column first:

  df3 - merge(df1, 
   subset(df2, select = -Place_RM),
   by=c(Sensor_RM, Station_RM))
  

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generalizing a regex for retrieving numbers with and without scientific notation

2014-02-19 Thread Marc Schwartz


On Feb 19, 2014, at 12:26 PM, Morway, Eric emor...@usgs.gov wrote:

 I'm trying to extract all of the values from edm in the example below.
 However, the first attempt only retrieves the final number in the sequence
 since it is recorded using scientific notation.  The second attempt
 retrieves all of the numbers, but omits the scientific notation component
 of the final number.  How can I make the regular expression more general
 such that I get every value AND its corresponding E-value (i.e.,
 ...E-06), where pertinent?   I've spent time reading through ?regex, but
 my attempts to use the * character, where the preceding item will be
 matched zero or more times, have so far proven syntactically incorrect or
 generally unsuccessful.  .Appreciate the help, Eric
 
 edm -
 c(,param_value,6.301343,6.366305,6.431268,6.496230,6.561192,6.626155,9.091117E-06)
 
 param_values - strapply(edm,\\d+\\.\\d+E[-+]?\\d+, as.numeric,
 simplify=cbind)
 param_values
 #[1,] 9.091117e-06
 
 param_values - strapply(edm,\\d+\\.\\d+, as.numeric, simplify=cbind)
 param_values
 #[1,] 6.301343 6.366305 6.431268 6.49623 6.561192 6.626155 9.091117


If the individual elements of the vector are either numeric or non-numeric, why 
not just use:

 as.numeric(edm)
[1]   NA   NA 6.301343e+00 6.366305e+00 6.431268e+00
[6] 6.496230e+00 6.561192e+00 6.626155e+00 9.091117e-06
Warning message:
NAs introduced by coercion 


The non-numeric elements are returned as NA's, which you can remove by using 
?na.omit.

The only reason to use a regex would be if the individual elements themselves 
contained both numeric and non-numeric characters. If you then want to 
explicitly format numeric output (which would yield a character vector), you 
can use ?sprintf or ?format. Keep in mind the difference between how R *PRINTS* 
a numeric value and how R *STORES* a numeric value internally.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep for multiple pattern?

2014-02-13 Thread Marc Schwartz


On Feb 13, 2014, at 8:43 AM, Rainer M Krug rai...@krugs.de wrote:

 Hi
 
 I want to search for multiple pattern as grep is doing for a single
 pattern, but this obviously not work:
 
 grep(an, month.name)
 [1] 1
 grep(em, month.name)
 [1]  9 11 12
 grep(eb, month.name)
 [1] 2
 grep(c(an, em, eb), month.name)
 [1] 1
 Warning message:
 In grep(c(an, em, eb), month.name) :
  argument 'pattern' has length  1 and only the first element will be used
 
 
 Is there an equivalent which returns the positions as grep is doing, but
 not using the strict full-string matching of match()?
 
 I could obviously do:
 
 unlist( sapply(pat, grep, month.name ) )
 an em1 em2 em3  eb
  1   9  11  12   2
 
 but is there a more compact command I am missing?
 
 Thanks,
 
 Rainer


The vertical bar '|' acts as a logical 'or' operator in regex expressions:

 grep(an|em|eb, month.name)
[1]  1  2  9 11 12

 grep(an|em|eb, month.name, value = TRUE)
[1] January   February  September November  December 


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sensitivity analysis - minimum effect size detectable by a binomial test

2014-02-05 Thread Marc Schwartz

On Feb 5, 2014, at 10:05 AM, Simone misen...@hotmail.com wrote:

 Hi all,
 
 I have performed a binomial test to verify if the number of males in a study 
 is significantly different from a null hypothesis (say, H0:p of being a male= 
 0.5).
 For instancee:
 binom.test(10, 30, p=0.5, alternative=two.sided, conf.level=0.95)
 
Exact binomial test
 
 data:  10 and 30
 number of successes = 10, number of trials =
 30, p-value = 0.09874
 alternative hypothesis: true probability of success is not equal to 0.5
 95 percent confidence interval:
 0.1728742 0.5281200
 sample estimates:
 probability of success 
 0.333 
 
 This way I get the estimated proportion of males (in this case p of success) 
 that is equal to 0.33 and an associated p-value (this is not significant at 
 alpha=0.05 with respect to the H0:P=0.5).
 
 Now, I want to know, given a power of, say, 0.8, alpha=0.05 and the above 
 sample size (30), what is the minimum proportion of males as low or as high 
 (two sided) like to be significantly detected with respect to a H0 (not 
 necessarily H0:P=0.5 - I am interested also in other null hypotheses). In 
 other words, I would have been able to detect a significant deviation from 
 the H0 for a given power, alpha and sample size if the proportion of males 
 would have been more than Xhigh or less than Xlow.
 
 I have had a look at the pwr package but it seems to me it doesn't allow to 
 calculate this.
 I would appreciate very much any suggestion.


Take a look at ?power.prop.test, where you can specify that one of the 
proportions is NULL, yielding the value you seek:


 power.prop.test(n = 30, p1 = 0.5, p2 = NULL, power = 0.8, sig.level = 0.05) 

 Two-sample comparison of proportions power calculation 

  n = 30
 p1 = 0.5
 p2 = 0.834231
  sig.level = 0.05
  power = 0.8
alternative = two.sided

NOTE: n is number in *each* group


The value for 'p2' is your high value for the detectible difference from a 
proportion of 0.5, given the other parameters. 1 - p2 would be your low value.


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating an equivalent of r-help on r.stackexchange.com ?

2014-02-04 Thread Marc Schwartz


On Feb 3, 2014, at 8:54 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote:

 On Mon, Feb 3, 2014 at 8:41 PM, Marc Schwartz marc_schwa...@me.com wrote:
 
 Hi All,
 
 As I have noted in a prior reply in this thread, which began last November, 
 I don't post in SO, but I do keep track of the traffic there via RSS feeds. 
 However, the RSS feeds are primarily for new posts and do not seem to update 
 with follow ups to the initial post.
 
 I do wish that they would provide an e-mail interface, which would help to 
 address some of the issues raised here today. They do provide notifications 
 on comments to posts, as do many other online fora. However, there is no 
 routine mailing of new posts with a given tag (eg. 'R'), at least as far as 
 I can see, as I had searched there previously for that functionality. That 
 would be a nice push based approach, as opposed to having to go to the web 
 site.
 
 
 You can set up email subscriptions for specific tags.  See the
 preferences section of your account.  I get regular emails of the
 r_filter.
 Here are the first few lines of an email I juist received (I have
 pasted it into this text plain email but they are received as HTML and
 there are links to the specific questions).

snip

Thanks for the pointer Gabor. I did not have an account on SE/SO and had only 
searched the various help resources there attempting to find out what kind of 
e-mail push functionality was available. A number of posts had suggested a non 
real time e-mail ability, which indeed seems to be the case.

I went ahead and created an account to get a sense of what was available. As 
you note, you can sign up for e-mail subscriptions based upon various tag 
criteria. However, it would seem that you need to specify time intervals for 
the frequency of the e-mails. These can be daily, every 3 hours or every 15 
minutes. So there seems to be a polling/digest based process going on.

I created an e-mail subscription last evening and selected every 15 minutes. 
What appears to be happening is that the frequency of the e-mails actually 
varies. Overnight and this morning, I have e-mails coming in every 20 to 30 
minutes or more apart. It is not entirely clear what the trigger is, given the 
inconsistency in frequency. Perhaps the infrastructure is not robust enough to 
support a more consistent polling/digest e-mail capability yet.

The e-mails contain snippets of new questions only and not responses 
(paralleling the RSS feed content). I need to actually go to the web site to 
see the full content of the question and to see if the question has been 
answered. In most cases, by the time that I get to the site, even right away 
after getting the e-mail, there are numerous replies already present. There is, 
of course, no way to respond via e-mail.

I would say that if one is looking for an efficient e-mail based interface to 
SE/SO, it does not exist at present. It is really designed as a web site only 
interaction, where you are likely going to need to have a browser continuously 
open to the respective site or sites in order to be able to interact 
effectively, if it is your intent to monitor and to respond in a timely fashion 
to queries. 

Alternatively, perhaps a real-time or near real-time updating RSS feed reader 
might make more sense for the timeliness of knowing about new questions. It is 
not clear to me how those who respond quickly (eg. within minutes) are 
interacting otherwise.

There appear to be some browser extensions to support notifications (eg. for 
Chrome), but again, you need to have your browser open. There also appear to be 
some desktop apps in alpha/beta stages that might be helpful. However, they 
seem to track new comments to questions that are specifically being followed 
(eg. questions that you have posted), rather than all new questions, thus 
paralleling the SE/SO Inbox content.

That being said, obviously, a lot of people are moving in that direction given 
the traffic decline here and the commensurate increase there.

Regards,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating an equivalent of r-help on r.stackexchange.com ?

2014-02-03 Thread Marc Schwartz

Hi All,

As I have noted in a prior reply in this thread, which began last November, I 
don't post in SO, but I do keep track of the traffic there via RSS feeds. 
However, the RSS feeds are primarily for new posts and do not seem to update 
with follow ups to the initial post.

I do wish that they would provide an e-mail interface, which would help to 
address some of the issues raised here today. They do provide notifications on 
comments to posts, as do many other online fora. However, there is no routine 
mailing of new posts with a given tag (eg. 'R'), at least as far as I can see, 
as I had searched there previously for that functionality. That would be a nice 
push based approach, as opposed to having to go to the web site.

I appreciate Don's comments regarding too many web site logins and too many 
passwords. Slight digression. The reality of constant security breaches of web 
sites has led me to use 1Password, such that I have a unique, randomly 
generated, strong password for almost every site that I login to (where I can 
control the password and login). I don't have to remember user IDs and 
passwords. With the multiple browser plug-ins for the application on the 
desktop and mobile app support with cross platform syncing, this has become, 
operationally, a non-issue for me.

I think that Barry makes a good distinction here. Notwithstanding the 
gamification of posting on SO, the formalisms on SO are pretty well ingrained.

I do also think that the marketplace (aka R users) in many respects, is 
speaking with its fingers, in that traffic on R-Help continues to decline.

I am attaching an updated PDF of the list traffic from 1997-2013, which at the 
time that I posted it last year, was not yet complete for 2013, albeit, my 
projection for the year was fairly close.

You can see that since the peak in 2010 of 41,048 posts for the year, traffic 
in 2013 declined to 20,538, or roughly a 50% decline. Much of that decline was 
from 2012 to 2013, which I postulate, is a direct outcome of the snowballing 
use of SO primarily.

Not in the plot for this year, January of 2014 had 1,129 posts, as compared to 
January of 2013 with 2,182 posts, or roughly a 50% decline. So the trend 
continues this year. If January's relative decline holds for the remainder of 
the year, or worse, perhaps accelerates, we could end the year at a level of 
activity (~10k posts) on R-Help not seen since circa 2002.

I honestly don't know the answer to the question and don't know that SO is the 
singular solution, as Barry has noted. However, as a long time member of the 
community, do feel that discussion of the future of these lists is warranted.

Perhaps Duncan's prophecy of R-Help just passively fading away will indeed 
happen. If the current rate of decline in posts here continues, it will become 
a self-fulfilling prophecy, or at minimum, R-Help will be supporting a 
declining minority of R users. Is it then worth the time, energy and costs to 
maintain and host, or are those resources better directed elsewhere to yield 
greater value to the community?

Should this simply continue to be a passive process as the marketplace moves 
elsewhere, or should there be a proactive discussion and plan put in place to 
modify infrastructure and behavior to retain traffic here? I suspect that this 
year may very well be important temporally to the implications for whatever 
decisions are made.

Regards,

Marc Schwartz









R-Help-Annual.pdf
Description: Adobe PDF document



On Feb 3, 2014, at 6:34 PM, Barry Rowlingson b.rowling...@lancaster.ac.uk 
wrote:

 As one of the original ranters of hey lets move to StackOverflow a
 few years back (see my UseR! lightning talk from Warwick) I should
 probably stick my oar in.
 
 I don't think the SO model is a good model for all the discussions
 that go on on R-help.
 
 I think SO is a good model for questions that have fairly precise
 answers that are demonstrably 'correct'.
 
 I think a mailing list is a bad model for questions that have answers.
 Reasons? Well, I see an email thread, start reading it, eight messages
 in, somewhere in a mix of top-posted and bottom-posted content, I
 discover the original poster has said Yes thanks Rolf that works!.
 Maybe I've learnt something in that process, but maybe I had the
 answer too and I've just wasted my time reading that thread. With
 StackOverflow questioners accept an answer and you needn't waste
 time reading it. I've given up reading R-help messages with
 interesting question titles if there's more than two contributors and
 six messages, since its either wandered off-topic or been answered. I
 suspect that heuristic is less efficient than SO's answer accepted
 flag.
 
 SO questions are tagged. I can look at only the ggplot-tagged
 questions, or the 'spatial'-tagged questions, or ignore anything with
 'finance' in it. Mailing lists are a bit coarse-grained and rigid for
 that, and subject lines are often uninformative of the content

Re: [R] Handling large SAS file in R

2014-01-28 Thread Marc Schwartz

Dennis,

The key difference is that with R, you are, as always, dependent upon 
volunteers providing software at no charge to you, most of whom have full time 
(and then some) jobs. Those jobs (and in many cases, family) will be their 
priority, as I am sure is the case with Matt. 

Unless they are in a position where their employer specifically allows them to 
allocate a percentage of their work time to voluntary projects, like R, you are 
at the inevitable mercy of that volunteer's time and priorities.

In the case of Stat/Transfer, they are a profit motivated business with revenue 
tied directly to the sales of the application. Thus, they have a very different 
perspective on serving their paying customers and can allocate dedicated 
resources to the functionality in their application.

An alternative here would be for one of the for profit companies that sell and 
support R versions, to take on the task of providing some of these facilities 
and providing them back to the community as a service. But, that is up to them 
to consider in their overall business plan and the value that they perceive it 
brings to their products.

Regards,

Marc Schwartz

On Jan 28, 2014, at 9:59 AM, Dennis Fisher fis...@plessthan.com wrote:

 Colleagues
 
 Frank Harrell wrote that “you need to purchase Stat/Transfer, which I did 
 many years ago and continue to use.  
 
 But I don’t understand why the sas7bdat package (or something equivalent) 
 cannot reverse engineer the SAS procedures so that R users can read sas7bdat 
 files as well as StatTransfer.  I have been in contact with the maintainer, 
 Matt Shotwell, regarding bugs in the present version (0.4) and he wrote:
   it tends to languish just one or two items from the top of my TODO... I 
 hope to get back to it soon.
 I have also written to this bulletin board about the foreign package not 
 being able to process certain SAS XPT files (which StatTransfer handled 
 without any problem).
 
 I am a strong advocate of R and I have arranged work-arounds (using 
 StatTransfer) in these cases.  However, R users would benefit from the 
 ability of R to read any SAS file without intermediate software.   I would 
 offer to participate in any efforts to accomplish this but I think that it is 
 beyond my capabilities.  
 
 Dennis
 
 Message: 23
 Date: Mon, 27 Jan 2014 13:25:54 -0800 (PST)
 From: Frank Harrell f.harr...@vanderbilt.edu
 To: r-help@r-project.org
 Subject: Re: [R] Handlig large SAS file in R
 Message-ID: 1390857954542-4684250.p...@n4.nabble.com
 Content-Type: text/plain; charset=us-ascii
 
 For that you need to purchase Stat/Transfer.
 Frank
 
 
 hans012 wrote
 Hey Guys
 I have a .sas7bdat file of 1.79gb that i want to read.
 I am using the .sas7bdat package to read the file and after i typed the
 command read.sas7bdat('filename.sas7bdat') it has been 3 hours with no
 result so far.
 Is there a way that i can see the progress of the read? 
 Or is there another way to read the file with less computing time?
 I do not have access to SAS, the file was sent to me.
 
 Let me know what you guys think
 KR
 Hans
 
 
 Dennis Fisher MD
 P  (The P Less Than Company)
 Phone: 1-866-PLessThan (1-866-753-7784)
 Fax: 1-866-PLessThan (1-866-753-7784)
 www.PLessThan.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to read this data correctly

2014-01-24 Thread Marc Schwartz

Hi,

I don't know that it is a problem in R reading the file per se. It is more of 
an issue, as far as I can see, that read.xls() is not written to deal with some 
aspects of cell formatting of certain types. In this case, the cell is 
formatted using a financial format with Japanese Yen. I did not take the time 
to look through the Perl script included.

The intermediate CSV file that is created by the Perl script that opens and 
reads the Excel file contains:

-0.419547704894512
-[$¥-411]0.42

I captured this while running read.xls() under debug(), since the CSV is 
created as a temp file that is deleted upon function exit. It would seem that 
the financial cell data is not simply read as a numeric value.

The CSV file will then be directly converted to a data frame in R as is using 
read.csv() to result in:

 read.xls(Book1.xlsx, 1, header = FALSE)
  V1
1 -0.419547704894512
2  -[$¥-411]0.42



You may need to use alternative Excel file importing functions, such as 
XLConnect or similar, that provide more robust functionality. Of course, R 
itself does not have financial data types, thus there may yet need to be some 
form of post import data clean up, even with the other options depending upon 
how they function.

Regards,

Marc Schwartz



On Jan 24, 2014, at 2:49 PM, Christofer Bogaso bogaso.christo...@gmail.com 
wrote:

 Hi Rui,
 
 Thanks for your reply.
 
 However why you said, 'shouldn't read properly in R'?
 
 Basically I was looking for some way so that I would get -0.419547704894512
 value in R against cell F4  F7. Because F7 is linked with F4.
 
 Ofcourse I can open Excel file then format that cell accordingly. However I
 am looking for some way in R so to avoid any manual process.
 
 Thanks and regards,
 
 
 On Sat, Jan 25, 2014 at 1:21 AM, Rui Barradas ruipbarra...@sapo.pt wrote:
 
 Hello,
 
 Cell F7 has a formula, =F4, and when I open the file in excel, I get
 -¥0.42, which shouldn't read properly in R.
 
 The problem seems to be in the file, not in read.xls.
 
 Hope this helps,
 
 Rui Barradas
 
 Em 24-01-2014 19:22, Christofer Bogaso escreveu:
 
 Hi again,
 
 I need to read below xlsx file correctly (available here:
 http://snk.to/f-ch3exae5), and used following code (say, file is saved in
 F: drive)
 
 
 library(gdata)
 read.xls(f:/Book1.xlsx, 1, header = F)
 
   V1
 1 -0.419547704894512
 2 -[$¥-411]0.42
 
 
 
 However please notice that, in my original excel file the cells F4 and F7
 have essentially the same values. Therefore I should get
 -0.419547704894512, for either cases above.
 
 Any idea on how to achieve that, without opening the xlsx file manually
 and
 then formatting the cell before reading it in R?
 
 Thanks for your help

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] summary() and the mode

2014-01-23 Thread Marc Schwartz


On Jan 23, 2014, at 2:27 PM, Ruhil, Anirudh ru...@ohio.edu wrote:

 A student asked: Why does R's summary() command yield the Mean and the 
 Median, quartiles, min, and max but was written to exclude the Mode?
 
 I said I had no clue, googled the question without much luck, and am now 
 posting it to see if anybody knows why.
 
 Ani


It has been discussed various times over the years. Presuming that there is 
interest in knowing it, the problem is how to estimate the mode, depending upon 
the nature of the data. 

That is, if the data are discrete (eg. a factor), a simple tabulation using 
table() can yield the one or perhaps more than one, most frequently occurring 
value. In this case:

set.seed(1)
x - sample(letters, 500, replace = TRUE)
tab - table(x)

# Get the first maximum value
tab[which.max(tab)]




If the data are continuous, then strictly speaking the mode is not well defined 
and you need to utilize something along the lines of a density estimation. In 
that case:


set.seed(1)
x - rnorm(500)

# Get the density estimates
dx - density(x)

# Which value is at the peak
dx$x[which.max(dx$y)]

Visual inspection is also helpful in this case:

  plot(dx)
  abline(v = dx$x[which.max(dx$y)])


See ?table, ?density and ?which.max

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Doubt in simple merge

2014-01-17 Thread Marc Schwartz


On Jan 16, 2014, at 11:14 PM, kingsly ecoking...@yahoo.co.in wrote:

 Thank you dear friends.  You have cleared my first doubt.  
 
 My second doubt:
 I have the same data sets Elder and Younger. Elder - data.frame(
   ID=c(ID1,ID2,ID3),
   age=c(38,35,31))
 Younger - data.frame(
   ID=c(ID4,ID5,ID3),
   age=c(29,21,NA))
 
 
  Row ID3 comes in both data set. It has a value (31) in Elder while NA in 
 Younger.
 
 I need output like this.
 
 IDage
 ID1  38
 ID2  35
 ID3  31
 ID4  29
 ID5  21 
 
 Kindly help me.


First, there is a problem with the way in which you created Younger, where you 
have the NA as NA, which is a character and coerces the entire column to a 
factor, rather than a numeric:

 str(Younger)
'data.frame':   3 obs. of  2 variables:
 $ ID : Factor w/ 3 levels ID3,ID4,ID5: 2 3 1
 $ age: Factor w/ 3 levels 21,29,NA: 2 1 3

It then causes problems in the default merge():

DF - merge(Elder, Younger, by = c(ID, age), all = TRUE)

 str(DF)
'data.frame':   6 obs. of  2 variables:
 $ ID : Factor w/ 5 levels ID1,ID2,ID3,..: 1 2 3 3 4 5
 $ age: chr  38 35 31 NA ...


Note that 'age' becomes a character vector, again rather than numeric.

Thus:

Younger - data.frame(ID = c(ID4, ID5, ID3), age = c(29, 21, NA))

Now, when you merge as before, you get:

 str(merge(Elder, Younger, by = c(ID, age), all = TRUE))
'data.frame':   6 obs. of  2 variables:
 $ ID : Factor w/ 5 levels ID1,ID2,ID3,..: 1 2 3 3 4 5
 $ age: num  38 35 31 NA 29 21


 merge(Elder, Younger, by = c(ID, age), all = TRUE)
   ID age
1 ID1  38
2 ID2  35
3 ID3  31
4 ID3  NA
5 ID4  29
6 ID5  21


Presuming that you want to consistently remove any NA values that may arise 
from either data frame:

 na.omit(merge(Elder, Younger, by = c(ID, age), all = TRUE))
   ID age
1 ID1  38
2 ID2  35
3 ID3  31
5 ID4  29
6 ID5  21


See ?na.omit

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] barplot: segment-wise shading

2014-01-17 Thread Marc Schwartz


On Jan 16, 2014, at 9:09 PM, Martin Weiser weis...@natur.cuni.cz wrote:

 Jim Lemon píše v Pá 17. 01. 2014 v 13:21 +1100:
 On 01/17/2014 10:59 AM, Marc Schwartz wrote:
 
 ...
 Arggh.
 
 No, this is my error for not actually looking at the plot and presuming 
 that it would work.
 
 Turns out that it does work for a non-stacked barplot:
 
   barplot(VADeaths, angle = 1:20 * 10, density = 10, beside = TRUE)
 
 However, internally within barplot(), actually barplot.default(), the 
 manner in which the matrix is passed to an internal function called 
 xyrect() to draw the segments, is that entire columns are passed, rather 
 than the individual segments (counts), when the bars are stacked.
 
 As a result, due to the vector based approach used, only the first 5 values 
 of 'angle' are actually used, since there are 5 columns, rather than all 
 20. The same impact will be observed when using the default legend that is 
 created.
 
 Thus, I don't believe that there will be an easy (non kludgy) way to do 
 what you want, at least with the default barplot() function.
 
 You could fairly easily create/build your own function using ?rect, which 
 is what barplot() uses to draw the segments. I am not sure if lattice based 
 graphics can do this or perhaps using Hadley's ggplot based approach would 
 offer a possibility.
 
 Apologies for the confusion.
 
 Regards,
 
 Marc
 
 Hi Marc and Martin,
 When I saw the original message I tried to look at the code for the 
 barplot function to see if I could call the rectFill function from 
 plotrix into it. Unfortunately barplot is one of those internal 
 functions that are not at all easy to hack and I have never gotten 
 around to adding stacked bars to the barp function. I thought that 
 rectFill would allow you to use more easily discriminated fills than 
 angles that only differed by 18 degrees.
 
 Jim
 
 Hi,
 
 after Marc pointed me out where to look for, I hacked barplot.default a
 bit, so now it does what I want (I added segmentwise argument).
 Unfortunately, it works well with segmentwise = TRUE, but not with
 segmentwise = FALSE (default)
 With segmentwise = FALSE, density argument works only in 1/n-th of the
 segments, where n is the number of columns (it seems like it refuses to
 auto-multiplicate, but I do not know why).
 Any ideas?
 
 Martin
 
 Here is my hack of barplot:

code snipped

Martin,

This would be a good time to learn how to use the ?debug function and related 
tools to step through your code to see where it is failing. Roger Peng also has 
some good notes here:

  http://www.biostat.jhsph.edu/~rpeng/docs/R-debug-tools.pdf

Note that when 'segmentwise = TRUE' and there are no 'angle' or 'density' 
arguments provided, it also does not work correctly. You may want to set some 
defaults in that case, or issue an error message.

I suspect that something in the indexing/expansion code that you added is not 
working as desired, but you will need to step through the code to see where.

One thing that you might want to consider, if the situation that you have is 
rather specialized, create your own function as you have done, but if 
'segementwise = FALSE', then pass the arguments to barplot() so that the 
default function is used in that situation.

Regards,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Doubt in simple merge

2014-01-16 Thread Marc Schwartz

Not quite:

 rbind(Elder, Younger)
   ID age
1 ID1  38
2 ID2  35
3 ID3  31
4 ID4  29
5 ID5  21
6 ID3  31

Note that ID3 is duplicated.


Should be:

 merge(Elder, Younger, by = c(ID, age), all = TRUE)
   ID age
1 ID1  38
2 ID2  35
3 ID3  31
4 ID4  29
5 ID5  21


He wants to do a join on both ID and age to avoid duplications of rows when 
the same ID and age occur in both data frames. If the same column names (eg 
Var) appears in both data frames and are not part of the 'by' argument, you 
end up with Var.x and Var.y in the result.

In the case of two occurrences of the same ID but two different ages, if that 
is possible, both rows would be added to the result using the above code.

Regards,

Marc Schwartz


On Jan 16, 2014, at 9:04 AM, Frede Aakmann Tøgersen fr...@vestas.com wrote:

 Ups, sorry that should have been
 
 mer - rbind(Elder, Younger)
 
 /frede
 
 
  Oprindelig meddelelse 
 Fra: Frede Aakmann Tøgersen
 Dato:16/01/2014 15.54 (GMT+01:00)
 Til: Adams, Jean ,kingsly
 Cc: R help
 Emne: Re: [R] Doubt in simple merge
 
 No I think the OP wants
 
 mer - merge(Elder, Younger)
 
 Br. Frede
 
 
  Oprindelig meddelelse 
 Fra: Adams, Jean
 Dato:16/01/2014 15.45 (GMT+01:00)
 Til: kingsly
 Cc: R help
 Emne: Re: [R] Doubt in simple merge
 
 You are telling it to merge by ID only.  But it sounds like you would like
 it to merge by both ID and age.
 
 merge(Elder, Younger, all=TRUE)
 
 Jean
 
 
 On Thu, Jan 16, 2014 at 6:25 AM, kingsly ecoking...@yahoo.co.in wrote:
 
 Dear R community
 
 I have a two data set called Elder and Younger.
 This is my code for simple merge.
 
 Elder - data.frame(
  ID=c(ID1,ID2,ID3),
  age=c(38,35,31))
 Younger - data.frame(
  ID=c(ID4,ID5,ID3),
  age=c(29,21,31))
 
 mer - merge(Elder,Younger,by=ID, all=T)
 
 Output I am expecting:
 
 IDage
 ID1  38
 ID2  35
 ID3  31
 ID4  29
 ID5  21
 
 It looks very simple.  But I need help.
 When I run the code it gives me age.x and age.y.
 thank you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] barplot: segment-wise shading

2014-01-16 Thread Marc Schwartz


On Jan 16, 2014, at 12:45 PM, Martin Weiser weis...@natur.cuni.cz wrote:

 Dear listers,
 
 I would like to make stacked barplot, and to be able to define shading
 (density or angle) segment-wise, i.e. NOT like here:
 # Bar shading example
 barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black,
 legend = rownames(VADeaths))
 
 The example has 5 different angles of shading, I would like to have as
 many possible angle values as there are segments (i.e. 20 in the
 VADeaths example).
 I was not successful using web search.
 Any advice?
 
 Thank you for your patience.
 With the best regards,
 Martin Weiser


You could do something like this:

# Get the dimensions of VADeaths
 dim(VADeaths)
[1] 5 4

# How many segments?
 prod(dim(VADeaths))
[1] 20


Then use that value in the barplot() arguments as you desire, for example:

  barplot(VADeaths, angle = 15 + 10 * 1:prod(dim(VADeaths)), 
  density = 20, col = black, legend = rownames(VADeaths))


or wrap the barplot() function in your own, which pre-calculates the values and 
then passes them to the barplot() call in the function.

See ?dim and ?prod

Be aware that a vector (eg. 1:5) will be 'dim-less', thus if you are going to 
use this approach for a vector based data object, you would want to use ?length

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] barplot: segment-wise shading

2014-01-16 Thread Marc Schwartz


On Jan 16, 2014, at 5:03 PM, Martin Weiser weis...@natur.cuni.cz wrote:

 Marc Schwartz píše v Čt 16. 01. 2014 v 16:46 -0600:
 On Jan 16, 2014, at 12:45 PM, Martin Weiser weis...@natur.cuni.cz wrote:
 
 Dear listers,
 
 I would like to make stacked barplot, and to be able to define shading
 (density or angle) segment-wise, i.e. NOT like here:
 # Bar shading example
barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black,
legend = rownames(VADeaths))
 
 The example has 5 different angles of shading, I would like to have as
 many possible angle values as there are segments (i.e. 20 in the
 VADeaths example).
 I was not successful using web search.
 Any advice?
 
 Thank you for your patience.
 With the best regards,
 Martin Weiser
 
 
 You could do something like this:
 
 # Get the dimensions of VADeaths
 dim(VADeaths)
 [1] 5 4
 
 # How many segments?
 prod(dim(VADeaths))
 [1] 20
 
 
 Then use that value in the barplot() arguments as you desire, for example:
 
  barplot(VADeaths, angle = 15 + 10 * 1:prod(dim(VADeaths)), 
  density = 20, col = black, legend = rownames(VADeaths))
 
 
 or wrap the barplot() function in your own, which pre-calculates the values 
 and then passes them to the barplot() call in the function.
 
 See ?dim and ?prod
 
 Be aware that a vector (eg. 1:5) will be 'dim-less', thus if you are going 
 to use this approach for a vector based data object, you would want to use 
 ?length
 
 Regards,
 
 Marc Schwartz
 
 
 Hello,
 
 thank you for your attempt, but this does not work (for me).
 This produces 5 angles of shading, not 20.
 Maybe because of my R version (R version 2.15.1 (2012-06-22); Platform:
 i486-pc-linux-gnu (32-bit))?
 
 Thank you.
 
 Regards,
 Martin Weiser


Arggh.

No, this is my error for not actually looking at the plot and presuming that it 
would work.

Turns out that it does work for a non-stacked barplot:

  barplot(VADeaths, angle = 1:20 * 10, density = 10, beside = TRUE)

However, internally within barplot(), actually barplot.default(), the manner in 
which the matrix is passed to an internal function called xyrect() to draw the 
segments, is that entire columns are passed, rather than the individual 
segments (counts), when the bars are stacked.

As a result, due to the vector based approach used, only the first 5 values of 
'angle' are actually used, since there are 5 columns, rather than all 20. The 
same impact will be observed when using the default legend that is created.

Thus, I don't believe that there will be an easy (non kludgy) way to do what 
you want, at least with the default barplot() function. 

You could fairly easily create/build your own function using ?rect, which is 
what barplot() uses to draw the segments. I am not sure if lattice based 
graphics can do this or perhaps using Hadley's ggplot based approach would 
offer a possibility.

Apologies for the confusion.

Regards,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different output for lm Mac vs PC

2014-01-15 Thread Marc Schwartz

Devin,

You should find out when and how that option was altered from the default, lest 
you find that virtually any modeling that you do on the Mac will be affected by 
that change, fundamentally altering the interpretation of the model results.

Regards,

Marc

On Jan 15, 2014, at 7:17 AM, CASENHISER, DEVIN M de...@uthsc.edu wrote:

 Yes that's it!
 
 My mac has:
 
options('contrasts')
   $contrasts
   [1] contr.sum  contr.poly
 
 
 whereas the PC has
 
   $contrasts
   unordered   ordered
   contr.treatment  contr.poly
 
 
 I've changed the mac with
 
   options(contrasts=c('contr.treatment','contr.poly'))
 
 
 and that has solved the issue.
 
 Thanks Greg and Marc!
 
 Cheers!
 Devin
 
 
 On 1/14/14 5:35 PM, Marc Schwartz marc_schwa...@me.com wrote:
 
 Good catch Greg.
 
 The Mac output observed can result from either:
 
 options(contrasts = c(contr.helmert, contr.poly))
 
 or
 
 options(contrasts = c(contr.sum, contr.poly))
 
 being run first, before calling the model code.
 
 I checked the referenced tutorial and did not see any steps pertaining to
 altering the default contrasts. So either code along the lines of the
 above was manually entered on the Mac at some point or perhaps there is a
 change to the defaults on Devin's Mac system? The latter perhaps in
 ~/.Rprofile to mimic S-PLUS' behavior, in the case of Helmert contrasts?
 
 Devin, note that the model output lines for both the intercept and sex,
 beyond the way in which 'sex' is displayed (sex1 versus sexmale), are
 rather different and are consistent with the use of non-default contrasts
 on the Mac, as Greg noted.
 
 Regards,
 
 Marc
 
 
 On Jan 14, 2014, at 3:55 PM, Greg Snow 538...@gmail.com wrote:
 
 I would suggest running the code:
 
 options('contrasts')
 
 on both machines to see if there is a difference.  Having the default
 contrasts set differently would be one explanation.
 
 On Tue, Jan 14, 2014 at 2:28 PM, Marc Schwartz marc_schwa...@me.com
 wrote:
 
 On Jan 14, 2014, at 2:23 PM, CASENHISER, DEVIN M de...@uthsc.edu
 wrote:
 
 I've noticed that I get different output when running a linear model
 on my Mac versus on my PC. Same effect, but the Mac assumes the
 predictor as a 0 level whereas the PC uses the first category
 (alphabetically).
 
 So for example (using Bodo Winter's example from his online linear
 models tutorial):
 
 pitch = c(233,204,242,130,112,142)
 sex=c(rep(female,3),rep(male,3))
 
 summary(lm(pitch~sex))
 
 My Mac, running R 3.0.2, outputs:
 
 Residuals:
1   2   3   4   5   6
 6.667 -22.333  15.667   2.000 -16.000  14.000
 
 Coefficients:
  Estimate Std. Error t value Pr(|t|)
 (Intercept)  177.167  7.201  24.601 1.62e-05 ***
 sex1  49.167  7.201   6.827  0.00241 **
 ---
 Signif. codes:  0 Œ***¹ 0.001 Œ**¹ 0.01 Œ*¹ 0.05 Œ.¹ 0.1 Œ ¹ 1
 
 Residual standard error: 17.64 on 4 degrees of freedom
 Multiple R-squared:  0.921, Adjusted R-squared:  0.9012
 F-statistic: 46.61 on 1 and 4 DF,  p-value: 0.002407
 
 But my PC, running R 3.0.2, outputs:
 
 Residuals:
1   2   3   4   5   6
 6.667 -22.333  15.667   2.000 -16.000  14.000
 
 Coefficients:
  Estimate Std. Error t value Pr(|t|)
 (Intercept)   226.33  10.18  22.224 2.43e-05 ***
 sexmale   -98.33  14.40  -6.827  0.00241 **
 ---
 Signif. codes:  0 Œ***¹ 0.001 Œ**¹ 0.01 Œ*¹ 0.05 Œ.¹ 0.1 Œ ¹ 1
 
 Residual standard error: 17.64 on 4 degrees of freedom
 Multiple R-squared:  0.921, Adjusted R-squared:  0.9012
 F-statistic: 46.61 on 1 and 4 DF,  p-value: 0.002407
 
 
 I understand that these are the same (correct) answer, but it does
 make it a little more challenging to follow examples (when learning or
 teaching) given that the coefficient outputs are calculated
 differently.
 
 I don't suppose that there is way to easily change either output so
 that they correspond (some setting I've overlooked perhaps)?
 
 Thanks and Cheers!
 Devin
 
 
 On my Mac with R 3.0.2, I get the same output as you get on your
 Windows machine.
 
 Something on your Mac is amiss, resulting in the recoding of 'sex'
 into a factor with presumably 0/1 levels rather than the default
 textual factor levels. If you try something like:
 
 model.frame(pitch ~ sex)
 
 the output should give you an indication of the actual data that is
 being used for your model in each case.
 
 Either you have other code on your Mac that you did not include above,
 which is modifying the contents of 'sex', or you have some other
 behavior going on in the default workspace.
 
 I would check for other objects in your current workspace on the Mac,
 using ls() for example, that might be conflicting. If you are running
 some type of GUI on your Mac (eg. the default R.app or perhaps
 RStudio), try running R from a terminal session, using 'R --vanilla'
 from the command line, to be sure that you are not loading a default
 workspace containing objects that are resulting in the altered

Re: [R] Subsetting on multiple criteria (AND condition) in R

2014-01-14 Thread Marc Schwartz

On Jan 14, 2014, at 1:38 PM, Jeff Johnson mrjeffto...@gmail.com wrote:

 I'm running the following to get what I would expect is a subset of
 countries that are not equal to US AND COUNTRY is not in one of my
 validcountries values.
 
 non_us - subset(mydf, (COUNTRY %in% validcountries)  COUNTRY != US,
 select = COUNTRY, na.rm=TRUE)
 
 however, when I then do table(non_us) I get:
 table(non_us)
 non_us
   AE AN AR AT AU BB BD BE BH BM BN BO BR BS CA CH CM CN CO CR CY DE DK DO
 EC ES
 0  3  0  2  1 31  4  1  1  1 45  1  1  4  5 86  3  1  8  1  2  1  8  2  1
 2  4
 FI FR GB GR GU HK ID IE IL IN IO IT JM JP KH KR KY LU LV MO MX MY NG NL NO
 NZ PA
 2  4 35  3  3 14  3  5  2  5  1  2  1 15  1 11  2  2  1  1 23  7  1  6  1
 3  1
 PE PG PH PR PT RO RU SA SE SG TC TH TT TW TZ US ZA
 2  1  1  8  1  1  1  1  1 18  1  1  2 11  1  0  3
 
 
 Notice US appears as the second to last. I expected it to NOT appear.
 
 Do you know if I'm using incorrect syntax? Is the  symbol equivalent to
 AND (notice I have 2 criteria for subsetting)? Also, is COUNTRY != US
 valid syntax? I don't get errors, but then again I don't get what I expect
 back.
 
 Thanks in advance!
 
 
 
 -- 
 Jeff


Review the Details section of ?subset, where you will find the following:

Factors may have empty levels after subsetting; unused levels are not 
automatically removed. See droplevels for a way to drop all unused levels from 
a data frame.


Your syntax is fine and the behavior is as expected.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different output for lm Mac vs PC

2014-01-14 Thread Marc Schwartz


On Jan 14, 2014, at 2:23 PM, CASENHISER, DEVIN M de...@uthsc.edu wrote:

 I've noticed that I get different output when running a linear model on my 
 Mac versus on my PC. Same effect, but the Mac assumes the predictor as a 0 
 level whereas the PC uses the first category (alphabetically).
 
 So for example (using Bodo Winter's example from his online linear models 
 tutorial):
 
 pitch = c(233,204,242,130,112,142)
 sex=c(rep(female,3),rep(male,3))
 
 summary(lm(pitch~sex))
 
 My Mac, running R 3.0.2, outputs:
 
 Residuals:
  1   2   3   4   5   6
  6.667 -22.333  15.667   2.000 -16.000  14.000
 
 Coefficients:
Estimate Std. Error t value Pr(|t|)
 (Intercept)  177.167  7.201  24.601 1.62e-05 ***
 sex1  49.167  7.201   6.827  0.00241 **
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 Residual standard error: 17.64 on 4 degrees of freedom
 Multiple R-squared:  0.921, Adjusted R-squared:  0.9012
 F-statistic: 46.61 on 1 and 4 DF,  p-value: 0.002407
 
 But my PC, running R 3.0.2, outputs:
 
 Residuals:
  1   2   3   4   5   6
  6.667 -22.333  15.667   2.000 -16.000  14.000
 
 Coefficients:
Estimate Std. Error t value Pr(|t|)
 (Intercept)   226.33  10.18  22.224 2.43e-05 ***
 sexmale   -98.33  14.40  -6.827  0.00241 **
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 Residual standard error: 17.64 on 4 degrees of freedom
 Multiple R-squared:  0.921, Adjusted R-squared:  0.9012
 F-statistic: 46.61 on 1 and 4 DF,  p-value: 0.002407
 
 
 I understand that these are the same (correct) answer, but it does make it a 
 little more challenging to follow examples (when learning or teaching) given 
 that the coefficient outputs are calculated differently.
 
 I don't suppose that there is way to easily change either output so that they 
 correspond (some setting I've overlooked perhaps)?
 
 Thanks and Cheers!
 Devin


On my Mac with R 3.0.2, I get the same output as you get on your Windows 
machine. 

Something on your Mac is amiss, resulting in the recoding of 'sex' into a 
factor with presumably 0/1 levels rather than the default textual factor 
levels. If you try something like:

  model.frame(pitch ~ sex)

the output should give you an indication of the actual data that is being used 
for your model in each case.

Either you have other code on your Mac that you did not include above, which is 
modifying the contents of 'sex', or you have some other behavior going on in 
the default workspace.

I would check for other objects in your current workspace on the Mac, using 
ls() for example, that might be conflicting. If you are running some type of 
GUI on your Mac (eg. the default R.app or perhaps RStudio), try running R from 
a terminal session, using 'R --vanilla' from the command line, to be sure that 
you are not loading a default workspace containing objects that are resulting 
in the altered behavior. Then re-try the example code. If that resolves the 
issue, you may want to delete, or at least rename/move the .RData file 
contained in your default working directory.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different output for lm Mac vs PC

2014-01-14 Thread Marc Schwartz

Good catch Greg.

The Mac output observed can result from either:

  options(contrasts = c(contr.helmert, contr.poly))

or

  options(contrasts = c(contr.sum, contr.poly))

being run first, before calling the model code.

I checked the referenced tutorial and did not see any steps pertaining to 
altering the default contrasts. So either code along the lines of the above was 
manually entered on the Mac at some point or perhaps there is a change to the 
defaults on Devin's Mac system? The latter perhaps in ~/.Rprofile to mimic 
S-PLUS' behavior, in the case of Helmert contrasts?

Devin, note that the model output lines for both the intercept and sex, beyond 
the way in which 'sex' is displayed (sex1 versus sexmale), are rather different 
and are consistent with the use of non-default contrasts on the Mac, as Greg 
noted.

Regards,

Marc


On Jan 14, 2014, at 3:55 PM, Greg Snow 538...@gmail.com wrote:

 I would suggest running the code:
 
 options('contrasts')
 
 on both machines to see if there is a difference.  Having the default
 contrasts set differently would be one explanation.
 
 On Tue, Jan 14, 2014 at 2:28 PM, Marc Schwartz marc_schwa...@me.com wrote:
 
 On Jan 14, 2014, at 2:23 PM, CASENHISER, DEVIN M de...@uthsc.edu wrote:
 
 I've noticed that I get different output when running a linear model on my 
 Mac versus on my PC. Same effect, but the Mac assumes the predictor as a 0 
 level whereas the PC uses the first category (alphabetically).
 
 So for example (using Bodo Winter's example from his online linear models 
 tutorial):
 
 pitch = c(233,204,242,130,112,142)
 sex=c(rep(female,3),rep(male,3))
 
 summary(lm(pitch~sex))
 
 My Mac, running R 3.0.2, outputs:
 
 Residuals:
 1   2   3   4   5   6
 6.667 -22.333  15.667   2.000 -16.000  14.000
 
 Coefficients:
   Estimate Std. Error t value Pr(|t|)
 (Intercept)  177.167  7.201  24.601 1.62e-05 ***
 sex1  49.167  7.201   6.827  0.00241 **
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 Residual standard error: 17.64 on 4 degrees of freedom
 Multiple R-squared:  0.921, Adjusted R-squared:  0.9012
 F-statistic: 46.61 on 1 and 4 DF,  p-value: 0.002407
 
 But my PC, running R 3.0.2, outputs:
 
 Residuals:
 1   2   3   4   5   6
 6.667 -22.333  15.667   2.000 -16.000  14.000
 
 Coefficients:
   Estimate Std. Error t value Pr(|t|)
 (Intercept)   226.33  10.18  22.224 2.43e-05 ***
 sexmale   -98.33  14.40  -6.827  0.00241 **
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 Residual standard error: 17.64 on 4 degrees of freedom
 Multiple R-squared:  0.921, Adjusted R-squared:  0.9012
 F-statistic: 46.61 on 1 and 4 DF,  p-value: 0.002407
 
 
 I understand that these are the same (correct) answer, but it does make it 
 a little more challenging to follow examples (when learning or teaching) 
 given that the coefficient outputs are calculated differently.
 
 I don't suppose that there is way to easily change either output so that 
 they correspond (some setting I've overlooked perhaps)?
 
 Thanks and Cheers!
 Devin
 
 
 On my Mac with R 3.0.2, I get the same output as you get on your Windows 
 machine.
 
 Something on your Mac is amiss, resulting in the recoding of 'sex' into a 
 factor with presumably 0/1 levels rather than the default textual factor 
 levels. If you try something like:
 
  model.frame(pitch ~ sex)
 
 the output should give you an indication of the actual data that is being 
 used for your model in each case.
 
 Either you have other code on your Mac that you did not include above, which 
 is modifying the contents of 'sex', or you have some other behavior going on 
 in the default workspace.
 
 I would check for other objects in your current workspace on the Mac, using 
 ls() for example, that might be conflicting. If you are running some type of 
 GUI on your Mac (eg. the default R.app or perhaps RStudio), try running R 
 from a terminal session, using 'R --vanilla' from the command line, to be 
 sure that you are not loading a default workspace containing objects that 
 are resulting in the altered behavior. Then re-try the example code. If that 
 resolves the issue, you may want to delete, or at least rename/move the 
 .RData file contained in your default working directory.
 
 Regards,
 
 Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replace to NA.

2014-01-06 Thread Marc Schwartz


On Jan 6, 2014, at 5:57 AM, vikram ranga babuaw...@gmail.com wrote:

 Dear All,
 
 I am bit stuck to a problem of replacing  to NA.
 I have big data set but here is the toy example:-
 
 test-data.frame(
 test1=c(,Hi,Hello),
 test2=c(Hi,,Bye),
 test3=c(Hello,,))
 
 If the data as in above, I could change all  to NA by this code:-
 
 for(i in 1:3){
 for(j in 1:3){
 if(test[j,i]==){
 test[j,i]=NA
 }
 }
 }
 
 but the problem arises if data frame has NA at some places
 
 test-data.frame(
 test1=c(,Hi,Hello),
 test2=c(Hi,NA,Bye),
 test3=c(Hello,,))
 
 the above loop script does not work on this data frame as NA is has
 logical class and does not return TRUE/FALSE.
 
 Can anyone provide some help?

snip


See ?is.na, which is used to test for NA values and is the canonical way to 
replace values with NA:

 test
  test1 test2 test3
1  Hi Hello
2Hi
3 Hello   Bye 


# Where test == , replace with NA
is.na(test) - test == 


 test
  test1 test2 test3
1  NAHi Hello
2Hi  NA  NA
3 Hello   Bye  NA


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] why there is no quarters?

2013-12-16 Thread Marc Schwartz


On Dec 15, 2013, at 6:11 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote:

 On 13-12-15 6:43 AM, 水静流深 wrote:
 seq(as.Date(2001/1/1),as.Date(2010/1/1),years)
 seq(as.Date(2001/1/1),as.Date(2010/1/1),weeks)
 seq(as.Date(2001/1/1),as.Date(2010/1/1),days)
 
 why there is no
 seq(as.Date(2001/1/1),as.Date(2010/1/1),quarters)  ?
 
 There's no need for it.  Just use months, and take every 3rd one:
   
 x - seq(as.Date(2001/1/1),as.Date(2010/1/1),months)
 x[seq_along(x) %% 3 == 1]


Alternatively, ?cut.Date has quarter for the 'breaks' argument:

x - seq(as.Date(2001/1/1), as.Date(2010/1/1), months)

xq - cut(x, breaks = quarter)

 head(xq, 10)
 [1] 2001-01-01 2001-01-01 2001-01-01 2001-04-01 2001-04-01 2001-04-01
 [7] 2001-07-01 2001-07-01 2001-07-01 2001-10-01
37 Levels: 2001-01-01 2001-04-01 2001-07-01 2001-10-01 ... 2010-01-01


If you want to change the values to use 2001-Q2 or variants, you can do 
something like:

S - c(01-01, 04-01, 07-01, 10-01)

xqq - paste(substr(xq, 1, 5), Q, match(substr(xq, 6, 10), S), sep = ) 

 head(xqq, 10)
 [1] 2001-Q1 2001-Q1 2001-Q1 2001-Q2 2001-Q2 2001-Q2
 [7] 2001-Q3 2001-Q3 2001-Q3 2001-Q4



See ?match, ?substr and ?paste


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] why there is no quarters?

2013-12-16 Thread Marc Schwartz

That will only work if your starting date happens to be the first day of the 
year:

x - seq(as.Date(2001/1/1), as.Date(2010/1/1), 3 months)

 head(x)
[1] 2001-01-01 2001-04-01 2001-07-01 2001-10-01 2002-01-01
[6] 2002-04-01


Compare that to:

x2 - seq(as.Date(2001/2/3), as.Date(2010/1/1), 3 months)

 head(x2, 10)
 [1] 2001-02-03 2001-05-03 2001-08-03 2001-11-03 2002-02-03
 [6] 2002-05-03 2002-08-03 2002-11-03 2003-02-03 2003-05-03


The 3 months is literally 3 months from the defined start date, not 3 months 
from the first of the year. So you are not going to get calendar quarter 
starting dates in that case.


On the other hand:

 cut(x2, breaks = quarter)
 [1] 2001-01-01 2001-04-01 2001-07-01 2001-10-01 2002-01-01 2002-04-01
 [7] 2002-07-01 2002-10-01 2003-01-01 2003-04-01 2003-07-01 2003-10-01
[13] 2004-01-01 2004-04-01 2004-07-01 2004-10-01 2005-01-01 2005-04-01
[19] 2005-07-01 2005-10-01 2006-01-01 2006-04-01 2006-07-01 2006-10-01
[25] 2007-01-01 2007-04-01 2007-07-01 2007-10-01 2008-01-01 2008-04-01
[31] 2008-07-01 2008-10-01 2009-01-01 2009-04-01 2009-07-01 2009-10-01
36 Levels: 2001-01-01 2001-04-01 2001-07-01 2001-10-01 ... 2009-10-01


Regards,

Marc Schwartz


On Dec 16, 2013, at 6:35 AM, Dániel Kehl ke...@ktk.pte.hu wrote:

 Hi,
 
 try
 
 x - seq(as.Date(2001/1/1),as.Date(2010/1/1),3 months)
 
 best,
 daniel
 
 Feladó: r-help-boun...@r-project.org [r-help-boun...@r-project.org] ; 
 meghatalmaz#243;: Pancho Mulongeni [p.mulong...@namibia.pharmaccess.org]
 Küldve: 2013. december 16. 13:05
 To: 1248283...@qq.com
 Cc: r-help@r-project.org
 Tárgy: Re: [R] why there is no quarters?
 
 Hi,
 I also would like to use quarters. I think a work around would be to just 
 label each record in the dataframe by its quarter.
 i.e. you add a factor called 'Quarter' with four levels (Q1 to Q4) for each 
 row and you assign the level based on the month of the date.
 You can easily do this with as.Date and as.character.
 
 Pancho Mulongeni
 Research Assistant
 PharmAccess Foundation
 1 Fouché Street
 Windhoek West
 Windhoek
 Namibia
 
 Tel:   +264 61 419 000
 Fax:  +264 61 419 001/2
 Mob: +264 81 4456 286

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exporting R graphics into Word without losing graph quality

2013-12-16 Thread Marc Schwartz


On Dec 16, 2013, at 8:39 AM, David Carlson dcarl...@tamu.edu wrote:

 This will create a simple plot using Windows enhanced metafile
 format:
 
 win.metafile(TestFigure.emf)
 plot(rnorm(25), rnorm(25))
 dev.off()
 null device 
  1 
 
 
 Windows does not read pdf.


This is correct for Office on Windows, not for Office on OSX. However, if you 
share the Office document created on OSX that has a PDF embedded with Windows 
Office users, they will see a bitmapped version of the graphic, rather than the 
PDF.


 It will offer to import an eps
 (encapsulated postscript) file, but it only imports the bitmap
 thumbnail image of the figure so it is completely useless.


Regarding EPS imports, this is NOT correct.

Word and the other Office apps will import the EPS file. It cannot render the 
postscript however, thus it will **display** a bitmapped preview image.

If you print the Word document using a PS compatible printer driver, you will 
get the full high quality vector based graphic output. If you print to a non-PS 
compatible printer, the bitmapped preview is what will be printed.

You may need to install EPS import filters for Office if they were not 
installed during the initial Office installation.

That being said, while it has been years since I was on Windows, I used to use 
the WMF/EMF format to import or just copy/paste into Word, when I needed a 
document containing an R plot that could be shared with others. In most cases, 
the image quality was fine.

Regards,

Marc Schwartz


 You
 can edit a metafile in Word, but different versions seem to have
 different issues. Earlier versions would lose clipping if you
 tried to edit the file, but World 2013 works reasonably well.
 Text labels can jump if you edit the figure in Word (especially
 rotated text) although it is simple to drag them back to where
 you want them. I haven't tried 2010 or 2007 recently.
 
 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 
 
 
 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Duncan
 Murdoch
 Sent: Sunday, December 15, 2013 5:24 PM
 To: david hamer; r-help@r-project.org
 Subject: Re: [R] Exporting R graphics into Word without losing
 graph quality
 
 On 13-12-15 6:00 PM, david hamer wrote:
 Hello,
 
 My x-y scatterplot produces a very ragged best-fit line when
 imported into
 Word.
 
 Don't use a bitmap format (png).
 
 Don't produce your graph in one format (screen display), then
 convert to 
 another (png).  Open the device in the format you want for the
 final file.
 
 Use a vector format for output.  I don't know what kinds Word
 supports, 
 but EPS or PDF would likely be best; if it can't read those,
 then 
 Windows metafile (via windows() to open the device) would be
 best. 
 (Don't trust the preview to tell you the quality of the graph,
 try 
 printing the document.  Word isn't quite as bad as it appears.)
 
 Don't use Word.
 
 Duncan Murdoch
 
 
 
 
 * plot (data.file$x, data.file$y, type = p, las=1, pch=20,
 ylab =
 expression(Cover of Species y ~ (m^{2}~ha^{-1} )),
 xlab =
 expression(Cover of Species x ~ (m^{2}~ha^{-1}))  )
 lines  (
 data.file$x,   fitted ( model.x )  )*
 
  A suggestion from the internet is to use .png at high (1200)
 resolution.
* dev.print  ( device = png,  file = R.graph.png,
 width = 1200,
 height = 700)*
 This gives a high-quality graph, but the titles and tick-mark
 labels become
 very tiny when exported into Word.
 
 I therefore increased the size of the titles and tick-mark
 labels with cex.
* plot (..cex =1.8, cex.lab = 1.8, cex.axis =
 1.25,)*
 But this causes the x-axis title to lie on top of the
 tick-mark labels.
 (This problem does not occur with the y-axis, where the title
 lies well
 away from the y-axis tick-mark labels.)
 Changing margins * par ( mai = c ( 1.3, 1.35, 1, .75 ) )*
 does not
 seem to have any effect on this.
 
 A suggestion from the internet is to delete the titles from
 plot, and use
 mtext with line=4 to drop the title lower on the graph.
 
 * plot (...  ylab =  , xlab =  .)mtext(side
 = 1, Cover
 of Species x (superscripts??), line = 4)*
 This works, but with mtext I have now lost the ability to have
 the
 superscripts in the axis title.
 
 And I am back full circle, having to lower the resolution of
 the graph to
 keep the x-axis title away from the axis, and thus reverting
 to a ragged,
 segmented line when exported to Word..
 
 Final note:  The R graphics window version of the graph
 becomes very
 distorted, even though the graph may be of high quality (other
 than the
 problem of the x-axis title overlaying the x-axis tick-mark
 labels) once in
 Word.  I guess this is because of using tricks to try to get
 a desired
 end-product in Word
 
 Thanks for any suggestions,
  David.

__
R-help@r

Re: [R] Should there be an R-beginners list?

2013-11-25 Thread Marc Schwartz

On Nov 25, 2013, at 7:56 AM, PIKAL Petr petr.pi...@precheza.cz wrote:

 Hi
 
 I doubt if people start to search answers if they often do not search them in 
 help pages and documentation provided. 
 
 I must agree with Duncan that if Stackoverflow was far more better than this 
 help list most people would seek advice there then here. Is there any 
 evidence in decreasing traffic here? 
 
 Anyway, similar discussion went in 2003 with outcome that was not in favour 
 for separate beginner list 
 http://tolstoy.newcastle.edu.au/R/help/03b/7944.html
 
 Petr
 
 BTW it is pitty that r help archive does not extend over year 2012. I found 
 that *Last message date: Tue 31 Jan 2012 - 12:19:21 GMT


Petr,

I may be confusing your final statement above, but the **main** R-Help archive 
is current to today:

  https://stat.ethz.ch/pipermail/r-help/

That being said, as one who has been interacting on R-Help (and other R-* 
lists) for a dozen years or so, I would have to say that one would need to have 
their head in the sand to not be cognizant of the dramatic decline in the 
traffic on R-Help in recent years. Simply keeping subjective track of the 
declining daily traffic ought to be sufficient.

Due to work related time constraints, my posting here in recent times has 
dropped notably. I do still read many of the R-Help posts and along with 
Martin, am co-moderator on R-Devel. So am still involved in that capacity.

I do follow SO and SE via RSS feed, so am aware of the increasing traffic 
there, albeit, I have not posted there.

In addition, there are a multitude of other online locations where R related 
posts have begun to accumulate. These include various LinkedIn groups, R 
related blogs, ResearchGate and others. I do believe, however, that SO is the 
dominant force in the shift of traffic.

To answer Petr's question above, I updated and re-ran some code that I had used 
some years ago to estimate the traffic on various lists/fora:

  https://stat.ethz.ch/pipermail/r-help/2009-January/184196.html

To that end, I am attaching a PDF file that contains a barplot of the annual 
R-Help traffic volume since 1997, through this month. The grey bars represent 
the actual annual traffic volumes of posts to R-Help.

For 2013, I added a red segment to the bar, which shows the projected number of 
posts for the full year, albeit, it is simply based upon the mean number of 
posts per day, averaged over the YTD volume, projected over the remaining days 
in the year, without any seasonal adjustments. So it may be optimistic, as we 
are coming into the holiday season for many.

Bottom line, while the trend was dramatically positive through 2010, peaking at 
a little over 41,000 total posts, the volume has just as dramatically declined 
in 2013 to a projected ~21,400. This means that the volume for 2013 has dropped 
back to the approximate volume of 2005.

Only time will tell if the dramatic decline will continue, or reach some new 
reasonable asymptote that is simply reflective of the distribution of traffic 
on various other online resources.

To the original query posted by Bert, I would say no, there is not a need for a 
beginner's list.

Regards,

Marc Schwartz




R-Help.pdf
Description: Adobe PDF document



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] convert one digit numbers to two digits one

2013-11-06 Thread Marc Schwartz

On Nov 6, 2013, at 10:25 AM, Alaios ala...@yahoo.com wrote:

 Hi all,
 the following returns the hour and the minutes
 
 paste(DataSet$TimeStamps[selectedInterval$start,4], 
 DataSet$TimeStamps[selectedInterval$start,5],sep=:)
 [1] 12:3
 
 the problem is that from these two I want to create a time stamp so 12:03. 
 The problem is that the number 3 is not converted to 03. Is there an easy way 
 when I have one digit integer to add a zero in the front? Two digits integers 
 are working fine so far, 12:19, or 12:45 would appear correctly
 
 I would like to thank you in advance for your help
 
 Regards
 Alex


This is an example where using ?sprintf gives you more control:

 sprintf(%02d:%02d, 12, 3)
[1] 12:03

 sprintf(%02d:%02d, 9, 3)
[1] 09:03


The syntax '%02d' tells sprintf to print the integer and pad with leading 
zeroes to two characters where needed.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Basic question: why does a scatter plot of a variable against itself works like this?

2013-11-06 Thread Marc Schwartz


On Nov 6, 2013, at 10:40 AM, Tal Galili tal.gal...@gmail.com wrote:

 Hello all,
 
 I just noticed the following behavior of plot:
 x - c(1,2,9)
 plot(x ~ x) # this is just like doing:
 plot(x)
 # when maybe we would like it to give this:
 plot(x ~ c(x))
 # the same as:
 plot(x ~ I(x))
 
 I was wondering if there is some reason for this behavior.
 
 
 Thanks,
 Tal


Hi Tal,

In your example:

  plot(x ~ x)

the formula method of plot() is called, which essentially does the following 
internally:

 model.frame(x ~ x)
  x
1 1
2 2
3 9

Note that there is only a single column in the result. Thus, the plot is based 
upon 'y' = c(1, 2, 9), while 'x' = 1:3, which is NOT the row names for the 
resultant data frame, but the indices of the vector elements in the 'x' column. 

This is just like:

  plot(c(1, 2, 9))


On the other hand:

 model.frame(x ~ c(x))
  x c(x)
1 11
2 22
3 99

 model.frame(x ~ I(x))
  x I(x)
1 11
2 22
3 99


In both of the above cases, you get two columns of data back, thus the result 
is essentially:

  plot(c(1, 2, 9), c(1, 2, 9))


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Questions about R

2013-11-06 Thread Marc Schwartz

On Nov 6, 2013, at 11:09 AM, Silvia Espinoza siles...@gmail.com wrote:

 Good morning. I am interested in downloading R.  I would appreciate if you
 can help me with the following questions, please.
 
 1.   Is R free, or I have to pay for support/maintenance, or it depends
 on the version? Is there a paid version?
 


Yes, it is free, although there are commercial versions of R available, if you 
decide that you do need/want commercial support.

Some additional info on commercial versions here:

  http://cran.r-project.org/doc/FAQ/R-FAQ.html#What-is-R_002dplus_003f


None of this has any effect on your ability to use R in a commercial setting, 
though there are some CRAN packages that do have such limitations: 

  
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Can-I-use-R-for-commercial-purposes_003f



 2.   How safe is it to work with data using R? Is there any risk that
 someone else can have access to the information?


That is outside of the scope of R and is dependent upon the security of the 
computer system(s) and possibly networks, upon and over which R is running and 
where your data is stored and managed.

Regards,

Marc Schwartz


 Thanks in advance for your attention and for any help you can provide me.
 
 Silvia Espinoza

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fail to install packages in R3.0.2 running in Redhat linux

2013-11-05 Thread Marc Schwartz


On Nov 5, 2013, at 4:38 AM, Mao Jianfeng jianfeng@gmail.com wrote:

 Dear R-helpers,
 
 Glad to write to you.
 
 I would like to have your helps to install packages through internet, in a
 linux computer. Could you please share any of your expertise with me on
 this problem?
 
 Thanks in advance.
 
 Best
 Jian-Feng,
 
 
 # check the problem here.
 install.packages(pkgs=ggplot2, repos='http://ftp.ctex.org/mirrors/CRAN/
 ')
 Installing package into ‘/checkpoints/home/jfmao/bin/R_library’
 (as ‘lib’ is unspecified)
 Error: Line starting 'html ...' is malformed!


The error suggests that there is a problem with the CRAN mirror that you have 
specified. I would try a different CRAN mirror and see if that resolves the 
problem.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fail to install packages in R3.0.2 running in Redhat linux

2013-11-05 Thread Marc Schwartz

Can you use those programs to get to the package tar file directly:

  http://ftp.ctex.org/mirrors/CRAN/src/contrib/ggplot2_0.9.3.1.tar.gz

If so, you might want to download it and then install as a local package 
installation on the remote server from the CLI (eg. using R CMD INSTALL ...).

You might also want to look at ?download.file for some additional hints on 
download methods which can be specified in the install.packages() call 
('methods' argument) and possible options to deal with proxies if that is the 
issue.

If you cannot use those programs to get to the tar file directly, it is 
possible that the remote linux server is blocked from accessing the CRAN mirror 
network. If so, check with a SysAdmin to see if there is something on the 
remote server that needs to be configured to allow you access to CRAN mirrors.

Regards,

Marc


On Nov 5, 2013, at 6:59 AM, Mao Jianfeng jianfeng@gmail.com wrote:

 Hi Marc,
 
 Thanks a lot for your reply.
 
 In fact, I am running R in a remote linux server. I am wondering there are 
 some special settings for Internet access in this server. I have ever tried 
 to use different CRAN mirrors, and failed. I can use lftp, wget, curl 
 to link to internet, in this server.
 
 So, do you have any ideas/tools/scripts on how to track the real problem, in 
 my case? 
 
 Best
 Jian-Feng,
 
 
 2013/11/5 Marc Schwartz marc_schwa...@me.com
 
 On Nov 5, 2013, at 4:38 AM, Mao Jianfeng jianfeng@gmail.com wrote:
 
  Dear R-helpers,
 
  Glad to write to you.
 
  I would like to have your helps to install packages through internet, in a
  linux computer. Could you please share any of your expertise with me on
  this problem?
 
  Thanks in advance.
 
  Best
  Jian-Feng,
 
  
  # check the problem here.
  install.packages(pkgs=ggplot2, repos='http://ftp.ctex.org/mirrors/CRAN/
  ')
  Installing package into /checkpoints/home/jfmao/bin/R_library
  (as lib is unspecified)
  Error: Line starting 'html ...' is malformed!
 
 
 The error suggests that there is a problem with the CRAN mirror that you have 
 specified. I would try a different CRAN mirror and see if that resolves the 
 problem.
 
 Regards,
 
 Marc Schwartz
 
 
 
 
 
 -- 
 Jian-Feng, Mao
 
 Post doc
 Forest Sciences Center
 University of British Columbia
 Vancouver, Canada


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] alternative for shell() in Mac

2013-11-04 Thread Marc Schwartz

On Nov 4, 2013, at 11:52 AM, Nicolas Gutierrez nicolas.gutier...@msc.org 
wrote:

 Hi All,
 
 I'm trying to run an ADMB function on R for Mac and need to find a substitute 
 for the Windows command shell(). I tried system() but I get the following 
 message:
 
 system(ADMBFile)
 /bin/sh: /Users/nicolas/Desktop/SPE/LBSPR_ADMB/L_AFun.exe: cannot execute 
 binary file
 
 Any hints please?
 
 Cheers,
 
 N


Why would you expect a Windows executable file (L_AFun.exe) to run on a 
non-Windows operating system?

This is not related to the system call, but that you are trying to run the 
wrong executable.

ADMB is presumably associated with AD Model Builder and you may be better off 
posting to the r-sig-mixed-models list:

  https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 'yum install R' failing with tcl/tk issue

2013-10-25 Thread Marc Schwartz


On Oct 25, 2013, at 1:29 AM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:

 On 25/10/2013 02:33, Michael Stauffer wrote:
 Hi,
 
 I'm trying to install R on CentOS 6.4.
 
 This is not the right list.  But
 
 - As the posting guide says, we only support current R here.  R 2.10.0 is 
 ancient, and other people seem to have found 3.0.1 RPMs for Centos 6.3.
 
 - It seems your RPM is linked against Tcl/Tk 8.4, also ancient.  Tcl/Tk 8.6 
 is current.
 
 I suggest you install R 3.0.2 from the sources, in which case R-devel would 
 be the right list.  For binary installations on CentOS, R-sig-Fedora is.
 

There are several inconsistencies in the output, as 3.0.1 is available as an 
RPM from the EPEL repos:

  http://dl.fedoraproject.org/pub/epel/6/x86_64/repoview/R.html

In addition, the output below shows that the R rpm being installed is from 
'el5', rather than 'el6'. If this was CentOS 5, rather than 6, R 2.15.2 is 
available:

  http://dl.fedoraproject.org/pub/epel/5/x86_64/repoview/R.html

Something seems to be amiss with the configuration not getting the right yum 
repo paths.

A Google search came up with this link:

  http://lancegatlin.org/tech/centos-6-clear-the-yum-cache

which might be helpful, as it suggests a similar issue of yum picking up 
incorrect versions. You may need to reinstall the EPEL repo RPM after these 
steps.

Regards,

Marc Schwartz


 
 Following some instructions online, I've done this:
 
 rpm -Uvh
 http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
 
 yum install R
 
 But yum fails, with this (full output below):
 
 Error: Package: R-core-2.10.0-2.el5.x86_64 (Rocks-6.1)
Requires: libtcl8.4.so()(64bit)
 Error: Package: R-core-2.10.0-2.el5.x86_64 (Rocks-6.1)
Requires: libtk8.4.so()(64bit)
 
 I have tcl/tk 8.5 already installed. Does anyone have any suggestion?
 Thanks!
 
 Full output:
 
 [root@picsl-cluster ~]# yum install R
 Repository base is listed more than once in the configuration
 Rocks-6.1
   | 1.9 kB 00:00
 base
| 3.7 kB 00:00
 Setting up Install Process
 Resolving Dependencies
 -- Running transaction check
 --- Package R.x86_64 0:2.10.0-2.el5 will be installed
 -- Processing Dependency: libRmath-devel = 2.10.0-2.el5 for package:
 R-2.10.0-2.el5.x86_64
 -- Processing Dependency: R-devel = 2.10.0-2.el5 for package:
 R-2.10.0-2.el5.x86_64
 -- Running transaction check
 --- Package R-devel.x86_64 0:2.10.0-2.el5 will be installed
 -- Processing Dependency: R-core = 2.10.0-2.el5 for package:
 R-devel-2.10.0-2.el5.x86_64
 --- Package libRmath-devel.x86_64 0:2.10.0-2.el5 will be installed
 -- Processing Dependency: libRmath = 2.10.0-2.el5 for package:
 libRmath-devel-2.10.0-2.el5.x86_64
 -- Running transaction check
 --- Package R-core.x86_64 0:2.10.0-2.el5 will be installed
 -- Processing Dependency: libtk8.4.so()(64bit) for package:
 R-core-2.10.0-2.el5.x86_64
 -- Processing Dependency: libtcl8.4.so()(64bit) for package:
 R-core-2.10.0-2.el5.x86_64
 -- Processing Dependency: libgfortran.so.1()(64bit) for package:
 R-core-2.10.0-2.el5.x86_64
 --- Package libRmath.x86_64 0:2.10.0-2.el5 will be installed
 -- Running transaction check
 --- Package R-core.x86_64 0:2.10.0-2.el5 will be installed
 -- Processing Dependency: libtk8.4.so()(64bit) for package:
 R-core-2.10.0-2.el5.x86_64
 -- Processing Dependency: libtcl8.4.so()(64bit) for package:
 R-core-2.10.0-2.el5.x86_64
 --- Package compat-libgfortran-41.x86_64 0:4.1.2-39.el6 will be installed
 -- Finished Dependency Resolution
 Error: Package: R-core-2.10.0-2.el5.x86_64 (Rocks-6.1)
Requires: libtcl8.4.so()(64bit)
 Error: Package: R-core-2.10.0-2.el5.x86_64 (Rocks-6.1)
Requires: libtk8.4.so()(64bit)
  You could try using --skip-broken to work around the problem
 ** Found 57 pre-existing rpmdb problem(s), 'yum check' output follows:
 foundation-git-1.7.11.4-0.x86_64 has missing requires of perl(SVN::Client)
 foundation-git-1.7.11.4-0.x86_64 has missing requires of perl(SVN::Core)
 foundation-git-1.7.11.4-0.x86_64 has missing requires of perl(SVN::Delta)
 foundation-git-1.7.11.4-0.x86_64 has missing requires of perl(SVN::Ra)
 1:guestfish-1.7.17-26.el6.x86_64 has missing requires of libguestfs = ('1',
 '1.7.17', '26.el6')
 opt-perl-AcePerl-1.92-0.el6.x86_64 has missing requires of
 perl(Ace::Browser::LocalSiteDefs)
 opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of
 perl(Apache::DBI)
 opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of
 perl(Bio::ASN1::EntrezGene)
 opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of
 perl(Bio::Expression::Contact)
 opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of
 perl(Bio::Expression::DataSet)
 opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of
 perl(Bio::Expression::Platform)
 opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of
 perl(Bio::Expression::Sample)
 opt-perl-BioPerl-1.6.901-0.el6.noarch has

Re: [R] Use R to plot a directory tree

2013-10-24 Thread Marc Schwartz

One R package that might be of interest would be 'diagram':

  http://cran.r-project.org/web/packages/diagram/

I would also agree with Bert here and would point you in the direction of 
PSTricks, which can handle these sorts of complex figures. It would of course 
require learning LaTeX, but that is a good thing. :-)

More info here:

  http://tug.org/PSTricks/main.cgi/

and lots of examples with code here:

 http://tug.org/PSTricks/main.cgi?file=examples


I use PSTricks for creating things like subject disposition flow charts for 
clinical study reports.

Regards,

Marc Schwartz

  
On Oct 24, 2013, at 8:47 AM, Bert Gunter gunter.ber...@gene.com wrote:

 A wild guess -- take a look at the CRAN phylohenetics task view, as
 that sounds like the sort of thing that might have tree generation and
 manipulation functions.
 
 ... but you may do better with some non-R tool out there.
 
 (Hopefully, you'll get a better response, though).
 
 Cheers,
 Bert
 
 On Thu, Oct 24, 2013 at 6:13 AM, Thaler,Thorn,LAUSANNE,Applied
 Mathematics thorn.tha...@rdls.nestle.com wrote:
 Dear all,
 
 I was wondering whether (or better: how) I can use R to read recursively a 
 directory to get all the sub-folders and files located in the root folder 
 and put it into a tree like structure where the leaves are files and 
 intermediate nodes are the directories? The idea is that I'd like to plot 
 the structure of a certain root folder to be able to restructure the file 
 system.
 
 Any ideas on that? I was googling a lot but apparently I did not use the 
 right terms (R tree folder or R tree directory takes me mainly to pages 
 about the R-tree a structure for spatial access methods [at least I learnt 
 something new ;)])
 
 Any pointer to the right function is highly appreciated.
 
 Cheers,
 
 Thorn Thaler
 NRC Lausanne
 Applied Mathematics
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 (650) 467-7374
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] installing package from source

2013-10-24 Thread Marc Schwartz


On Oct 24, 2013, at 11:38 AM, David Winsemius dwinsem...@comcast.net wrote:

 
 On Oct 23, 2013, at 7:53 PM, Long Vo wrote:
 
 Hi R users,
 Currently I want to fit a FIGARCH model to a dataset. The only package that
 allow for it that I could find is fGarch. However it seems that the FIGARCH
 model class fitting of this package has been moved to Oxmetrics. I tried to
 install the old versions of it using 'tar.gz' files from CRAN archive 
 http://cran.r-project.org/src/contrib/Archive/fGarch/
 http://cran.r-project.org/src/contrib/Archive/fGarch/   but not sure how
 it works. I tried
 
 install.packages(myfilepath\fGarch_260.71.tar.gz, repos = NULL,
 type=source)
 
 And received this error:
 
 Warning: invalid package './I:_R filesGarch_260.71.tar.gz'
 Error: ERROR: no packages specified
 Warning messages:
 1: running command 'I:/01_RFI~1/INSTAL~1/R-30~1.1/bin/i386/R CMD INSTALL
 -l I:\01_R files\installment\R-3.0.1\library ./I:_R files
 Garch_260.71.tar.gz' had status 1 
 2: In install.packages(I:\001_R files\fGarch_260.71.tar.gz, repos = NULL, 
 :
 installation of package ‘./I:_R filesGarch_260.71.tar.gz’ had non-zero
 exit status
 
 Any helps on this?
 
 
 I've aways specified the package names and their locations separately in my 
 call to install.packages, but I don't know if that is always needed. It also 
 appears that you have no / separator between your path and the file name.


Long is trying to install a rather old version of the source R package that 
contains FORTRAN code on Windows.

Besides the immediate error in the way the path was constructed in the 
install.packages() call, using a single backslash, which needs to be escaped:

  
http://cran.r-project.org/bin/windows/base/rw-FAQ.html#R-can_0027t-find-my-file

there are likely to be issues from trying to install an old version of the 
package on a newer version of R, perhaps the lack of the requisite development 
tools for compiling FORTRAN:

  
http://cran.r-project.org/bin/windows/base/rw-FAQ.html#Can-I-install-packages-into-libraries-in-this-version_003f

and other issues as well.

Depending upon how far back you need to go in package versions, there may be 
pre-compiled Windows binaries (.zip files) available in directories here:

  http://cran.r-project.org/bin/windows/contrib/



Regards,

Marc Schwartz


 
 Regards,
 Long

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 1726 matches

Mail list logo