[R] Spencer 15-point weighted moving average
I am trying to apply Spencer's 15-point weighted moving average filter to the time series "shampoo," using the filter command, but I am not sure if I am using the filter correctly: library(fma) sma15 <- c(-.009, -.019, -.016, .009, .066, .144, .209, .231, .209, .144, .066, .009, -.016, -.019, -.009) (s1 <- filter(shampoo, sma15)) This result does not match the "spence.15" command from package locfit library(locfit) spence.15(shampoo) Any help understanding why these are different (or what I am doing wrong with filter) would be appreciated. Thanks, Sam Thomas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pdf files in loops
Have you tried plot(pic) -Original Message- From: Berwin A Turlach Sent: Wednesday, March 31, 2010 10:15 PM To: James Rome Cc: r-help@r-project.org Subject: Re: [R] pdf files in loops On Wed, 31 Mar 2010 22:06:48 -0400 James Rome wrote: > print() did not help, and I get strange messages about mode(onefile) > not being changed: Either: R> pic <- histogram(...) R> print(pic) or R> print(histogram(...)) > I would actually like all the histograms in one pdf file too... Then move the pdf() and the dev.off() commands outside the loop. HTH. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mosaic
I've typically used mosaic plots with categorical data - not sure what the numerical values represent. This fits a mosaic plot by rounding the values: library(vcd) t <- xtabs(resp ~ sexo + regiao + ano, data = dados) ft <- ftable(round(t)) cotabplot(ft, panel = cotab_coindep, type = "mosaic") Sam Thomas -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Silvano Sent: Wednesday, March 24, 2010 11:01 AM To: r-help@r-project.org Subject: [R] Mosaic Hi, I have this data set: obitoss = c( 5.8,17.4,5.9,17.6,5.8,17.5,4.7,15.8, 3.8,13.4,3.8,13.5,3.7,13.4,3.4,13.6, 4.4,17.3,4.3,17.4,4.2,17.5,4.3,17.0, 4.4,13.6,5.1,14.6,5.7,13.5,3.6,13.3, 6.5,19.6,6.4,19.4,6.3,19.5,6.0,19.7) (dados = data.frame( regiao = factor(rep(c('Norte', 'Nordeste', 'Sudeste', 'Sul', 'Centro-Oeste'), each=8)), ano = factor(rep(c('2000','2001','2002','2003'), each=2)), sexo = factor(rep(c('F','M'), 4)), resp=obitoss)) I would like to make a mosaic to represent the numeric variable depending on 3 variables. Does anyone know how to do? -- Silvano Cesar da Costa Departamento de Estatística Universidade Estadual de Londrina Fone: 3371-4346 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ctree (party) changing font sizes in plots
When plotting Binary Trees (ctree) from the party package, is there a way to adjust the font sizes of the leaves? require(party) irisct <- ctree(Species ~ ., data = iris) plot(irisct) I want to adjust the font sizes for "Node 2", "Node 5", etc. I'd also like to be able to adjust the font sizes for the x-axis and y-axis labels of the histograms. Thanks, Sam Thomas Revelant Technologies, LLC. sam.tho...@revelanttech.com <mailto:sam.tho...@revelanttech.com> 317-696-9214 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package "tm" fails to remove "the" with remove stopwords
Mark, It looks like removeWords removed "the" in all instances except when "the" was the first word in your text. Maybe there is a parameter that needs to be set? I couldn't find anything on the help page. Here's an example of what I am seeing using the "crude" dataset #function removeWords does not appear to remove the first word require(tm) data("crude") crude[[1]] removeWords(crude[[1]], "Shamrock") #Second word removed removeWords(crude[[1]], "Diamond") #First word not removed Sam Thomas From: Mark Kimpel [mailto:mwkim...@gmail.com] Sent: Friday, November 13, 2009 11:47 AM To: Sam Thomas Cc: r-help@r-project.org; feine...@logic.at Subject: Re: package "tm" fails to remove "the" with remove stopwords Sam, Thanks for the example. Removing stop words after the DocumentTermMatrix has been created works fine if one is working with single words, but what if one is creating a dtm of possible combinations of words? Wouldn't one want to remove them from the corpus? Mark Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 399-1219 Skype No Voicemail please On Thu, Nov 12, 2009 at 12:04 PM, Sam Thomas wrote: I'm not sure what's wrong with your approach, but this seems to strip "the" require(tm) params <- list(minDocFreq = 1, removeNumbers = TRUE, stemming = TRUE, stopwords = TRUE, weighting = weightTf) myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and jill ran up the hill", "to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) dtm <- DocumentTermMatrix(text.corp, control = params) dtm dtm.mat <- as.matrix(dtm) dtm.mat From: Mark Kimpel [mailto:mwkim...@gmail.com] Sent: Thursday, November 12, 2009 11:30 AM To: r-help@r-project.org; feine...@logic.at; Sam Thomas Subject: package "tm" fails to remove "the" with remove stopwords I am using code that previously worked to remove stopwords using package "tm". Even manually adding "the" to the list does not work to remove "the". This package has undergone extensive redevelopment with changes to the function syntax, so perhaps I am just missing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm) myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and jill ran up the hill", "to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) # text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentTermMatrix(text.corp) dtm dtm.mat <- as.matrix(dtm) dtm.mat > dtm.mat Terms Docs falls fetch hill jack jill mainly pail plain rain ran spain the water 1 0 0000 00 01 0 1 1 0 2 1 0000 10 10 0 0 0 0 3 0 0111 00 00 1 0 0 0 4 0 1000 01 00 0 0 0 1 R version 2.10.0 Patched (2009-10-27 r50222) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] chron_2.3-33 RWeka_0.3-23 tm_0.5-1 loaded via a namespace (and not attached): [1] grid_2.10.0 rJava_0.8-1 slam_0.1-6 tools_2.10.0 Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 399-1219 Skype No Voicemail please [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package "tm" fails to remove "the" with remove stopwords
I'm not sure what's wrong with your approach, but this seems to strip "the" require(tm) params <- list(minDocFreq = 1, removeNumbers = TRUE, stemming = TRUE, stopwords = TRUE, weighting = weightTf) myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and jill ran up the hill", "to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) dtm <- DocumentTermMatrix(text.corp, control = params) dtm dtm.mat <- as.matrix(dtm) dtm.mat From: Mark Kimpel [mailto:mwkim...@gmail.com] Sent: Thursday, November 12, 2009 11:30 AM To: r-help@r-project.org; feine...@logic.at; Sam Thomas Subject: package "tm" fails to remove "the" with remove stopwords I am using code that previously worked to remove stopwords using package "tm". Even manually adding "the" to the list does not work to remove "the". This package has undergone extensive redevelopment with changes to the function syntax, so perhaps I am just missing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm) myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and jill ran up the hill", "to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) # text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentTermMatrix(text.corp) dtm dtm.mat <- as.matrix(dtm) dtm.mat > dtm.mat Terms Docs falls fetch hill jack jill mainly pail plain rain ran spain the water 1 0 0000 00 01 0 1 1 0 2 1 0000 10 10 0 0 0 0 3 0 0111 00 00 1 0 0 0 4 0 1000 01 00 0 0 0 1 R version 2.10.0 Patched (2009-10-27 r50222) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] chron_2.3-33 RWeka_0.3-23 tm_0.5-1 loaded via a namespace (and not attached): [1] grid_2.10.0 rJava_0.8-1 slam_0.1-6 tools_2.10.0 Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 399-1219 Skype No Voicemail please [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading and frequency analysis of Spanish text
I used the readDOC function in tm. After storing the document locally on a Windows pc... langren.sp.path <- "C:\\text\\" #store file by itself in this directory langren.corpus <- (Corpus(DirSource(langren.sp.path), readerControl = list(reader = readDOC(AntiwordOptions = "-t"), language = "spa", load = TRUE))) (langren.sp.file <- langren.corpus[[1]])[1:10] I think the default encoding for antiword is latin1, but antiword -m option can handle other mappings. Sam Thomas -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Friendly Sent: Wednesday, August 05, 2009 2:19 PM To: R-Help Subject: [R] reading and frequency analysis of Spanish text For an historical paper I'm working on, I have some Spanish plaintext, presently in the form of a Word .doc file, http://euclid.psych.yorku.ca/SCS/Gallery/images/Private/Langren/Verdadera-spanish-stripped.doc and also some ciphered text from the same original source. The ultimate goal is to use some frequency analysis of letters and word lengths in the plaintext to help decode the ciphered text. For now, I'm stuck on how to read the Spanish plaintext into R as a text string, given that it is in a Word .doc file using some form of latin1 encoding. From Word, I can Save As .. plain text (.txt), but I'm worried about losing character encoding information and I don't see anything in the list of Other encodings presented that seems helpful. A naive attempt to read the .doc file directly gives: > langren.sp.file <- "http://euclid.psych.yorku.ca/SCS/Gallery/images/Private/Langren/Verdadera-spanish-stripped.doc"; > > langren.txt <- scan(langren.sp.file, encoding="latin1") Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got 'ÐÏࡱá' > Can someone help? -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading data entered within an R program
threeftmetered Sent from my Windows Mobile® phone. -Original Message- From: Muenchen, Robert A (Bob) Sent: Saturday, July 11, 2009 4:43 PM To: R-help@r-project.org Subject: [R] Reading data entered within an R program Dear R-helpers, I know of two ways to reading data within an R program, using textConnection and stdin (demo program below). I've Googled about and looked in several books for comparisons of the two approaches but haven't found anything. Are there any particular advantages or disadvantages to these two approaches? If you were teaching R beginners, which would you present? Thanks, Bob http://RforSASandSPSSusers.com # R Program to Read Data Within a Program. # Very similar to SAS datalines or cards statements, # and SPSS BEGIN DATA / END DATA commands. # This stores the data as one long text string. mystring <- "workshop,gender,q1,q2,q3,q4 01,1,f,1,1,5,1 02,2,f,2,1,4,1 03,1,f,2,2,4,3 04,2, ,3,1, ,3 05,1,m,4,5,2,4 06,2,m,5,4,5,5 07,1,m,5,3,4,4 08,2,m,4,5,5,5" # The textConnection function allows read.csv to # read data from the text string just as it would # from a file. # The leading zero on first column helps show that # R is storing row names as a character vector. mydata <- read.csv( textConnection(mystring) ) mydata mydata <- read.csv( stdin() ) workshop,gender,q1,q2,q3,q4 01,1,f,1,1,5,1 02,2,f,2,1,4,1 03,1,f,2,2,4,3 04,2, ,3,1, ,3 05,1,m,4,5,2,4 06,2,m,5,4,5,5 07,1,m,5,3,4,4 08,2,m,4,5,5,5 #The blank line above tells R to stop reading. mydata # Read it again stripping out blanks and setting # "nothing" to be missing for gender. mydata <- read.csv( stdin(), strip.white=TRUE, na.strings="" ) workshop,gender,q1,q2,q3,q4 01,1,f,1,1,5,1 02,2,f,2,1,4,1 03,1,f,2,2,4,3 04,2, ,3,1, ,3 05,1,m,4,5,2,4 06,2,m,5,4,5,5 07,1,m,5,3,4,4 08,2,m,4,5,5,5 mydata __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.