[R] Spencer 15-point weighted moving average

2010-10-11 Thread Sam Thomas
I am trying to apply Spencer's 15-point weighted moving average filter
to the time series "shampoo," using the filter command, but I am not
sure if I am using the filter correctly:

 

library(fma)

sma15 <- c(-.009, -.019, -.016, .009, .066, .144, .209, .231, 

.209, .144, .066, .009, -.016, -.019,
-.009)

(s1 <- filter(shampoo, sma15))

 

This result does not match the "spence.15" command from package locfit

 

library(locfit)

spence.15(shampoo)

 

Any help understanding why these are different (or what I am doing wrong
with filter) would be appreciated.

 

Thanks, 

 

Sam Thomas 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pdf files in loops

2010-04-01 Thread Sam Thomas
Have you tried
plot(pic)

-Original Message-
From: Berwin A Turlach 
Sent: Wednesday, March 31, 2010 10:15 PM
To: James Rome 
Cc: r-help@r-project.org 
Subject: Re: [R] pdf files in loops

On Wed, 31 Mar 2010 22:06:48 -0400
James Rome  wrote:

> print() did not help, and I get strange messages about mode(onefile)
> not being changed:

Either:

R> pic <- histogram(...)
R> print(pic)

or

R> print(histogram(...))

> I would actually like all the histograms in one pdf file too...

Then move the pdf() and the dev.off() commands outside the loop.

HTH.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mosaic

2010-03-24 Thread Sam Thomas
I've typically used mosaic plots with categorical data - not sure what the 
numerical values represent.  

This fits a mosaic plot by rounding the values:  

library(vcd)
t <- xtabs(resp ~ sexo + regiao + ano, data = dados)
ft <- ftable(round(t))

cotabplot(ft, panel = cotab_coindep, type = "mosaic")



Sam Thomas

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Silvano
Sent: Wednesday, March 24, 2010 11:01 AM
To: r-help@r-project.org
Subject: [R] Mosaic

Hi,

I have this data set:

obitoss = c(
5.8,17.4,5.9,17.6,5.8,17.5,4.7,15.8,
3.8,13.4,3.8,13.5,3.7,13.4,3.4,13.6,
4.4,17.3,4.3,17.4,4.2,17.5,4.3,17.0,
4.4,13.6,5.1,14.6,5.7,13.5,3.6,13.3,
6.5,19.6,6.4,19.4,6.3,19.5,6.0,19.7)

(dados = data.frame(
regiao = factor(rep(c('Norte', 'Nordeste', 'Sudeste', 'Sul',
'Centro-Oeste'), each=8)),
ano = factor(rep(c('2000','2001','2002','2003'), each=2)),
sexo = factor(rep(c('F','M'), 4)), resp=obitoss))

I would like to make a mosaic to represent the numeric
variable depending on 3 variables. Does anyone know how to
do?

--
Silvano Cesar da Costa
Departamento de Estatística
Universidade Estadual de Londrina
Fone: 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ctree (party) changing font sizes in plots

2009-11-20 Thread Sam Thomas
When plotting Binary Trees (ctree) from the party package, is there a
way to adjust the font sizes of the leaves?  

 

require(party)

irisct <- ctree(Species ~ ., data = iris)

plot(irisct)

 

I want to adjust the font sizes for "Node 2", "Node 5", etc.  I'd also
like to be able to adjust the font sizes for the x-axis and y-axis
labels of the histograms.  

 

Thanks,

 

Sam Thomas

Revelant Technologies, LLC.

sam.tho...@revelanttech.com <mailto:sam.tho...@revelanttech.com> 

317-696-9214

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package "tm" fails to remove "the" with remove stopwords

2009-11-13 Thread Sam Thomas
Mark,

 

It looks like removeWords removed "the" in all instances except when
"the" was the first word in your text.   Maybe there is a parameter that
needs to be set?  I couldn't find anything on the help page.  

 

Here's an example of what I am seeing using the "crude" dataset

 

#function removeWords does not appear to remove the first word

require(tm)

data("crude")

crude[[1]]

removeWords(crude[[1]], "Shamrock")  #Second word removed

removeWords(crude[[1]], "Diamond")  #First word not removed

 

Sam Thomas

 

 

From: Mark Kimpel [mailto:mwkim...@gmail.com] 
Sent: Friday, November 13, 2009 11:47 AM
To: Sam Thomas
Cc: r-help@r-project.org; feine...@logic.at
Subject: Re: package "tm" fails to remove "the" with remove stopwords

 

Sam,

 

Thanks for the example. Removing stop words after the DocumentTermMatrix
has been created works fine if one is working with single words, but
what if one is creating a dtm of possible combinations of words?
Wouldn't one want to remove them from the corpus?

 

Mark


Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Skype No Voicemail please



On Thu, Nov 12, 2009 at 12:04 PM, Sam Thomas
 wrote:

I'm not sure what's wrong with your approach, but this seems to strip
"the"

 

require(tm)

params <- list(minDocFreq = 1, 

removeNumbers = TRUE,

stemming = TRUE,

stopwords = TRUE,

weighting = weightTf)

 

myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack
and jill ran up the hill", "to fetch a pail of water")

text.corp <- Corpus(VectorSource(myDocument))

dtm <- DocumentTermMatrix(text.corp, control = params)

dtm

dtm.mat <- as.matrix(dtm)

dtm.mat

 

 

From: Mark Kimpel [mailto:mwkim...@gmail.com] 
Sent: Thursday, November 12, 2009 11:30 AM
To: r-help@r-project.org; feine...@logic.at; Sam Thomas
Subject: package "tm" fails to remove "the" with remove stopwords

 

I am using code that previously worked to remove stopwords using package
"tm". Even manually adding "the" to the list does not work to remove
"the". This package has undergone extensive redevelopment with changes
to the function syntax, so perhaps I am just missing something. 

 

Please see my simple example, output, and sessionInfo() below.

 

Thanks!

Mark

 

require(tm)

myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack
and jill ran up the hill", "to fetch a pail of water")

text.corp <- Corpus(VectorSource(myDocument))

#

text.corp <- tm_map(text.corp, stripWhitespace)

text.corp <- tm_map(text.corp, removeNumbers)

text.corp <- tm_map(text.corp, removePunctuation)

## text.corp <- tm_map(text.corp, stemDocument)

text.corp <- tm_map(text.corp, removeWords, c("the",
stopwords("english")))

dtm <- DocumentTermMatrix(text.corp)

dtm

dtm.mat <- as.matrix(dtm)

dtm.mat

 

> dtm.mat

Terms

Docs falls fetch hill jack jill mainly pail plain rain ran spain the
water

   1 0 0000  00 01   0 1   1
0

   2 1 0000  10 10   0 0   0
0

   3 0 0111  00 00   1 0   0
0

   4 0 1000  01 00   0 0   0
1

 

R version 2.10.0 Patched (2009-10-27 r50222) 

x86_64-unknown-linux-gnu 

 

locale:

 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  

 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8

 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8   

 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 

 [9] LC_ADDRESS=C   LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

 

attached base packages:

[1] stats graphics  grDevices datasets  utils methods   base


 

other attached packages:

[1] chron_2.3-33 RWeka_0.3-23 tm_0.5-1

 

loaded via a namespace (and not attached):

[1] grid_2.10.0  rJava_0.8-1  slam_0.1-6   tools_2.10.0

 

 

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Skype No Voicemail please

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package "tm" fails to remove "the" with remove stopwords

2009-11-12 Thread Sam Thomas
I'm not sure what's wrong with your approach, but this seems to strip
"the"

 

require(tm)

params <- list(minDocFreq = 1, 

removeNumbers = TRUE,

stemming = TRUE,

stopwords = TRUE,

weighting = weightTf)

 

myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack
and jill ran up the hill", "to fetch a pail of water")

text.corp <- Corpus(VectorSource(myDocument))

dtm <- DocumentTermMatrix(text.corp, control = params)

dtm

dtm.mat <- as.matrix(dtm)

dtm.mat

 

 

From: Mark Kimpel [mailto:mwkim...@gmail.com] 
Sent: Thursday, November 12, 2009 11:30 AM
To: r-help@r-project.org; feine...@logic.at; Sam Thomas
Subject: package "tm" fails to remove "the" with remove stopwords

 

I am using code that previously worked to remove stopwords using package
"tm". Even manually adding "the" to the list does not work to remove
"the". This package has undergone extensive redevelopment with changes
to the function syntax, so perhaps I am just missing something. 

 

Please see my simple example, output, and sessionInfo() below.

 

Thanks!

Mark

 

require(tm)

myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack
and jill ran up the hill", "to fetch a pail of water")

text.corp <- Corpus(VectorSource(myDocument))

#

text.corp <- tm_map(text.corp, stripWhitespace)

text.corp <- tm_map(text.corp, removeNumbers)

text.corp <- tm_map(text.corp, removePunctuation)

## text.corp <- tm_map(text.corp, stemDocument)

text.corp <- tm_map(text.corp, removeWords, c("the",
stopwords("english")))

dtm <- DocumentTermMatrix(text.corp)

dtm

dtm.mat <- as.matrix(dtm)

dtm.mat

 

> dtm.mat

Terms

Docs falls fetch hill jack jill mainly pail plain rain ran spain the
water

   1 0 0000  00 01   0 1   1
0

   2 1 0000  10 10   0 0   0
0

   3 0 0111  00 00   1 0   0
0

   4 0 1000  01 00   0 0   0
1

 

R version 2.10.0 Patched (2009-10-27 r50222) 

x86_64-unknown-linux-gnu 

 

locale:

 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  

 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8

 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8   

 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 

 [9] LC_ADDRESS=C   LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

 

attached base packages:

[1] stats graphics  grDevices datasets  utils methods   base


 

other attached packages:

[1] chron_2.3-33 RWeka_0.3-23 tm_0.5-1

 

loaded via a namespace (and not attached):

[1] grid_2.10.0  rJava_0.8-1  slam_0.1-6   tools_2.10.0

 

 

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Skype No Voicemail please


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading and frequency analysis of Spanish text

2009-08-05 Thread Sam Thomas
I used the readDOC function in tm.  

After storing the document locally on a Windows pc...

langren.sp.path <- "C:\\text\\" #store file by itself in this directory

langren.corpus <- (Corpus(DirSource(langren.sp.path), readerControl = 
list(reader
= 
readDOC(AntiwordOptions = "-t"), language = "spa", load = TRUE)))

(langren.sp.file <- langren.corpus[[1]])[1:10]


I think the default encoding for antiword is latin1, but antiword -m option can 
handle other mappings.  

Sam Thomas

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Michael Friendly
Sent: Wednesday, August 05, 2009 2:19 PM
To: R-Help
Subject: [R] reading and frequency analysis of Spanish text

For an historical  paper I'm working on, I have some Spanish plaintext, 
presently in the form of a Word .doc
file,
http://euclid.psych.yorku.ca/SCS/Gallery/images/Private/Langren/Verdadera-spanish-stripped.doc

and also some ciphered text from the same original source.  The ultimate 
goal is to use some
frequency analysis of letters and word lengths in  the plaintext to help 
decode the ciphered text.

For now, I'm stuck on how to read the Spanish plaintext into R as a text 
string, given that it is in a Word .doc file
using some form of latin1 encoding.  From Word, I can Save As .. plain 
text (.txt), but I'm worried about losing
character encoding information and I don't see anything in the list of 
Other encodings presented that seems
helpful. 

A naive attempt to read the .doc file directly gives:

 > langren.sp.file <- 
"http://euclid.psych.yorku.ca/SCS/Gallery/images/Private/Langren/Verdadera-spanish-stripped.doc";
 >
 > langren.txt <- scan(langren.sp.file, encoding="latin1")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
na.strings,  :
  scan() expected 'a real', got 'ÐÏࡱá'
 >

Can someone help?

-- 
Michael Friendly Email: friendly AT yorku DOT ca 
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading data entered within an R program

2009-07-12 Thread Sam Thomas
threeftmetered

Sent from my Windows Mobile® phone.

-Original Message-
From: Muenchen, Robert A (Bob) 
Sent: Saturday, July 11, 2009 4:43 PM
To: R-help@r-project.org 
Subject: [R] Reading data entered within an R program

Dear R-helpers,

I know of two ways to reading data within an R program, using
textConnection and stdin (demo program below). I've Googled about and
looked in several books for comparisons of the two approaches but
haven't found anything. Are there any particular advantages or
disadvantages to these two approaches? If you were teaching R beginners,
which would you present?

Thanks,
Bob
http://RforSASandSPSSusers.com 


# R Program to Read Data Within a Program.
# Very similar to SAS datalines or cards statements,
# and SPSS BEGIN DATA / END DATA commands.

# This stores the data as one long text string.

mystring <-
"workshop,gender,q1,q2,q3,q4
01,1,f,1,1,5,1
02,2,f,2,1,4,1
03,1,f,2,2,4,3
04,2, ,3,1, ,3
05,1,m,4,5,2,4
06,2,m,5,4,5,5
07,1,m,5,3,4,4
08,2,m,4,5,5,5"

# The textConnection function allows read.csv to 
# read data from the text string just as it would 
# from a file.
# The leading zero on first column helps show that
# R is storing row names as a character vector.

mydata <- read.csv( textConnection(mystring) )
mydata


mydata <- read.csv( stdin() )
workshop,gender,q1,q2,q3,q4
01,1,f,1,1,5,1
02,2,f,2,1,4,1
03,1,f,2,2,4,3
04,2, ,3,1, ,3
05,1,m,4,5,2,4
06,2,m,5,4,5,5
07,1,m,5,3,4,4
08,2,m,4,5,5,5

#The blank line above tells R to stop reading.
mydata

# Read it again stripping out blanks and setting
# "nothing" to be missing for gender.

mydata <- read.csv( stdin(), strip.white=TRUE, na.strings="" )
workshop,gender,q1,q2,q3,q4
01,1,f,1,1,5,1
02,2,f,2,1,4,1
03,1,f,2,2,4,3
04,2, ,3,1, ,3
05,1,m,4,5,2,4
06,2,m,5,4,5,5
07,1,m,5,3,4,4
08,2,m,4,5,5,5

mydata

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.