[R] twitteR and wordcloud()

Doran, Harold Wed, 11 Mar 2015 16:50:31 -0700

I am trying to replicate the twitter and word cloud example found here

https://sites.google.com/site/miningtwitter/questions/talking-about/wordclouds/wordcloud1


When implemented verbatim, I replicate results and all works fine. But, when I 
make a slight modification to the code it fails in creating the tdm matrix. I 
found only one other question on this same topic at stack overflow with no 
answer leading to a solution.

Here is my code for a reproducible example, though you would need the twitteR 
tokens etc to run this on your own.

Any idea why the tdm step fails?

library(twitteR)
library(tm)
library(wordcloud)
library(RColorBrewer)

mach_tweets = searchTwitter("#machine", n=50, lang="en")
mach_text = sapply(mach_tweets, function(x) x$getText())
mach_corpus = Corpus(VectorSource(mach_text))

# create document term matrix applying some transformations
tdm = TermDocumentMatrix(mach_corpus,
   control = list(removePunctuation = TRUE,
   #stopwords = c(stopwords()),
   removeNumbers = TRUE, tolower = TRUE))

   # define tdm as matrix
m = as.matrix(tdm)
# get word counts in decreasing order
word_freqs = sort(rowSums(m), decreasing=TRUE)
# create a data frame with words and their frequencies
dm = data.frame(word=names(word_freqs), freq=word_freqs)

wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "Dark2�))



In fact, earlier today on a different computer than I am working on now, I 
wrote the following function and it works perfectly

tweets <- function(string, n, min){
                tweets <- searchTwitter(as.character(string), n=n)
                tweets_text <- sapply(tweets, function(x) x$getText())
                tweets_text_corpus <- Corpus(VectorSource(tweets_text))
                tweets_text_corpus <- tm_map(tweets_text_corpus, 
removePunctuation)
                tweets_text_corpus <- tm_map(tweets_text_corpus, 
function(x)removeWords(x,stopwords()))
                #wordcloud(tweets_text_corpus)
                myDtm <- TermDocumentMatrix(tweets_text_corpus, control = 
list(minWordLength = 1))
                m <- as.matrix(myDtm)
                v <- sort(rowSums(m), decreasing=TRUE)
                #wordcloud(names(v), v, scale = c(4,2), min.freq= min )
                v
                }

v <- tweets('#beer', n= 20)

But, when I run it on my Mac at home it also fails at the tdm step.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] twitteR and wordcloud()

Reply via email to