Re: [R] Warning message when starting RStudio
On 23/04/15 13:41, Albin Blaschka wrote: Hello Am 23.04.2015 um 09:57 schrieb Berend Hasselman: On 23-04-2015, at 08:45, Sun Shine wrote: Hi list Recently, when starting up RStudio, the following warning is being displayed: "Error in tools:::httpdPort <= 0L : comparison (4) is possible only for atomic and list types" I think that this is specific to RStudio because starting R in a terminal window doesn't produce this message. Does anyone have an idea on how to clear the conditions that are giving rise to this warning? Upgrade R-Studio, it is a problem in the interaction between R and R-Studio, which was solved with the new version of R-Studio... I had the same problem... HTH, Albin Hi Albin That did help - thanks so much! Cheers Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Warning message when starting RStudio
Hi list Recently, when starting up RStudio, the following warning is being displayed: "Error in tools:::httpdPort <= 0L : comparison (4) is possible only for atomic and list types" I think that this is specific to RStudio because starting R in a terminal window doesn't produce this message. Does anyone have an idea on how to clear the conditions that are giving rise to this warning? Many thanks Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing words and initials with tm
Hi Jim The name's come up on my radar, but that's about it. I'll look into it. Thanks for the reference. All the best S On 10/04/15 23:36, Jim Lemon wrote: > Hi Sun, > No, I was thinking of something like hunspell, which seems to fit into > the sort of work that you are doing. > > Jim > > > On Fri, Apr 10, 2015 at 11:42 PM, Sun Shine <mailto:phaedr...@gmail.com>> wrote: > > Thanks Jeff. > > I'll add that to the ever-growing list my current studies are > generating daily. :-) > > Cheers > S > > > > On 10/04/15 14:32, Jeff Newmiller wrote: > > "I suspect that it might have something to do with regular > expressions, but to be honest, I'm (currently) pretty crap > with those." > > I cannot think of a better incentive to take action on this > hole in your education and buckle down to learn regular > expressions. There are many books and tutorials available. > > --- > Jeff NewmillerThe .. > Go Live... > DCN: <mailto:jdnew...@dcn.davis.ca.us>> Basics: ##.#. > ##.#. Live Go... >Live: OO#.. Dead: > OO#.. Playing > Research Engineer (Solar/BatteriesO.O#.#.O#. with > /Software/Embedded Controllers) .OO#..OO#. > rocks...1k > > ------- > Sent from my phone. Please excuse my brevity. > > On April 10, 2015 3:19:51 AM PDT, Sun Shine > mailto:phaedr...@gmail.com>> wrote: > > Hi list > > Using the tm package, part of the pre-processing work is > to remove > words, etc. from the corpus. > > I wish to remove people's names and also their initials > which are > peppered throughout the corpus. But, because some people's > initials are > > the same as parts of common words - e.g. 'am' = 'became' > => 'bec e' or > 'ec' = 'because' => 'b ause' or 'ar' = 'arrival' => > 'rival' (which has > a > completely different meaning). > > Is there any way of doing this without leaving a trail of > nonsense > half-terms behind? I suspect that it might have something > to do with > regular expressions, but to be honest, I'm (currently) > pretty crap with > > those. > > Would it make a difference if I removed initials and names > *prior* to > converting all text to lower case, so I remove 'AM' and > because > 'became' > is lower case, it should remain unaffected? > > Any recommendations on how best to proceed with this? > > Thanks as always. > Sun > > __ > R-help@r-project.org <mailto:R-help@r-project.org> mailing > list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > > > > __ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cluster analysis using term frequencies
Hi list I am using the 'tm' package to review meeting notes at a school to identify terms frequently associated with 'learning', 'sports', and 'extra-mural' activities, and then to sort any terms according to these three headers in a way that could be supported statistically (as opposed to, say, my own bias, etc.). To accomplish this, I have done the following: (1) After the usual pre-processing of the text data, loading it as a corpus and then converting it into a document term matrix (called 'allTerms'), I have identified the 20 most frequently occurring terms in the meeting notes and extracted these into a named vector called 'freqTerms'. Many of the terms returned have nothing to do with any of the three themes of 'learning', 'sports', or 'extra-mural'. (2) Therefore, I have also manually generated a list of terms and synonyms for 'learning' and 'sports', etc. (e.g. 'football', 'soccer', 'drama', 'chess', etc.) and then tested for the occurrence of each of these terms in the corpus, e.g.: > allTerms['soccer'] and have come up with a list of some 30 terms together with their frequencies. I manually sorted these according to three headers 'learning', 'sports', and 'extra-mural' and dropped these into a table in a word processing document. Some of these terms are also in the freqTerms vector. What I want to do now is to use cluster analysis (hclust, from the 'cluster' library) to plot a dendrogram of the terms I have manually checked and put into the table, in order to see how closely similar the terms are and whether they cluster in ways similar to the way as I manually sorted these under the table column headers of 'learning', 'sports', and 'extra-mural'. To do this, I dropped these manually sorted terms into a data frame together with the associated values (which I called 'tes.df') and then tried plotting this as follows: > dtes <- dist(tes.df, method = 'euclidean') > dtesFreq <- hclust(dtes, method = 'ward.D') > plot(dtesFreq, labels = names(tes.df)) However, I get an error message when trying to plot this: "Error in graphics:::plotHclust(n1, merge, height, order(x$order), hang, : invalid dendrogram input". I'm clearly screwing something up, either in my source data.frame or in my setting hclust up, but don't know which, nor how. More than just identifying the error however, I am interested in finding a smart (efficient/ elegant) way of checking the occurrence and frequency value of the terms that may be associated with 'sports', 'learning', and 'extra-mural' and extracting these into a matrix or data frame so that I can analyse and plot their clustering to see if how I associated these terms is actually supported statistically. I'm sure that there must be a way of doing this in R, but I'm obviously not going about it correctly. Can anyone shine a light please? Thanks for any help/ guidance. Regards, Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using a text file as a removeWord dictionary in tm_map
Hi again I've now had the chance to try this out, and using scan() doesn't seem to work either. This is what I used: 1) I generated a plain text file called stopDict.txt. This file is of the format: "a, bunch, of, words, to, use" 2) I invoked scan(), like this: > userStopList <- scan(text = '~/path/to/stopDict.txt', what = " ", sep = ",") 3) Then I used the externally generated list as stop words: > docs <- tm_map(docs, removeWords, userStopList) 3) When I go to inspect the document, at least two of the user-defined stop words are in the text Is there a further argument I should be passing to scan(), or is the stopDict.txt file not set up the correct way? I tried each term separated by ' ' and ',', (e.g. 'all', 'the', 'text') but that didn't work, neither does it seem to work when the whole list is enclosed within quotes (e.g. "all, the, text"). While not critical to have the capacity to read in an externally generated list, it sure would be helpful. Thanks. Sun On 02/03/15 07:36, Sun Shine wrote: Thanks Jim. I thought that I was passing a vector, not realising I had converted this to a list object. I haven't come across the scan() function so far, so this is good to know. Good explanation - I'll give this a go when I can get back to that piece of work later today. Thanks again. Regards, Sun On 01/03/15 21:13, jim holtman wrote: The 'read.table' was creating a data.frame (not a vector) and applying 'c' to it converted it to a list. You should alway look at the object you are creating. You probably want to use 'scan'. == testFile <- "Although,this,query,applies,specifically,to,the,tm,package" # read in with read.table create a data.frame df_words <- read.table(text = testFile, sep = ',') df_words # not a vector V1 V2V3 V4 V5 V6 V7 V8 V9 1 Although this query applies specifically to the tm package c(df_words) # this results in a list $V1 [1] Although Levels: Although $V2 [1] this Levels: this $V3 [1] query Levels: query $V4 [1] applies Levels: applies $V5 [1] specifically Levels: specifically $V6 [1] to Levels: to $V7 [1] the Levels: the $V8 [1] tm Levels: tm $V9 [1] package Levels: package # now read with 'scan' scan_words <- scan(text = testFile, what = '', sep = ',') Read 9 items scan_words [1] "Although" "this" "query" "applies" "specifically" "to" [7] "the" "tm" "package" Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sat, Feb 28, 2015 at 8:46 AM, Sun Shine wrote: Hi list Although this query applies specifically to the tm package, perhaps it's something that others might be able to lend a thought to. Using tm to do some initial text mining, I want to include an external (to R) generated dictionary of words that I want removed from the corpus. I have created a comma separated list of terms in " " marks in a stopList.txt plain UTF-8 file. I want to read this into R, so do: stopDict <- read.table('~/path/to/file/stopList.txt', sep=',') When I want to load it as part of the removeWords function in tm, I do: docs <- tm_map(docs, removeWords, stopDict) which has no effect. Neither does: docs <- tm_map(docs, removeWords, c(stopDict)) What am I not seeing/ doing? How do I pass a text file with pre-defined terms to the removeWords transform of tm? Thanks for any ideas. Cheers Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using a text file as a removeWord dictionary in tm_map
Thanks Jim. I thought that I was passing a vector, not realising I had converted this to a list object. I haven't come across the scan() function so far, so this is good to know. Good explanation - I'll give this a go when I can get back to that piece of work later today. Thanks again. Regards, Sun On 01/03/15 21:13, jim holtman wrote: The 'read.table' was creating a data.frame (not a vector) and applying 'c' to it converted it to a list. You should alway look at the object you are creating. You probably want to use 'scan'. == testFile <- "Although,this,query,applies,specifically,to,the,tm,package" # read in with read.table create a data.frame df_words <- read.table(text = testFile, sep = ',') df_words # not a vector V1 V2V3 V4 V5 V6 V7 V8 V9 1 Although this query applies specifically to the tm package c(df_words) # this results in a list $V1 [1] Although Levels: Although $V2 [1] this Levels: this $V3 [1] query Levels: query $V4 [1] applies Levels: applies $V5 [1] specifically Levels: specifically $V6 [1] to Levels: to $V7 [1] the Levels: the $V8 [1] tm Levels: tm $V9 [1] package Levels: package # now read with 'scan' scan_words <- scan(text = testFile, what = '', sep = ',') Read 9 items scan_words [1] "Although" "this" "query""applies" "specifically" "to" [7] "the" "tm" "package" Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sat, Feb 28, 2015 at 8:46 AM, Sun Shine wrote: Hi list Although this query applies specifically to the tm package, perhaps it's something that others might be able to lend a thought to. Using tm to do some initial text mining, I want to include an external (to R) generated dictionary of words that I want removed from the corpus. I have created a comma separated list of terms in " " marks in a stopList.txt plain UTF-8 file. I want to read this into R, so do: stopDict <- read.table('~/path/to/file/stopList.txt', sep=',') When I want to load it as part of the removeWords function in tm, I do: docs <- tm_map(docs, removeWords, stopDict) which has no effect. Neither does: docs <- tm_map(docs, removeWords, c(stopDict)) What am I not seeing/ doing? How do I pass a text file with pre-defined terms to the removeWords transform of tm? Thanks for any ideas. Cheers Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using a text file as a removeWord dictionary in tm_map
Hi list Although this query applies specifically to the tm package, perhaps it's something that others might be able to lend a thought to. Using tm to do some initial text mining, I want to include an external (to R) generated dictionary of words that I want removed from the corpus. I have created a comma separated list of terms in " " marks in a stopList.txt plain UTF-8 file. I want to read this into R, so do: > stopDict <- read.table('~/path/to/file/stopList.txt', sep=',') When I want to load it as part of the removeWords function in tm, I do: > docs <- tm_map(docs, removeWords, stopDict) which has no effect. Neither does: > docs <- tm_map(docs, removeWords, c(stopDict)) What am I not seeing/ doing? How do I pass a text file with pre-defined terms to the removeWords transform of tm? Thanks for any ideas. Cheers Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rgraphviz and NA indices error
Hi list Can someone help me debug the following please: Having downloaded and installed the bioconductor packages and Rgraphviz, I am attempting to plot a network graph showing the relation among chosen words in the corpus of text data. I first did this: > plot(dtm, terms=findFreqTerms(dtm, lowfreq=100) [1:30], corThreshold=0.75) and received the error message: Error in `[.simple_triplet_matrix`(m, , terms) : NA indices not allowed. My next step was to remove any NA indices (although to be honest, this is more of a stab in the dark because there shouldn't be any NA values in the corpus): > docsNA <- (docs[!is.na(docs)]) Then redid the DTM with the NA values removed > dtmNA <- DocumentTermMatrix(docsNA) Then re-ran Rgraphviv with the new set > plot(dtmNA, terms=findFreqTerms(dtmNA, lowfreq=100) [1:10], corThreshold=0.5) But, still get an error: Error in `[.simple_triplet_matrix`(m, , terms) : NA indices not allowed. I have not been successful in finding out why this error persists nor what to do about it. Anyone have any ideas to progress past this issue? Thanks Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Solved: Re: text miner error: Error in UseMethod("meta", x)
Hi list Closing this one off myself, this is what I did: The error seems to concern the update of tm to version 0.6: the conversion to lower case text should now be: > docs <- tm_map(docs, content_transformer(tolower)) Everything else seems to work fine thereafter. The issue in the tutorial concerns section 3.1. wherein Graham creates a function toSpace. This seems to introduce an additional term that tm_map and later DocumentTermMatrix do not seem to know how to handle. This is probably an incorrect interpretation of what's going on, but the fix appears to be to use the above line earlier in the preparation stage. If anyone has more informed insight, please share. Cheers Sun On 25/02/15 17:33, Sun Shine wrote: Hi list I've been working my way through a tutorial on text mining ( http://onepager.togaware.com/TextMiningO.pdf ) and all was well until I came across this problem using tm (text miner): ++code+++ > docs <- tm_map(docs, content_transformer(tolower)) Warning messages: 1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) : all scheduled cores encountered errors in user code 2: In mclapply(content(x), FUN, ...) : all scheduled cores encountered errors in user code ++end-code After some searching, it appears the best fix for this problem was to pass an explicit lazy=TRUE argument to tm, like this: > docs <- tm_map(docs, content_transformer(tolower), lazy=TRUE) However, a little further on in the tutorial to set up the text matrix, a related (?) error was returned: ++code+++ > dtm <- DocumentTermMatrix(docs) Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "try-error" In addition: Warning message: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code ++end-code I tried applying the explicit lazy=TRUE again, but doesn't change things. I have gone over the tutorial again and have followed all of the steps (including loading the requisite libraries). Moreover, searching on the web seems to return several contradictory suggestions and I'm no wiser than I was before. The closest I came to an answer was at Stack Overflow http://stackoverflow.com/questions/24771165/r-project-no-applicable-method-for-meta-applied-to-an-object-of-class-charact and that answer suggested using the latest tm (v 0.6) and claimed that the earlier tolower step was wrong. However, my code used the recommended: corpus <- tm_map(corpus, content_transformer(tolower)) Is there anyone on the list who could either sign-post me to a solution or assist in debugging this please? I'm running R version 3.1.2 and tm is 0.6 Many thanks Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] text miner error: Error in UseMethod("meta", x)
Hi list I've been working my way through a tutorial on text mining ( http://onepager.togaware.com/TextMiningO.pdf ) and all was well until I came across this problem using tm (text miner): ++code+++ > docs <- tm_map(docs, content_transformer(tolower)) Warning messages: 1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) : all scheduled cores encountered errors in user code 2: In mclapply(content(x), FUN, ...) : all scheduled cores encountered errors in user code ++end-code After some searching, it appears the best fix for this problem was to pass an explicit lazy=TRUE argument to tm, like this: > docs <- tm_map(docs, content_transformer(tolower), lazy=TRUE) However, a little further on in the tutorial to set up the text matrix, a related (?) error was returned: ++code+++ > dtm <- DocumentTermMatrix(docs) Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "try-error" In addition: Warning message: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code ++end-code I tried applying the explicit lazy=TRUE again, but doesn't change things. I have gone over the tutorial again and have followed all of the steps (including loading the requisite libraries). Moreover, searching on the web seems to return several contradictory suggestions and I'm no wiser than I was before. The closest I came to an answer was at Stack Overflow http://stackoverflow.com/questions/24771165/r-project-no-applicable-method-for-meta-applied-to-an-object-of-class-charact and that answer suggested using the latest tm (v 0.6) and claimed that the earlier tolower step was wrong. However, my code used the recommended: corpus <- tm_map(corpus, content_transformer(tolower)) Is there anyone on the list who could either sign-post me to a solution or assist in debugging this please? I'm running R version 3.1.2 and tm is 0.6 Many thanks Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noob question re: writing while loops on one line
Thanks John: understanding it as a line return makes sense! Cheers Sun On 15/02/15 14:59, John Kane wrote: Hi Sun, Can you check the code in the one line command in RStudio? I tied it and got the expected error. Or to put it another way, it should not have run for you :) The semi-colon is funtioning as a line return John Kane Kingston ON Canada -Original Message- From: phaedr...@gmail.com Sent: Sun, 15 Feb 2015 10:55:28 + To: drjimle...@gmail.com Subject: Re: [R] Noob question re: writing while loops on one line Brilliant Jim - that does the trick!! I guess then that the semi-colon rule works for any program or function that is being written on one line? Any reason why when writing this out in the RStudio source editor no semi-colon is required, but it is when written in the interactive console? Thanks again Sun On 15/02/15 10:41, Jim Lemon wrote: Hi Sun, Try including a semicolon. while(count < 10) { print(count); count<-count+1 } Jim On Sun, Feb 15, 2015 at 9:20 PM, Sun Shine wrote: Hi list I'm working through some exercises and did a while loop which raised an issue for me: I can write out the while loop so: count <- 0 while(count < 10) { print(count) count <- count + 1 } And this works fine. Trying to do the same thing all on one line however gives this error: "Error: unexpected symbol in "while(count < 10) { print(count) count"" My question: How can one write out a while loop all in one line? Is there a symbol or something that I should be including? Thanks for any suggestions. Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noob question re: writing while loops on one line
Brilliant Jim - that does the trick!! I guess then that the semi-colon rule works for any program or function that is being written on one line? Any reason why when writing this out in the RStudio source editor no semi-colon is required, but it is when written in the interactive console? Thanks again Sun On 15/02/15 10:41, Jim Lemon wrote: Hi Sun, Try including a semicolon. while(count < 10) { print(count); count<-count+1 } Jim On Sun, Feb 15, 2015 at 9:20 PM, Sun Shine wrote: Hi list I'm working through some exercises and did a while loop which raised an issue for me: I can write out the while loop so: count <- 0 while(count < 10) { print(count) count <- count + 1 } And this works fine. Trying to do the same thing all on one line however gives this error: "Error: unexpected symbol in "while(count < 10) { print(count) count"" My question: How can one write out a while loop all in one line? Is there a symbol or something that I should be including? Thanks for any suggestions. Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Noob question re: writing while loops on one line
Hi list I'm working through some exercises and did a while loop which raised an issue for me: I can write out the while loop so: > count <- 0 while(count < 10) { print(count) count <- count + 1 } And this works fine. Trying to do the same thing all on one line however gives this error: "Error: unexpected symbol in "while(count < 10) { print(count) count"" My question: How can one write out a while loop all in one line? Is there a symbol or something that I should be including? Thanks for any suggestions. Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Updating to R 3.1.1. - impacts on existing packages
Thanks Jeff/ Henrik Jeff - that's what I needed: so far the update seems to be painless. Many thanks Sun On 03/02/15 01:45, Jeff Newmiller wrote: I think you missed the question, Henrik, which was directed at updating the local 3.1 library with all of the packages that were in the 3.0 library. The usual advice for this is to copy your 3.0 library onto your 3.1 library (duplicate directory structure) so R knows what packages you want to use and then use update packages. In general the copied directories will not work directly, but R can update them. Note that some packages are dropped due to better support in different packages or lack of maintainer activity, so not all packages thus copied may end up usable. On Mon, 2 Feb 2015, Henrik Bengtsson wrote: On Mon, Feb 2, 2015 at 4:49 PM, Sun Shine wrote: Hi list I've signed up for a Coursera course on exploratory data analysis, and the recommendation is to update to R base 3.1.1. I'm currently on 3.0.2. If I do upgrade, what is the best way for me to upgrade all my packages for compatibility? Would this be accomplished through the command: update.packages() Also, any ideas what percentage of the packages have been updated to work with 3.1.1. ? I'm just wanting to do a risk evaluation because I don't want to lose access to packages such as ggplot2, sna, statnet, FactoMineR, and several others through upgrading. All package on CRAN should be up-to-date (that's almost the definition of CRAN; if a package is not updated in time it's likely to be archived due to lack of maintenance). When in doubt, have a look at their individual CRAN pages, e.g. http://cran.r-project.org/package=ggplot2. Look for the "r-release". Note that "r-release" always refers to the latest stable official R release, which currently is R 3.1.2. You should upgrade to that version and not 3.1.1. It's pretty safe to always install the most recent stable release version of R. If you're using an old version of R, like you do, it's more likely that you run into problems in general than if you use the most recent version. So, avoid sticking with old version and make to upgrade whenever a new release come out. /Henrik Thanks for any steers Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Updating to R 3.1.1. - impacts on existing packages
Hi list I've signed up for a Coursera course on exploratory data analysis, and the recommendation is to update to R base 3.1.1. I'm currently on 3.0.2. If I do upgrade, what is the best way for me to upgrade all my packages for compatibility? Would this be accomplished through the command: > update.packages() Also, any ideas what percentage of the packages have been updated to work with 3.1.1. ? I'm just wanting to do a risk evaluation because I don't want to lose access to packages such as ggplot2, sna, statnet, FactoMineR, and several others through upgrading. Thanks for any steers Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Working with data frames
Hello William, Ivan and Jim I appreciate your replies. I did suppress the factors using stringsAsFactors=FALSE and in that way was able to progress some more on getting a sense of the data set, so thanks for that suggestion. I had previously overlooked it. Also thanks William, I never understood what those thick line segs were - now I do. That had been about the best I could get by that point and still not with the names on the x axis. Unfortunately using William's suggestion of 'with' gave me errors: > with(MHP.def, {plot(as.integer(MHP.def$Names),cH.E, axes=FALSE, xlab='Area') axis(side=2) axis(side=1, at=seq_along(levels(MHP.def$Names)), lab=levels(MHP.def$Names))}) Error: unexpected symbol in "with(MHP.def, {plot(as.integer(MHP.def$Names), MHP.def$cH.E, axes=FALSE, xlab='Area') axis" This may have something to do with the period between cH and E or perhaps from the $ to access data from a column? I have now installed ggplot2 and with the help of the graphics cookbook will see if I can make some headway like this, at least for now. I think William's suggestion about learning to work with factors is fundamentally sound and something I will need to get my head around. For now though, I think I'll stick to exploring ggplot2 so that I can visualise this data set more easily. Thanks again. Best Sun On 11/12/14 16:06, William Dunlap wrote: > Here is a reproducible example > > d <- read.csv(text="Name,Age\nBob,2\nXavier,25\nAdam,1") > > str(d) > 'data.frame': 3 obs. of 2 variables: >$ Name: Factor w/ 3 levels "Adam","Bob","Xavier": 2 3 1 >$ Age : int 2 25 1 > > Do you get something similar? If not, show us what you have (you > could trim it down to a few columns). > > Let's try some plots. > > plot(d$Age) > This shows a plot of d$Age (on y axis) vs "Index", where Index is > 1:length(d$Age). The points are at (1,2), (2,25), and (3,1). You gave > plot() no information about what should be on the x axis so it gave > you the index numbers. > > Now asking for d$Name on the x axis and d$Age on the y. > > plot(d$Name, d$Age) > This put the names, in alphabetical order on the x axis. The y axis > ranges from about 0 to 25 and neither axis is labelled. There are > thick horizontal line segments where you expect the the points to > be. These are degenerate boxplots - when you ask to plot a > 'factor' variable on the x axis and numbers on the y you get such > a plot. > > Some folks suggested you avoid factors by adding stringsAsFactors=FALSE > (or as.is <http://as.is>=TRUE) to your call to read.csv. Let's try that > > d2 <- read.csv(stringsAsFactors=FALSE, > text="Name,Age\nBob,2\nXavier,25\nAdam,1") > > plot(d2$Name, d2$Age) > Error in plot.window(...) : need finite 'xlim' values > In addition: Warning messages: > 1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion > 2: In min(x) : no non-missing arguments to min; returning Inf > 3: In max(x) : no non-missing arguments to max; returning -Inf > You get no plot at all. > > You can get closer to what I think you want with > with(d, { > plot(as.integer(Name), Age, axes=FALSE, xlab="Name") > axis(side=2) # draw the usual y axis > axis(side=1, at=seq_along(levels(Name)), lab=levels(Name)) > }) > If you want the names in a different order on the x axis, then reconstruct > the factor object d$Name with a different order of levels. E.g., > d$Name <- factor(d$Name, levels=c("Xavier", "Bob", "Adam")) > and replot. > > There are various plotting packages, e.g., ggplot2, that can make this > sort of thing easier, but I think the recommendation not to use factors > is wrong. You do need to learn how to use them to your advantage. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <http://tibco.com> > > On Thu, Dec 11, 2014 at 5:00 AM, Sun Shine <mailto:phaedr...@gmail.com>> wrote: > > Hello > > I am struggling with data frames and would appreciate some help > please. > > I have a data set of 13 observations and 80 variables. The first > column is the names of different political area boundaries (e.g. > MHad, LBNW, etc), the first row is a vector of variable names > concerning various census data (e.g. age.T, hse.Unk, etc.). The > first cell [1,1] is blank. > > I have loaded this via read.csv('path.to/data.set.csv' > <http://path.to/data.set.csv%27>), and now want to run some > analyses on this data frame. If I want to get a list of the names >
[R] Working with data frames
Hello I am struggling with data frames and would appreciate some help please. I have a data set of 13 observations and 80 variables. The first column is the names of different political area boundaries (e.g. MHad, LBNW, etc), the first row is a vector of variable names concerning various census data (e.g. age.T, hse.Unk, etc.). The first cell [1,1] is blank. I have loaded this via read.csv('path.to/data.set.csv'), and now want to run some analyses on this data frame. If I want to get a list of the names of the political areas (i.e. the first column), the result is a vector of numbers which appear to correlate with the factors, but I don't get the text names, just the corresponding number. So, if I want to plot something basic, like the area that uses the most gas for central heating, for example: > plot(data.set$ch.Gas) The result is the y-axis gives the gas usage for the areas, but the x-axis gives only the numbers of the areas, not the names of the areas (which is preferred). So, two questions: (1) have I set up my csv file correctly to be read as a data frame as the first row of all of the remaining columns with the values for that political area in the corresponding row in the column with the specific variable name? So far, looking through tutorials and books seems to suggest yes, but at this point I'm no longer sure. (2) How can I access the names of the political areas when plotting so that these are given on the x-axis instead of the numbers? Thanks for any help. Cheers Sun __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analyzing qualitative data sets
Thanks for the link. I had not been aware of that. On 29/07/14 15:27, Bert Gunter wrote: 1. If you are asking about statistics, this is the wrong list. Post here instead: stats.stackexchange.com. 2. If you you are asking about what sorts of statistical analyses are available in R, check the CRAN task views here: http://cran.r-project.org/web/views/ 3. If you are asking about how to program in R and have not already done so, please read "An Introduction to R" or R web tutorial of your choice before posting here further. Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Tue, Jul 29, 2014 at 6:01 AM, Sun Shine wrote: Hello list I'm just beginning my PhD and am likely to be using lots of surveys in my data collection, and am wanting to get my head around the ideas about how best to approach the tasks in R. The data sets I have collected so far for some preliminary practise with are made up of the following survey data: (1) 25 observations x 15 variables of dichotomous nominal (categorical) data [basically, yes/ no responses with a couple of missing values] (2) 25 obs x 14 var of ordinal rank data [5 item Likert-scale, with some missing values], and (3) 23 observations of free text, typically in the form of one sentence or statement, and I will be using RQDA for that part. So far, I have been able to piece together that I can use the Spearman method of the wilcox.text for #2 (ordinal data), but have yet to find anything that I can do for the nominal data. I was thinking of using frequency tables, but I don't seem to be able to find out too much info on it/ how to do that. Anyway, I have three questions that I'd appreciate members of this list taking a swing at for ideas please. (a) what types of analyses are available to apply to the data types above? I have been thinking about MCA using FactoMineR as well as MDS using MASS to visualise the data in high dimensional space, but I think that I haven't (yet!) figured out how to properly prepare my data sets for these, and most texts and tutorials seem to focus mostly on quantitative data analysis. (b) is there anyway that I can automate the Spearman process so that it iterates across the set, otherwise it looks like I may have to manually take the two columns and keep comparing pairs until I have correlated all of the columns with all of the other columns - so is there anyway that I can automate this and get the test statistics and p values dumped in a table for summarising? (c) after using RQDA to code the statements, is it feasible to reintroduce those codes back into the data set to explore correlations among the other columns and the units of coded text to see what variables co-occur? Well, thanks for taking the time to read this - and I look forward to any thoughts/ suggestions that might help. Cheers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analyzing qualitative data sets
Thanks, I'll look into that further. Cheers On 29/07/14 16:44, Richard M. Heiberger wrote: For your item 2, (2) 25 obs x 14 var of ordinal rank data [5 item Likert-scale, with some missing values], and I recommend the likert function in the HH package install.packages("HH") library(HH) ?likert Rich On Tue, Jul 29, 2014 at 9:01 AM, Sun Shine wrote: Hello list I'm just beginning my PhD and am likely to be using lots of surveys in my data collection, and am wanting to get my head around the ideas about how best to approach the tasks in R. The data sets I have collected so far for some preliminary practise with are made up of the following survey data: (1) 25 observations x 15 variables of dichotomous nominal (categorical) data [basically, yes/ no responses with a couple of missing values] (2) 25 obs x 14 var of ordinal rank data [5 item Likert-scale, with some missing values], and (3) 23 observations of free text, typically in the form of one sentence or statement, and I will be using RQDA for that part. So far, I have been able to piece together that I can use the Spearman method of the wilcox.text for #2 (ordinal data), but have yet to find anything that I can do for the nominal data. I was thinking of using frequency tables, but I don't seem to be able to find out too much info on it/ how to do that. Anyway, I have three questions that I'd appreciate members of this list taking a swing at for ideas please. (a) what types of analyses are available to apply to the data types above? I have been thinking about MCA using FactoMineR as well as MDS using MASS to visualise the data in high dimensional space, but I think that I haven't (yet!) figured out how to properly prepare my data sets for these, and most texts and tutorials seem to focus mostly on quantitative data analysis. (b) is there anyway that I can automate the Spearman process so that it iterates across the set, otherwise it looks like I may have to manually take the two columns and keep comparing pairs until I have correlated all of the columns with all of the other columns - so is there anyway that I can automate this and get the test statistics and p values dumped in a table for summarising? (c) after using RQDA to code the statements, is it feasible to reintroduce those codes back into the data set to explore correlations among the other columns and the units of coded text to see what variables co-occur? Well, thanks for taking the time to read this - and I look forward to any thoughts/ suggestions that might help. Cheers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] analyzing qualitative data sets
Hello list I'm just beginning my PhD and am likely to be using lots of surveys in my data collection, and am wanting to get my head around the ideas about how best to approach the tasks in R. The data sets I have collected so far for some preliminary practise with are made up of the following survey data: (1) 25 observations x 15 variables of dichotomous nominal (categorical) data [basically, yes/ no responses with a couple of missing values] (2) 25 obs x 14 var of ordinal rank data [5 item Likert-scale, with some missing values], and (3) 23 observations of free text, typically in the form of one sentence or statement, and I will be using RQDA for that part. So far, I have been able to piece together that I can use the Spearman method of the wilcox.text for #2 (ordinal data), but have yet to find anything that I can do for the nominal data. I was thinking of using frequency tables, but I don't seem to be able to find out too much info on it/ how to do that. Anyway, I have three questions that I'd appreciate members of this list taking a swing at for ideas please. (a) what types of analyses are available to apply to the data types above? I have been thinking about MCA using FactoMineR as well as MDS using MASS to visualise the data in high dimensional space, but I think that I haven't (yet!) figured out how to properly prepare my data sets for these, and most texts and tutorials seem to focus mostly on quantitative data analysis. (b) is there anyway that I can automate the Spearman process so that it iterates across the set, otherwise it looks like I may have to manually take the two columns and keep comparing pairs until I have correlated all of the columns with all of the other columns - so is there anyway that I can automate this and get the test statistics and p values dumped in a table for summarising? (c) after using RQDA to code the statements, is it feasible to reintroduce those codes back into the data set to explore correlations among the other columns and the units of coded text to see what variables co-occur? Well, thanks for taking the time to read this - and I look forward to any thoughts/ suggestions that might help. Cheers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Translating a basic Python script into R
On 29/12/13 10:45, YuHong wrote: In my opinion, the best usages of Python and R should be for different type of tasks respectively. For example, Python is good for automating miscellaneous tasks, while R is good for list data processing and statistical modelling. Therefore when you become more familiar with Python and R, you shall not use the two for exactly the same thing. That makes sense. The point I was trying to make really concerned the structure of conditional statements, as well as statements such as "print" and the declaration of the variables. I wasn't actually referring to any programming similarities. In any event, this opinion was made by someone with little experience in programming, so from the "outside" the similarities are more apparent probably than to someone with the more sophisticated awareness of the "inside" dissimilarities. My original intent had been to use Meadows' models to try to get my hand in for modelling and since I know a (very) little about Python I started off with that, which gave me a rough idea of how I could approach such tasks, and it worked; R is something else I'd like to learn, so thought that I would try to do the same thing in R which is where I ran aground and hence was very appreciative of Ishta taking the time to demonstrate how to do such things in R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Translating a basic Python script into R
Hi Ista On 28/12/13 23:06, Ista Zahn wrote: Hi, I don't see any nested conditions in the python code... A direct translation in R looks almost the same, except that you need to group using parentheses and brackets instead of whitespace, and there is no += in R (at least not that I'm aware of). Making those changes gives stock = 50 time = 1 inflow_a = 0 inflow_b = 5 outflow = 5 x = stock y = time print ("Model of inflow and outflow rates of water") print ("version 3") print (stock) while (time <= 9) { stock = (stock - outflow) + inflow_a time = time + 1 y = c(y, time) x = c(x, stock) print (stock) if (stock == 30) { print ("Faucet turned on") } } while (time >= 6 & time <= 9) { stock = (stock - outflow) + inflow_b time = time + 1 y = c(y, time) x = c(x, stock) print (stock) } sprintf("Volume in tub stabilises at %d gallons over %d minutes", stock, time) print (x) print (y) I'm sure that there must be some very elegant way to do this, but I cannot find out how to do so in any of the books I have, nor do my web searches throw back anything useful (I suspect that I'm not phrasing the question properly). In both python and R you can of course use if/else instead of the two separate while loops. An R version is stock = 50 time = 1 inflow_a = 0 inflow_b = 5 outflow = 5 x = stock y = time print ("Model of inflow and outflow rates of water") print ("version 3\n") print (stock) while (time <= 9) { if(time <= 5) { stock = (stock - outflow) + inflow_a } else { stock = (stock - outflow) + inflow_b } time = time + 1 y = c(y, time) x = c(x, stock) print (stock) if (stock == 30) { print ("Faucet turned on") } } sprintf("Volume in tub stabilises at %d gallons over %d minutes", stock, time) print (x) print (y) plot(y, x) Can someone please offer a few suggestions about ways that I could translate the Python script into R so that I can then run a plot as well? You can plot in python, e.g., from matplotlib.pyplot import * plot(y, x) show() Best, Ista This was *very* helpful: I leaned about both R and Python and am pleased to see that the structure between the two - for this script at least - are so similar. Thank you for taking the time to explain and demonstrate rather than to just tell me to RTFM. Your reply has given me a lot of ideas to play around with in experimenting, so I can envisage an enjoyable afternoon testing some of this on the other models Meadows described. Many thanks for your clear explanations. Best wishes Sun __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Translating a basic Python script into R
Hi I am attempting to translate some of the models that Donella Meadows wrote about in her book "Thinking in systems" into code. Originally, I had wanted to do this in Python, but thought that it would be fun to see if it is feasible to do so in R, especially given the plotting capacity of R. Meadows describes a very simple example of a stock and flow: 50 gallons of water in a bath tub - drain out at a rate of 5 gal/ minute and then turn on the faucet after five minutes which flows at 5 gal/ min. The outcome is obviously that after 5 minutes, the bath tub will maintain a steady stock of 25 gal thereafter. My basic code in Python looks like this: Python code= stock = 50 time = 1 inflow_a = 0 inflow_b = 5 outflow = 5 x = [stock] y = [time] print "Model of inflow and outflow rates of water" print "version 3" print print stock while time <= 5: stock = (stock - outflow) + inflow_a time += 1 y += [time] x += [stock] print stock if stock == 30: print "Faucet turned on" while time >= 6 and time <= 9: stock = (stock - outflow) + inflow_b time += 1 y += [time] x += [stock] print stock print "Volume in tub stabilises at %d gallons over %d minutes" % (stock, time) print x print y end code I want to translate this into an equivalent script in R. After some searching around, I found how to set up a while loop, and constructed the first section, like this: ==R code== while(time <= 10) { if time <= 5 stock < time <- time + 1 print(time) } = end code = However, what I would like to learn how to do is to nest the if conditions in a way similar to that given in the Python code. I'm sure that there must be some very elegant way to do this, but I cannot find out how to do so in any of the books I have, nor do my web searches throw back anything useful (I suspect that I'm not phrasing the question properly). Can someone please offer a few suggestions about ways that I could translate the Python script into R so that I can then run a plot as well? Many thanks in anticipation. Sun __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.