Please keep messages on the list so others can pitch in. _Which_ words do you want to consider identical for the purpose of frequency count? _What_ do you want to plot?
B. > On Aug 3, 2017, at 4:36 PM, Riaan Van Der Walt <riaan.vanderw...@nwu.ac.za> > wrote: > > Hallo Boris, > I've loaded the Rstem, Snowball. > But I am clueless how to get a list eg. whal* (whale, whales, whaling, > whaler, whalers, whaleman, whalemen, whale-ship, whale-boat, whale's) > in the book Moby Dick and the frequency of each of the different words. > I'am usig this script: > > whales1.v <- grep("^whal.*", moby.word.v) > whales1.v > > The total occurrence for whal* is 1699. > But I can't display it or plot it. > > I am new to R and the learning curve is steep!! > > Thx! > Riaan > > > Riaan van der Walt > Tel / Phone / Mogala : 27+72+2172429 > Email / Epos / Emeile: riaan.vanderw...@nwu.ac.za > Url: http://www.nwu.ac.za/ > > >>> Boris Steipe <boris.ste...@utoronto.ca> 31 Jul 2017 23:37 >>> > You need a stemming algorithm. See here: > https://cran.r-project.org/web/views/NaturalLanguageProcessing.html > > Myself, I've had good experience with Rstem. > > B. > > > > > > > On Jul 31, 2017, at 4:47 PM, Riaan Van Der Walt > > <riaan.vanderw...@nwu.ac.za> wrote: > > > > I am new to R. > > Busy with Text Analysis. > > > > Need a script to find e.g > > > > whale, whales, whale's, whaler, whalers, whaling,... in Moby Dick > > > > Riaan > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > <Riaan Van Der Walt.vcf> ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.