Re: [R] How to understand the mentality behind tidyverse and ggplot2?
Personally I liked two workshops Thomas Lin Pedersen gave: https://www.youtube.com/watch?v=h29g21z0a68 https://www.youtube.com/watch?v=0m4yywqNPVY&t=5219s -Roy > On Nov 18, 2020, at 3:24 PM, John via R-help wrote: > > On Tue, 17 Nov 2020 12:43:21 -0500 > C W wrote: > >> Dear R list, >> >> I am an old-school R user. I use apply(), with(), and which() in base >> package instead of filter(), select(), separate() in Tidyverse. The >> idea of pipeline (i.e. %>%) my code was foreign to me for a while. It >> makes the code shorter, but sometimes less readable? >> >> With ggplot2, I just don't understand how it is organized. Take this >> code: >> >>> ggplot(diamonds, aes(x=carat, y=price)) + >>> geom_point(aes(color=cut)) + >> geom_smooth() >> >> There are three plus signs. How do you know when to "add" and what to >> "add"? I've seen more plus signs. >> >> To me, aes() stands for aesthetic, meaning looks. So, anything >> related to looks like points and smooth should be in aes(). >> Apparently, it's not the case. >> >> So, how does ggplot2 work? Could someone explain this for an >> old-school R user? >> >> Thank you! >> > A really short form is to consider that ggplot2 syntax defines an > object, and then additional simply adds to it, which is what all the > plus signs are. Ideally, you can start a ggplot call with a > designation of a target: > > Instead of: > ggplot(diamonds, aes(x=carat, y=price)) + ... > > use something like" > > fig1 <- ggplot(diamonds, aes(x=carat, y=price)) + ... > > This creates an environment object that can then be further modified. > Learning the syntax is a chore, but the output tends to be fine, > especially for publications and final graphics. One the other hand it's > slower and fussier than some of the more traditional approaches, which > are what I would prefer for EDA. > > JWDougherty > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center ***Note new street address*** 110 McAllister Way Santa Cruz, CA 95060 Phone: (831)-420-3666 Fax: (831) 420-3980 e-mail: roy.mendelss...@noaa.gov www: https://www.pfeg.noaa.gov/ "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected" "the arc of the moral universe is long, but it bends toward justice" -MLK Jr. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to understand the mentality behind tidyverse and ggplot2?
On Tue, 17 Nov 2020 12:43:21 -0500 C W wrote: > Dear R list, > > I am an old-school R user. I use apply(), with(), and which() in base > package instead of filter(), select(), separate() in Tidyverse. The > idea of pipeline (i.e. %>%) my code was foreign to me for a while. It > makes the code shorter, but sometimes less readable? > > With ggplot2, I just don't understand how it is organized. Take this > code: > > > ggplot(diamonds, aes(x=carat, y=price)) + > > geom_point(aes(color=cut)) + > geom_smooth() > > There are three plus signs. How do you know when to "add" and what to > "add"? I've seen more plus signs. > > To me, aes() stands for aesthetic, meaning looks. So, anything > related to looks like points and smooth should be in aes(). > Apparently, it's not the case. > > So, how does ggplot2 work? Could someone explain this for an > old-school R user? > > Thank you! > A really short form is to consider that ggplot2 syntax defines an object, and then additional simply adds to it, which is what all the plus signs are. Ideally, you can start a ggplot call with a designation of a target: Instead of: ggplot(diamonds, aes(x=carat, y=price)) + ... use something like" fig1 <- ggplot(diamonds, aes(x=carat, y=price)) + ... This creates an environment object that can then be further modified. Learning the syntax is a chore, but the output tends to be fine, especially for publications and final graphics. One the other hand it's slower and fussier than some of the more traditional approaches, which are what I would prefer for EDA. JWDougherty __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tutorial/vignette on modified Kneser Ney smoothing
Hi Gayathri, Maybe the cmscu package? https://github.com/jasonkdavis/r-cmscu Jim On Thu, Nov 19, 2020 at 6:30 AM Gayathri Nagarajan < gayathri.nagara...@gmail.com> wrote: > Hi Team > > Iam a new learner trying to build n gram models from text corpus and trying > to understand the modified kneser Ney smoothing algorithm to code and build > my word prediction model. > > Can someone point me to a vignette or tutorial that will help me learn this > ? > > Thanks in advance for your help > > Regards > Gayathri > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to understand the mentality behind tidyverse and ggplot2?
On 17/11/2020 12:43 p.m., C W wrote: Dear R list, I am an old-school R user. I use apply(), with(), and which() in base package instead of filter(), select(), separate() in Tidyverse. The idea of pipeline (i.e. %>%) my code was foreign to me for a while. It makes the code shorter, but sometimes less readable? Think of the pipe as pure syntactic sugar. It doesn't really do anything, it just lets you write "f(g(x))" as "x %>% g() %>% f()" (where the parens "()" are optional). Read it as "Take x and pass it to g(); take the result and pass it to f()", which is exactly how you'd read "f(g(x))". The pipe presents it in the same order as in English, which sometimes makes it a bit easier to read than the mathematical notation. There's a lot more to tidyverse ideas besides the pipe. The overview is in the "Tidyverse Manifesto" (a vignette in the tidyverse package), and details are in Grolemund and Wickham's book "R for Data Science". With ggplot2, I just don't understand how it is organized. Take this code: ggplot2 is much harder to understand, but Wickham's book "ggplot2: Elegant Graphics for Data Analysis" gives a really readable yet thorough description. ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) + geom_smooth() There are three plus signs. How do you know when to "add" and what to "add"? I've seen more plus signs. To me, aes() stands for aesthetic, meaning looks. So, anything related to looks like points and smooth should be in aes(). Apparently, it's not the case. Yes "aesthetic" was a really bad choice of word. So, how does ggplot2 work? Could someone explain this for an old-school R user? Not in one email, but hopefully the references (which are both available online for free, or in a bookstore at some cost) can help. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to understand the mentality behind tidyverse and ggplot2?
I'd recommend two places to get started: * https://r4ds.had.co.nz/data-visualisation.html for a quick intro to ggplot2 (and the rest of the book explains the general tidyverse philosophy) * https://ggplot2-book.org for the full details of ggplot2. Hadley On Wed, Nov 18, 2020 at 11:37 AM C W wrote: > > Dear R list, > > I am an old-school R user. I use apply(), with(), and which() in base > package instead of filter(), select(), separate() in Tidyverse. The idea of > pipeline (i.e. %>%) my code was foreign to me for a while. It makes the > code shorter, but sometimes less readable? > > With ggplot2, I just don't understand how it is organized. Take this code: > > > ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) + > geom_smooth() > > There are three plus signs. How do you know when to "add" and what to > "add"? I've seen more plus signs. > > To me, aes() stands for aesthetic, meaning looks. So, anything related to > looks like points and smooth should be in aes(). Apparently, it's not the > case. > > So, how does ggplot2 work? Could someone explain this for an old-school R > user? > > Thank you! > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- http://hadley.nz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to understand the mentality behind tidyverse and ggplot2?
I should have said: Have you worked through the Vignettes and examples?? Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Nov 18, 2020 at 9:37 AM C W wrote: > Dear R list, > > I am an old-school R user. I use apply(), with(), and which() in base > package instead of filter(), select(), separate() in Tidyverse. The idea of > pipeline (i.e. %>%) my code was foreign to me for a while. It makes the > code shorter, but sometimes less readable? > > With ggplot2, I just don't understand how it is organized. Take this code: > > > ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) + > geom_smooth() > > There are three plus signs. How do you know when to "add" and what to > "add"? I've seen more plus signs. > > To me, aes() stands for aesthetic, meaning looks. So, anything related to > looks like points and smooth should be in aes(). Apparently, it's not the > case. > > So, how does ggplot2 work? Could someone explain this for an old-school R > user? > > Thank you! > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tutorial/vignette on modified Kneser Ney smoothing
Wrong list! Google "kneser Ney smoothing algorithm" for possibilities. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Nov 18, 2020 at 11:30 AM Gayathri Nagarajan < gayathri.nagara...@gmail.com> wrote: > Hi Team > > Iam a new learner trying to build n gram models from text corpus and trying > to understand the modified kneser Ney smoothing algorithm to code and build > my word prediction model. > > Can someone point me to a vignette or tutorial that will help me learn this > ? > > Thanks in advance for your help > > Regards > Gayathri > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tutorial/vignette on modified Kneser Ney smoothing
Hi Team Iam a new learner trying to build n gram models from text corpus and trying to understand the modified kneser Ney smoothing algorithm to code and build my word prediction model. Can someone point me to a vignette or tutorial that will help me learn this ? Thanks in advance for your help Regards Gayathri [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to understand the mentality behind tidyverse and ggplot2?
RTFM, perhaps? Or even worse, buy his book? el — Sent from Dr Lisse’s iPad Mini 5 On 18 Nov 2020, 20:39 +0200, Ben Tupper , wrote: > Hi, > > I feel your pain. As you have likely discovered yourself, there are > just about 10^14 tutorials/posts/tips out there on ggplot2. See > https://rseek.org/?q=+ggplot2+tutorial for example. Yikes! > > One resource I found most helpful when I started is > https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1. > This is a terrific resource for getting the feel of layering-up. > > Hope you find it helpful. > > CHeers, > Ben > > On Wed, Nov 18, 2020 at 12:37 PM C W wrote: > > > > Dear R list, > > > > I am an old-school R user. I use apply(), with(), and which() in base > > package instead of filter(), select(), separate() in Tidyverse. The idea of > > pipeline (i.e. %>%) my code was foreign to me for a while. It makes the > > code shorter, but sometimes less readable? > > > > With ggplot2, I just don't understand how it is organized. Take this code: > > > > > ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) + > > geom_smooth() > > > > There are three plus signs. How do you know when to "add" and what to > > "add"? I've seen more plus signs. > > > > To me, aes() stands for aesthetic, meaning looks. So, anything related to > > looks like points and smooth should be in aes(). Apparently, it's not the > > case. > > > > So, how does ggplot2 work? Could someone explain this for an old-school R > > user? > > > > Thank you! > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Ben Tupper > Bigelow Laboratory for Ocean Science > East Boothbay, Maine > http://www.bigelow.org/ > https://eco.bigelow.org > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to understand the mentality behind tidyverse and ggplot2?
Hi, I feel your pain. As you have likely discovered yourself, there are just about 10^14 tutorials/posts/tips out there on ggplot2. See https://rseek.org/?q=+ggplot2+tutorial for example. Yikes! One resource I found most helpful when I started is https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1. This is a terrific resource for getting the feel of layering-up. Hope you find it helpful. CHeers, Ben On Wed, Nov 18, 2020 at 12:37 PM C W wrote: > > Dear R list, > > I am an old-school R user. I use apply(), with(), and which() in base > package instead of filter(), select(), separate() in Tidyverse. The idea of > pipeline (i.e. %>%) my code was foreign to me for a while. It makes the > code shorter, but sometimes less readable? > > With ggplot2, I just don't understand how it is organized. Take this code: > > > ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) + > geom_smooth() > > There are three plus signs. How do you know when to "add" and what to > "add"? I've seen more plus signs. > > To me, aes() stands for aesthetic, meaning looks. So, anything related to > looks like points and smooth should be in aes(). Apparently, it's not the > case. > > So, how does ggplot2 work? Could someone explain this for an old-school R > user? > > Thank you! > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Ben Tupper Bigelow Laboratory for Ocean Science East Boothbay, Maine http://www.bigelow.org/ https://eco.bigelow.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to understand the mentality behind tidyverse and ggplot2?
This is not the place for tutorials (although I recognize that many responses and discussions do intersect tutoriality). If you do a web search on ggplot tutorials you will find many good ones. Or go to the RStudio website which links to resources, including Hadley Wickham's book, which is probably the most authoritative. Incidentally, ggplot is based on Leland WIlkinson's book "The Grammar of Graphics" that provided the blueprint for Wickham's software (his PhD project at Iowa State I believe). Cheers, Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Nov 18, 2020 at 9:37 AM C W wrote: > Dear R list, > > I am an old-school R user. I use apply(), with(), and which() in base > package instead of filter(), select(), separate() in Tidyverse. The idea of > pipeline (i.e. %>%) my code was foreign to me for a while. It makes the > code shorter, but sometimes less readable? > > With ggplot2, I just don't understand how it is organized. Take this code: > > > ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) + > geom_smooth() > > There are three plus signs. How do you know when to "add" and what to > "add"? I've seen more plus signs. > > To me, aes() stands for aesthetic, meaning looks. So, anything related to > looks like points and smooth should be in aes(). Apparently, it's not the > case. > > So, how does ggplot2 work? Could someone explain this for an old-school R > user? > > Thank you! > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to understand the mentality behind tidyverse and ggplot2?
Dear R list, I am an old-school R user. I use apply(), with(), and which() in base package instead of filter(), select(), separate() in Tidyverse. The idea of pipeline (i.e. %>%) my code was foreign to me for a while. It makes the code shorter, but sometimes less readable? With ggplot2, I just don't understand how it is organized. Take this code: > ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) + geom_smooth() There are three plus signs. How do you know when to "add" and what to "add"? I've seen more plus signs. To me, aes() stands for aesthetic, meaning looks. So, anything related to looks like points and smooth should be in aes(). Apparently, it's not the case. So, how does ggplot2 work? Could someone explain this for an old-school R user? Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] - Trying to replicate VLOOKUP in R - help needed
I will do that... Thanks again Jeff. r/ Gregg Powell ‐‐‐ Original Message ‐‐‐ On Wednesday, November 18, 2020 8:36 AM, Jeff Newmiller wrote: > Instead, learn how to use the merge function, or perhaps the dplyr::left_join > function. VLOOKUP is really not necessary. > > On November 18, 2020 7:11:49 AM PST, Gregg via R-help r-help@r-project.org > wrote: > > > Thanks Andrew and Mitch for your help. > > With your assistance, I was able to sort this out. > > Since I have to do this type of thing of often, and since there is no > > existing package/function (yet) that makes this easy, if ever I get to > > the point were I develop enough skill to build and submit a new > > package, a simple little VLOOKUP(like) function contained in a package > > would be of great use. > > r/ > > Gregg > > ‐‐‐ Original Message ‐‐‐ > > On Monday, November 16, 2020 1:56 PM, Gregg via R-help > > r-help@r-project.org wrote: > > > > > PROBLEM: I am trying to replicate something like a VLOOKUP in R but > > > am having no success - need a bit of help. > > > > > GIVEN DATA SET (data.table): (looks something like this, but much > > > bigger) > > > > > NAME TOTALAUTH ASSIGNED_COMPANY > > > ABERDEEN PROVING GROUND 1 NA > > > ADELPHI LABORATORY CENTER 1 NA > > > CARLISLE BARRACKS 1 NA > > > DETROIT ARSENAL 1 NA > > > DUGWAY PROVING GROUND 1 NA > > > FORT A P HILL 1 NA > > > FORT BELVOIR 1 NA > > > FORT BENNING 1 NA > > > FORT BLISS 1 NA > > > FORT BRAGG 1 NA > > > FORT BUCHANAN 1 NA > > > > > I am trying to update the values in the ASSIGNED_COMPANY column from > > > NAs to a value that matches based on the "key" word like below. > > > > > NAME TOTALAUTH ASSIGNED_COMPANY > > > ABERDEEN PROVING GROUND 1 NEC Aberdeen > > > ADELPHI LABORATORY CENTER 1 NEC Adelphi > > > CARLISLE BARRACKS 1 NEC Carlise > > > DETROIT ARSENAL 1 NEC Detroit > > > DUGWAY PROVING GROUND 1 NEC Dugway > > > FORT A P HILL 1 NEC AP Hill > > > FORT BELVOIR 1 NEC Belvoir > > > FORT BENNING 1 NEC Benning > > > FORT BLISS 1 NEC Bliss > > > FORT BRAGG 1 NEC Bragg > > > FORT BUCHANAN 1 NEC Buchanon > > > > > In a nutshell, for instance... > > > > > I want to search for the keyword "ABERDEEN" in the NAME column, and > > > for every row where it exists, I want to update the NA in the > > > ASSIGNED_COMPANY column to "NEC Aberdeen" > > > > > I want to search for the keyword "ADELPHI" in the NAME column, and > > > for every row where it exists, I want to update the NA in the > > > ASSIGNED_COMPANY column to "NEC ADELPHI" > > > > > ... and so on for every value in the NAME column - so in the end > > > a I have matching names in the ASSIGNED_COMPANY column. > > > > > I can use an if statement because it is not vectorized. > > > > > If I use an ifelse statement, the "else" rewrites any changes with "" > > > > > Something so simple should not be difficult. > > > > > Some of the methods I attempted to use are below along with the > > > errors I get... > > > > > ###CODE### > > > > > library(data.table) > > > library(dplyr) > > > library(stringr) > > > > > VLOOKUP_inR <- data.table::fread("DATASET_TESTINGONLY.csv") > > > > > #METHOD 1 FAILS > > > VLOOKUP_inR %>% dplyr::rename_if(grepl("ADELPHI", VLOOKUP_inR$NAME, > > > useBytes = TRUE), "NEC Adelphi") > > > > > Error in get(.x, .env, mode = "function") : > > > > > object 'NEC Adelphi' of mode 'function' was not found > > > > > #METHOD 2 FAILS > > > if(stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { > > > VLOOKUP_inR$ASSIGNED_COMPANY == "NEC Adelphi" > > > } > > > > > Warning message: > > > In if (stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { : > > > the condition has length > 1 and only the first element will be used > > > > > #METHOD 3 FAILS > > > ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, > > > "ADELPHI"), ASIP_combined_location_tally$ASSIGNED_COMPANY == > > > ASIP_combined_location_tally$ASSIGNED_COMPANY) > > > > > Error in > > > ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, : > > > > > argument "no" is missing, with no default > > > > > #METHOD4 FAILS > > > VLOOKUP_inR_matching <- VLOOKUP_inR %>% mutate(ASSIGNED_COMPANY = > > > ifelse(grepl(pattern = 'ABERDEEN', x = NAME), 'NEC Aberdeen', '')) > > > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% > > > mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'ADELPHI', x = NAME), > > > 'NEC Adelphi', '')) > > > > > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% > > > mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'CARLISLE', x = NAME), > > > 'NEC Carlisle Barracks', '')) > > > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% > > > mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'DETROIT', x = NAME), > > > 'NEC Detroit Arsenal', '')) > > > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% > > > mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'BELVOIR', x = NAME), > > > 'NEC Fort Belvoir', '')) > > > >
Re: [R] - Trying to replicate VLOOKUP in R - help needed
Instead, learn how to use the merge function, or perhaps the dplyr::left_join function. VLOOKUP is really not necessary. On November 18, 2020 7:11:49 AM PST, Gregg via R-help wrote: >Thanks Andrew and Mitch for your help. > >With your assistance, I was able to sort this out. > >Since I have to do this type of thing of often, and since there is no >existing package/function (yet) that makes this easy, if ever I get to >the point were I develop enough skill to build and submit a new >package, a simple little VLOOKUP(like) function contained in a package >would be of great use. > >r/ >Gregg > > > > >‐‐‐ Original Message ‐‐‐ >On Monday, November 16, 2020 1:56 PM, Gregg via R-help > wrote: > >> PROBLEM: I am trying to replicate something like a VLOOKUP in R but >am having no success - need a bit of help. >> > >> GIVEN DATA SET (data.table): (looks something like this, but much >bigger) >> > >> NAME TOTALAUTH ASSIGNED_COMPANY >> ABERDEEN PROVING GROUND 1 NA >> ADELPHI LABORATORY CENTER 1 NA >> CARLISLE BARRACKS 1 NA >> DETROIT ARSENAL 1 NA >> DUGWAY PROVING GROUND 1 NA >> FORT A P HILL 1 NA >> FORT BELVOIR 1 NA >> FORT BENNING 1 NA >> FORT BLISS 1 NA >> FORT BRAGG 1 NA >> FORT BUCHANAN 1 NA >> > >> I am trying to update the values in the ASSIGNED_COMPANY column from >NAs to a value that matches based on the "key" word like below. >> > >> NAME TOTALAUTH ASSIGNED_COMPANY >> ABERDEEN PROVING GROUND 1 NEC Aberdeen >> ADELPHI LABORATORY CENTER 1 NEC Adelphi >> CARLISLE BARRACKS 1 NEC Carlise >> DETROIT ARSENAL 1 NEC Detroit >> DUGWAY PROVING GROUND 1 NEC Dugway >> FORT A P HILL 1 NEC AP Hill >> FORT BELVOIR 1 NEC Belvoir >> FORT BENNING 1 NEC Benning >> FORT BLISS 1 NEC Bliss >> FORT BRAGG 1 NEC Bragg >> FORT BUCHANAN 1 NEC Buchanon >> > >> In a nutshell, for instance... >> > >> I want to search for the keyword "ABERDEEN" in the NAME column, and >for every row where it exists, I want to update the NA in the >ASSIGNED_COMPANY column to "NEC Aberdeen" >> > >> I want to search for the keyword "ADELPHI" in the NAME column, and >for every row where it exists, I want to update the NA in the >ASSIGNED_COMPANY column to "NEC ADELPHI" >> > >> ... and so on for every value in the NAME column - so in the end >a I have matching names in the ASSIGNED_COMPANY column. >> > >> I can use an if statement because it is not vectorized. >> > >> If I use an ifelse statement, the "else" rewrites any changes with "" >> > >> Something so simple should not be difficult. >> > >> Some of the methods I attempted to use are below along with the >errors I get... >> > >> ###CODE### >> > >> library(data.table) >> library(dplyr) >> library(stringr) >> > >> VLOOKUP_inR <- data.table::fread("DATASET_TESTINGONLY.csv") >> > >> #METHOD 1 FAILS >> VLOOKUP_inR %>% dplyr::rename_if(grepl("ADELPHI", VLOOKUP_inR$NAME, >useBytes = TRUE), "NEC Adelphi") >> > >> Error in get(.x, .env, mode = "function") : >> > >> object 'NEC Adelphi' of mode 'function' was not found >> > >> #METHOD 2 FAILS >> if(stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { >> VLOOKUP_inR$ASSIGNED_COMPANY == "NEC Adelphi" >> } >> > >> Warning message: >> In if (stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { : >> the condition has length > 1 and only the first element will be used >> > >> #METHOD 3 FAILS >> ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, >"ADELPHI"), ASIP_combined_location_tally$ASSIGNED_COMPANY == >ASIP_combined_location_tally$ASSIGNED_COMPANY) >> > >> Error in >ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, : >> > >> argument "no" is missing, with no default >> > >> #METHOD4 FAILS >> VLOOKUP_inR_matching <- VLOOKUP_inR %>% mutate(ASSIGNED_COMPANY = >ifelse(grepl(pattern = 'ABERDEEN', x = NAME), 'NEC Aberdeen', '')) >> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% >mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'ADELPHI', x = NAME), >'NEC Adelphi', '')) >> > >> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% >mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'CARLISLE', x = NAME), >'NEC Carlisle Barracks', '')) >> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% >mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'DETROIT', x = NAME), >'NEC Detroit Arsenal', '')) >> VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% >mutate(ASSIGNED_COMPANY = ifelse(grepl(pattern = 'BELVOIR', x = NAME), >'NEC Fort Belvoir', '')) >> > >> ---the 4th method just over writes all previous changers back >to "" >> > >> >## >> > >> Any help offered would be so very greatly appreciated. >> > >> Thanks you. >> > >> r/ >> gregg powell >> AZ >> > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, r
Re: [R] - Trying to replicate VLOOKUP in R - help needed
Thanks Andrew and Mitch for your help. With your assistance, I was able to sort this out. Since I have to do this type of thing of often, and since there is no existing package/function (yet) that makes this easy, if ever I get to the point were I develop enough skill to build and submit a new package, a simple little VLOOKUP(like) function contained in a package would be of great use. r/ Gregg ‐‐‐ Original Message ‐‐‐ On Monday, November 16, 2020 1:56 PM, Gregg via R-help wrote: > PROBLEM: I am trying to replicate something like a VLOOKUP in R but am having > no success - need a bit of help. > > GIVEN DATA SET (data.table): (looks something like this, but much bigger) > > NAME TOTALAUTH ASSIGNED_COMPANY > ABERDEEN PROVING GROUND 1 NA > ADELPHI LABORATORY CENTER 1 NA > CARLISLE BARRACKS 1 NA > DETROIT ARSENAL 1 NA > DUGWAY PROVING GROUND 1 NA > FORT A P HILL 1 NA > FORT BELVOIR 1 NA > FORT BENNING 1 NA > FORT BLISS 1 NA > FORT BRAGG 1 NA > FORT BUCHANAN 1 NA > > I am trying to update the values in the ASSIGNED_COMPANY column from NAs to a > value that matches based on the "key" word like below. > > NAME TOTALAUTH ASSIGNED_COMPANY > ABERDEEN PROVING GROUND 1 NEC Aberdeen > ADELPHI LABORATORY CENTER 1 NEC Adelphi > CARLISLE BARRACKS 1 NEC Carlise > DETROIT ARSENAL 1 NEC Detroit > DUGWAY PROVING GROUND 1 NEC Dugway > FORT A P HILL 1 NEC AP Hill > FORT BELVOIR 1 NEC Belvoir > FORT BENNING 1 NEC Benning > FORT BLISS 1 NEC Bliss > FORT BRAGG 1 NEC Bragg > FORT BUCHANAN 1 NEC Buchanon > > In a nutshell, for instance... > > I want to search for the keyword "ABERDEEN" in the NAME column, and for every > row where it exists, I want to update the NA in the ASSIGNED_COMPANY column > to "NEC Aberdeen" > > I want to search for the keyword "ADELPHI" in the NAME column, and for every > row where it exists, I want to update the NA in the ASSIGNED_COMPANY column > to "NEC ADELPHI" > > ... and so on for every value in the NAME column - so in the end a I have > matching names in the ASSIGNED_COMPANY column. > > I can use an if statement because it is not vectorized. > > If I use an ifelse statement, the "else" rewrites any changes with "" > > Something so simple should not be difficult. > > Some of the methods I attempted to use are below along with the errors I > get... > > ###CODE### > > library(data.table) > library(dplyr) > library(stringr) > > VLOOKUP_inR <- data.table::fread("DATASET_TESTINGONLY.csv") > > #METHOD 1 FAILS > VLOOKUP_inR %>% dplyr::rename_if(grepl("ADELPHI", VLOOKUP_inR$NAME, useBytes > = TRUE), "NEC Adelphi") > > Error in get(.x, .env, mode = "function") : > > object 'NEC Adelphi' of mode 'function' was not found > > #METHOD 2 FAILS > if(stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { > VLOOKUP_inR$ASSIGNED_COMPANY == "NEC Adelphi" > } > > Warning message: > In if (stringr::str_detect(VLOOKUP_inR$NAME, "ADELPHI")) { : > the condition has length > 1 and only the first element will be used > > #METHOD 3 FAILS > ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, "ADELPHI"), > ASIP_combined_location_tally$ASSIGNED_COMPANY == > ASIP_combined_location_tally$ASSIGNED_COMPANY) > > Error in ifelse(stringr::str_detect(ASIP_combined_location_tally$NAME, : > > argument "no" is missing, with no default > > #METHOD4 FAILS > VLOOKUP_inR_matching <- VLOOKUP_inR %>% mutate(ASSIGNED_COMPANY = > ifelse(grepl(pattern = 'ABERDEEN', x = NAME), 'NEC Aberdeen', '')) > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = > ifelse(grepl(pattern = 'ADELPHI', x = NAME), 'NEC Adelphi', '')) > > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = > ifelse(grepl(pattern = 'CARLISLE', x = NAME), 'NEC Carlisle Barracks', '')) > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = > ifelse(grepl(pattern = 'DETROIT', x = NAME), 'NEC Detroit Arsenal', '')) > VLOOKUP_inR_matching <- VLOOKUP_inR_matching %>% mutate(ASSIGNED_COMPANY = > ifelse(grepl(pattern = 'BELVOIR', x = NAME), 'NEC Fort Belvoir', '')) > > ---the 4th method just over writes all previous changers back to "" > > ## > > Any help offered would be so very greatly appreciated. > > Thanks you. > > r/ > gregg powell > AZ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. signature.asc Description: OpenPGP digital signature __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and
Re: [R] analyzing results from Tuesday's US elections
Maybe this could be interesting to verify against found anomalies? "A second memory card with uncounted votes was found during an audit in Fayette County, Georgia, containing 2,755 votes" https://www.zerohedge.com/political/second-memory-card-2755-votes-found-during-georgia-election-audit-decreasing-biden-lead __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting duplicate items that occur in multiple groups
Thanks, everyone! Quoting Jim Lemon : Oops, I sent this to Tom earlier today and forgot to copy to the list: VendorID=rep(paste0("V",1:10),each=5) AcctID=paste0("A",sample(1:5,50,TRUE)) Data<-data.frame(VendorID,AcctID) table(Data) # get multiple vendors for each account dupAcctID<-colSums(table(Data)>0) Data$dupAcct<-NA # fill in the new column for(i in 1:length(dupAcctID)) Data$dupAcct[Data$AcctID == names(dupAcctID[i])]<-dupAcctID[i] Jim On Wed, Nov 18, 2020 at 8:20 AM Tom Woolman wrote: Hi everyone. I have a dataframe that is a collection of Vendor IDs plus a bank account number for each vendor. I'm trying to find a way to count the number of duplicate bank accounts that occur in more than one unique Vendor_ID, and then assign the count value for each row in the dataframe in a new variable. I can do a count of bank accounts that occur within the same vendor using dplyr and group_by and count, but I can't figure out a way to count duplicates among multiple Vendor_IDs. Dataframe example code: #Create a sample data frame: set.seed(1) Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = sample(1:1)) Thanks in advance for any help. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting duplicate items that occur in multiple groups
Oops, I sent this to Tom earlier today and forgot to copy to the list: VendorID=rep(paste0("V",1:10),each=5) AcctID=paste0("A",sample(1:5,50,TRUE)) Data<-data.frame(VendorID,AcctID) table(Data) # get multiple vendors for each account dupAcctID<-colSums(table(Data)>0) Data$dupAcct<-NA # fill in the new column for(i in 1:length(dupAcctID)) Data$dupAcct[Data$AcctID == names(dupAcctID[i])]<-dupAcctID[i] Jim On Wed, Nov 18, 2020 at 8:20 AM Tom Woolman wrote: > Hi everyone. I have a dataframe that is a collection of Vendor IDs > plus a bank account number for each vendor. I'm trying to find a way > to count the number of duplicate bank accounts that occur in more than > one unique Vendor_ID, and then assign the count value for each row in > the dataframe in a new variable. > > I can do a count of bank accounts that occur within the same vendor > using dplyr and group_by and count, but I can't figure out a way to > count duplicates among multiple Vendor_IDs. > > > Dataframe example code: > > > #Create a sample data frame: > > set.seed(1) > > Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = > sample(1:1)) > > > > > Thanks in advance for any help. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting duplicate items that occur in multiple groups
On Wed, Nov 18, 2020 at 5:40 AM Bert Gunter wrote: > > z <- with(Data2, tapply(Vendor,Account, I)) > n <- vapply(z,length,1) > data.frame (Vendor = unlist(z), >Account = rep(names(z),n), >NumVen = rep(n,n) > ) > > ## which gives: > >Vendor Account NumVen > A1 V1 A1 1 > A21 V2 A2 3 > A22 V3 A2 3 > A23 V1 A2 3 > A3 V4 A3 1 > A4 V2 A4 1 > > Of course this also works for Data1 > > Bill may be able to come up with a slicker version, however. Perhaps transform(Data2, nshare = as.vector(table(Account)[Account])) (or dplyr::mutate() instead of transform(), if you prefer.) -Deepayan > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman > wrote: > > > Yes, good catch. Thanks > > > > > > Quoting Bert Gunter : > > > > > Why 0's in the data frame? Shouldn't that be 1 (vendor with that > > account)? > > > > > > Bert > > > Bert Gunter > > > > > > "The trouble with having an open mind is that people keep coming along > > and > > > sticking things into it." > > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > > > > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman > > > wrote: > > > > > >> Hi Bill. Sorry to be so obtuse with the example data, I was trying > > >> (too hard) not to share any actual values so I just created randomized > > >> values for my example; of course I should have specified that the > > >> random values would not provide the expected problem pattern. I should > > >> have just used simple dummy codes as Bill Dunlap did. > > >> > > >> So per Bill's example data for Data1, the expected (hoped for) output > > >> should be: > > >> > > >> Vendor Account Num_Vendors_Sharing_Bank_Acct > > >> 1 V1 A1 0 > > >> 2 V2 A2 3 > > >> 3 V3 A2 3 > > >> 4 V4 A2 3 > > >> > > >> > > >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct. > > >> The value is 3 for V2, V3 and V4 because they each share bank account > > >> A2. > > >> > > >> > > >> Likewise, in the Data2 frame, the same logic applies: > > >> > > >> Vendor Account Num_Vendors_Sharing_Bank_Acct > > >> 1 V1 A1 0 > > >> 2 V2 A2 3 > > >> 3 V3 A2 3 > > >> 4 V1 A2 3 > > >> 5 V4 A3 0 > > >> 6 V2 A4 0 > > >> > > >> > > >> > > >> > > >> > > >> > > >> Thanks! > > >> > > >> > > >> Quoting Bill Dunlap : > > >> > > >> > What should the result be for > > >> > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), > > >> > Account=c("A1","A2","A2","A2")) > > >> > ? > > >> > > > >> > Must each vendor have only one account? If not, what should the > > result > > >> be > > >> > for > > >> >Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), > > >> > Account=c("A1","A2","A2","A2","A3","A4")) > > >> > ? > > >> > > > >> > -Bill > > >> > > > >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman > > > > >> > wrote: > > >> > > > >> >> Hi everyone. I have a dataframe that is a collection of Vendor IDs > > >> >> plus a bank account number for each vendor. I'm trying to find a way > > >> >> to count the number of duplicate bank accounts that occur in more > > than > > >> >> one unique Vendor_ID, and then assign the count value for each row in > > >> >> the dataframe in a new variable. > > >> >> > > >> >> I can do a count of bank accounts that occur within the same vendor > > >> >> using dplyr and group_by and count, but I can't figure out a way to > > >> >> count duplicates among multiple Vendor_IDs. > > >> >> > > >> >> > > >> >> Dataframe example code: > > >> >> > > >> >> > > >> >> #Create a sample data frame: > > >> >> > > >> >> set.seed(1) > > >> >> > > >> >> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = > > >> >> sample(1:1)) > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> Thanks in advance for any help. > > >> >> > > >> >> __ > > >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> >> PLEASE do read the posting guide > > >> >> http://www.R-project.org/posting-guide.html > > >> >> and provide commented, minimal, self-contained, reproducible code. > > >> >> > > >> > > >> __ > > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting guide > > >> http://www.R-project.org/posting-guide.html > > >> and provide commented, minimal, self-contained, reproducible code. > > >> > > > > > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To