[R] A simple string alienation question.
Hello Dear R Community, I would ask a little bit of help from you please:I have a dataset, which is in a CSV file – I have read it into R as follows: V11 tropical fruit"2 whole milk"3 pip fruit"4 other vegetables"5 whole milk"6 rolls/buns" The issue is: the data set in Csv file also appears with the quotation marks “. I can’t get rid of the quotation marks. I want to do it in R. The Quotes only appear at the end of the string. The dataset has many rows – this is just a copy. My intention is to be able to get rid of the quotes and then want to separate the strings with a ‘/’. i.e. rolls/buns should be rolls in one column and buns in another. I know this is something very simple I am lacking – but if you could please show me how to do this? If someone could throw some light please. I read the data in with a simple read.csv statement: calc <- read.csv("Fight.csv", stringsAsFactors = F, header = F) str(x) Output: > str(calc)'data.frame': 38765 obs. of 1 variable: $ V1: chr "tropical fruit\"" "whole milk\"" "pip fruit\"" "other vegetables\"" ... Many Thanks in advance for your help. Kind Regards, Sam. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data Frame Organization
There is some issue with the plain text vs. HTML - please find the answer again. If illegible kindly see the attached pic. Best Wishes. s. x <- c('A', 'B', 'C', 'A', 'B', 'C') y <- c(10, 5, 9, 5, 15, 20) df <- data.frame(x,y) df f <- reshape(df, v.names = "y", idvar = "x", timevar = "y", direction = "wide") RESULT: > f x y.10 y.5 y.9 y.15 y.201 A 10 5 NA NA NA2 B NA 5 NA 15 NA3 C NA NA 9 NA 20 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame organization
Dear Arnaud, I just played around with your data a bit and found this to be useful. But kindly note that I am NO expert like the other people in the group. My answer to you is purely for help purposes. My knowledge in R too is limited. I used the reshape function and arrived at something. I am sure others will arrive at a better and more crisp answer that I have. Again please note: I am only a novice. x <- c('A', 'B', 'C', 'A', 'B', 'C')y <- c(10, 5, 9, 5, 15, 20)df <- data.frame(x,y)dff <- reshape(df, v.names = "y", idvar = "x", timevar = "y", direction = "wide") RESULT: > f x y.10 y.5 y.9 y.15 y.201 A 10 5 NA NA NA2 B NA 5 NA 15 > NA3 C NA NA 9 NA 20 Hope this is of any use. Kind Regards, s. On Monday, 26 August 2019, 11:37:13 pm GMT+5:30, Arnaud Mosnier wrote: Hi, I have a really simple question. I need to convert a data.frame with the following format A 10 B 5 C 9 A 5 B 15 C 20 in this format A 10 5 B 5 15 C 9 20 Thanks !!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with numeric and character separation
Hi Bert and Jeff, Thanks a lot for your response. I have been able to solve the problem with pointers added from people who responded. My original dataset was in the form of a vector, my desired goal is to convert it into a data.frame. which I have been able to do. Thanks again for all your help. 'regexr' is still something I am digging into... sam. ** On Sunday, 16 June 2019, 10:28:17 pm GMT+5:30, Bert Gunter wrote: 1. Post in **plain text** not html, please. 2. This looks like a vector, not a data frame. 3. Use ?strsplit: > ex <- "a.1" > strsplit(ex,"\\.") [[1]] [1] "a" "1" See ?regex for info on regular expressions in R.If you are confused about the data structures of input or output, time to spend time with basic R tutorials. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Jun 16, 2019 at 8:47 AM Sam Charya via R-help wrote: Dear All, I need help with splitting a string. My data frame is in the following format: V11 DD Pack0.002 FTA English News0.003 FTA Complimentary0.004 WB1.185 WION1.186 Al Jazeera0.007 Animal Planet2.368 Asianet Movies17.709 Calcutta News0.0010 Comedy Central5.90 I read the file from a csv and set header = False, hence the name: V1. The data consists of names of TV Channels and their prices. For example: Row 1: Name of the Channel is 'DD Pack' and the Price is 0.00.similarly for Row 5, the name of the Channel is 'WION' and the price is 1.18. similarly for Row 8: The name of the Channel is 'Asianet Movies' and the price is 17.70. My question is: How would I separate the data into 2 columns: One for the Channel name and one for the Price. For example. The Heading should be for Col1: 'Channel Name' and for Col2: 'Price'The data under 'Channel Name' should be 'DD Pack' and for 'Price' should be 0.00 and so on and so forth. The letters and the numeric appears together and I am not being able to use a separator and I am not being able to figure this out. Kindly please help me with this. Many Thanks in advance for your help. This is my first question ever to the community so apologies if I have made a mistake in sending it to the wrong groups - kindly direct if that is the case. sam. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with a third ggplot error
Thanks a lot Boris, I tried out your worked out solution and it works just perfectly fine. No doubt I need a lot of practice with regexr and the pattern stated by you - I will do that now. Thanks a lot for pointing me in the right direction. Appreciate it a lot. Sam. On Monday, 17 June 2019, 2:39:14 pm GMT+5:30, Boris Steipe wrote: (Technically you are now thread-hijacking. But here goes:) mydf <- data.frame(V11 = c("DD Pack0.002", "FTA English News0.003", "FTA Complimentary0.004"), stringsAsFactors = FALSE) # regex matching start-of-string(letters or blanks)(numbers, a decimal # point, more numbers)end-of-string: "^([a-zA-Z ]+)([0-9]+\\.[0-9]+)$" # first check that all elements are matched by the regex. If not, an assumption # of how the strings are patterned is not true ... all(grepl("^([a-zA-Z ]+)([0-9]+\\.[0-9]+)$", V11)) # must be true! mydf$`Channel name` <- gsub("^([a-zA-Z ]+)([0-9]+\\.[0-9]+)$", "\\1", mydf$V11) mydf$Price <- gsub("^([a-zA-Z ]+)([0-9]+\\.[0-9]+)$", "\\2", mydf$V11) mydf # V11 Channel name Price # 1 DD Pack0.002 DD Pack 0.002 # 2 FTA English News0.003 FTA English News 0.003 # 3 FTA Complimentary0.004 FTA Complimentary 0.004 Note this _will_ give wrong results if channel names like "ABC4" exist. B. > On 2019-06-15, at 15:40, Sam Charya via R-help wrote: > > Hello All, > I need help with splitting a string. My data frame is in the following format: > V11 DD Pack0.002 FTA English News0.003 FTA >Complimentary0.004 WB1.185 WION1.186 Al >Jazeera0.007 Animal Planet2.368 Asianet Movies17.709 Calcutta >News0.0010 Comedy Central5.90 > I read the file from a csv and set header = False, hence it is named V1. > The data consists of names of Channels and their prices. For example: Row 1: > Name of the Channel is 'DD Pack' and the Price is 0.00.similarly for Row 5, > the name of the Channel is 'WION' and the price is 1.18. > similarly for Row 8: The name of the Channel is 'Asianet Movies' and the > price is 17.70. > > My question is: How would I separate the data into 2 columns: One for the > Channel name and one for the Price. > For example. The Heading should be for Col1: 'Channel Name' and for Col2: > 'Price'The data under 'Channel Name' should be 'DD Pack' and for 'Price' > should be 0.00 and so on and so forth. > The letters and the numeric appears together so there is no separator and I > am not being able to figure this out. Kindly please help resolve this. > Many Thanks in advance for your help. This is my first question ever to the > community so apologies if I have made a mistake in sending it to the wrong > group - kindly direct if that is the case. > sam. > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with numeric and character separation
Dear All, I need help with splitting a string. My data frame is in the following format: V11 DD Pack0.002 FTA English News0.003 FTA Complimentary0.004 WB1.185 WION1.186 Al Jazeera0.007 Animal Planet2.368 Asianet Movies17.709 Calcutta News0.0010 Comedy Central5.90 I read the file from a csv and set header = False, hence the name: V1. The data consists of names of TV Channels and their prices. For example: Row 1: Name of the Channel is 'DD Pack' and the Price is 0.00.similarly for Row 5, the name of the Channel is 'WION' and the price is 1.18. similarly for Row 8: The name of the Channel is 'Asianet Movies' and the price is 17.70. My question is: How would I separate the data into 2 columns: One for the Channel name and one for the Price. For example. The Heading should be for Col1: 'Channel Name' and for Col2: 'Price'The data under 'Channel Name' should be 'DD Pack' and for 'Price' should be 0.00 and so on and so forth. The letters and the numeric appears together and I am not being able to use a separator and I am not being able to figure this out. Kindly please help me with this. Many Thanks in advance for your help. This is my first question ever to the community so apologies if I have made a mistake in sending it to the wrong groups - kindly direct if that is the case. sam. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with a third ggplot error
Hello All, I need help with splitting a string. My data frame is in the following format: V11 DD Pack0.002 FTA English News0.003 FTA Complimentary0.004 WB1.185 WION1.186 Al Jazeera0.007 Animal Planet2.368 Asianet Movies17.709 Calcutta News0.0010 Comedy Central5.90 I read the file from a csv and set header = False, hence it is named V1. The data consists of names of Channels and their prices. For example: Row 1: Name of the Channel is 'DD Pack' and the Price is 0.00.similarly for Row 5, the name of the Channel is 'WION' and the price is 1.18. similarly for Row 8: The name of the Channel is 'Asianet Movies' and the price is 17.70. My question is: How would I separate the data into 2 columns: One for the Channel name and one for the Price. For example. The Heading should be for Col1: 'Channel Name' and for Col2: 'Price'The data under 'Channel Name' should be 'DD Pack' and for 'Price' should be 0.00 and so on and so forth. The letters and the numeric appears together so there is no separator and I am not being able to figure this out. Kindly please help resolve this. Many Thanks in advance for your help. This is my first question ever to the community so apologies if I have made a mistake in sending it to the wrong group - kindly direct if that is the case. sam. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.