Re: [R] Dput Help in R
Shivi, It looks like you have copied and pasted with errors. When you use dput() on a dataframe, it will output a list (see example that follows). I think you have cut off the beginning of the output and have manually added the assignment “ab<-“. Also it is clear that the read.csv is interpreting your numbers as character strings. Note the quotation marks. > sample_df <- data.frame(nums = 1:10, rand_num = runif(10)) > dput(sample_df) structure(list(nums = 1:10, rand_num = c(0.346553232055157, 0.996620914200321, 0.412795166717842, 0.250930240144953, 0.809035068377852, 0.782051374204457, 0.0857722735963762, 0.938713687239215, 0.819887164747342, 0.870529293781146 )), .Names = c("nums", "rand_num"), row.names = c(NA, -10L), class = "data.frame”) > On Dec 31, 2015, at 1:26 AM, SHIVI BHATIAwrote: > > Hi Duncan, > Please find the dput from the data. > > ab<-read.csv("collection_last.csv",header=TRUE) > y<-ab[1:10,] > > > ab<- "2,458", "2,461", "2,462", "2,463", "2,464", "2,465", "2,468", > "2,469", "2,470", "2,473", "2,474", "2,475", "2,476", "2,477", > "2,478", "2,479", "2,480", "2,483", "2,484,267", "2,485", > "2,486", "2,487", "2,490", "2,491", "2,491,176", "2,492", > "2,494", "2,495,976", "2,496", "2,497", "2,498", "2,499", > "2,500", "2,500,001", "2,501", "2,502", "2,503", "2,504", > "2,506", "2,507", "2,508", "2,509", "2,510", "2,511", "2,512", > "2,513", "2,514", "2,515", "2,516", "2,517", "2,519", "2,520", > "2,521", "2,523", "2,523,257", "2,524", "2,525", "2,526", > "2,527", "2,528", "2,529", "2,530", "2,531", "2,535", "2,536", > "2,538", "2,539", "2,540", "2,542", "2,543", "2,544", "2,545", > "2,546", "2,547", "2,549", "2,550", "2,551", "2,553", "2,554", > "2,556", "2,557", "2,558", "2,559", "2,560", "2,561", "2,563", > "2,564", "2,565", "2,566", "2,567", "2,570", "2,572", "2,574", > "2,576", "2,577", "2,578", "2,579", "2,580", "2,582", "2,583", > "2,584", "2,585", "2,588", "2,589", "2,590", "2,591", "2,592", > "2,593", "2,594", "2,596", "2,599", "2,600", "2,601", "2,602", > "2,604", "2,605", "2,606", "2,607", "2,609", "2,611", "2,612", > "2,613", "2,614", "2,615", "2,616", "2,617", "2,618", "2,619", > "2,620", "2,621", "2,622", "2,624", "2,626", "2,627", "2,628", > "2,628,385", "2,629", "2,630", "2,633", "2,634", "2,636", > "2,637", "2,638", "2,640", "2,642", "2,644", "2,647", "2,648", > "2,649", "2,650", "2,651", "2,654", "2,655", "2,656", "2,658", > "2,660", "2,661", "2,663", "2,665", "2,666", "2,667", "2,668", > "2,670", "2,671", "2,672", "2,673", "2,674", "2,676", "2,677", > "2,678", "2,679", "2,679,505", "2,680", "2,682", "2,684", > "2,687", "2,688", "2,689", "2,690", "2,692", "2,694", "2,695", > "2,696", "2,697", "2,699", "2,700", "2,702", "2,703", "2,705", > "2,706", "2,707", "2,708", "2,709", "2,710", "2,711", "2,712", > "2,713", "2,714", "2,715", "2,716", "2,717", "2,718", "2,720", > "2,721", "2,722", "2,723", "2,727", "2,728", "2,730", "2,732", > "2,733", "2,736", "2,737", "2,738", "2,739", "2,742", "2,744", > "2,747", "2,748", "2,749", "2,750", "2,752", "2,753", "2,754", > "2,758", "2,759", "2,760", "2,761", "2,765,189", "2,766", > "2,767", "2,770", "2,772", "2,773", "2,774", "2,776", "2,778", > "2,780", "2,783", "2,784", "2,785", "2,786", "2,787", "2,789", > "2,790", "2,792", "2,793", "2,794", "2,795", "2,797", "2,798", > "2,799", "2,800", "2,803", "2,804", "2,806", "2,808", "2,808,003", > "2,809", "2,810", "2,812", "2,813", "2,814", "2,816", "2,818", > "2,820", "2,824", "2,825", "2,826", "2,832", "2,835", "2,839,850", > "2,840", "2,841", "2,842", "2,844", "2,845", "2,846", "2,847", > "2,848", "2,850", "2,851", "2,852", "2,853", "2,855", "2,856", > "2,857", "2,858", "2,859", "2,861", "2,864", "2,865", "2,866", > "2,867", "2,869", "2,870", "2,873", "2,874", "2,875", "2,876", > "2,877", "2,878", "2,879", "2,880", "2,883", "2,884", "2,885", > "2,886", "2,890", "2,891", "2,892", "2,893", "2,894", "2,895", > "2,896", "2,897", "2,898", "2,899", "2,902", "2,903", "2,904", > "2,905", "2,907", "2,908", "2,908,956", "2,910", "2,911", > "2,912", "2,913", "2,916", "2,917", "2,922", "2,923", "2,924", > "2,925", "2,926", "2,929", "2,930", "2,931", "2,932", "2,933", > "2,934", "2,936", "2,937", "2,939", "2,941", "2,943", "2,944", > "2,946", "2,947", "2,948", "2,950", "2,951", "2,953", "2,954", > "2,955", "2,957", "2,959", "2,960", "2,961", "2,965", "2,967", > "2,968", "2,970", "2,971", "2,973", "2,975", "2,976", "2,979", > "2,981", "2,982", "2,983", "2,987", "2,988", "2,989", "2,990", > "2,991", "2,992", "2,993", "2,994", "2,996", "2,997", "2,998", > "2,999", "20", "20,000", "20,001", "20,003", "20,004", "20,028", > "20,035", "20,054", "20,066", "20,071", "20,077", "20,079", > "20,088", "20,101", "20,104", "20,116", "20,120", "20,126", > "20,151", "20,153", "20,157", "20,174", "20,176", "20,190", > "20,191", "20,196", "20,199", "20,213", "20,214", "20,217", > "20,225", "20,257", "20,262", "20,263", "20,288", "20,294", > "20,307", "20,320", "20,325", "20,349",
Re: [R] Dput Help in R
Hi Duncan, Please find the dput from the data. ab<-read.csv("collection_last.csv",header=TRUE) y<-ab[1:10,] ab<- "2,458", "2,461", "2,462", "2,463", "2,464", "2,465", "2,468", "2,469", "2,470", "2,473", "2,474", "2,475", "2,476", "2,477", "2,478", "2,479", "2,480", "2,483", "2,484,267", "2,485", "2,486", "2,487", "2,490", "2,491", "2,491,176", "2,492", "2,494", "2,495,976", "2,496", "2,497", "2,498", "2,499", "2,500", "2,500,001", "2,501", "2,502", "2,503", "2,504", "2,506", "2,507", "2,508", "2,509", "2,510", "2,511", "2,512", "2,513", "2,514", "2,515", "2,516", "2,517", "2,519", "2,520", "2,521", "2,523", "2,523,257", "2,524", "2,525", "2,526", "2,527", "2,528", "2,529", "2,530", "2,531", "2,535", "2,536", "2,538", "2,539", "2,540", "2,542", "2,543", "2,544", "2,545", "2,546", "2,547", "2,549", "2,550", "2,551", "2,553", "2,554", "2,556", "2,557", "2,558", "2,559", "2,560", "2,561", "2,563", "2,564", "2,565", "2,566", "2,567", "2,570", "2,572", "2,574", "2,576", "2,577", "2,578", "2,579", "2,580", "2,582", "2,583", "2,584", "2,585", "2,588", "2,589", "2,590", "2,591", "2,592", "2,593", "2,594", "2,596", "2,599", "2,600", "2,601", "2,602", "2,604", "2,605", "2,606", "2,607", "2,609", "2,611", "2,612", "2,613", "2,614", "2,615", "2,616", "2,617", "2,618", "2,619", "2,620", "2,621", "2,622", "2,624", "2,626", "2,627", "2,628", "2,628,385", "2,629", "2,630", "2,633", "2,634", "2,636", "2,637", "2,638", "2,640", "2,642", "2,644", "2,647", "2,648", "2,649", "2,650", "2,651", "2,654", "2,655", "2,656", "2,658", "2,660", "2,661", "2,663", "2,665", "2,666", "2,667", "2,668", "2,670", "2,671", "2,672", "2,673", "2,674", "2,676", "2,677", "2,678", "2,679", "2,679,505", "2,680", "2,682", "2,684", "2,687", "2,688", "2,689", "2,690", "2,692", "2,694", "2,695", "2,696", "2,697", "2,699", "2,700", "2,702", "2,703", "2,705", "2,706", "2,707", "2,708", "2,709", "2,710", "2,711", "2,712", "2,713", "2,714", "2,715", "2,716", "2,717", "2,718", "2,720", "2,721", "2,722", "2,723", "2,727", "2,728", "2,730", "2,732", "2,733", "2,736", "2,737", "2,738", "2,739", "2,742", "2,744", "2,747", "2,748", "2,749", "2,750", "2,752", "2,753", "2,754", "2,758", "2,759", "2,760", "2,761", "2,765,189", "2,766", "2,767", "2,770", "2,772", "2,773", "2,774", "2,776", "2,778", "2,780", "2,783", "2,784", "2,785", "2,786", "2,787", "2,789", "2,790", "2,792", "2,793", "2,794", "2,795", "2,797", "2,798", "2,799", "2,800", "2,803", "2,804", "2,806", "2,808", "2,808,003", "2,809", "2,810", "2,812", "2,813", "2,814", "2,816", "2,818", "2,820", "2,824", "2,825", "2,826", "2,832", "2,835", "2,839,850", "2,840", "2,841", "2,842", "2,844", "2,845", "2,846", "2,847", "2,848", "2,850", "2,851", "2,852", "2,853", "2,855", "2,856", "2,857", "2,858", "2,859", "2,861", "2,864", "2,865", "2,866", "2,867", "2,869", "2,870", "2,873", "2,874", "2,875", "2,876", "2,877", "2,878", "2,879", "2,880", "2,883", "2,884", "2,885", "2,886", "2,890", "2,891", "2,892", "2,893", "2,894", "2,895", "2,896", "2,897", "2,898", "2,899", "2,902", "2,903", "2,904", "2,905", "2,907", "2,908", "2,908,956", "2,910", "2,911", "2,912", "2,913", "2,916", "2,917", "2,922", "2,923", "2,924", "2,925", "2,926", "2,929", "2,930", "2,931", "2,932", "2,933", "2,934", "2,936", "2,937", "2,939", "2,941", "2,943", "2,944", "2,946", "2,947", "2,948", "2,950", "2,951", "2,953", "2,954", "2,955", "2,957", "2,959", "2,960", "2,961", "2,965", "2,967", "2,968", "2,970", "2,971", "2,973", "2,975", "2,976", "2,979", "2,981", "2,982", "2,983", "2,987", "2,988", "2,989", "2,990", "2,991", "2,992", "2,993", "2,994", "2,996", "2,997", "2,998", "2,999", "20", "20,000", "20,001", "20,003", "20,004", "20,028", "20,035", "20,054", "20,066", "20,071", "20,077", "20,079", "20,088", "20,101", "20,104", "20,116", "20,120", "20,126", "20,151", "20,153", "20,157", "20,174", "20,176", "20,190", "20,191", "20,196", "20,199", "20,213", "20,214", "20,217", "20,225", "20,257", "20,262", "20,263", "20,288", "20,294", "20,307", "20,320", "20,325", "20,349", "20,356", "20,375", "20,385", "20,387", "20,401", "20,405", "20,412", "20,425", "20,443", "20,462", "20,517", "20,525", "20,526", "20,532", "20,542", "20,547", "20,557", "20,576,265", "20,601", "20,612", "20,623", "20,625", "20,641", "20,657", "20,690", "20,691", "20,693", "20,700", "20,712", "20,725", "20,728", "20,752,792", "20,754", "20,773", "20,779", "20,780", "20,784", "20,792", "20,821", "20,830", "20,873", "20,882", "20,890", "20,900", "20,906", "20,947", "20,956", "20,964", "20,979", "20,980", "200", "200,000", "200,014", "200,023", "200,058", "200,105", "200,372", "200,476", "200,736", "201", "201,583", "201,759", "201,844", "202", "202,093", "202,278", "202,414", "202,753", "203", "203,380", "203,388", "204", "204,184", "204,403", "204,760", "204,846", "204,922", "205", "205,000", "205,307", "205,559", "206", "206,705", "206,916", "207", "207,367", "208", "208,096", "208,267", "208,284", "209", "209,075", "209,355", "209,653", "21", "21,010", "21,042",
Re: [R] Dput Help in R
> On Dec 30, 2015, at 11:26 PM, SHIVI BHATIA <shivi.bha...@safexpress.com> > wrote: > > Hi Duncan, > Please find the dput from the data. > > ab<-read.csv("collection_last.csv",header=TRUE) > y<-ab[1:10,] > This is (possibly) partial output from a dput call. Unable to repair at any rate. > > ab<- "2,458", "2,461", "2,462", "2,463", "2,464", "2,465", "2,468", > "2,469", "2,470", "2,473", "2,474", "2,475", "2,476", "2,477", > "2,478", "2,479", "2,480", "2,483", "2,484,267", "2,485", > snipped > "99,581", "99,834", "990", "992", "992,489", "993", "994", > "994,195", "995", "996", "998", "999"), class = "factor"), It is useful in showing that these items (presumably the column named "Final" are factors. Notice the commas in the values you might think were numeric. You will need to remove the commas (probably with `gsub`) before using `as.numeric`. I haven't quite figured out how a dataframe could have a factor column that was so much longer than the adjacent columns named "Month" and "Year". I would suggest redoing the read.csv with stringsAsFactor=FALSE so that you can then work on "pure" text before the coercion to numeric. -- David. > Month = structure(c(11L, 11L, 7L, 2L, 2L, 12L, 11L, 11L, >11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, > 11L, >11L, 11L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), > .Label = c("Apr", > > "Aug", "Dec", "Feb", "Jan", "Jul", "Jun", "Mar", "May", "Nov", > > "Oct", "Sep"), class = "factor"), Year = c(2010L, 2010L, > > 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, > > 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, > > 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, > > 2011L)), .Names = c("DOC_TYPE", "DOC_NO", "DOC_DT", "SFX_CODE", > > "CUSTOMER", "DOC_AMOUNT", "OS_ASON_RPT_DT", "OS_DAYS", "BILLING_BRANCH", > > "COLL_BR", "RECEIPT_NO", "RECEIPT_DT", "Applied.Date", "RECEIPT_AMT", > > "TDS_AMT", "REBATE", "Final", "Month", "Year"), row.names = c(NA, > > 30L), class = "data.frame") > > > Not sure if this would help. >> > -Original Message- > From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] > Sent: Wednesday, December 30, 2015 10:23 PM > To: SHIVI BHATIA <shivi.bha...@safexpress.com>; r-help@r-project.org > Subject: Re: [R] Dput Help in R > > On 30/12/2015 5:56 AM, SHIVI BHATIA wrote: >> Dear Team, >> >> >> >> I am facing an error while performing a manipulation using a dplyr > package. >> In the code below, I am using mutate to build a new calculated column: >> >> >> >> kp<-read.csv("collection_last.csv",header=TRUE) >> >> mutate(kp,dif=DOC_AMOUNT-RECEIPT_AMT+TDS_AMT+REBATE) >> >> >> >> However it gives an error:- >> >> Warning messages: >> >> 1: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L, : >> >> '-' not meaningful for factors >> >> 2: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L, : >> >> '+' not meaningful for factors >> >> 3: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L, : >> >> '+' not meaningful for factors >> >> >> >> This is an error when some of my variables are factors hence I have >> tried to change these to numeric so used the expression as: >> >> kp$DOC_TYPE=as.numeric(kp$DOC_TYPE). >> >> >> >> this now shows as variable type of as "double". So expedite help on >> this one i was trying to create a reproducible example and i am highly >> struggling to >> >> create one. the data i have is approx. around 1 million rows with 21 >> columns hence when i use a dput option it does not capture the entire >> detailing and row level info required to share and even >> dput(head(kp$DOC_TYPE) does not help
Re: [R] Dput Help in R
> On Dec 30, 2015, at 2:56 AM, SHIVI BHATIAwrote: > > Dear Team, > > > > I am facing an error while performing a manipulation using a dplyr package. > In the code below, I am using mutate to build a new calculated column: > > > > kp<-read.csv("collection_last.csv",header=TRUE) Given the material below, I suspect that columns which you suspected of being 'numeric' were actually found to have some values that could not be converted to that class and so were entered as 'factor's. The approach of converting such a set of factor-values back to their intended numeric-values is not as simple as coercing to numeric. I would instead suggest that you learn how to use the colClasses argument for the `read.*`-functions. If all of the values are numeric then it could be as simple as: kp<-read.csv("collection_last.csv", # header=TRUE is default for read csv colClasses="numeric" ) If it is not that simple, then this might succeed: kp[ , c('DOC_AMOUNT', 'RECEIPT_AMT', 'TDS_AMT', 'REBATE')] <- lapply( kp[ , c('DOC_AMOUNT', 'RECEIPT_AMT', 'TDS_AMT', 'REBATE')], function(x) as.numeric(as.character(x)) ) The care and fixing of factor arguments is just one of the items covered in the R-FAQ which, like the "Introduction to R" should be read by all R-noobs. -- David. > > mutate(kp,dif=DOC_AMOUNT-RECEIPT_AMT+TDS_AMT+REBATE) > > > > However it gives an error:- > > Warning messages: > > 1: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L, : > > '-' not meaningful for factors > > 2: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L, : > > '+' not meaningful for factors > > 3: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L, : > > '+' not meaningful for factors > > > > This is an error when some of my variables are factors hence I have tried to > change these to numeric so used the expression as: > > kp$DOC_TYPE=as.numeric(kp$DOC_TYPE). > > > > this now shows as variable type of as "double". So expedite help on this one > i was trying to create a reproducible example and i am highly struggling to > > create one. the data i have is approx. around 1 million rows with 21 columns > hence when i use a dput option it does not capture the entire detailing and > row level info required to share and even dput(head(kp$DOC_TYPE) does not > help either. > > I have seen many stack overflow & r help column before composing this email. > Hence i need help to create this reproducible example to share with the > experts in the community. Apologies if this is a repeat. > > > > PLEASE HELP AS I AM HIGHLY STRUGGLING TO BUILD ANY OUTCOME. > > Regards, Shivi > > > > This e-mail is confidential. It may also be legally privileged. If you are > not the addressee you may not copy, forward, disclose or use any part of it. > If you have received this message in error, please delete it and all copies > from your system and notify the sender immediately by return e-mail. Internet > communications cannot be guaranteed to be timely, secure, error or > virus-free. The sender does not accept liability for any errors or omissions. > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dput Help in R
On 30/12/2015 5:56 AM, SHIVI BHATIA wrote: Dear Team, I am facing an error while performing a manipulation using a dplyr package. In the code below, I am using mutate to build a new calculated column: kp<-read.csv("collection_last.csv",header=TRUE) mutate(kp,dif=DOC_AMOUNT-RECEIPT_AMT+TDS_AMT+REBATE) However it gives an error:- Warning messages: 1: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L, : '-' not meaningful for factors 2: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L, : '+' not meaningful for factors 3: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L, : '+' not meaningful for factors This is an error when some of my variables are factors hence I have tried to change these to numeric so used the expression as: kp$DOC_TYPE=as.numeric(kp$DOC_TYPE). this now shows as variable type of as "double". So expedite help on this one i was trying to create a reproducible example and i am highly struggling to create one. the data i have is approx. around 1 million rows with 21 columns hence when i use a dput option it does not capture the entire detailing and row level info required to share and even dput(head(kp$DOC_TYPE) does not help either. I have seen many stack overflow & r help column before composing this email. Hence i need help to create this reproducible example to share with the experts in the community. Apologies if this is a repeat. PLEASE HELP AS I AM HIGHLY STRUGGLING TO BUILD ANY OUTCOME. If you are working with a dataframe or matrix named x, just use y <- x[1:10,] to extract the first 10 rows. The error will probably occur with this subset as well, and dput() will give you a reasonably sized amount of output. If the error doesn't happen, just take a bigger subset, and possibly leave off the beginning, e.g. y <- x[101:110,] for 10 lines starting at line 101. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.