Re: [R] Dput Help in R

2015-12-31 Thread Mark Sharp
Shivi,

It looks like you have copied and pasted with errors. When you use dput() on a 
dataframe, it will output a list (see example that follows). I think you have 
cut off the beginning of the output and have manually added the assignment 
“ab<-“. Also it is clear that the read.csv is interpreting your numbers as 
character strings. Note the quotation marks.

> sample_df <- data.frame(nums = 1:10, rand_num = runif(10))
> dput(sample_df)
structure(list(nums = 1:10, rand_num = c(0.346553232055157, 0.996620914200321, 
0.412795166717842, 0.250930240144953, 0.809035068377852, 0.782051374204457, 
0.0857722735963762, 0.938713687239215, 0.819887164747342, 0.870529293781146
)), .Names = c("nums", "rand_num"), row.names = c(NA, -10L), class = 
"data.frame”)


> On Dec 31, 2015, at 1:26 AM, SHIVI BHATIA  wrote:
> 
> Hi Duncan,
> Please find the dput from the data.
> 
> ab<-read.csv("collection_last.csv",header=TRUE)
> y<-ab[1:10,]
> 
> 
> ab<- "2,458", "2,461", "2,462", "2,463", "2,464", "2,465", "2,468",
> "2,469", "2,470", "2,473", "2,474", "2,475", "2,476", "2,477",
> "2,478", "2,479", "2,480", "2,483", "2,484,267", "2,485",
> "2,486", "2,487", "2,490", "2,491", "2,491,176", "2,492",
> "2,494", "2,495,976", "2,496", "2,497", "2,498", "2,499",
> "2,500", "2,500,001", "2,501", "2,502", "2,503", "2,504",
> "2,506", "2,507", "2,508", "2,509", "2,510", "2,511", "2,512",
> "2,513", "2,514", "2,515", "2,516", "2,517", "2,519", "2,520",
> "2,521", "2,523", "2,523,257", "2,524", "2,525", "2,526",
> "2,527", "2,528", "2,529", "2,530", "2,531", "2,535", "2,536",
> "2,538", "2,539", "2,540", "2,542", "2,543", "2,544", "2,545",
> "2,546", "2,547", "2,549", "2,550", "2,551", "2,553", "2,554",
> "2,556", "2,557", "2,558", "2,559", "2,560", "2,561", "2,563",
> "2,564", "2,565", "2,566", "2,567", "2,570", "2,572", "2,574",
> "2,576", "2,577", "2,578", "2,579", "2,580", "2,582", "2,583",
> "2,584", "2,585", "2,588", "2,589", "2,590", "2,591", "2,592",
> "2,593", "2,594", "2,596", "2,599", "2,600", "2,601", "2,602",
> "2,604", "2,605", "2,606", "2,607", "2,609", "2,611", "2,612",
> "2,613", "2,614", "2,615", "2,616", "2,617", "2,618", "2,619",
> "2,620", "2,621", "2,622", "2,624", "2,626", "2,627", "2,628",
> "2,628,385", "2,629", "2,630", "2,633", "2,634", "2,636",
> "2,637", "2,638", "2,640", "2,642", "2,644", "2,647", "2,648",
> "2,649", "2,650", "2,651", "2,654", "2,655", "2,656", "2,658",
> "2,660", "2,661", "2,663", "2,665", "2,666", "2,667", "2,668",
> "2,670", "2,671", "2,672", "2,673", "2,674", "2,676", "2,677",
> "2,678", "2,679", "2,679,505", "2,680", "2,682", "2,684",
> "2,687", "2,688", "2,689", "2,690", "2,692", "2,694", "2,695",
> "2,696", "2,697", "2,699", "2,700", "2,702", "2,703", "2,705",
> "2,706", "2,707", "2,708", "2,709", "2,710", "2,711", "2,712",
> "2,713", "2,714", "2,715", "2,716", "2,717", "2,718", "2,720",
> "2,721", "2,722", "2,723", "2,727", "2,728", "2,730", "2,732",
> "2,733", "2,736", "2,737", "2,738", "2,739", "2,742", "2,744",
> "2,747", "2,748", "2,749", "2,750", "2,752", "2,753", "2,754",
> "2,758", "2,759", "2,760", "2,761", "2,765,189", "2,766",
> "2,767", "2,770", "2,772", "2,773", "2,774", "2,776", "2,778",
> "2,780", "2,783", "2,784", "2,785", "2,786", "2,787", "2,789",
> "2,790", "2,792", "2,793", "2,794", "2,795", "2,797", "2,798",
> "2,799", "2,800", "2,803", "2,804", "2,806", "2,808", "2,808,003",
> "2,809", "2,810", "2,812", "2,813", "2,814", "2,816", "2,818",
> "2,820", "2,824", "2,825", "2,826", "2,832", "2,835", "2,839,850",
> "2,840", "2,841", "2,842", "2,844", "2,845", "2,846", "2,847",
> "2,848", "2,850", "2,851", "2,852", "2,853", "2,855", "2,856",
> "2,857", "2,858", "2,859", "2,861", "2,864", "2,865", "2,866",
> "2,867", "2,869", "2,870", "2,873", "2,874", "2,875", "2,876",
> "2,877", "2,878", "2,879", "2,880", "2,883", "2,884", "2,885",
> "2,886", "2,890", "2,891", "2,892", "2,893", "2,894", "2,895",
> "2,896", "2,897", "2,898", "2,899", "2,902", "2,903", "2,904",
> "2,905", "2,907", "2,908", "2,908,956", "2,910", "2,911",
> "2,912", "2,913", "2,916", "2,917", "2,922", "2,923", "2,924",
> "2,925", "2,926", "2,929", "2,930", "2,931", "2,932", "2,933",
> "2,934", "2,936", "2,937", "2,939", "2,941", "2,943", "2,944",
> "2,946", "2,947", "2,948", "2,950", "2,951", "2,953", "2,954",
> "2,955", "2,957", "2,959", "2,960", "2,961", "2,965", "2,967",
> "2,968", "2,970", "2,971", "2,973", "2,975", "2,976", "2,979",
> "2,981", "2,982", "2,983", "2,987", "2,988", "2,989", "2,990",
> "2,991", "2,992", "2,993", "2,994", "2,996", "2,997", "2,998",
> "2,999", "20", "20,000", "20,001", "20,003", "20,004", "20,028",
> "20,035", "20,054", "20,066", "20,071", "20,077", "20,079",
> "20,088", "20,101", "20,104", "20,116", "20,120", "20,126",
> "20,151", "20,153", "20,157", "20,174", "20,176", "20,190",
> "20,191", "20,196", "20,199", "20,213", "20,214", "20,217",
> "20,225", "20,257", "20,262", "20,263", "20,288", "20,294",
> "20,307", "20,320", "20,325", "20,349", 

Re: [R] Dput Help in R

2015-12-31 Thread SHIVI BHATIA
Hi Duncan,
Please find the dput from the data.

ab<-read.csv("collection_last.csv",header=TRUE)
y<-ab[1:10,]


ab<- "2,458", "2,461", "2,462", "2,463", "2,464", "2,465", "2,468",
"2,469", "2,470", "2,473", "2,474", "2,475", "2,476", "2,477",
"2,478", "2,479", "2,480", "2,483", "2,484,267", "2,485",
"2,486", "2,487", "2,490", "2,491", "2,491,176", "2,492",
"2,494", "2,495,976", "2,496", "2,497", "2,498", "2,499",
"2,500", "2,500,001", "2,501", "2,502", "2,503", "2,504",
"2,506", "2,507", "2,508", "2,509", "2,510", "2,511", "2,512",
"2,513", "2,514", "2,515", "2,516", "2,517", "2,519", "2,520",
"2,521", "2,523", "2,523,257", "2,524", "2,525", "2,526",
"2,527", "2,528", "2,529", "2,530", "2,531", "2,535", "2,536",
"2,538", "2,539", "2,540", "2,542", "2,543", "2,544", "2,545",
"2,546", "2,547", "2,549", "2,550", "2,551", "2,553", "2,554",
"2,556", "2,557", "2,558", "2,559", "2,560", "2,561", "2,563",
"2,564", "2,565", "2,566", "2,567", "2,570", "2,572", "2,574",
"2,576", "2,577", "2,578", "2,579", "2,580", "2,582", "2,583",
"2,584", "2,585", "2,588", "2,589", "2,590", "2,591", "2,592",
"2,593", "2,594", "2,596", "2,599", "2,600", "2,601", "2,602",
"2,604", "2,605", "2,606", "2,607", "2,609", "2,611", "2,612",
"2,613", "2,614", "2,615", "2,616", "2,617", "2,618", "2,619",
"2,620", "2,621", "2,622", "2,624", "2,626", "2,627", "2,628",
"2,628,385", "2,629", "2,630", "2,633", "2,634", "2,636",
"2,637", "2,638", "2,640", "2,642", "2,644", "2,647", "2,648",
"2,649", "2,650", "2,651", "2,654", "2,655", "2,656", "2,658",
"2,660", "2,661", "2,663", "2,665", "2,666", "2,667", "2,668",
"2,670", "2,671", "2,672", "2,673", "2,674", "2,676", "2,677",
"2,678", "2,679", "2,679,505", "2,680", "2,682", "2,684",
"2,687", "2,688", "2,689", "2,690", "2,692", "2,694", "2,695",
"2,696", "2,697", "2,699", "2,700", "2,702", "2,703", "2,705",
"2,706", "2,707", "2,708", "2,709", "2,710", "2,711", "2,712",
"2,713", "2,714", "2,715", "2,716", "2,717", "2,718", "2,720",
"2,721", "2,722", "2,723", "2,727", "2,728", "2,730", "2,732",
"2,733", "2,736", "2,737", "2,738", "2,739", "2,742", "2,744",
"2,747", "2,748", "2,749", "2,750", "2,752", "2,753", "2,754",
"2,758", "2,759", "2,760", "2,761", "2,765,189", "2,766",
"2,767", "2,770", "2,772", "2,773", "2,774", "2,776", "2,778",
"2,780", "2,783", "2,784", "2,785", "2,786", "2,787", "2,789",
"2,790", "2,792", "2,793", "2,794", "2,795", "2,797", "2,798",
"2,799", "2,800", "2,803", "2,804", "2,806", "2,808", "2,808,003",
"2,809", "2,810", "2,812", "2,813", "2,814", "2,816", "2,818",
"2,820", "2,824", "2,825", "2,826", "2,832", "2,835", "2,839,850",
"2,840", "2,841", "2,842", "2,844", "2,845", "2,846", "2,847",
"2,848", "2,850", "2,851", "2,852", "2,853", "2,855", "2,856",
"2,857", "2,858", "2,859", "2,861", "2,864", "2,865", "2,866",
"2,867", "2,869", "2,870", "2,873", "2,874", "2,875", "2,876",
"2,877", "2,878", "2,879", "2,880", "2,883", "2,884", "2,885",
"2,886", "2,890", "2,891", "2,892", "2,893", "2,894", "2,895",
"2,896", "2,897", "2,898", "2,899", "2,902", "2,903", "2,904",
"2,905", "2,907", "2,908", "2,908,956", "2,910", "2,911",
"2,912", "2,913", "2,916", "2,917", "2,922", "2,923", "2,924",
"2,925", "2,926", "2,929", "2,930", "2,931", "2,932", "2,933",
"2,934", "2,936", "2,937", "2,939", "2,941", "2,943", "2,944",
"2,946", "2,947", "2,948", "2,950", "2,951", "2,953", "2,954",
"2,955", "2,957", "2,959", "2,960", "2,961", "2,965", "2,967",
"2,968", "2,970", "2,971", "2,973", "2,975", "2,976", "2,979",
"2,981", "2,982", "2,983", "2,987", "2,988", "2,989", "2,990",
"2,991", "2,992", "2,993", "2,994", "2,996", "2,997", "2,998",
"2,999", "20", "20,000", "20,001", "20,003", "20,004", "20,028",
"20,035", "20,054", "20,066", "20,071", "20,077", "20,079",
"20,088", "20,101", "20,104", "20,116", "20,120", "20,126",
"20,151", "20,153", "20,157", "20,174", "20,176", "20,190",
"20,191", "20,196", "20,199", "20,213", "20,214", "20,217",
"20,225", "20,257", "20,262", "20,263", "20,288", "20,294",
"20,307", "20,320", "20,325", "20,349", "20,356", "20,375",
"20,385", "20,387", "20,401", "20,405", "20,412", "20,425",
"20,443", "20,462", "20,517", "20,525", "20,526", "20,532",
"20,542", "20,547", "20,557", "20,576,265", "20,601", "20,612",
"20,623", "20,625", "20,641", "20,657", "20,690", "20,691",
"20,693", "20,700", "20,712", "20,725", "20,728", "20,752,792",
"20,754", "20,773", "20,779", "20,780", "20,784", "20,792",
"20,821", "20,830", "20,873", "20,882", "20,890", "20,900",
"20,906", "20,947", "20,956", "20,964", "20,979", "20,980",
"200", "200,000", "200,014", "200,023", "200,058", "200,105",
"200,372", "200,476", "200,736", "201", "201,583", "201,759",
"201,844", "202", "202,093", "202,278", "202,414", "202,753",
"203", "203,380", "203,388", "204", "204,184", "204,403",
"204,760", "204,846", "204,922", "205", "205,000", "205,307",
"205,559", "206", "206,705", "206,916", "207", "207,367",
"208", "208,096", "208,267", "208,284", "209", "209,075",
"209,355", "209,653", "21", "21,010", "21,042", 

Re: [R] Dput Help in R

2015-12-31 Thread David Winsemius

> On Dec 30, 2015, at 11:26 PM, SHIVI BHATIA <shivi.bha...@safexpress.com> 
> wrote:
> 
> Hi Duncan,
> Please find the dput from the data.
> 
> ab<-read.csv("collection_last.csv",header=TRUE)
> y<-ab[1:10,]
> 

This is (possibly) partial output from a dput call. Unable to repair at any 
rate.
> 
> ab<- "2,458", "2,461", "2,462", "2,463", "2,464", "2,465", "2,468",
> "2,469", "2,470", "2,473", "2,474", "2,475", "2,476", "2,477",
> "2,478", "2,479", "2,480", "2,483", "2,484,267", "2,485",
> 
snipped
> "99,581", "99,834", "990", "992", "992,489", "993", "994",
> "994,195", "995", "996", "998", "999"), class = "factor"),


It is useful in showing that these items (presumably the column named "Final" 
are factors. Notice the commas in the values you might think were numeric. You 
will need to remove the commas (probably with `gsub`) before using `as.numeric`.

I haven't quite figured out how a dataframe could have a factor column that was 
so much longer than the adjacent columns named "Month" and "Year". I would 
suggest redoing the read.csv with stringsAsFactor=FALSE so that you can then 
work on "pure" text before the coercion to numeric.

-- David.



> Month = structure(c(11L, 11L, 7L, 2L, 2L, 12L, 11L, 11L,
>11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
> 11L,
>11L, 11L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L),
> .Label = c("Apr",
> 
> "Aug", "Dec", "Feb", "Jan", "Jul", "Jun", "Mar", "May", "Nov",
> 
> "Oct", "Sep"), class = "factor"), Year = c(2010L, 2010L,
> 
> 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
> 
> 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
> 
> 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
> 
> 2011L)), .Names = c("DOC_TYPE", "DOC_NO", "DOC_DT", "SFX_CODE",
> 
> "CUSTOMER", "DOC_AMOUNT", "OS_ASON_RPT_DT", "OS_DAYS", "BILLING_BRANCH",
> 
> "COLL_BR", "RECEIPT_NO", "RECEIPT_DT", "Applied.Date", "RECEIPT_AMT",
> 
> "TDS_AMT", "REBATE", "Final", "Month", "Year"), row.names = c(NA,
> 
> 30L), class = "data.frame")
> 
> 
> Not sure if this would help.
>> 
> -Original Message-
> From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
> Sent: Wednesday, December 30, 2015 10:23 PM
> To: SHIVI BHATIA <shivi.bha...@safexpress.com>; r-help@r-project.org
> Subject: Re: [R] Dput Help in R
> 
> On 30/12/2015 5:56 AM, SHIVI BHATIA wrote:
>> Dear Team,
>> 
>> 
>> 
>> I am facing an error while performing a manipulation using a dplyr
> package.
>> In the code below, I am using mutate to build a new calculated column:
>> 
>> 
>> 
>> kp<-read.csv("collection_last.csv",header=TRUE)
>> 
>> mutate(kp,dif=DOC_AMOUNT-RECEIPT_AMT+TDS_AMT+REBATE)
>> 
>> 
>> 
>> However it gives an error:-
>> 
>> Warning messages:
>> 
>> 1: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
>> 
>>   '-' not meaningful for factors
>> 
>> 2: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
>> 
>>   '+' not meaningful for factors
>> 
>> 3: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
>> 
>>   '+' not meaningful for factors
>> 
>> 
>> 
>> This is an error when some of my variables are factors hence I have
>> tried to change these to numeric so used the expression as:
>> 
>> kp$DOC_TYPE=as.numeric(kp$DOC_TYPE).
>> 
>> 
>> 
>> this now shows as variable type of as "double". So expedite help on
>> this one i was trying to create a reproducible example and i am highly
>> struggling to
>> 
>> create one. the data i have is approx. around 1 million rows with 21
>> columns hence when i use a dput option it does not capture the entire
>> detailing and row level info required to share and even
>> dput(head(kp$DOC_TYPE) does not help

Re: [R] Dput Help in R

2015-12-30 Thread David Winsemius

> On Dec 30, 2015, at 2:56 AM, SHIVI BHATIA  wrote:
> 
> Dear Team, 
> 
> 
> 
> I am facing an error while performing a manipulation using a dplyr package.
> In the code below, I am using mutate to build a new calculated column:
> 
> 
> 
> kp<-read.csv("collection_last.csv",header=TRUE)

Given the material below, I suspect that columns which you suspected of being 
'numeric' were actually found to have some values that could not be converted 
to that class and so were entered as 'factor's. The approach of converting such 
a set of factor-values back to their intended numeric-values is not as simple 
as coercing to numeric. I would instead suggest that you learn how to use the 
colClasses argument for the `read.*`-functions. If all of the values are 
numeric then it could be as simple as:

kp<-read.csv("collection_last.csv",  # header=TRUE is default for read csv
   colClasses="numeric" )

If it is not that simple, then this might succeed:

kp[ , c('DOC_AMOUNT', 'RECEIPT_AMT', 'TDS_AMT', 'REBATE')] <- 
 lapply( kp[ , c('DOC_AMOUNT', 'RECEIPT_AMT', 'TDS_AMT', 'REBATE')], 
 function(x) as.numeric(as.character(x))
)

The care and fixing of factor arguments is just one of the items covered in the 
R-FAQ which, like the "Introduction to R" should be read by all R-noobs.

-- 
David.

> 
> mutate(kp,dif=DOC_AMOUNT-RECEIPT_AMT+TDS_AMT+REBATE)
> 
> 
> 
> However it gives an error:-
> 
> Warning messages:
> 
> 1: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
> 
>  '-' not meaningful for factors
> 
> 2: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
> 
>  '+' not meaningful for factors
> 
> 3: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :
> 
>  '+' not meaningful for factors
> 
> 
> 
> This is an error when some of my variables are factors hence I have tried to
> change these to numeric so used the expression as:
> 
> kp$DOC_TYPE=as.numeric(kp$DOC_TYPE). 
> 
> 
> 
> this now shows as variable type of as "double". So expedite help on this one
> i was trying to create a reproducible example and i am highly struggling to 
> 
> create one. the data i have is approx. around 1 million rows with 21 columns
> hence when i use a dput option it does not capture the entire detailing and
> row level info required to share and even dput(head(kp$DOC_TYPE) does not
> help either. 
> 
> I have seen many stack overflow & r help column before composing this email.
> Hence i need help to create this reproducible example to share with the
> experts in the community. Apologies if this is a repeat.
> 
> 
> 
> PLEASE HELP AS I AM HIGHLY STRUGGLING TO BUILD ANY OUTCOME. 
> 
> Regards, Shivi
> 
> 
> 
> This e-mail is confidential. It may also be legally privileged. If you are 
> not the addressee you may not copy, forward, disclose or use any part of it. 
> If you have received this message in error, please delete it and all copies 
> from your system and notify the sender immediately by return e-mail. Internet 
> communications cannot be guaranteed to be timely, secure, error or 
> virus-free. The sender does not accept liability for any errors or omissions.
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dput Help in R

2015-12-30 Thread Duncan Murdoch

On 30/12/2015 5:56 AM, SHIVI BHATIA wrote:

Dear Team,



I am facing an error while performing a manipulation using a dplyr package.
In the code below, I am using mutate to build a new calculated column:



kp<-read.csv("collection_last.csv",header=TRUE)

mutate(kp,dif=DOC_AMOUNT-RECEIPT_AMT+TDS_AMT+REBATE)



However it gives an error:-

Warning messages:

1: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :

   '-' not meaningful for factors

2: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :

   '+' not meaningful for factors

3: In Ops.factor(c(28831L, 28831L, 17504L, 4184L, 36187L, 25819L, 699L,  :

   '+' not meaningful for factors



This is an error when some of my variables are factors hence I have tried to
change these to numeric so used the expression as:

kp$DOC_TYPE=as.numeric(kp$DOC_TYPE).



this now shows as variable type of as "double". So expedite help on this one
i was trying to create a reproducible example and i am highly struggling to

create one. the data i have is approx. around 1 million rows with 21 columns
hence when i use a dput option it does not capture the entire detailing and
row level info required to share and even dput(head(kp$DOC_TYPE) does not
help either.

I have seen many stack overflow & r help column before composing this email.
Hence i need help to create this reproducible example to share with the
experts in the community. Apologies if this is a repeat.



PLEASE HELP AS I AM HIGHLY STRUGGLING TO BUILD ANY OUTCOME.


If you are working with a dataframe or matrix named x, just use

y <- x[1:10,]

to extract the first 10 rows.  The error will probably occur with this 
subset as well, and dput() will give you a reasonably sized amount of 
output.  If the error doesn't happen, just take a bigger subset, and 
possibly leave off the beginning, e.g.


y <- x[101:110,]

for 10 lines starting at line 101.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.