Re: [R] Mystery Error in midnightStandard
TB == Ted Byers r.ted.by...@gmail.com on Tue, 27 Jan 2009 16:00:27 -0500 TB I wasn't even aware I was using midnightStandard. You won't TB find it in my TB script. TB TB Here is the relevant loop: TB TB date1 = timeDate(charvec = Sys.Date(), format = %Y-%m-%d) TB date1 TB dow = 3; TB for (i in 1:length(V4) ) { TB x = read.csv(as.character(V4[[i]]), header = FALSE, TB na.strings=); TB y = x[,1]; TB year = V2[[i]]; TB week = V3[[i]]; TB dtstr = sprintf(%i-%i-%i,year,week,dow); TB date2 = timeDate(dtstr, format = %Y-%U-%w); TB resultsdataframe[[i]] - difftimeDate(date1,date2,units = TB weeks); TB fp = fitdistr(y,exponential); TB print(c(V1[[i]],V2[[i]],V3[[i]],fp,fp)); TB print(c(year,week,date2,resultsdataframe[[i]])); TB resultsdataframe[[i]] - fp; TB resultsdataframe[[i]] - fp; TB } TB TB It fails with a little more than 100 records left in V4. TB TB The full error message is: TB TB Error in midnightStandard(charvec, format) : TB 'charvec' has non-NA entries of different number of characters timeDate() uses the midnight standard. The function 'midnightStandard' assumes that all entries in 'charvec' have the same 'format'. Can you please check if this is the case? This is all I can say from the information you provided. Please give us a reproducible example. We can continue this discussion off-list. regards, Yohan TB TB Until it fails, date2 and resultsdataframe[[i]] get correct TB values. TB TB str() produces no surprises: TB TB str(resultsdataframe); TB 'data.frame': 303 obs. of 6 variables: TB $ mid : int 171 206 206 206 206 206 206 206 206 218 ... TB $ year : int 2008 2008 2008 2008 2008 2008 2008 2008 2008 TB 2008 ... TB $ week : int 16 17 18 19 21 26 31 35 51 40 ... TB $ dt : num 39.9 38.9 37.9 36.9 34.9 ... TB $ estimate: num Inf 0.25 Inf 0.0408 0.2 ... TB $ sd : num Inf 0.1768 Inf 0.0289 0.1414 ... TB TB I would assume the error is related to my new code that TB manipulates dates, TB as it doesn't occur in the earlier version that did not TB manipulate dates TB (the relevant work being done, albeit very slowly, within TB the DB). TB TB FTR: The year and week values are generated by MySQL using TB the YEAR and WEEK TB functions applied to timestamps. I do not know if it is TB relevant, but the TB week value, at the point of failure, is 0 (a value that does TB not occur TB earlier in the dataset, but several times subsequently), TB and I do not see TB how a value of 0 for the week (legitimate in posix date TB formats) could TB produce the error message I get. TB TB Any thoughts on what is really wrong, and how to fix it? TB TB Thanks TB TB Ted -- PhD student Swiss Federal Institute of Technology Zurich www.ethz.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mystery Error in midnightStandard
Hi Yohan, Thanks. On Wed, Jan 28, 2009 at 4:57 AM, Yohan Chalabi chal...@phys.ethz.ch wrote: TB == Ted Byers r.ted.by...@gmail.com on Tue, 27 Jan 2009 16:00:27 -0500 TB I wasn't even aware I was using midnightStandard. You won't TB find it in my TB script. TB TB Here is the relevant loop: TB TB date1 = timeDate(charvec = Sys.Date(), format = %Y-%m-%d) TB date1 TB dow = 3; TB for (i in 1:length(V4) ) { TB x = read.csv(as.character(V4[[i]]), header = FALSE, TB na.strings=); TB y = x[,1]; TB year = V2[[i]]; TB week = V3[[i]]; TB dtstr = sprintf(%i-%i-%i,year,week,dow); TB date2 = timeDate(dtstr, format = %Y-%U-%w); TB resultsdataframe[[i]] - difftimeDate(date1,date2,units = TB weeks); TB fp = fitdistr(y,exponential); TB print(c(V1[[i]],V2[[i]],V3[[i]],fp,fp)); TB print(c(year,week,date2,resultsdataframe[[i]])); TB resultsdataframe[[i]] - fp; TB resultsdataframe[[i]] - fp; TB } TB TB It fails with a little more than 100 records left in V4. TB TB The full error message is: TB TB Error in midnightStandard(charvec, format) : TB 'charvec' has non-NA entries of different number of characters timeDate() uses the midnight standard. The function 'midnightStandard' assumes that all entries in 'charvec' have the same 'format'. Can you please check if this is the case? It is certain that all entries have the same format, but I'm starting to think that the error message is something of a red herring. Consider this: year = 2009 week = 0 day = 3 datestr = sprintf(%i-%i-%i,year,week,day);datestr [1] 2009-0-3 date1 = timeDate(datestr, format = %Y-%U-%w); date1 GMT [1] [NA] day = 4 datestr = sprintf(%i-%i-%i,year,week,day);datestr [1] 2009-0-4 date1 = timeDate(datestr, format = %Y-%U-%w); date1 GMT [1] [2009-01-01] datestr = sprintf(%i-%i-%i,year,week,3);datestr [1] 2009-0-3 date2 = timeDate(datestr, format = %Y-%U-%w);date2 GMT [1] [NA] difftimeDate(date2,date1, units = weeks) Error in midnightStandard(charvec, format) : 'charvec' has non-NA entries of different number of characters In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf The first values for year, week and day are the values on which my loop dies. It returns 'NA' here. It seems clear that it is returning NA because the date that data corresponds to is 2008-12-31. The error is being produced by difftimeDate rather than timeDate (as shown by the above session). But that represents a flaw in the function design. It should fail when taking the elapsed time between a null and the present, but if I wrote such a function, I'd have it return null (perhaps with a warning) rather than just die. A bigger issue is that timeDate ought never give null here (which is what I assume 'NA' means), since all the data comes from transaction data with real dates, so the elapsed time, measured in weeks, ought to always be a valid real number that is positive semidefinite. I have not yet come to any conclusions as to how it ought to behave (whether to return new years day, along with a warning, or to return the date requested by reinvoking itself with the year and week adjusted so a valid date is returned). On a practical side, how would I test date2 to see if it is null, so I can give it a sensible default value? A more troubling thought is that with this handling of dates in this combination of SQL (my group by clause uses YEAR(transaction_date),WEEK(transaction_date)) to get the data and R to process it, the week containing new years day will ALWAYS be split in two at the first second of the new year. I'm going to have to either figure out a way to correct this, or ignore it (as it doesn't actually make things wrong, but rather it splits a sample into two unequal parts). Thoughts? Thanks Ted [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mystery Error in midnightStandard
TB == Ted Byers r.ted.by...@gmail.com on Wed, 28 Jan 2009 09:30:58 -0500 TB It is certain that all entries have the same format, but I'm TB starting to TB think that the error message is something of a red herring. TB Consider this: TB TB year = 2009 TB week = 0 TB day = 3 TB datestr = sprintf(%i-%i-%i,year,week,day);datestr TB [1] 2009-0-3 TB date1 = timeDate(datestr, format = %Y-%U-%w); TB date1 TB GMT TB [1] [NA] TB day = 4 TB datestr = sprintf(%i-%i-%i,year,week,day);datestr TB [1] 2009-0-4 TB date1 = timeDate(datestr, format = %Y-%U-%w); TB date1 TB GMT TB [1] [2009-01-01] TB TB datestr = sprintf(%i-%i-%i,year,week,3);datestr TB [1] 2009-0-3 TB date2 = timeDate(datestr, format = %Y-%U-%w);date2 TB GMT TB [1] [NA] TB difftimeDate(date2,date1, units = weeks) TB Error in midnightStandard(charvec, format) : TB 'charvec' has non-NA entries of different number of characters TB In addition: Warning messages: TB 1: In min(x) : no non-missing arguments to min; returning Inf TB 2: In max(x) : no non-missing arguments to max; returning -Inf TB TB TB TB The first values for year, week and day are the values on TB which my loop TB dies. It returns 'NA' here. It seems clear that it is TB returning NA because TB the date that data corresponds to is 2008-12-31. TB TB The error is being produced by difftimeDate rather than timeDate TB (as shown TB by the above session). But that represents a flaw in the TB function design. This is not a flaw in timeDate. it behaves the same way as 'as.POSIXct' strptime(datestr, format = %Y-%U-%w) Instead of claiming that there is a flaw in the function you could have suggested an 'is.na' method for 'timeDate'. I will add an 'is.na' method in the dev version of 'timeDate'. regards, Yohan TB It should fail when taking the elapsed time between a null TB and the present, TB but if I wrote such a function, I'd have it return null TB (perhaps with a TB warning) rather than just die. TB TB A bigger issue is that timeDate ought never give null here TB (which is what I TB assume 'NA' means), since all the data comes from transaction TB data with real TB dates, so the elapsed time, measured in weeks, ought to always TB be a valid TB real number that is positive semidefinite. I have not yet TB come to any TB conclusions as to how it ought to behave (whether to return TB new years day, TB along with a warning, or to return the date requested by TB reinvoking itself TB with the year and week adjusted so a valid date is returned). TB TB On a practical side, how would I test date2 to see if it is TB null, so I can TB give it a sensible default value? TB TB A more troubling thought is that with this handling of dates TB in this TB combination of SQL (my group by clause uses TB YEAR(transaction_date),WEEK(transaction_date)) to get the data TB and R to TB process it, the week containing new years day will ALWAYS be TB split in two at TB the first second of the new year. I'm going to have to either TB figure out a TB way to correct this, or ignore it (as it doesn't actually make TB things wrong, TB but rather it splits a sample into two unequal parts). -- PhD student Swiss Federal Institute of Technology Zurich www.ethz.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mystery Error in midnightStandard
Hi Yohan, On Wed, Jan 28, 2009 at 10:28 AM, Yohan Chalabi chal...@phys.ethz.chwrote: TB == Ted Byers r.ted.by...@gmail.com on Wed, 28 Jan 2009 09:30:58 -0500 TB It is certain that all entries have the same format, but I'm TB starting to TB think that the error message is something of a red herring. TB Consider this: TB TB year = 2009 TB week = 0 TB day = 3 TB datestr = sprintf(%i-%i-%i,year,week,day);datestr TB [1] 2009-0-3 TB date1 = timeDate(datestr, format = %Y-%U-%w); TB date1 TB GMT TB [1] [NA] TB day = 4 TB datestr = sprintf(%i-%i-%i,year,week,day);datestr TB [1] 2009-0-4 TB date1 = timeDate(datestr, format = %Y-%U-%w); TB date1 TB GMT TB [1] [2009-01-01] TB TB datestr = sprintf(%i-%i-%i,year,week,3);datestr TB [1] 2009-0-3 TB date2 = timeDate(datestr, format = %Y-%U-%w);date2 TB GMT TB [1] [NA] TB difftimeDate(date2,date1, units = weeks) TB Error in midnightStandard(charvec, format) : TB 'charvec' has non-NA entries of different number of characters TB In addition: Warning messages: TB 1: In min(x) : no non-missing arguments to min; returning Inf TB 2: In max(x) : no non-missing arguments to max; returning -Inf TB TB TB TB The first values for year, week and day are the values on TB which my loop TB dies. It returns 'NA' here. It seems clear that it is TB returning NA because TB the date that data corresponds to is 2008-12-31. TB TB The error is being produced by difftimeDate rather than timeDate TB (as shown TB by the above session). But that represents a flaw in the TB function design. This is not a flaw in timeDate. it behaves the same way as 'as.POSIXct' That the two behave the same doesn't change the assessment that the design is flawed. That doesn't mean that the function is wrong. It means only that the behaviour can be made more useful. For example, in SQL, if a given calculation returns NULL, and the result is subsequently used in another calculation, the result that returns is also NULL. That is quite useful, and admits algorithms that can react appropriately to NULLs when necessary. That is arguably better than forcing the code to fail the moment a NULL is used in a secondary calculation. In C++, OTOH, one can catch the problem earlier using, e.g., exceptions, again allowing the program to complete even when problems arise for certain values or combinations thereof. As a software engineer, I understand the issues involved in creating libraries. If I want to incorporate the functionality of a given standard suite of functions (e.g. ANSI C standard library functions, or posix functions), my first step would be to ensure I can duplicate how they behave. But I would not stop there. There are, for example, serious design flaws in many ANSI C functions that, ignored, introduce serious security defects in applications that use them. I would therefore refactor them to eliminate the security defects. If they can not be eliminated, I would replace the function in question by a similar function that does not have that security defect. Posix is a useful, but old, standard, and I am merely suggesting that once you have duplicated it, look beyond it to ways it can be improved upon. There is more to the design of a function than whether or not it gives the right result with good input. There is how it behaves when there is a problem with the inputs and whether or not you force the calling code to die when a problem arises or you give the calling code a way to react to such problems. When I add functions to my own C++ or Java libraries, I normally include more bad input data in the unit tests than good data (though the latter is sufficient to ensure correct results are invariably obtained), precisely so I can document how it behaves when there is a problem and give coders who use it a variety of options to use to deal with them. strptime(datestr, format = %Y-%U-%w) Instead of claiming that there is a flaw in the function you could have suggested an 'is.na' method for 'timeDate'. At the time, I did not know about is.na. I have spent the past hour trying is.na, but to no avail. I guess that is no surprise to you, but that it would fail is not reflected in the R documentation of is.na. That mentions S3, but not S4. As I just recently started using R, I have not yet looked at what S3 and S4 are, so that is a few more hours of study before I get this problem solved. I will add an 'is.na' method in the dev version of 'timeDate'. Thanks. I'll benefit from that once it makes it into the production release. In the mean time, I need to find a way to make something similar now, in my script. Thanks Ted [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Mystery Error in midnightStandard
TB == Ted Byers r.ted.by...@gmail.com on Wed, 28 Jan 2009 11:25:55 -0500 TB That the two behave the same doesn't change the assessment TB that the design TB is flawed. That doesn't mean that the function is wrong. TB It means only TB that the behaviour can be made more useful. For example, TB in SQL, if a given TB calculation returns NULL, and the result is subsequently used TB in another TB calculation, the result that returns is also NULL. That is TB quite useful, TB and admits algorithms that can react appropriately to NULLs TB when necessary. TB That is arguably better than forcing the code to fail the TB moment a NULL is TB used in a secondary calculation. In C++, OTOH, one can catch TB the problem TB earlier using, e.g., exceptions, again allowing the program TB to complete even TB when problems arise for certain values or combinations thereof. TB TB As a software engineer, I understand the issues involved TB in creating TB libraries. If I want to incorporate the functionality of a TB given standard TB suite of functions (e.g. ANSI C standard library functions, TB or posix TB functions), my first step would be to ensure I can duplicate TB how they TB behave. But I would not stop there. There are, for example, TB serious design TB flaws in many ANSI C functions that, ignored, introduce TB serious security TB defects in applications that use them. I would therefore TB refactor them to TB eliminate the security defects. If they can not be eliminated, TB I would TB replace the function in question by a similar function that TB does not have TB that security defect. TB TB Posix is a useful, but old, standard, and I am merely suggesting TB that once TB you have duplicated it, look beyond it to ways it can be TB improved upon. TB There is more to the design of a function than whether or not TB it gives the TB right result with good input. There is how it behaves when TB there is a TB problem with the inputs and whether or not you force the TB calling code to die TB when a problem arises or you give the calling code a way to TB react to such TB problems. When I add functions to my own C++ or Java libraries, TB I normally TB include more bad input data in the unit tests than good data TB (though the TB latter is sufficient to ensure correct results are invariably TB obtained), TB precisely so I can document how it behaves when there is a TB problem and give TB coders who use it a variety of options to use to deal with them. TB TB TB TB strptime(datestr, format = %Y-%U-%w) TB TB Instead of claiming that there is a flaw in the function TB you could have TB suggested an 'is.na' method for 'timeDate'. TB TB TB At the time, I did not know about is.na. I have spent the TB past hour trying TB is.na, but to no avail. I guess that is no surprise to you, TB but that it TB would fail is not reflected in the R documentation of is.na. TB That mentions TB S3, but not S4. As I just recently started using R, I have TB not yet looked TB at what S3 and S4 are, so that is a few more hours of study TB before I get TB this problem solved. TB TB TB TB I will add an 'is.na' method in the dev version of 'timeDate'. TB TB TB Thanks. I'll benefit from that once it makes it into the TB production TB release. In the mean time, I need to find a way to make TB something similar TB now, in my script. setMethod(is.na, timeDate, function(x) is.na(as.POSIXct(x))) TB TB Thanks -- PhD student Swiss Federal Institute of Technology Zurich www.ethz.ch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mystery Error in midnightStandard
I wasn't even aware I was using midnightStandard. You won't find it in my script. Here is the relevant loop: date1 = timeDate(charvec = Sys.Date(), format = %Y-%m-%d) date1 dow = 3; for (i in 1:length(V4) ) { x = read.csv(as.character(V4[[i]]), header = FALSE, na.strings=); y = x[,1]; year = V2[[i]]; week = V3[[i]]; dtstr = sprintf(%i-%i-%i,year,week,dow); date2 = timeDate(dtstr, format = %Y-%U-%w); resultsdataframe$dt[[i]] - difftimeDate(date1,date2,units = weeks); fp = fitdistr(y,exponential); print(c(V1[[i]],V2[[i]],V3[[i]],fp$estimate,fp$sd)); print(c(year,week,date2,resultsdataframe$dt[[i]])); resultsdataframe$estimate[[i]] - fp$estimate; resultsdataframe$sd[[i]] - fp$sd; } It fails with a little more than 100 records left in V4. The full error message is: Error in midnightStandard(charvec, format) : 'charvec' has non-NA entries of different number of characters Until it fails, date2 and resultsdataframe$dt[[i]] get correct values. str() produces no surprises: str(resultsdataframe); 'data.frame':303 obs. of 6 variables: $ mid : int 171 206 206 206 206 206 206 206 206 218 ... $ year: int 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 ... $ week: int 16 17 18 19 21 26 31 35 51 40 ... $ dt : num 39.9 38.9 37.9 36.9 34.9 ... $ estimate: num Inf 0.25 Inf 0.0408 0.2 ... $ sd : num Inf 0.1768 Inf 0.0289 0.1414 ... I would assume the error is related to my new code that manipulates dates, as it doesn't occur in the earlier version that did not manipulate dates (the relevant work being done, albeit very slowly, within the DB). FTR: The year and week values are generated by MySQL using the YEAR and WEEK functions applied to timestamps. I do not know if it is relevant, but the week value, at the point of failure, is 0 (a value that does not occur earlier in the dataset, but several times subsequently), and I do not see how a value of 0 for the week (legitimate in posix date formats) could produce the error message I get. Any thoughts on what is really wrong, and how to fix it? Thanks Ted [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.