Re: [Rd] binary string conversion to a vector (PR#14120)
Just responding to some of the issues in this long post: (1) Don't rely on the printed form of an object to decide whether or not they are identical. The function str() is very useful in this regard, and sometimes also unclass(). To see whether two object are identical, use the function identical() > qvector <- c("0", "0", "0", "1", "1", "0", "1") > qvector[1] [1] "0" > noquote(qvector[1]) [1] 0 > str(noquote(qvector[1])) Class 'noquote' chr "0" > as.integer(qvector[1]) [1] 0 > str(as.integer(qvector[1])) int 0 > identical(noquote(qvector[1]), as.integer(qvector[1])) [1] FALSE > Does this alleviate the concern as to the possibility of a bug in noquote/as.integer? Or were there deeper issues? (2) to see how some other users of R have package up miscellaneous functions that might be of use to other people, look for packages on CRAN with "misc" in their names -- I see almost 10 of them. The problem with just posting snippets of code is that they get lost in all the other posts here, and many long term R users have dozens if not hundreds of their own functions that are streamlined for their own frequent tasks and style of programming. (3) sounds like a great idea to use R to bring statistical rigor into the analysis of the performance of combinatorial optimization algorithms! (4) install.packages("stringr") works fine for me. Maybe it was a temporary glitch? Have you checked whether you have a valid repository selected? E.g., I have in my .Rprofile: options(repos=c(CRAN="http://cran.cnr.Berkeley.edu"; , CRANextra="http://www.stats.ox.ac.uk/pub/RWin";)) Enjoy learning R! -- Tony Plate Franc Brglez wrote: > Hello! > > Please accept my sincere apologies for annoying the R development team with > my post this week. If I were required to register as "a developer" before > submission, this would not have happened. To rehabilitate myself, please find > at the bottom of this mail two R-functions, 'string2vector' and > 'vector2string', with "comments and tests". Both functions may go a long way > towards assisting a number of R-users to make their R-programming more > productive. I am a novice R-programmer: I started dabbling in R less than two > months ago, heavily influenced by examples of code I see, including within > the R.org documents (monkey does what monkey sees). Before posting two > functions, I would really appreciate constructive edits where they may be > needed as well as their posting by someone-in-the-know so there will be > conveniently accessible for R users. > > I am very impressed with potential of R and the community supporting it. I > just wish I got to R sooner: I am looking to R to better support my work in > "designed experiments to assess the statistically significant performance of > combinatorial optimization algorithms on instance isomorphs of NP-hard > problems" -- for better context of this mouthful, see the few postings under > http://www.cbl.ncsu.edu:16080/xBed/publications/ > I am working on a tutorial paper where I expect R to play a significant role > in better explaining and illustrating, code-wise and graphically, the > concepts discussed in the publications above. I would welcome a co-author > with experience in R-programming as well as statistics and interests in the > experimental methods addressed in these publications. > > As I elaborate in notes that follow, I was looking at a variety of > "R-documents" before my "bug" submission. I would appreciate very much if > some of you could take the time to scan through these notes and respond > briefly with useful pointers. Here are the headlines: > > (1) why I still think there may be a bug with 'noquote' vs 'as.integer' > > (2) search on "split string" and "join string"; the missing package > "stringr" > > (3) a take on "Tcl" commands 'split', 'join', 'string', 'append', > 'foreach' > > (4) a take on "R" functions 'string2vector' and 'vector2string' > > (5) code and comments for "R" functions 'string2vector' and 'vector2string > > (1) why I still think there may be a bug with 'noquote' vs 'as.integer' > > >> # MacOSX 10.6.2, R 2.9.1 GUI 1.28 Tiger build 32-bit (5444) >> qvector >> > [1] "0" "0" "0" "1" "1" "0" "1" > >> qvector[1] >> > [1] "0" > >> tmp = noquote(qvector[1]) >> tmp >> > [1] 0 > >> tmp = as.integer(qvector[1]) >> tmp >> > [1] 0 > > When embedded in the function as per my "bug" report, 'noquote' and > 'as.integer' are no longer equivalent whereas in the example above they > appear to be equivalent!! I submitted the "function" with print/cat > statements for sake of illustration. > > (2) search on "split string" and "join string"; the missing package "stringr" > > http://search.r-project.org/ reveals >orderof 850 messages for search o
[Rd] daylight saving / time zone issues with as.POSIXlt/as.POSIXct (PR#10392)
Running under Windows XP 64 bit, as.POSIXlt()/as.POSIXct() seem to think that US time zones (EST5EDT, MST7MDT) switched from daylight savings back to standard time on Oct 28, 2007, whereas the switch is actually on Sun Nov 04, 2007. Examples: > Sys.timezone() [1] "Mountain Daylight Time" > as.POSIXct("2007-10-30 12:38:47") [1] "2007-10-30 12:38:47 Mountain Daylight Time" > # *** Should report 2007-10-30 14:38:47 EDT: > as.POSIXlt(as.POSIXct("2007-10-30 12:38:47"), "EST5EDT") [1] "2007-10-30 13:38:47 EST" > Sys.time() [1] "2007-11-01 09:22:28 Mountain Daylight Time" > # Bad behavior is manifested in different ways with TZ="MST7MDT" > Sys.setenv(TZ="MST7MDT") > # *** Should report "12:38:47 MDT" > as.POSIXct("2007-10-30 12:38:47") [1] "2007-10-30 12:38:47 MST" > as.POSIXlt(as.POSIXct("2007-10-30 12:38:47"), "EST5EDT") [1] "2007-10-30 14:38:47 EST" > # *** Should report "2007-11-01 09:23:09 MDT" > Sys.time() [1] "2007-11-01 08:23:09 MST" > > sessionInfo() R version 2.6.0 Patched (2007-10-11 r43143) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base > Furthermore, with the timezone "Mountain Daylight Time" (which is the default I get when I start R), the switch appears to be on Nov 5 in 2006, whereas it actually was on Oct 29 in 2006. > # New R session > Sys.timezone() [1] "Mountain Daylight Time" > # *** wrong switch in 2006 *** > as.POSIXct("2006-10-30 12:38:47")+(-4:7)*(24*3600) [1] "2006-10-26 12:38:47 Mountain Daylight Time" [2] "2006-10-27 12:38:47 Mountain Daylight Time" [3] "2006-10-28 12:38:47 Mountain Daylight Time" [4] "2006-10-29 12:38:47 Mountain Daylight Time" [5] "2006-10-30 12:38:47 Mountain Daylight Time" [6] "2006-10-31 12:38:47 Mountain Daylight Time" [7] "2006-11-01 12:38:47 Mountain Daylight Time" [8] "2006-11-02 12:38:47 Mountain Daylight Time" [9] "2006-11-03 12:38:47 Mountain Daylight Time" [10] "2006-11-04 12:38:47 Mountain Daylight Time" [11] "2006-11-05 11:38:47 Mountain Standard Time" [12] "2006-11-06 11:38:47 Mountain Standard Time" > as.POSIXct("2007-10-30 12:38:47")+(-4:7)*(24*3600) [1] "2007-10-26 12:38:47 Mountain Daylight Time" [2] "2007-10-27 12:38:47 Mountain Daylight Time" [3] "2007-10-28 12:38:47 Mountain Daylight Time" [4] "2007-10-29 12:38:47 Mountain Daylight Time" [5] "2007-10-30 12:38:47 Mountain Daylight Time" [6] "2007-10-31 12:38:47 Mountain Daylight Time" [7] "2007-11-01 12:38:47 Mountain Daylight Time" [8] "2007-11-02 12:38:47 Mountain Daylight Time" [9] "2007-11-03 12:38:47 Mountain Daylight Time" [10] "2007-11-04 11:38:47 Mountain Standard Time" [11] "2007-11-05 11:38:47 Mountain Standard Time" [12] "2007-11-06 11:38:47 Mountain Standard Time" > Sys.setenv(TZ="MST7MDT") > Sys.timezone() [1] "MST" > as.POSIXct("2006-10-30 12:38:47")+(-4:7)*(24*3600) [1] "2006-10-26 13:38:47 MDT" "2006-10-27 13:38:47 MDT" [3] "2006-10-28 13:38:47 MDT" "2006-10-29 12:38:47 MST" [5] "2006-10-30 12:38:47 MST" "2006-10-31 12:38:47 MST" [7] "2006-11-01 12:38:47 MST" "2006-11-02 12:38:47 MST" [9] "2006-11-03 12:38:47 MST" "2006-11-04 12:38:47 MST" [11] "2006-11-05 12:38:47 MST" "2006-11-06 12:38:47 MST" > # *** wrong switch in 2007 *** > as.POSIXct("2007-10-30 12:38:47")+(-4:7)*(24*3600) [1] "2007-10-26 13:38:47 MDT" "2007-10-27 13:38:47 MDT" [3] "2007-10-28 12:38:47 MST" "2007-10-29 12:38:47 MST" [5] "2007-10-30 12:38:47 MST" "2007-10-31 12:38:47 MST" [7] "2007-11-01 12:38:47 MST" "2007-11-02 12:38:47 MST" [9] "2007-11-03 12:38:47 MST" "2007-11-04 12:38:47 MST" [11] "2007-11-05 12:38:47 MST" "2007-11-06 12:38:47 MST" > I see this behavior on all the Windows systems I have tried: Windows XP 64 bit, Windows XP 32 bit Pro, Windows XP home, Windows 2000, with a variety of R versions. The systems have all relevant Windows updates applied (unless some were inadvertently missed) and the systems otherwise appear to behave correctly with respect to times and timezones. I do not see this problem on Ubuntu Linux systems. -- Tony Plate __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] minor omissions re "special" windows file names (PR#9622)
"lpt1" through "lpt9" and "com1" through "com9" are in fact "special" file names under various versions of Microsoft Windows. However, "R Extensions" says only that "lpt1" through "lpt4" and "com1" through "com3" cannot be used as filenames, and "R CMD check" checks for just these filenames. Consequently, "R CMD check" can let through filenames that are disallowed under Windows. For references, see http://en.wikipedia.org/wiki/Filename and http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/naming_a_file.asp (FWIW, note however that Wikipedia *may* be mistaken in claiming that "com0" and "lpt0" are special file names -- these are not listed on the Microsoft page, and I verified that they can be used as normal file names on my Windows XP machine at least.) Accordingly, the following changes might be desired: In src/scripts/check.in: OLD LINE: if(grep(/^(con|prn|aux|clock\$|nul|lpt[1-3]|com[1-4])$/, NEW LINE: if(grep(/^(con|prn|aux|clock\$|nul|lpt[1-9]|com[1-9])$/, ALSO: fix comment at beginning of check.in (search for "com1") In doc/manuals/R-exts.texi, I would suggest changing - names. In addition, files with names @samp{con}, @samp{prn}, @samp{aux}, @samp{clock$}, @samp{nul}, @samp{com1} to @samp{com4}, and @samp{lpt1} to @samp{lpt3} after conversion to lower case and stripping possible ``extensions'', are disallowed. Also, file names in the same - to - names. In addition, files with names @samp{con}, @samp{prn}, @samp{aux}, @samp{clock$}, @samp{nul}, @samp{com1} to @samp{com9}, and @samp{lpt1} to @samp{lpt9} after conversion to lower case and stripping possible ``extensions'', are disallowed (e.g., @samp{lpt5.foo.bar} is disallowed.) Also, file names in the same - (with the example added to clarify what "stripping possible extensions" means). Obviously, not a high-priority error. -- Tony Plate __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] bug in format.default: trim=TRUE does not always work as advertised (PR#9114)
This is a multi-part message in MIME format. --090008060607010208040805 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit DESCRIPTION OF PROBLEM: Output from format.default sometimes has whitespace around it when using big.mark="," and trim=TRUE. E.g.: > # works ok as long as big.mark is not actually used: > format(c(-1,1,10,999), big.mark=",", trim=TRUE) [1] "-1" "1" "10" "999" > # but if big.mark is used, output is justified and not trimmed: > format(c(-1,1,10,999,1e6), big.mark=",", trim=TRUE) [1] " -1" "1" " 10" " 999" "1,000,000" > The documentation for the argument 'trim' to format.default() states: trim: logical; if 'FALSE', logical, numeric and complex values are right-justified to a common width: if 'TRUE' the leading blanks for justification are suppressed. Thus, the above behavior of including blanks for justification when trim=FALSE (in some situations) seems to contradict the documentation. PROPOSED FIX: The last call to prettyNum() in format.default() (src/library/base/R/format.R) has the argument preserve.width = "common" If this is changed to preserve.width = if (trim) "individual" else "common" then output is formatted correctly in the case above. A patch for this one line is attached to this message (patch done against the released R-2.3.1 source tarball (2006/06/01), the format.R file in this release is not different to the one in the current snapshot of the devel version of R). After making these changes, I ran "make check-all". I did not see any tests that seemed to break with these changes. -- Tony Plate --090008060607010208040805 Content-Type: text/plain; name="format.default.fixes.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="format.default.fixes.diff" --- R-2.3.1-orig/src/library/base/R/format.R2006-04-09 16:19:19.0 -0600 +++ R-2.3.1/src/library/base/R/format.R 2006-07-26 15:52:42.117456700 -0600 @@ -37,7 +37,7 @@ small.interval = small.interval, decimal.mark = decimal.mark, zero.print = zero.print, -preserve.width = "common") + preserve.width = if (trim) "individual" else "common") ) } } --090008060607010208040805-- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] (PR#7943) documentation enhancement: Note in ?seek for Windows
Fair enough. But for the benefit of the unfortunate souls having to work in Windows, it would be nice if the documentation were as explicit as possible about what is and is not known about particular issues (while still being concise). How about the following: NEW: #ifdef windows The value returned by \code{seek()} is known to be unreliable on Windows systems for text mode files. The Windows documentation states that the return values from Windows OS seek functions for text mode files are unreliable (because the Windows file-I/O functions can insert extra characters at end-of-lines when working with text mode files.) Binary mode files should not be affected by this particular issue, but there are known problems on Windows systems with the reliability of the return value from \code{seek()} for binary mode files opened in append mode. Clipboard connections can seek too. #endif Tony Plate Prof Brian Ripley wrote: > I think the proposed change is appropriate only if the return value is > *known* to be reliable for binary files. > > I for one do not trust the writers of an OS whom have made such a > serious error in one mode (and many other errors elsewhere) not to have > made one in closely related code. Since it is not Open Source, we > cannot find out. > > On Wed, 15 Jun 2005 [EMAIL PROTECTED] wrote: > >> [I started a new bug report for this issue because it was not the >> primary issue in the original discussion, which was PR#7899] >> >> [EMAIL PROTECTED] wrote: >> > Tony Plate wrote: >> > [snip] >> >>[EMAIL PROTECTED] wrote: >> >>[snip] >> >>>Note that ?seek currently tells us "The value returned by >> >>>seek(where=NA) appears to be unreliable on Windows systems, at least >> >>>for text files." >> >>>It would be nice if this comment could be removed, of course >> >> >> >> >> >>May the explanation could be given that this happens with text files >> >>because Windows inserts extra characters at end-of-lines when reading >> >>"text" mode files (but with binary files, things should be fine.) This >> >>particular issue is documented in Microsoft Windows documentation >> (e.g., >> >>at http://msdn2.microsoft.com/library/75yw9bf3(en-us,vs.80).aspx, found >> >>by searching on Google using the terms "fseek windows documentation"). >> >>Are there any known issues using seek with binary files under Windows? >> >>If there are not, then the caveat could be made specific to text files >> >>and all vagueness removed. >> > >> > >> > Hmm, all I find (including your link) is Windows CE related ... >> > >> > Uwe Ligges >> >> For the record, the documentation I pointed to is for Windows 2000 etc, >> and is not just related to Windows CE (Uwe retracted that claim in a >> private email). >> >> So, the suggestion to refine the note in ?seek stands. Perhaps >> src/library/base/man/seek.Rd could be changed as follows: >> >> OLD: >> >> #ifdef windows >> The value returned by \code{seek(where=NA)} appears to be unreliable >> on Windows systems, at least for text files. Clipboard connections >> can seek too. >> #endif >> >> NEW: >> >> #ifdef windows >> The value returned by \code{seek()} is unreliable >> on Windows systems for text files. This is because the Windows >> file-I/O functions can insert extra characters at end-of-lines >> when working with text mode files. Binary mode files should not >> be affected by this issue. Clipboard connections can seek too. >> #endif >> >> Of course, if someone knows that the return value of seek() is >> unreliable on Windows for binary files, this documentation change is >> innappropriate (and then the documentation should probably be changed >> from "appears to be unreliable, at least for text files" to "is >> unreliable, for both binary and text files". >> >> -- Tony Plate >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] documentation enhancement: Note in ?seek for Windows (PR#7943)
[I started a new bug report for this issue because it was not the primary issue in the original discussion, which was PR#7899] [EMAIL PROTECTED] wrote: > Tony Plate wrote: > [snip] >>[EMAIL PROTECTED] wrote: >>[snip] >>>Note that ?seek currently tells us "The value returned by >>>seek(where=NA) appears to be unreliable on Windows systems, at least >>>for text files." >>>It would be nice if this comment could be removed, of course >> >> >>May the explanation could be given that this happens with text files >>because Windows inserts extra characters at end-of-lines when reading >>"text" mode files (but with binary files, things should be fine.) This >>particular issue is documented in Microsoft Windows documentation (e.g., >>at http://msdn2.microsoft.com/library/75yw9bf3(en-us,vs.80).aspx, found >>by searching on Google using the terms "fseek windows documentation"). >>Are there any known issues using seek with binary files under Windows? >>If there are not, then the caveat could be made specific to text files >>and all vagueness removed. > > > Hmm, all I find (including your link) is Windows CE related ... > > Uwe Ligges For the record, the documentation I pointed to is for Windows 2000 etc, and is not just related to Windows CE (Uwe retracted that claim in a private email). So, the suggestion to refine the note in ?seek stands. Perhaps src/library/base/man/seek.Rd could be changed as follows: OLD: #ifdef windows The value returned by \code{seek(where=NA)} appears to be unreliable on Windows systems, at least for text files. Clipboard connections can seek too. #endif NEW: #ifdef windows The value returned by \code{seek()} is unreliable on Windows systems for text files. This is because the Windows file-I/O functions can insert extra characters at end-of-lines when working with text mode files. Binary mode files should not be affected by this issue. Clipboard connections can seek too. #endif Of course, if someone knows that the return value of seek() is unreliable on Windows for binary files, this documentation change is innappropriate (and then the documentation should probably be changed from "appears to be unreliable, at least for text files" to "is unreliable, for both binary and text files". -- Tony Plate __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel