Re: [Rd] binary string conversion to a vector (PR#14120)

2009-12-12 Thread tplate
Just responding to some of the issues in this long post:

(1) Don't rely on the printed form of an object to decide whether or not 
they are identical.  The function str() is very useful in this regard, 
and sometimes also unclass().  To see whether two object are identical, 
use the function identical()

 > qvector <- c("0", "0", "0", "1", "1", "0", "1")
 > qvector[1]
[1] "0"
 > noquote(qvector[1])
[1] 0
 > str(noquote(qvector[1]))
Class 'noquote'  chr "0"
 > as.integer(qvector[1])
[1] 0
 > str(as.integer(qvector[1]))
 int 0
 > identical(noquote(qvector[1]), as.integer(qvector[1]))
[1] FALSE
 >

Does this alleviate the concern as to the possibility of a bug in 
noquote/as.integer? Or were there deeper issues?

(2) to see how some other users of R have package up miscellaneous 
functions that might be of use to other people, look for packages on 
CRAN with "misc" in their names -- I see almost 10 of them.  The problem 
with just posting snippets of code is that they get lost in all the 
other posts here, and many long term R users have dozens if not hundreds 
of their own functions that are streamlined for their own frequent tasks 
and style of programming.

(3) sounds like a great idea to use R to bring statistical rigor into 
the analysis of the performance of combinatorial optimization algorithms!

(4) install.packages("stringr") works fine for me.  Maybe it was a 
temporary glitch?  Have you checked whether you have a valid repository 
selected?  E.g., I have in my .Rprofile:
options(repos=c(CRAN="http://cran.cnr.Berkeley.edu"; , 
CRANextra="http://www.stats.ox.ac.uk/pub/RWin";))

Enjoy learning R!

-- Tony Plate

Franc Brglez wrote:
> Hello!
>  
> Please accept my sincere apologies for annoying the R development team with 
> my post this week. If I were required to register as "a developer" before 
> submission, this would not have happened. To rehabilitate myself, please find 
> at the bottom of this mail two R-functions, 'string2vector' and 
> 'vector2string', with "comments and tests". Both functions may go a long way 
> towards assisting a number of R-users to make their R-programming more 
> productive. I am a novice R-programmer: I started dabbling in R less than two 
> months ago, heavily influenced by examples of code I see, including within 
> the R.org documents (monkey does what monkey sees). Before posting two 
> functions, I would really appreciate constructive edits where they may be 
> needed as well as their posting by someone-in-the-know so there will be 
> conveniently accessible for R users.
>
> I am very impressed with potential of R and the community supporting it. I 
> just wish I got to R sooner: I am looking to R to better support my work in 
> "designed experiments to assess the statistically significant performance of 
> combinatorial optimization algorithms on instance isomorphs of NP-hard 
> problems" -- for better context of this mouthful, see the few postings under
>   http://www.cbl.ncsu.edu:16080/xBed/publications/
> I am working on a tutorial paper where I expect R to play a significant role 
> in better explaining and illustrating, code-wise and graphically, the 
> concepts discussed in the publications above. I would welcome a co-author 
> with experience in R-programming as well as statistics and interests in the 
> experimental methods addressed in these publications.
>
> As I elaborate in notes that follow, I was looking at a variety of 
> "R-documents" before my "bug" submission. I would appreciate very much if 
> some of you could take the time to scan through these notes and respond 
> briefly with useful pointers. Here are the headlines:
>
> (1) why I still think there may be a bug with 'noquote' vs 'as.integer'
>
> (2) search on "split string" and "join string"; the missing package 
> "stringr"
>
> (3) a take on "Tcl" commands 'split', 'join', 'string', 'append', 
> 'foreach'
>
> (4) a take on "R" functions 'string2vector' and 'vector2string'
>
> (5) code and comments for "R" functions 'string2vector' and 'vector2string
>
> (1) why I still think there may be a bug with 'noquote' vs 'as.integer'
> 
>   
>> # MacOSX 10.6.2, R 2.9.1 GUI 1.28 Tiger build 32-bit (5444)
>> qvector
>> 
> [1] "0" "0" "0" "1" "1" "0" "1"
>   
>> qvector[1]
>> 
> [1] "0"
>   
>> tmp = noquote(qvector[1])
>> tmp
>> 
> [1] 0
>   
>> tmp = as.integer(qvector[1])
>> tmp
>> 
> [1] 0
>   
> When embedded in the function as per my "bug" report, 'noquote' and 
> 'as.integer' are no longer equivalent whereas in the example above they 
> appear to be equivalent!! I submitted the "function" with print/cat 
> statements for sake of illustration.
>
> (2) search on "split string" and "join string"; the missing package "stringr"
> 
> http://search.r-project.org/ reveals
>orderof 850 messages for search o

[Rd] daylight saving / time zone issues with as.POSIXlt/as.POSIXct (PR#10392)

2007-11-01 Thread tplate
Running under Windows XP 64 bit, as.POSIXlt()/as.POSIXct() seem
to think that US time zones (EST5EDT, MST7MDT) switched from daylight
savings back to standard time on Oct 28, 2007, whereas the switch
is actually on Sun Nov 04, 2007.

Examples:

 > Sys.timezone()
[1] "Mountain Daylight Time"
 > as.POSIXct("2007-10-30 12:38:47")
[1] "2007-10-30 12:38:47 Mountain Daylight Time"
 > # *** Should report 2007-10-30 14:38:47 EDT:
 > as.POSIXlt(as.POSIXct("2007-10-30 12:38:47"), "EST5EDT")
[1] "2007-10-30 13:38:47 EST"
 > Sys.time()
[1] "2007-11-01 09:22:28 Mountain Daylight Time"

 > # Bad behavior is manifested in different ways with TZ="MST7MDT"
 > Sys.setenv(TZ="MST7MDT")
 > # *** Should report "12:38:47 MDT"
 > as.POSIXct("2007-10-30 12:38:47")
[1] "2007-10-30 12:38:47 MST"
 > as.POSIXlt(as.POSIXct("2007-10-30 12:38:47"), "EST5EDT")
[1] "2007-10-30 14:38:47 EST"
 > # *** Should report "2007-11-01 09:23:09 MDT"
 > Sys.time()
[1] "2007-11-01 08:23:09 MST"
 >
 > sessionInfo()
R version 2.6.0 Patched (2007-10-11 r43143)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
 >


Furthermore, with the timezone "Mountain Daylight Time"
(which is the default I get when I start R), the switch
appears to be on Nov 5 in 2006, whereas it actually was
on Oct 29 in 2006.

 > # New R session
 > Sys.timezone()
[1] "Mountain Daylight Time"
 > # *** wrong switch in 2006 ***
 > as.POSIXct("2006-10-30 12:38:47")+(-4:7)*(24*3600)
  [1] "2006-10-26 12:38:47 Mountain Daylight Time"
  [2] "2006-10-27 12:38:47 Mountain Daylight Time"
  [3] "2006-10-28 12:38:47 Mountain Daylight Time"
  [4] "2006-10-29 12:38:47 Mountain Daylight Time"
  [5] "2006-10-30 12:38:47 Mountain Daylight Time"
  [6] "2006-10-31 12:38:47 Mountain Daylight Time"
  [7] "2006-11-01 12:38:47 Mountain Daylight Time"
  [8] "2006-11-02 12:38:47 Mountain Daylight Time"
  [9] "2006-11-03 12:38:47 Mountain Daylight Time"
[10] "2006-11-04 12:38:47 Mountain Daylight Time"
[11] "2006-11-05 11:38:47 Mountain Standard Time"
[12] "2006-11-06 11:38:47 Mountain Standard Time"
 > as.POSIXct("2007-10-30 12:38:47")+(-4:7)*(24*3600)
  [1] "2007-10-26 12:38:47 Mountain Daylight Time"
  [2] "2007-10-27 12:38:47 Mountain Daylight Time"
  [3] "2007-10-28 12:38:47 Mountain Daylight Time"
  [4] "2007-10-29 12:38:47 Mountain Daylight Time"
  [5] "2007-10-30 12:38:47 Mountain Daylight Time"
  [6] "2007-10-31 12:38:47 Mountain Daylight Time"
  [7] "2007-11-01 12:38:47 Mountain Daylight Time"
  [8] "2007-11-02 12:38:47 Mountain Daylight Time"
  [9] "2007-11-03 12:38:47 Mountain Daylight Time"
[10] "2007-11-04 11:38:47 Mountain Standard Time"
[11] "2007-11-05 11:38:47 Mountain Standard Time"
[12] "2007-11-06 11:38:47 Mountain Standard Time"
 > Sys.setenv(TZ="MST7MDT")
 > Sys.timezone()
[1] "MST"
 > as.POSIXct("2006-10-30 12:38:47")+(-4:7)*(24*3600)
  [1] "2006-10-26 13:38:47 MDT" "2006-10-27 13:38:47 MDT"
  [3] "2006-10-28 13:38:47 MDT" "2006-10-29 12:38:47 MST"
  [5] "2006-10-30 12:38:47 MST" "2006-10-31 12:38:47 MST"
  [7] "2006-11-01 12:38:47 MST" "2006-11-02 12:38:47 MST"
  [9] "2006-11-03 12:38:47 MST" "2006-11-04 12:38:47 MST"
[11] "2006-11-05 12:38:47 MST" "2006-11-06 12:38:47 MST"
 > # *** wrong switch in 2007 ***
 > as.POSIXct("2007-10-30 12:38:47")+(-4:7)*(24*3600)
  [1] "2007-10-26 13:38:47 MDT" "2007-10-27 13:38:47 MDT"
  [3] "2007-10-28 12:38:47 MST" "2007-10-29 12:38:47 MST"
  [5] "2007-10-30 12:38:47 MST" "2007-10-31 12:38:47 MST"
  [7] "2007-11-01 12:38:47 MST" "2007-11-02 12:38:47 MST"
  [9] "2007-11-03 12:38:47 MST" "2007-11-04 12:38:47 MST"
[11] "2007-11-05 12:38:47 MST" "2007-11-06 12:38:47 MST"
 >

I see this behavior on all the Windows systems I have tried:
Windows XP 64 bit, Windows XP 32 bit Pro, Windows XP home,
Windows 2000, with a variety of R versions.  The systems
have all relevant Windows updates applied (unless some were
inadvertently missed) and the systems otherwise appear to
behave correctly with respect to times and timezones.

I do not see this problem on Ubuntu Linux systems.

-- Tony Plate

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] minor omissions re "special" windows file names (PR#9622)

2007-04-18 Thread tplate
"lpt1" through "lpt9" and "com1" through "com9"  are in fact "special" 
file names under various versions of Microsoft Windows.  However, "R 
Extensions" says only that "lpt1" through "lpt4" and "com1" through 
"com3" cannot be used as filenames, and "R CMD check" checks for just 
these filenames.  Consequently, "R CMD check" can let through filenames 
that are disallowed under Windows.

For references, see http://en.wikipedia.org/wiki/Filename and 
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/naming_a_file.asp
(FWIW, note however that Wikipedia *may* be mistaken in claiming that 
"com0" and "lpt0" are special file names -- these are not listed on the 
Microsoft page, and I verified that they can be used as normal file 
names on my Windows XP machine at least.)

Accordingly, the following changes might be desired:

In src/scripts/check.in:
OLD LINE:
  if(grep(/^(con|prn|aux|clock\$|nul|lpt[1-3]|com[1-4])$/,
NEW LINE:
 if(grep(/^(con|prn|aux|clock\$|nul|lpt[1-9]|com[1-9])$/,
ALSO: fix comment at beginning of check.in (search for "com1")

In doc/manuals/R-exts.texi, I would suggest changing
-
names.  In addition, files with names @samp{con}, @samp{prn},
@samp{aux}, @samp{clock$}, @samp{nul}, @samp{com1} to @samp{com4}, and
@samp{lpt1} to @samp{lpt3} after conversion to lower case and stripping
possible ``extensions'', are disallowed.  Also, file names in the same
-
to
-
names.  In addition, files with names @samp{con}, @samp{prn},
@samp{aux}, @samp{clock$}, @samp{nul}, @samp{com1} to @samp{com9}, and
@samp{lpt1} to @samp{lpt9} after conversion to lower case and stripping
possible ``extensions'', are disallowed
(e.g., @samp{lpt5.foo.bar} is disallowed.)  Also, file names in the same
-

(with the example added to clarify what "stripping possible extensions" 
means).

Obviously, not a high-priority error.

-- Tony Plate

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] bug in format.default: trim=TRUE does not always work as advertised (PR#9114)

2006-07-31 Thread tplate
This is a multi-part message in MIME format.
--090008060607010208040805
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

DESCRIPTION OF PROBLEM:

Output from format.default sometimes has whitespace around it when using
big.mark="," and trim=TRUE.  E.g.:

  > # works ok as long as big.mark is not actually used:
  > format(c(-1,1,10,999), big.mark=",", trim=TRUE)
[1] "-1"  "1"   "10"  "999"
  > # but if big.mark is used, output is justified and not trimmed:
  > format(c(-1,1,10,999,1e6), big.mark=",", trim=TRUE)
[1] "   -1" "1" "   10" "  999" "1,000,000"
  >

The documentation for the argument 'trim' to format.default() states:
 trim: logical; if 'FALSE', logical, numeric and complex values are
   right-justified to a common width: if 'TRUE' the leading
   blanks for justification are suppressed.

Thus, the above behavior of including blanks for justification when 
trim=FALSE (in some situations) seems to contradict the documentation.

PROPOSED FIX:

The last call to prettyNum() in format.default() 
(src/library/base/R/format.R) has the argument

preserve.width = "common"

If this is changed to

preserve.width = if (trim) "individual" else "common"

then output is formatted correctly in the case above.

A patch for this one line is attached to this message (patch done 
against the released R-2.3.1 source tarball (2006/06/01), the format.R 
file in this release is not different to the one in the current snapshot 
of the devel version of R).  After making these changes, I ran "make 
check-all".  I did not see any tests that seemed to  break with these 
changes.

-- Tony Plate


--090008060607010208040805
Content-Type: text/plain;
 name="format.default.fixes.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="format.default.fixes.diff"

--- R-2.3.1-orig/src/library/base/R/format.R2006-04-09 16:19:19.0 
-0600
+++ R-2.3.1/src/library/base/R/format.R 2006-07-26 15:52:42.117456700 -0600
@@ -37,7 +37,7 @@
 small.interval = small.interval,
 decimal.mark = decimal.mark,
 zero.print = zero.print,
-preserve.width = "common")
+ preserve.width = if (trim) "individual" else "common")
   )
 }
 }

--090008060607010208040805--

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (PR#7943) documentation enhancement: Note in ?seek for Windows

2005-06-15 Thread tplate
Fair enough.  But for the benefit of the unfortunate souls having to 
work in Windows, it would be nice if the documentation were as explicit 
as possible about what is and is not known about particular issues 
(while still being concise).  How about the following:

NEW:

#ifdef windows
   The value returned by \code{seek()} is known to be unreliable
   on Windows systems for text mode files.  The Windows documentation
   states that the return values from Windows OS seek functions for
   text mode files are unreliable (because the Windows file-I/O
   functions can insert extra characters at end-of-lines when working
   with text mode files.) Binary mode files should not be affected by
   this particular issue, but there are known problems on Windows
   systems with the reliability of the return value from \code{seek()}
   for binary mode files opened in append mode.  Clipboard connections
   can seek too.
#endif

Tony Plate


Prof Brian Ripley wrote:
> I think the proposed change is appropriate only if the return value is 
> *known* to be reliable for binary files.
> 
> I for one do not trust the writers of an OS whom have made such a 
> serious error in one mode (and many other errors elsewhere) not to have 
> made one in closely related code.  Since it is not Open Source, we 
> cannot find out.
> 
> On Wed, 15 Jun 2005 [EMAIL PROTECTED] wrote:
> 
>> [I started a new bug report for this issue because it was not the
>> primary issue in the original discussion, which was PR#7899]
>>
>> [EMAIL PROTECTED] wrote:
>> > Tony Plate wrote:
>> > [snip]
>> >>[EMAIL PROTECTED] wrote:
>> >>[snip]
>> >>>Note that ?seek currently tells us "The value returned by
>> >>>seek(where=NA) appears to be unreliable on Windows systems, at least
>> >>>for text files."
>> >>>It would be nice if this comment could be removed, of course 
>> >>
>> >>
>> >>May the explanation could be given that this happens with text files
>> >>because Windows inserts extra characters at end-of-lines when reading
>> >>"text" mode files (but with binary files, things should be fine.) This
>> >>particular issue is documented in Microsoft Windows documentation 
>> (e.g.,
>> >>at http://msdn2.microsoft.com/library/75yw9bf3(en-us,vs.80).aspx, found
>> >>by searching on Google using the terms "fseek windows documentation").
>> >>Are there any known issues using seek with binary files under Windows?
>> >>If there are not, then the caveat could be made specific to text files
>> >>and all vagueness removed.
>> >
>> >
>> > Hmm, all I find (including your link) is Windows CE related ...
>> >
>> > Uwe Ligges
>>
>> For the record, the documentation I pointed to is for Windows 2000 etc,
>> and is not just related to Windows CE (Uwe retracted that claim in a
>> private email).
>>
>> So, the suggestion to refine the note in ?seek stands.  Perhaps
>> src/library/base/man/seek.Rd could be changed as follows:
>>
>> OLD:
>>
>> #ifdef windows
>>   The value returned by \code{seek(where=NA)} appears to be unreliable
>>   on Windows systems, at least for text files.  Clipboard connections
>>   can seek too.
>> #endif
>>
>> NEW:
>>
>> #ifdef windows
>>   The value returned by \code{seek()} is unreliable
>>   on Windows systems for text files.  This is because the Windows
>>   file-I/O functions can insert extra characters at end-of-lines
>>   when working with text mode files.  Binary mode files should not
>>   be affected by this issue.  Clipboard connections can seek too.
>> #endif
>>
>> Of course, if someone knows that the return value of seek() is
>> unreliable on Windows for binary files, this documentation change is
>> innappropriate (and then the documentation should probably be changed
>> from "appears to be unreliable, at least for text files" to "is
>> unreliable, for both binary and text files".
>>
>> -- Tony Plate
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] documentation enhancement: Note in ?seek for Windows (PR#7943)

2005-06-15 Thread tplate
[I started a new bug report for this issue because it was not the 
primary issue in the original discussion, which was PR#7899]

[EMAIL PROTECTED] wrote:
 > Tony Plate wrote:
 > [snip]
 >>[EMAIL PROTECTED] wrote:
 >>[snip]
 >>>Note that ?seek currently tells us "The value returned by
 >>>seek(where=NA) appears to be unreliable on Windows systems, at least
 >>>for text files."
 >>>It would be nice if this comment could be removed, of course 
 >>
 >>
 >>May the explanation could be given that this happens with text files
 >>because Windows inserts extra characters at end-of-lines when reading
 >>"text" mode files (but with binary files, things should be fine.) This
 >>particular issue is documented in Microsoft Windows documentation (e.g.,
 >>at http://msdn2.microsoft.com/library/75yw9bf3(en-us,vs.80).aspx, found
 >>by searching on Google using the terms "fseek windows documentation").
 >>Are there any known issues using seek with binary files under Windows?
 >>If there are not, then the caveat could be made specific to text files
 >>and all vagueness removed.
 >
 >
 > Hmm, all I find (including your link) is Windows CE related ...
 >
 > Uwe Ligges

For the record, the documentation I pointed to is for Windows 2000 etc, 
and is not just related to Windows CE (Uwe retracted that claim in a 
private email).

So, the suggestion to refine the note in ?seek stands.  Perhaps 
src/library/base/man/seek.Rd could be changed as follows:

OLD:

#ifdef windows
   The value returned by \code{seek(where=NA)} appears to be unreliable
   on Windows systems, at least for text files.  Clipboard connections
   can seek too.
#endif

NEW:

#ifdef windows
   The value returned by \code{seek()} is unreliable
   on Windows systems for text files.  This is because the Windows
   file-I/O functions can insert extra characters at end-of-lines
   when working with text mode files.  Binary mode files should not
   be affected by this issue.  Clipboard connections can seek too.
#endif

Of course, if someone knows that the return value of seek() is 
unreliable on Windows for binary files, this documentation change is 
innappropriate (and then the documentation should probably be changed 
from "appears to be unreliable, at least for text files" to "is 
unreliable, for both binary and text files".

-- Tony Plate

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel