Re: [Rd] Understanding the sequence of events when calling the R dpois function

2018-05-31 Thread Greg Minshall
Jason,

as Chuck Berry (to whom, *thanks* for 'do {...} while(0)'!) suggested,
using grep, or even grep executed from find, such as

find . -type f -exec grep -H "dpois" \{\} \; | less

(executed from the root of an R source tree), is your friend.

cheers, Greg

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()

2018-06-12 Thread Greg Minshall
Martin, et al.,

> I think we should allow 'year' to be "double" instead, and so it
> could also be +Inf or -Inf and we'd nicely cover 
> the conversions from and to 'numeric' -- which is really used
> internally for dates and date-times in  POSIXct.

storing years as a double makes me worry slightly about

> year <- 1e50
> (year+1)-year
[1] 0

which is not how one thinks of years (or integers) as behaving.

cheers, Greg

ps -- sorry for the ">" overloading!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()

2018-06-13 Thread Greg Minshall
Gabe,

> Also, I would expect the year 1e50 and the "year" Inf to be functionally
> equivalent in meaning (and largely meaningless) in context.

indeed.

thanks, Greg

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] creating a package from a development source tree

2014-07-01 Thread Greg Minshall
hi.  this is sort of a software methodology question.

i'm working on developing a package (with C source code).  developing
it, i use autotools, git, make, and such like.  as a result, there are
random files in my source code directory that "R CMD check" would rather
not see.  some of these files (such as .git and friends) i could arrange
to move to "just above" my development subtree; others of these (such as
configure.ac and the random crud autoreconf(1) and friends leave laying
around) can't be moved (afaik).

i'm curious how people move from their personal development tree to a
tree from which the package can be passed to "R CMD *".  do "you" have a
makefile in their tree that creates this?  during development, is this
the path you use for building and testing?

cheers, Greg Minshall

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] creating a package from a development source tree

2014-07-01 Thread Greg Minshall
thanks!  that, as did an off-list reply, gave me the clue i needed.
cheers.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] DESCRIPTION.in file causes R CMD check to fail?

2014-07-04 Thread Greg Minshall
hi.  i'm building a package using autotools.  to propagate the package
version number from configure.ac to DESCRIPTION, i'm using a
DESCRIPTION.in file.  both of these files are "shar"'d below.

i need to distribute the DESCRIPTION.in file, as ./configure will need
it.  but, "R CMD check" wants to look at DESCRIPTION, so i've let that
also come into the package tarball.

however, when i run "R CMD check" with DESCRIPTION.in in the tree, it
fails:

bash greg-minshalls-mbp: {3359} R CMD check image2k_0.1.tar.gz 
* using log directory ‘/Users/minshall/src/mine/image2k/buildit/image2k.Rcheck’
* using R version 3.0.3 (2014-03-06)
* using platform: x86_64-apple-darwin13.1.0 (64-bit)
* using session charset: UTF-8
Error in if (desc["Priority"] == "base") { : 
  missing value where TRUE/FALSE needed
Execution halted


(whereas if i remove DESCRIPTION.in and recreate the tar file,
everything is good.)

are there any thoughts on what might be the problem?

cheers, Greg Minshall

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#   DESCRIPTION
#   DESCRIPTION.in
#
echo x - DESCRIPTION
sed 's/^X//' >DESCRIPTION << 'END-of-DESCRIPTION'
XPackage: image2k
XType: Package
XTitle: interface between pixmap and Imlib2 and/or MagickWand
XVersion: 0.1
XDate: 2014-06-24
XAuthor: Greg Minshall
XMaintainer: Greg Minshall 
XDescription: image2k creates pixmaps from any image file that is
X  supported by the Imlib2 and/or MagickWand libraries (assuming one or
X  both of these libraries are available on your machine).  similarly,
X  given a pixmap, image2k can write that to an image file using either
X  Imlib2 and/or MagickWand.  (MagickWand is an optional library from
X  ImageMagick.)
XLicense: MIT + file LICENSE
XDepends: pixmap
END-of-DESCRIPTION
echo x - DESCRIPTION.in
sed 's/^X//' >DESCRIPTION.in << 'END-of-DESCRIPTION.in'
XPackage: image2k
XType: Package
XTitle: interface between pixmap and Imlib2 and/or MagickWand
XVersion: @VERSION@
XDate: 2014-06-24
XAuthor: Greg Minshall
XMaintainer: Greg Minshall 
XDescription: image2k creates pixmaps from any image file that is
X  supported by the Imlib2 and/or MagickWand libraries (assuming one or
X  both of these libraries are available on your machine).  similarly,
X  given a pixmap, image2k can write that to an image file using either
X  Imlib2 and/or MagickWand.  (MagickWand is an optional library from
X  ImageMagick.)
XLicense: MIT + file LICENSE
XDepends: pixmap
END-of-DESCRIPTION.in
exit

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] DESCRIPTION.in file causes R CMD check to fail?

2014-07-04 Thread Greg Minshall
hi, Duncan,

thanks for the reply, and the pointer to the XML package.

> I don't understand why configure needs access to DESCRIPTION.in. What
> is it reading there?

actually, ./configure is setting the version number in DESCRIPTION,
using DESCRIPTION.in as a template.  in configure.ac, i have:

AC_INIT([image2k],[0.1])

which says that i'm building the "image2k" package, with version 0.1.

my DESCRIPTION.in file has a line

Version: @VERSION@


back in configure.ac, i tell autoconf to do substitutions in
DESCRIPTION.in to create DESCRIPTION (among other files):

AC_CONFIG_FILES([
DESCRIPTION
Makefile
src/Makefile
])


so, ./configure will copy DESCRIPTION.in to DESCRIPTION, but will
substitute its idea of the version.  (i'm always a big fan of second
normal form.)

i looked at XML's package.  thanks, i'm new to the autotools world, so
it's good to be able to look at other examples, especially when used
with R (at which i'm also not so proficient, ignorance squared).

it *looks* like my problem probably comes from some code in

./src/library/tools/R/check.R

from ./src/library/tools/R/check.R:
## Package sources from the R distribution are special.  They
## have a 'DESCRIPTION.in' file (instead of 'DESCRIPTION'),
## with Version and License fields containing '@VERSION@' for
## substitution by configure.  Earlier bundles had packages
## containing DESCRIPTIION.in, hence the extra check for
## Makefile.in.

is_base_pkg <- is_rec_pkg <- FALSE
if (file.exists(f <- file.path(pkgdir, "DESCRIPTION.in")) &&
file.exists(file.path(pkgdir, "Makefile.in"))) {
desc <- try(read.dcf(f))
if (inherits(desc, "try-error") || !length(desc)) {
resultLog(Log, "EXISTS but not correct format")
do_exit(1L)
}
desc <- desc[1L, ]
if (desc["Priority"] == "base") {
messageLog(Log, "looks like ", sQuote(pkgname0),
   " is a base package")
messageLog(Log, "skipping installation test")
is_base_pkg <- TRUE
pkgname <- desc["Package"] # should be same as pkgname0
}
}

(i'm looking at R-3.1.0.)  XML doesn't run into this because although it
has a DESCRIPTION.in, it does *not* have Makefile.in, so the suspect
code isn't run.

it seems like maybe something like the below patch might fix my
problem.  (but, it also seems like adding a "Priority: other" should,
and indeed does, fix my problem.)

sorry if this was overly wordy.

cheers, Greg


--- orig-check.R2014-03-29 01:15:03.0 +0200
+++ new-check.R 2014-07-05 08:22:01.0 +0300
@@ -4270,7 +4270,7 @@
 do_exit(1L)
 }
 desc <- desc[1L, ]
-if (desc["Priority"] == "base") {
+if ((!is.na(desc["Priority"]) && desc["Priority"] == "base")) {
 messageLog(Log, "looks like ", sQuote(pkgname0),
" is a base package")
 messageLog(Log, "skipping installation test")


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] DESCRIPTION.in file causes R CMD check to fail?

2014-07-05 Thread Greg Minshall
Duncan,

> That looks like a good fix in any case.  I'll put it in.  (It's too
> late to make it into 3.1.1, but I'll try to remember to backport it to
> R-patched after the release.)

great -- thanks!

cheers, Greg Minshall

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] request for "minor" fix to src/library/tools/QC.r

2014-07-06 Thread Greg Minshall
hi.  i was hoping to automatically set the Version: number in my package
description file (image2k-package.Rd) in the man directory by using a
.in file in the same directory.  "R CMD build" excludes the .in file
because the routine .check_package_subdirs() in QC.R (which build calls)
doesn't like that file.

.check_package_subdirs() also checks files in R/, and at *that* point in
the code says "now configure might generate files in this directory" and
explicitly allows any files that end in .in.  i wonder if that code
could be duplicated for the case of files in man/?  a diff to that
effect (just copying three lines from the immediately prior case for R/)
is provided below.

(as a check as to the viability of this, i manually added my
man/image2k-package.in file to the tarball built by build, and both
check and install seemed to survive without any problems.  a sample size
of 1 is *always* my favorite. :)

cheers, Greg Minshall

--- src/library/tools/R/QC.r2014-03-25 01:15:06.0 +0200
+++ src/library/tools/R/mod-QC.r2014-07-06 14:05:46.0 +0300
@@ -4372,6 +4372,9 @@
   full.names = FALSE,
   OS_subdirs = OS_subdirs)
 wrong <- setdiff(all_files, man_files)
+## now configure might generate files in this directory
+generated <- grep("\\.in$", wrong)
+if(length(generated)) wrong <- wrong[-generated]
 if(length(wrong)) {
 wrong_things$man <- wrong
 if(doDelete) unlink(file.path(dir, "man", wrong))

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] arguments to .Call(), .External should be read-only?

2014-07-08 Thread Greg Minshall
hi.  if i'm reading correctly, "Writing R Extensions" appears to be
inconsistent on the question of whether the arguments passed to a
routine called via .Call() or .External() should considered read-only.
section 5.2, "Interface functions .C and .Fortran", says

However, when character vectors are used other than in a read-only way,
the .Call interface is much to be preferred.


which sort of implies (if one reads optimistically) that using .Call()
(by extension, again optimistically, .External()) one could treat
*character* vectors (and, again, optimistically, numeric, etc., vectors)
in a non-read-only way.

on the other (pessimistic) hand, section 5.9, "Handling R objects in C",
says

Neither .Call nor .External copy their arguments: you should treat
arguments you receive through these interfaces as read-only.


for an application, i'd like to consider these writable.  assuming
sufficient warnings in the documentation, etc., is that permissable?

cheers, Greg Minshall

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] how to list external dependencies (i.e., non-R packages)?

2014-07-13 Thread Greg Minshall
hi.  i'm working on a package which only works if one (or both) of two
libraries (Imlib2 and MagickWand) exist on the machine on which the
package is compiled and executed.  as currently written, the program
purposely generates an error at *compile* time if neither library is
available (thinking the earlier the user is notified, the less
frustrating).

is there a way of specifying this dependency in, say, the DESCRIPTION
file?  (seems unlikely, as this is what the whole autotools/configure.ac
framework is supposed to deal with.)

or, should i take out the compile time error (maybe replace my #error
with a #warning), and just generate at error at run time?  (this will
require me to figure out what that means for build tests/, but obviously
some sort of hack is doable.)

cheers, Greg

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to list external dependencies (i.e., non-R packages)?

2014-07-18 Thread Greg Minshall
thanks for all the replies on this.

i've modified my package to include a SystemRequirements field with
(what seem to be) the appropriate entries.  (see (**) below for how one
might standardize.)

SystemRequirements: libImlib2, libMagickWand


Michael Lawrence mentioned:
> One idea is to wrap system libraries in R packages.

that's not something i would tackle.  both libraries are very large,
maybe 100-200 functions each, with lots of parameters.  and, especially
since my use, at least, uses 3 or 4 functions from each library.

a question: what happens when i submit this package to CRAN?  in
general, builds will fail, unless the build machine happens to have one
or both of the underlying libraries on it.  (and, if i made it *compile*
successfully -- fairly easy to do -- then the tests will fail or, if the
tests *don't* fail, what were they testing?? :)

cheers, Greg Minshall

---
(**) ignore below unless you are really interested in something like
standardizing external dependencies:

if i were to do something along the lines of something suggested in 2011

[1] https://stat.ethz.ch/pipermail/r-devel/2011-February/059787.html

i'd probably start naming libraries in an R way, friendly somehow with
the ultimate source of the library.  then (obeying whatever law of
computer science it is that says all problems can be solved with an
extra level of indirection), allow a distro (if that's the term)
specific mapping file to be maintained by some R*distro person, that
would map upstream library/version-derived R way to distro
rpm/deb/whatever names/numbers.  but, that's what i'd do in some
hypothetical universe.

then, if a package developer needed an already-defined external
dependency, they would just list that in (some field in) DESCRIPTION,
with an appropriate version number.

otoh, if the package developer's dependency is *not* already defined,
the pacakge developer would need to do the R-incantation (e-mail to
[Rd], whatever) to get it defined to R, and *then* list it in
DESCRIPTION.  however, the R*distro people will then need to figure out
the mapping from that dependency to their naming/numbering scheme before
what the external dependency in *that package's* DESCRIPTION file will
help much.  (otoh, CRAN will see the dependency, and may figure out an
initial mapping, or the package developer may even submit one or more
mappings from the R-name/version to some select set of distros.)

sorry, that's fairly terse.  i'd be happy to elaborate, but i don't know
there's an audience.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] pipe(): input to, and output from, a single process

2020-03-16 Thread Greg Minshall
hi.  i'd like to instantiate sed(1), send it some input, and retrieve
its output, all via pipes (rather than an intermediate file).

my sense from pipe and looking at the sources (sys-unix.c) is that is
not possible.  is that true?  are there any thoughts of providing such a
facility?

cheers, Greg

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] pipe(): input to, and output from, a single process

2020-03-17 Thread Greg Minshall
Simon,

> FWIW if you're on unix, you can use named pipes (fifos) for that:

i've always wondered what named pipes actually were.  thanks!

cheers, Greg

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] pipe(): input to, and output from, a single process

2020-03-17 Thread Greg Minshall
Gabor, thanks.  yes, managing the two-way communication is always a bit
error-prone, as it depends on the input/output characteristics of the
two ends -- they either match, or deadlock.  it's too bad if polling is
always *required* -- i'd think sometimes a programmer would be happy
blocking, though other times one wants better control over when to
block.  cheers, Greg

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] pipe(): input to, and output from, a single process

2020-03-17 Thread Greg Minshall
Dirk,

> Octave had this already in the 1990s, see documentation for 'popen2' here:

thanks.  unix that had since the 1970s... :)

cheers, Greg

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-23 Thread Greg Minshall
+1

Avi Gross via R-devel  wrote:

> Arguably, R was not developed to satisfy some needs in the way intended.
> 
> When I have had to work with datasets from some of the social sciences I have 
> had to adapt to subtleties in how they did things with software like SPSS in 
> which an NA was done using an out of bounds marker like 999 or "." or even a 
> blank cell. The problem is that R has a concept where data such as integers 
> or floating point numbers is not stored as text normally but in their own 
> formats and a vector by definition can only contain ONE data type. So the 
> various forms of NA as well as Nan and Inf had to be grafted on to be 
> considered VALID to share the same storage area as if they sort of were an 
> integer or floating point number or text or whatever.
> 
> It does strike me as possible to simply have a column that is something like 
> a factor that can contain as many NA excuses as you wish such as "NOT 
> ANSWERED" to "CANNOT READ THE SQUIGLE" to "NOT SURE" to "WILL BE FILLED IN 
> LATER" to "I DON'T SPEAK ENGLISH AND CANNOT ANSWER STUPID QUESTIONS". This 
> additional column would presumably only have content when the other column 
> has an NA. Your queries and other changes would work on something like a 
> data.frame where both such columns coexisted.
> 
> Note reading in data with multiple NA reasons may take extra work. If your 
> errors codes are text, it will all become text. If the errors are 999 and 998 
> and 997, it may all be treated as numeric and you may not want to convert all 
> such codes to an NA immediately. Rather, you would use the first 
> vector/column to make the second vector and THEN replace everything that 
> should be an NA with an actual NA and reparse the entire vector to become 
> properly numeric unless you like working with text and will convert to 
> numbers as needed on the fly.
> 
> Now this form of annotation may not be pleasing but I suggest that an 
> implementation that does allow annotation may use up space too. Of course, if 
> your NA values are rare and space is only used then, you might save space. 
> But if you could make a factor column and have it use the smallest int it can 
> get as a basis, it may be a way to save on space.
> 
> People who have done work with R, especially those using the tidyverse, are 
> quite used to using one column to explain another. So if you are asked to say 
> tabulate what percent of missing values are due to reasons A/B/C then the 
> added columns works fine for that calculation too.
> 
> 
> -Original Message-
> From: R-devel  On Behalf Of Adrian Du?a
> Sent: Sunday, May 23, 2021 2:04 PM
> To: Tomas Kalibera 
> Cc: r-devel 
> Subject: Re: [Rd] 1954 from NA
> 
> Dear Tomas,
> 
> I understand that perfectly, but that is fine.
> The payload is not going to be used in any computations anyways, it is 
> strictly an information carrier that differentiates between different types 
> of (tagged) NA values.
> 
> Having only one NA value in R is extremely limiting for the social sciences, 
> where multiple missing values may exist, because respondents:
> - did not know what to respond, or
> - did not want to respond, or perhaps
> - the question did not apply in a given situation etc.
> 
> All of these need to be captured, stored, and most importantly treated as if 
> they would be regular missing values. Whether the payload might be lost in 
> computations makes no difference: they were supposed to be "missing values" 
> anyways.
> 
> The original question is how the payload is currently stored: as an unsigned 
> int of 32 bits, or as an unsigned short of 16 bits. If the R internals would 
> not be affected (and I see no reason why they would be), it would allow an 
> entire universe for the social sciences that is not currently available and 
> which all other major statistical packages do offer.
> 
> Thank you very much, your attention is greatly appreciated, Adrian
> 
> On Sun, May 23, 2021 at 7:59 PM Tomas Kalibera 
> wrote:
> 
> > TLDR: tagging R NAs is not possible.
> >
> > External software should not depend on how R currently implements NA, 
> > this may change at any time. Tagging of NA is not supported in R (if 
> > it were, it would have been documented). It would not be possible to 
> > implement such tagging reliably with the current implementation of NA in R.
> >
> > NaN payload propagation is not standardized. Compilers are free to and 
> > do optimize code not preserving/achieving any specific propagation.
> > CPUs/FPUs differ in how they propagate in binary operations, some zero 
> > the payload on any operation. Virtualized environments, binary 
> > translations, etc, may not preserve it in any way, either. ?NA has 
> > disclaimers about this, an NA may become NaN (payload lost) even in 
> > unary operations and also in binary operations not involving other NaN/NAs.
> >
> > Writing any new software that would depend on that anything specific 
> > happens to the NaN payloads would 

Re: [Rd] 1954 from NA

2021-05-24 Thread Greg Minshall
Adrian,

> If it was only one column then your solution is neat. But with 5-600
> variables, each of which can contain multiple missing values, to
> double this number of variables just to describe NA values seems to me
> excessive.  Not to mention we should be able to quickly convert /
> import / export from one software package to another. This would imply
> maintaining some sort of metadata reference of which explanatory
> additional factor describes which original variable.

one thing *i* should keep in mind is the old saying: "The difference
between theory and practice is that in theory there is no difference,
but in practice, there is."

but, in theory:

if you have 500 columns of possibly-NA'd variables, you could have one
column of 500 "bits", where each bit has one of N values, N being the
number of explanations the corresponding column has for why the NA
exists.

i guess the CS'y thing that comes to my mind here is that one thing is
the *semantics* of what you are trying to convey, and the other is how
those semantics are *encoded* in whatever representation you are using.

cheers, Greg

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: 1954 from NA

2021-05-24 Thread Greg Minshall
luke,

> PLEASE DO NOT DO THIS!

very happy to withdraw my offered alternative!

cheers, Greg

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S3 dispatch does not work for generics defined inside an environment

2021-06-30 Thread Greg Minshall
Taras,

> P.S. If you are wondering what I am trying to achieve here — we have a
> very large codebase and I am trying to use environments as a type of
> “poor man’s namespaces” to organize code in a modular fashion. But of
> course it’s all pointless if I can’t get the generics to work
> reliably.

i'm not knowledgeable about S3.  but, a different way to try to
modularize large code bases is to split them into separate packages.
just in case you hadn't already thought about, and rejected, that idea.

cheers, Greg

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S3 dispatch does not work for generics defined inside an environment

2021-07-01 Thread Greg Minshall
Taras,
> That was my original plan as well, but managing and deploying dozens
> of little packages that are all under active development is a
> nightmare even with devtools. Just too much overhead, not to mention
> that coming up with names that would not have namespace conflicts was
> getting silly.

i can imagine.

> In the end, I wrote a package that implements lightweight python-like
> modules for R and that has really improved my workflow. I hope to
> publish this package later this year after I have cleaned it up a bit.

cool -- good luck with it.

Greg

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] order of operations

2021-08-31 Thread Greg Minshall
Gabor Grothendieck  wrote:

> ... and maybe not having a guarantee would simplify implementation?

+1 for: "The results of such statements are not defined.", or something
to that effect.  (Erasmus had something to say here. :)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel