Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Michael Lawrence
Currently unique() does duplicated() internally and then extracts. One
could make a countUnique that simply counts, rather than allocate the
logical return value of duplicated(). But so much of the cost is in the
hash operation that it probably won't help much, but that might depend on
the sizes of things. The more unique elements, the better it would perform.


On Thu, Jan 8, 2015 at 2:06 PM, Peter Haverty haverty.pe...@gene.com
wrote:

 How about unique them both and compare the lengths?  It's less work,
 especially allocation.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com

 On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard pda...@gmail.com wrote:

  If you look at the definition of %in%, you'll find that it is implemented
  using match, so if we did as you suggest, I give it about three days
 before
  someone suggests to inline the function call... Readability of source
 code
  is not usually our prime concern.
 
  The  idea does have some merit, though.
 
  Apropos, why is there no setcontains()?
 
  -pd
 
   On 06 Jan 2015, at 22:02 , Hervé Pagès hpa...@fredhutch.org wrote:
  
   Hi,
  
   Current implementation:
  
   setequal - function (x, y)
   {
x - as.vector(x)
y - as.vector(y)
all(c(match(x, y, 0L)  0L, match(y, x, 0L)  0L))
   }
  
   First what about replacing 'match(x, y, 0L)  0L' and 'match(y, x, 0L)
 
  0L'
   with 'x %in% y' and 'y %in% x', respectively. They're strictly
   equivalent but the latter form is a lot more readable than the former
   (isn't this the raison d'être of %in%?):
  
   setequal - function (x, y)
   {
x - as.vector(x)
y - as.vector(y)
all(c(x %in% y, y %in% x))
   }
  
   Furthermore, replacing 'all(c(x %in% y, y %in x))' with
   'all(x %in% y)  all(y %in% x)' improves readability even more and,
   more importantly, reduces memory footprint significantly on big vectors
   (e.g. by 15% on integer vectors with 15M elements):
  
   setequal - function (x, y)
   {
x - as.vector(x)
y - as.vector(y)
all(x %in% y)  all(y %in% x)
   }
  
   It also seems to speed up things a little bit (not in a significant
   way though).
  
   Cheers,
   H.
  
   --
   Hervé Pagès
  
   Program in Computational Biology
   Division of Public Health Sciences
   Fred Hutchinson Cancer Research Center
   1100 Fairview Ave. N, M1-B514
   P.O. Box 19024
   Seattle, WA 98109-1024
  
   E-mail: hpa...@fredhutch.org
   Phone:  (206) 667-5791
   Fax:(206) 667-1319
  
   __
   R-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-devel
 
  --
  Peter Dalgaard, Professor,
  Center for Statistics, Copenhagen Business School
  Solbjerg Plads 3, 2000 Frederiksberg, Denmark
  Phone: (+45)38153501
  Email: pd@cbs.dk  Priv: pda...@gmail.com
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 

 [[alternative HTML version deleted]]


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Peter Haverty
I was thinking something like:

setequal - function(x,y) {
xu = unique(x)
yu = unique(y)
if (length(xu) != length(yu)) { return FALSE; }
return (all( match( xu, yu, 0L )  0L ) )
}

This lets you fail early for cheap (skipping the allocation from the
0Ls).  Whether or not this goes fast depends a lot on the uniqueness of
x and y and whether or not you want to optimize for the TRUE or FALSE case.
You'd do much better to make some real hashes in C and compare the keys,
but it's probably not worth the complexity.




Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Thu, Jan 8, 2015 at 2:06 PM, Peter Haverty phave...@gene.com wrote:

 How about unique them both and compare the lengths?  It's less work,
 especially allocation.



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com

 On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard pda...@gmail.com wrote:

 If you look at the definition of %in%, you'll find that it is implemented
 using match, so if we did as you suggest, I give it about three days before
 someone suggests to inline the function call... Readability of source code
 is not usually our prime concern.

 The  idea does have some merit, though.

 Apropos, why is there no setcontains()?

 -pd

  On 06 Jan 2015, at 22:02 , Herv� Pag�s hpa...@fredhutch.org wrote:
 
  Hi,
 
  Current implementation:
 
  setequal - function (x, y)
  {
   x - as.vector(x)
   y - as.vector(y)
   all(c(match(x, y, 0L)  0L, match(y, x, 0L)  0L))
  }
 
  First what about replacing 'match(x, y, 0L)  0L' and 'match(y, x, 0L)
  0L'
  with 'x %in% y' and 'y %in% x', respectively. They're strictly
  equivalent but the latter form is a lot more readable than the former
  (isn't this the raison d'�tre of %in%?):
 
  setequal - function (x, y)
  {
   x - as.vector(x)
   y - as.vector(y)
   all(c(x %in% y, y %in% x))
  }
 
  Furthermore, replacing 'all(c(x %in% y, y %in x))' with
  'all(x %in% y)  all(y %in% x)' improves readability even more and,
  more importantly, reduces memory footprint significantly on big vectors
  (e.g. by 15% on integer vectors with 15M elements):
 
  setequal - function (x, y)
  {
   x - as.vector(x)
   y - as.vector(y)
   all(x %in% y)  all(y %in% x)
  }
 
  It also seems to speed up things a little bit (not in a significant
  way though).
 
  Cheers,
  H.
 
  --
  Herv� Pag�s
 
  Program in Computational Biology
  Division of Public Health Sciences
  Fred Hutchinson Cancer Research Center
  1100 Fairview Ave. N, M1-B514
  P.O. Box 19024
  Seattle, WA 98109-1024
 
  E-mail: hpa...@fredhutch.org
  Phone:  (206) 667-5791
  Fax:(206) 667-1319
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel

 --
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Peter Haverty
Try this out. It looks like a 2X speedup for some cases and a wash in
others.  unique does two allocations, but skipping the  0L allocation
could make up for it.

library(microbenchmark)
library(RUnit)

x = sample.int(1e4, 1e5, TRUE)
y = sample.int(1e4, 1e5, TRUE)

set_equal - function(x, y) {
xu = .Internal(unique(x, FALSE, FALSE, NA))
yu = .Internal(unique(y, FALSE, FALSE, NA))
if (length(xu) != length(yu)) {
return(FALSE);
}
return( all(match(xu, yu, 0L)  0L) )
}

set_equal2 - function(x, y) {
xu = .Internal(unique(x, FALSE, FALSE, NA))
yu = .Internal(unique(y, FALSE, FALSE, NA))
if (length(xu) != length(yu)) {
return(FALSE);
}
return( !anyNA(match(xu, yu)) )
}

microbenchmark(
a = setequal(x, y),
b = set_equal(x, y),
c = set_equal2(x, y)
)
checkIdentical(setequal(x, y), set_equal(x, y))
checkIdentical(setequal(x, y), set_equal2(x, y))

x = y
microbenchmark(
a = setequal(x, y),
b = set_equal(x, y),
c = set_equal2(x, y)
)
checkIdentical(setequal(x, y), set_equal(x, y))
checkIdentical(setequal(x, y), set_equal2(x, y))


Sorry, I'm probably over-posting today.

Regards,

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Hervé Pagès

On 01/08/2015 01:30 PM, peter dalgaard wrote:

If you look at the definition of %in%, you'll find that it is implemented using 
match, so if we did as you suggest, I give it about three days before someone 
suggests to inline the function call...


But you wouldn't bet money on that right? Because you know you would
loose.


Readability of source code is not usually our prime concern.


Don't sacrifice readability if you do not have a good reason for it.
What's your reason here? Are you seriously suggesting that inlining
makes a significant difference? As Michael pointed out, the expensive
operation here is the hashing. But sadly some people like inlining and
want to use it everywhere: it's easy and they feel good about it, even
if it hurts readability and maintainability (if you use x %in% y
instead of the inlined version, the day someone changes the
implementation of x %in% y for something faster, or fixes a bug
in it, your code will automatically benefit, right now it won't).

More simply put: good readability generally leads to better code.



The  idea does have some merit, though.

Apropos, why is there no setcontains()?


Wait... shouldn't everybody use all(match(x, y, nomatch = 0L)  0L) ?

H.



-pd


On 06 Jan 2015, at 22:02 , Hervé Pagès hpa...@fredhutch.org wrote:

Hi,

Current implementation:

setequal - function (x, y)
{
  x - as.vector(x)
  y - as.vector(y)
  all(c(match(x, y, 0L)  0L, match(y, x, 0L)  0L))
}

First what about replacing 'match(x, y, 0L)  0L' and 'match(y, x, 0L)  0L'
with 'x %in% y' and 'y %in% x', respectively. They're strictly
equivalent but the latter form is a lot more readable than the former
(isn't this the raison d'être of %in%?):

setequal - function (x, y)
{
  x - as.vector(x)
  y - as.vector(y)
  all(c(x %in% y, y %in% x))
}

Furthermore, replacing 'all(c(x %in% y, y %in x))' with
'all(x %in% y)  all(y %in% x)' improves readability even more and,
more importantly, reduces memory footprint significantly on big vectors
(e.g. by 15% on integer vectors with 15M elements):

setequal - function (x, y)
{
  x - as.vector(x)
  y - as.vector(y)
  all(x %in% y)  all(y %in% x)
}

It also seems to speed up things a little bit (not in a significant
way though).

Cheers,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New version of Rtools for Windows

2015-01-08 Thread Avraham Adler
Regarding the redefinition error, I've asked on StackOverflow for
advice [1], but I have noticed the following; perhaps someone here can
understand what changed between the stdio.h of 4.6.3 and the stdio.h
of 4.8.4. In GCC 4.8.4, the section of stdio.h which is referenced in
the errors is the following:

#if !defined (__USE_MINGW_ANSI_STDIO) || __USE_MINGW_ANSI_STDIO == 0
/* this is here to deal with software defining
 * vsnprintf as _vsnprintf, eg. libxml2.  */
#pragma push_macro(snprintf)
#pragma push_macro(vsnprintf)
# undef snprintf
# undef vsnprintf
  int __cdecl __ms_vsnprintf(char * __restrict__ d,size_t n,const char
* __restrict__ format,va_list arg)
__MINGW_ATTRIB_DEPRECATED_MSVC2005 __MINGW_ATTRIB_DEPRECATED_SEC_WARN;

  __mingw_ovr
  __MINGW_ATTRIB_NONNULL(3)
  int vsnprintf (char * __restrict__ __stream, size_t __n, const char
* __restrict__ __format, va_list __local_argv)
  {
return __ms_vsnprintf (__stream, __n, __format, __local_argv);
  }

  int __cdecl __ms_snprintf(char * __restrict__ s, size_t n, const
char * __restrict__  format, ...);

#ifndef __NO_ISOCEXT
__mingw_ovr
__MINGW_ATTRIB_NONNULL(3)
int snprintf (char * __restrict__ __stream, size_t __n, const char *
__restrict__ __format, ...)
{
  register int __retval;
  __builtin_va_list __local_argv; __builtin_va_start( __local_argv, __format );
  __retval = __ms_vsnprintf (__stream, __n, __format, __local_argv);
  __builtin_va_end( __local_argv );
  return __retval;
}
#endif /* !__NO_ISOCEXT */

#pragma pop_macro (vsnprintf)
#pragma pop_macro (snprintf)
#endif

The corresponding section in 4.6.3 as found in the Rtools for Windows
installation is:

#if !defined (__USE_MINGW_ANSI_STDIO) || __USE_MINGW_ANSI_STDIO == 0
/* this is here to deal with software defining
 * vsnprintf as _vsnprintf, eg. libxml2.  */
#pragma push_macro(snprintf)
#pragma push_macro(vsnprintf)
# undef snprintf
# undef vsnprintf
  int __cdecl vsnprintf(char * __restrict__ d,size_t n,const char *
__restrict__ format,va_list arg)
__MINGW_ATTRIB_DEPRECATED_MSVC2005 __MINGW_ATTRIB_DEPRECATED_SEC_WARN;

#ifndef __NO_ISOCEXT
  int __cdecl snprintf(char * __restrict__ s, size_t n, const char *
__restrict__  format, ...);
#ifndef __CRT__NO_INLINE
  __CRT_INLINE int __cdecl vsnprintf(char * __restrict__ d,size_t
n,const char * __restrict__ format,va_list arg)
  {
return _vsnprintf (d, n, format, arg);
  }
#endif /* !__CRT__NO_INLINE */
#endif /* !__NO_ISOCEXT */
#pragma pop_macro (vsnprintf)
#pragma pop_macro (snprintf)
#endif

The latter does not have a direct redefinition of the two functions. I
still don't know why the #undef calls do not work [1].

Thank you,

Avi

[1] 
https://stackoverflow.com/questions/27853225/is-there-a-way-to-include-stdio-h-but-ignore-some-of-the-functions-therein

On Thu, Jan 8, 2015 at 2:27 PM, Hin-Tak Leung
ht...@users.sourceforge.net wrote:
 Oh, I forgot to mention that besides setting AR, RANLIB and the stack probing 
 fix, you also need a very up to date binutils. 2.25 was out in december. Even 
 with that , if you linker's default is not what you are compiling for (i.e. a 
 multiarch toolchain), you need to set GNUTARGET also, i.e. -m32/-m64 is not 
 enough. Some fix to autodetect non-default targets went in after christmas 
 before the new year, but I am not brave enough to try that on a daily basis 
 yet (only tested it and reported it, then reverting the change - how gcc 
 invokes the linker is rather complicated and it is not easy to have two 
 binutils installed...)- setting GNUTARGET seems safer :-).
 Whether you need that depends on whether you are compiling for your 
 toolchain's default target architecture.

 AR, RANLIB, GNUTARGET are all environment variables - you set them the usual 
 way. The stack probing fix is for passing make check, when you finish make.

 --
 On Thu, Jan 8, 2015 6:14 PM GMT Avraham Adler wrote:

On Thu, Jan 8, 2015 at 10:48 AM, Hin-Tak Leung
ht...@users.sourceforge.net wrote:

 The r.dll crash is easy - you need to be using gcc-ar for ar, and 
 gcc-ranlib for ranlib. I also posted a patch to fix the check failure for 
 stack probing, as lto optimizes away the stack probing code, as it should.

 yes, lto build's speed gain is very impressive.



I apologize for my ignorance, but how would I do that? I tried by
changing the following in src/gnuwin32/MkRules.local:

# prefix for 64-bit: path or x86_64-w64-mingw32-
BINPREF64 = x86_64-w64-mingw32-gcc-

I added the gcc- as the suffix there, but I guess that is insufficient
as I still get the following error using 4.9.2:

windres -F pe-x86-64  -I../include -i dllversion.rc -o dllversion.o
gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o
dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o
psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o
system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a
../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a

Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

2015-01-08 Thread Peter Haverty
For what it's worth, I think we would need a new function if the default
behavior changes.  Since we already have get and mget, maybe cget for
conditional get?  if get, safe get, ...

I like the idea of keeping the original not found behavior if the
if.not.found arg is missing. However, it will be important to keep the
number of arguments down.  (I noticed that Martin's example lacks a frame
argument.)  I've heard rumors that there are plans to reduce the function
call overhead, so perhaps this matters less now.

I like Luke's idea of making exists/get/etc. .Primitives. I think that will
be necessary in order to go fast.  For my two cents, I also think
get/assign should just be synonyms for the [[ .Primitive.  That could
actually simplify things a bit. One might add inherits=FALSE and
if.not.found arguments to the environment [[ code, for example.

Regards,
Pete


Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Thu, Jan 8, 2015 at 11:57 AM, luke-tier...@uiowa.edu wrote:

 On Thu, 8 Jan 2015, Michael Lawrence wrote:

  If we do add an argument to get(), then it should be named consistently
 with the ifnotfound argument of mget(). As mentioned, the possibility of a
 NULL value is problematic. One solution is a sentinel value that indicates
 an unbound value (like R_UnboundValue).


 A null default is fine -- it's a default; if it isn't right for a
 particular case you can provide something else.


 But another idea (and one pretty similar to John's) is to follow the
 SYMSXP
 design at the C level, where there is a structure that points to the name
 and a value. We already have SYMSXPs at the R level of course (name
 objects) but they do not provide access to the value, which is typically
 R_UnboundValue. But this does not even need to be implemented with SYMSXP.
 The design would allow something like:

 binding - getBinding(x, env)
 if (hasValue(binding)) {
  x - value(binding) # throws an error if none
  message(name(binding), has value, x)
 }

 That I think it is a bit verbose but readable and could be made fast. And
 I
 think binding objects would be useful in other ways, as they are
 essentially a named object. For example, when iterating over an
 environment.


 This would need a lot more thought. Directly exposing the internals is
 definitely not something we want to do as we may well want to change
 that design. But there are lots of other corner issues that would have
 to be thought through before going forward, such as what happens if an
 rm occurs between obtaining a binding object and doing something with
 it. Serialization would also need thinking through. This doesn't seem
 like a worthwhile place to spend our efforts to me.

 Adding getIfExists, or .get, or get0, or whatever seems fine. Adding
 an argument to get() with missing giving current behavior may be OK
 too. Rewriting exists and get as .Primitives may be sufficient though.

 Best,

 luke


  Michael




 On Thu, Jan 8, 2015 at 6:03 AM, John Nolan jpno...@american.edu wrote:

  Adding an optional argument to get (and mget) like

 val - get(name, where, ..., value.if.not.found=NULL )   (*)

 would be useful for many.  HOWEVER, it is possible that there could be
 some confusion here: (*) can give a NULL because either x exists and
 has value NULL, or because x doesn't exist.   If that matters, the user
 would need to be careful about specifying a value.if.not.found that
 cannot
 be confused with a valid value of x.

 To avoid this difficulty, perhaps we want both: have Martin's
 getifexists(
 )
 return a list with two values:
   - a boolean variable 'found'  # = value returned by exists( )
   - a variable 'value'

 Then implement get( ) as:

 get - function(x,...,value.if.not.found ) {

   if( missing(value.if.not.found) ) {
 a - getifexists(x,... )
 if (!a$found) error(x not found)
   } else {
 a - getifexists(x,...,value.if.not.found )
   }
   return(a$value)
 }

 Note that value.if.not.found has no default value in above.
 It behaves exactly like current get does if value.if.not.found
 is not specified, and if it is specified, it would be faster
 in the common situation mentioned below:
  if(exists(x,...)) { get(x,...) }

 John

 P.S. if you like dromedaries call it valueIfNotFound ...

  ..
  John P. Nolan
  Math/Stat Department
  227 Gray Hall,   American University
  4400 Massachusetts Avenue, NW
  Washington, DC 20016-8050

  jpno...@american.edu   voice: 202.885.3140
  web: academic2.american.edu/~jpnolan
  ..


 -R-devel r-devel-boun...@r-project.org wrote: -
 To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org
 From: Duncan Murdoch
 Sent by: R-devel
 Date: 01/08/2015 06:39AM
 Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

 On 08/01/2015 4:16 AM, Martin Maechler wrote:
  In November, we had a bug 

Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

2015-01-08 Thread Peter Haverty
Michael's idea has an interesting bonus that he and I discussed earlier.
It would be very convenient to have a container of key/value pairs.  I
imagine many people often write this:

x - mapply( names(x), x, FUN=function(k,v) { # work with key and value }

especially ex perl people accustomed to

while ( ($key, $value) = each( some_hash ) { }

Perhaps there is room for additional discussion of using lists of SYMSXPs
in this manner. (If SYMSXPs are not that safe, perhaps a looping construct
for named vectors that gave the illusion iterating over a list of
two-tuples.)



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Thu, Jan 8, 2015 at 11:57 AM, luke-tier...@uiowa.edu wrote:

 On Thu, 8 Jan 2015, Michael Lawrence wrote:

  If we do add an argument to get(), then it should be named consistently
 with the ifnotfound argument of mget(). As mentioned, the possibility of a
 NULL value is problematic. One solution is a sentinel value that indicates
 an unbound value (like R_UnboundValue).


 A null default is fine -- it's a default; if it isn't right for a
 particular case you can provide something else.


 But another idea (and one pretty similar to John's) is to follow the
 SYMSXP
 design at the C level, where there is a structure that points to the name
 and a value. We already have SYMSXPs at the R level of course (name
 objects) but they do not provide access to the value, which is typically
 R_UnboundValue. But this does not even need to be implemented with SYMSXP.
 The design would allow something like:

 binding - getBinding(x, env)
 if (hasValue(binding)) {
  x - value(binding) # throws an error if none
  message(name(binding), has value, x)
 }

 That I think it is a bit verbose but readable and could be made fast. And
 I
 think binding objects would be useful in other ways, as they are
 essentially a named object. For example, when iterating over an
 environment.


 This would need a lot more thought. Directly exposing the internals is
 definitely not something we want to do as we may well want to change
 that design. But there are lots of other corner issues that would have
 to be thought through before going forward, such as what happens if an
 rm occurs between obtaining a binding object and doing something with
 it. Serialization would also need thinking through. This doesn't seem
 like a worthwhile place to spend our efforts to me.

 Adding getIfExists, or .get, or get0, or whatever seems fine. Adding
 an argument to get() with missing giving current behavior may be OK
 too. Rewriting exists and get as .Primitives may be sufficient though.

 Best,

 luke


  Michael




 On Thu, Jan 8, 2015 at 6:03 AM, John Nolan jpno...@american.edu wrote:

  Adding an optional argument to get (and mget) like

 val - get(name, where, ..., value.if.not.found=NULL )   (*)

 would be useful for many.  HOWEVER, it is possible that there could be
 some confusion here: (*) can give a NULL because either x exists and
 has value NULL, or because x doesn't exist.   If that matters, the user
 would need to be careful about specifying a value.if.not.found that
 cannot
 be confused with a valid value of x.

 To avoid this difficulty, perhaps we want both: have Martin's
 getifexists(
 )
 return a list with two values:
   - a boolean variable 'found'  # = value returned by exists( )
   - a variable 'value'

 Then implement get( ) as:

 get - function(x,...,value.if.not.found ) {

   if( missing(value.if.not.found) ) {
 a - getifexists(x,... )
 if (!a$found) error(x not found)
   } else {
 a - getifexists(x,...,value.if.not.found )
   }
   return(a$value)
 }

 Note that value.if.not.found has no default value in above.
 It behaves exactly like current get does if value.if.not.found
 is not specified, and if it is specified, it would be faster
 in the common situation mentioned below:
  if(exists(x,...)) { get(x,...) }

 John

 P.S. if you like dromedaries call it valueIfNotFound ...

  ..
  John P. Nolan
  Math/Stat Department
  227 Gray Hall,   American University
  4400 Massachusetts Avenue, NW
  Washington, DC 20016-8050

  jpno...@american.edu   voice: 202.885.3140
  web: academic2.american.edu/~jpnolan
  ..


 -R-devel r-devel-boun...@r-project.org wrote: -
 To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org
 From: Duncan Murdoch
 Sent by: R-devel
 Date: 01/08/2015 06:39AM
 Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

 On 08/01/2015 4:16 AM, Martin Maechler wrote:
  In November, we had a bug repository conversation
  with Peter Hagerty and myself:
 
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
 
  where the bug report title started with
 
   ---  exists is a bottleneck for dispatch and package loading, ...
 
  Peter proposed an extra simplified and henc faster version of 

Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

2015-01-08 Thread Michael Lawrence
On Thu, Jan 8, 2015 at 11:57 AM, luke-tier...@uiowa.edu wrote:

 On Thu, 8 Jan 2015, Michael Lawrence wrote:

  If we do add an argument to get(), then it should be named consistently
 with the ifnotfound argument of mget(). As mentioned, the possibility of a
 NULL value is problematic. One solution is a sentinel value that indicates
 an unbound value (like R_UnboundValue).


 A null default is fine -- it's a default; if it isn't right for a
 particular case you can provide something else.


 But another idea (and one pretty similar to John's) is to follow the
 SYMSXP
 design at the C level, where there is a structure that points to the name
 and a value. We already have SYMSXPs at the R level of course (name
 objects) but they do not provide access to the value, which is typically
 R_UnboundValue. But this does not even need to be implemented with SYMSXP.
 The design would allow something like:

 binding - getBinding(x, env)
 if (hasValue(binding)) {
  x - value(binding) # throws an error if none
  message(name(binding), has value, x)
 }

 That I think it is a bit verbose but readable and could be made fast. And
 I
 think binding objects would be useful in other ways, as they are
 essentially a named object. For example, when iterating over an
 environment.


 This would need a lot more thought. Directly exposing the internals is
 definitely not something we want to do as we may well want to change
 that design. But there are lots of other corner issues that would have
 to be thought through before going forward, such as what happens if an
 rm occurs between obtaining a binding object and doing something with
 it. Serialization would also need thinking through. This doesn't seem
 like a worthwhile place to spend our efforts to me.



Just wanted to be clear that I was not suggesting to expose any internals.
We could implement the behavior using SYMSXP, or not. Nor would the binding
need to be mutable. The binding would be considered independent of the
environment from which it was retrieved. As Pete has mentioned, it could be
a useful abstraction to have in general.


 Adding getIfExists, or .get, or get0, or whatever seems fine. Adding
 an argument to get() with missing giving current behavior may be OK
 too. Rewriting exists and get as .Primitives may be sufficient though.

 Best,

 luke


  Michael




 On Thu, Jan 8, 2015 at 6:03 AM, John Nolan jpno...@american.edu wrote:

  Adding an optional argument to get (and mget) like

 val - get(name, where, ..., value.if.not.found=NULL )   (*)

 would be useful for many.  HOWEVER, it is possible that there could be
 some confusion here: (*) can give a NULL because either x exists and
 has value NULL, or because x doesn't exist.   If that matters, the user
 would need to be careful about specifying a value.if.not.found that
 cannot
 be confused with a valid value of x.

 To avoid this difficulty, perhaps we want both: have Martin's
 getifexists(
 )
 return a list with two values:
   - a boolean variable 'found'  # = value returned by exists( )
   - a variable 'value'

 Then implement get( ) as:

 get - function(x,...,value.if.not.found ) {

   if( missing(value.if.not.found) ) {
 a - getifexists(x,... )
 if (!a$found) error(x not found)
   } else {
 a - getifexists(x,...,value.if.not.found )
   }
   return(a$value)
 }

 Note that value.if.not.found has no default value in above.
 It behaves exactly like current get does if value.if.not.found
 is not specified, and if it is specified, it would be faster
 in the common situation mentioned below:
  if(exists(x,...)) { get(x,...) }

 John

 P.S. if you like dromedaries call it valueIfNotFound ...

  ..
  John P. Nolan
  Math/Stat Department
  227 Gray Hall,   American University
  4400 Massachusetts Avenue, NW
  Washington, DC 20016-8050

  jpno...@american.edu   voice: 202.885.3140
  web: academic2.american.edu/~jpnolan
  ..


 -R-devel r-devel-boun...@r-project.org wrote: -
 To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org
 From: Duncan Murdoch
 Sent by: R-devel
 Date: 01/08/2015 06:39AM
 Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

 On 08/01/2015 4:16 AM, Martin Maechler wrote:
  In November, we had a bug repository conversation
  with Peter Hagerty and myself:
 
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
 
  where the bug report title started with
 
   ---  exists is a bottleneck for dispatch and package loading, ...
 
  Peter proposed an extra simplified and henc faster version of exists(),
  and I commented
 
   --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch
 ---
   I'm very grateful that you've started exploring the bottlenecks
 of
 loading
   packages with many S4 classes (and methods)...
   and I hope we can make real progress there rather sooner than
 later.
 
  

Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread William Dunlap
 why is there no setcontains()?

Several packages define is.subset(), which I am assuming is what you are
proposing, but it its arguments reversed.  E.g., package:algstat has
   is.subset - function(x, y) all(x %in% y)
   containsQ - function(y, x) all(x %in% y)
and package:rje has essentially the same is.subset.

package:arulesSequences and package:arules have an S4 generic called
is.subset, which is entirely different (it is not a predicate, but returns
a matrix).


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard pda...@gmail.com wrote:

 If you look at the definition of %in%, you'll find that it is implemented
 using match, so if we did as you suggest, I give it about three days before
 someone suggests to inline the function call... Readability of source code
 is not usually our prime concern.

 The  idea does have some merit, though.

 Apropos, why is there no setcontains()?

 -pd

  On 06 Jan 2015, at 22:02 , Hervé Pagès hpa...@fredhutch.org wrote:
 
  Hi,
 
  Current implementation:
 
  setequal - function (x, y)
  {
   x - as.vector(x)
   y - as.vector(y)
   all(c(match(x, y, 0L)  0L, match(y, x, 0L)  0L))
  }
 
  First what about replacing 'match(x, y, 0L)  0L' and 'match(y, x, 0L) 
 0L'
  with 'x %in% y' and 'y %in% x', respectively. They're strictly
  equivalent but the latter form is a lot more readable than the former
  (isn't this the raison d'être of %in%?):
 
  setequal - function (x, y)
  {
   x - as.vector(x)
   y - as.vector(y)
   all(c(x %in% y, y %in% x))
  }
 
  Furthermore, replacing 'all(c(x %in% y, y %in x))' with
  'all(x %in% y)  all(y %in% x)' improves readability even more and,
  more importantly, reduces memory footprint significantly on big vectors
  (e.g. by 15% on integer vectors with 15M elements):
 
  setequal - function (x, y)
  {
   x - as.vector(x)
   y - as.vector(y)
   all(x %in% y)  all(y %in% x)
  }
 
  It also seems to speed up things a little bit (not in a significant
  way though).
 
  Cheers,
  H.
 
  --
  Hervé Pagès
 
  Program in Computational Biology
  Division of Public Health Sciences
  Fred Hutchinson Cancer Research Center
  1100 Fairview Ave. N, M1-B514
  P.O. Box 19024
  Seattle, WA 98109-1024
 
  E-mail: hpa...@fredhutch.org
  Phone:  (206) 667-5791
  Fax:(206) 667-1319
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel

 --
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Peter Haverty
How about unique them both and compare the lengths?  It's less work,
especially allocation.



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard pda...@gmail.com wrote:

 If you look at the definition of %in%, you'll find that it is implemented
 using match, so if we did as you suggest, I give it about three days before
 someone suggests to inline the function call... Readability of source code
 is not usually our prime concern.

 The  idea does have some merit, though.

 Apropos, why is there no setcontains()?

 -pd

  On 06 Jan 2015, at 22:02 , Herv� Pag�s hpa...@fredhutch.org wrote:
 
  Hi,
 
  Current implementation:
 
  setequal - function (x, y)
  {
   x - as.vector(x)
   y - as.vector(y)
   all(c(match(x, y, 0L)  0L, match(y, x, 0L)  0L))
  }
 
  First what about replacing 'match(x, y, 0L)  0L' and 'match(y, x, 0L) 
 0L'
  with 'x %in% y' and 'y %in% x', respectively. They're strictly
  equivalent but the latter form is a lot more readable than the former
  (isn't this the raison d'�tre of %in%?):
 
  setequal - function (x, y)
  {
   x - as.vector(x)
   y - as.vector(y)
   all(c(x %in% y, y %in% x))
  }
 
  Furthermore, replacing 'all(c(x %in% y, y %in x))' with
  'all(x %in% y)  all(y %in% x)' improves readability even more and,
  more importantly, reduces memory footprint significantly on big vectors
  (e.g. by 15% on integer vectors with 15M elements):
 
  setequal - function (x, y)
  {
   x - as.vector(x)
   y - as.vector(y)
   all(x %in% y)  all(y %in% x)
  }
 
  It also seems to speed up things a little bit (not in a significant
  way though).
 
  Cheers,
  H.
 
  --
  Herv� Pag�s
 
  Program in Computational Biology
  Division of Public Health Sciences
  Fred Hutchinson Cancer Research Center
  1100 Fairview Ave. N, M1-B514
  P.O. Box 19024
  Seattle, WA 98109-1024
 
  E-mail: hpa...@fredhutch.org
  Phone:  (206) 667-5791
  Fax:(206) 667-1319
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel

 --
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

2015-01-08 Thread Duncan Murdoch
On 08/01/2015 4:16 AM, Martin Maechler wrote:
 In November, we had a bug repository conversation
 with Peter Hagerty and myself:
 
   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
 
 where the bug report title started with
 
  ---  exists is a bottleneck for dispatch and package loading, ...
 
 Peter proposed an extra simplified and henc faster version of exists(),
 and I commented
 
  --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch ---
  I'm very grateful that you've started exploring the bottlenecks of 
 loading
  packages with many S4 classes (and methods)...
  and I hope we can make real progress there rather sooner than later.
 
  OTOH, your `summaryRprof()` in your vignette indicates that exists() 
 may use
  upto 10% of the time spent in library(reportingTools),  and your speedup
  proposals of exist()  may go up to ca 30%  which is good and well worth
  considering,  but still we can only expect 2-3% speedup for package 
 loading
  which unfortunately is not much.
 
  Still I agree it is worth looking at exists() as you did  ... and 
  consider providing a fast simplified version of it in addition to 
 current
  exists() [I think].
 
  BTW, as we talk about enhancements here, maybe consider a further 
 possibility:
  My subjective guess is that probably more than half of exists() uses 
 are of the
  form
 
  if(exists(name, where, ...)) {
 get(name, whare, )
 ..
  } else { 
  NULL / error() / .. or similar
  }
 
  i.e. many exists() calls when returning TRUE are immediately followed 
 by the
  corresponding get() call which repeats quite a bit of the lookup that 
 exists()
  has done.
 
  Instead, I'd imagine a function, say  getifexists(name, ...) that does 
 both at
  once in the exists is TRUE case but in a way we can easily keep the 
 if(.) ..
  else clause above.  One already existing approach would use
 
  if(!inherits(tryCatch(xx - get(name, where, ...), error=function(e)e), 
 error)) {
 
... (( work with xx )) ...
 
  } else  { 
 NULL / error() / .. or similar
  }
 
  but of course our C implementation would be more efficient and use more 
 concise
  syntax {which should not look like error handling}.   Follow ups to 
 this idea
  should really go to R-devel (the mailing list).
 
 and now I do follow up here myself :
 
 I found that  'getifexists()' is actually very simple to implement,
 I have already tested it a bit, but not yet committed to R-devel
 (the R trunk aka master branch) because I'd like to get
 public comments {RFC := Request For Comments}.
 

I don't like the name -- I'd prefer getIfExists.  As Baath (2012, R
Journal) pointed out, R names are very inconsistent in naming
conventions, but lowerCamelCase is the most common choice.  Second most
common is period.separated, so an argument could be made for
get.if.exists, but there's still the possibility of confusion with S3
methods, and users of other languages where . is an operator find it a
little strange.

If you don't like lowerCamelCase (and a lot of people don't), then I
think underscore_separated is the next best choice, so would use
get_if_exists.

Another possibility is to make no new name at all, and just add an
optional parameter to get() (which if present acts as your value.if.not
parameter, if not present keeps the current object not found error).

Duncan Murdoch


 My version of the help file {for both exists() and getifexists()}
 rendered in text is
 
 -- help(getifexists) ---
 Is an Object Defined?
 
 Description:
 
  Look for an R object of the given name and possibly return it
 
 Usage:
 
  exists(x, where = -1, envir = , frame, mode = any,
 inherits = TRUE)
  
  getifexists(x, where = -1, envir = as.environment(where),
  mode = any, inherits = TRUE, value.if.not = NULL)
  
 Arguments:
 
x: a variable name (given as a character string).
 
where: where to look for the object (see the details section); if
   omitted, the function will search as if the name of the
   object appeared unquoted in an expression.
 
envir: an alternative way to specify an environment to look in, but
   it is usually simpler to just use the ‘where’ argument.
 
frame: a frame in the calling list.  Equivalent to giving ‘where’ as
   ‘sys.frame(frame)’.
 
 mode: the mode or type of object sought: see the ‘Details’ section.
 
 inherits: should the enclosing frames of the environment be searched?
 
 value.if.not: the return value of ‘getifexists(x, *)’ when ‘x’ does not
   exist.
 
 Details:
 
  The ‘where’ argument can specify the environment in which to look
  for the object in any of several ways: as an integer (the position
  in the ‘search’ list); as the character string name 

Re: [Rd] gsub with perl=TRUE results in 'this version of PCRE is not compiled with Unicode property support' in R-devel

2015-01-08 Thread Prof Brian Ripley
Why are you reporting that your PCRE library does not have something 
which the R-admin manual says it should preferably have?  To wit, 
footnote 37 says


'and not PCRE2, which started at version 10.0. PCRE must be built with 
UTF-8 support (not the default) and support for Unicode properties is 
assumed by some R packages. Neither are tested by configure. JIT support 
is desirable.'


That certainly does not fail on my Linux, Windows and OS X builds of 
R-devel.  (Issues about pre-built binaries, if that is what you used, 
should be reported to their maintainers, not here.)


And the help does say in ?regex

 In UTF-8 mode, some Unicode properties may be supported via
 ‘\p{xx}’ and ‘\P{xx}’ which match characters with and without
 property ‘xx’ respectively.

Note the 'may'.




On 07/01/2015 23:25, Dan Tenenbaum wrote:

The following code:

res - gsub((*UCP)\\b(i)\\b,
 , nhgrimelanomaclass, perl = TRUE)

results in:

Error in gsub(sprintf((*UCP)\\b(%s)\\b, i), , nhgrimelanomaclass,  :
   invalid regular expression '(*UCP)\b(i)\b'
In addition: Warning message:
In gsub(sprintf((*UCP)\\b(%s)\\b, i), , nhgrimelanomaclass,  :
   PCRE pattern compilation error
'this version of PCRE is not compiled with Unicode property support'
at '(*UCP)\b(i)\b'

on

R Under development (unstable) (2015-01-01 r67290)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

And also on the same version of R-devel on Snow Leopard, Windows, and Linux. 
But it does not produce an error on

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

Dan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] On base::rank

2015-01-08 Thread Arunkumar Srinivasan
Have a look at the following, taken from base::rank:

...
if (!is.na(na.last)  any(nas)) {
yy - integer(length(x)) # ~
storage.mode(yy) - storage.mode(y) # 
yy - NA
NAkeep - (na.last == keep)
if (NAkeep || na.last) {
yy[!nas] - y
if (!NAkeep)
yy[nas] - (length(y) + 1L):length(yy)
}
...

Alternatively, look at lines 36 and 37 here:
https://github.com/wch/r-source/blob/fbf5cdf29d923395b537a9893f46af1aa75e38f3/src/library/base/R/rank.R#L36

There seems to be no need for those lines, IIUC. Isn't it? 'yy' is
replaced with NA in the ver next line.

Best,
Arun.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] Announcing Docker containers for Bioconductor

2015-01-08 Thread Elena Grassi
Thanks, this is really useful and I was looking forward to it after
having used rocker!

I have a strange (ok, at least to me :) ) issue concerning volumes.
On a machine (debian testing/unstable up to date) everything works
smoothly when I do something like:
data@decoder:~$ docker run -v
/home/data/Dropbox/work/
matrix/mr_bioc/matrix_rider/:/opt/matrix_rider
-p 8787:8787 9f40f8036ad4
(in /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/ I have a
Rstudio project)

The same command on another machine (more or less the same debian as
before) leads me to being unable to open the project with Rstudio as
long as the directory is not writeable.
The only relevant difference seems to be my user UID which is 1000 on
the first machine but different on the other one. I tried to use -e as
suggested here
https://github.com/rocker-org/rocker/wiki/Sharing-files-with-host-machine
with no luck.

Does anyone more docker-savy than me have a suggestion? (maybe the
volumes approach is not the best one here).

Thanks,
E.
p.s. sorry Dan for the first mail sent only to you

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Rd] RFC: getifexists() {was [Bug 16065] exists ...}

2015-01-08 Thread Martin Maechler
In November, we had a bug repository conversation
with Peter Hagerty and myself:

  https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065

where the bug report title started with

 ---  exists is a bottleneck for dispatch and package loading, ...

Peter proposed an extra simplified and henc faster version of exists(),
and I commented

 --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch ---
 I'm very grateful that you've started exploring the bottlenecks of loading
 packages with many S4 classes (and methods)...
 and I hope we can make real progress there rather sooner than later.

 OTOH, your `summaryRprof()` in your vignette indicates that exists() may 
use
 upto 10% of the time spent in library(reportingTools),  and your speedup
 proposals of exist()  may go up to ca 30%  which is good and well worth
 considering,  but still we can only expect 2-3% speedup for package 
loading
 which unfortunately is not much.

 Still I agree it is worth looking at exists() as you did  ... and 
 consider providing a fast simplified version of it in addition to current
 exists() [I think].

 BTW, as we talk about enhancements here, maybe consider a further 
possibility:
 My subjective guess is that probably more than half of exists() uses are 
of the
 form

 if(exists(name, where, ...)) {
get(name, whare, )
..
 } else { 
 NULL / error() / .. or similar
 }

 i.e. many exists() calls when returning TRUE are immediately followed by 
the
 corresponding get() call which repeats quite a bit of the lookup that 
exists()
 has done.

 Instead, I'd imagine a function, say  getifexists(name, ...) that does 
both at
 once in the exists is TRUE case but in a way we can easily keep the 
if(.) ..
 else clause above.  One already existing approach would use

 if(!inherits(tryCatch(xx - get(name, where, ...), error=function(e)e), 
error)) {

   ... (( work with xx )) ...

 } else  { 
NULL / error() / .. or similar
 }

 but of course our C implementation would be more efficient and use more 
concise
 syntax {which should not look like error handling}.   Follow ups to this 
idea
 should really go to R-devel (the mailing list).

and now I do follow up here myself :

I found that  'getifexists()' is actually very simple to implement,
I have already tested it a bit, but not yet committed to R-devel
(the R trunk aka master branch) because I'd like to get
public comments {RFC := Request For Comments}.

My version of the help file {for both exists() and getifexists()}
rendered in text is

-- help(getifexists) ---
Is an Object Defined?

Description:

 Look for an R object of the given name and possibly return it

Usage:

 exists(x, where = -1, envir = , frame, mode = any,
inherits = TRUE)
 
 getifexists(x, where = -1, envir = as.environment(where),
 mode = any, inherits = TRUE, value.if.not = NULL)
 
Arguments:

   x: a variable name (given as a character string).

   where: where to look for the object (see the details section); if
  omitted, the function will search as if the name of the
  object appeared unquoted in an expression.

   envir: an alternative way to specify an environment to look in, but
  it is usually simpler to just use the ‘where’ argument.

   frame: a frame in the calling list.  Equivalent to giving ‘where’ as
  ‘sys.frame(frame)’.

mode: the mode or type of object sought: see the ‘Details’ section.

inherits: should the enclosing frames of the environment be searched?

value.if.not: the return value of ‘getifexists(x, *)’ when ‘x’ does not
  exist.

Details:

 The ‘where’ argument can specify the environment in which to look
 for the object in any of several ways: as an integer (the position
 in the ‘search’ list); as the character string name of an element
 in the search list; or as an ‘environment’ (including using
 ‘sys.frame’ to access the currently active function calls).  The
 ‘envir’ argument is an alternative way to specify an environment,
 but is primarily there for back compatibility.

 This function looks to see if the name ‘x’ has a value bound to it
 in the specified environment.  If ‘inherits’ is ‘TRUE’ and a value
 is not found for ‘x’ in the specified environment, the enclosing
 frames of the environment are searched until the name ‘x’ is
 encountered.  See ‘environment’ and the ‘R Language Definition’
 manual for details about the structure of environments and their
 enclosures.

 *Warning:* ‘inherits = TRUE’ is the default behaviour for R but
 not for S.

 If ‘mode’ is specified then only objects of that type are sought.
 The ‘mode’ may specify one of the collections ‘numeric’ and
 ‘function’ (see ‘mode’): any 

Re: [Rd] New version of Rtools for Windows

2015-01-08 Thread Avraham Adler
Very timely, as this is how I got into the problem I posted about
earlier; maybe some of the problems I ran into will mean more to the
you and the experts on this thread, Dr. Murdoch.For reference, I run
Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2.

As we discussed offline, Dr. Murdoch, I've been trying to build R
using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen
(rubenvb) told me he is no longer developing his own builds of GCC,
but is focusing on MSYS2 and the mingw64 personal builds. So, similar
to what Jeroen said, I first installed MSYS2, whose initial
installation on windows is not so simple[1]. After the initial
install, the following packages need to be manually installed: make,
tar, zip, unzip, zlib, and rsync. I also installed base-devel, which
is way more than necessary, but there may be packages in there which
are necessary.

I originally installed the most up-to-date version of GCC (4.9.2)[2],
and I did pick the -seh version, as since I install (almost) all
packages from source (the one exception being nloptr for now), the
exception handling should be consistent and it is supposed to up to
~15% faster[3].

The initial build crashed with the following error:

gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H  -O3 -Wall
-pedantic -mtune=core2   -c xmalloc.c -o xmalloc.o
ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o
tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o
tre-mem.o tre-parse.o tre-stack.o xmalloc.o
gcc -std=gnu99 -m64   -O3 -Wall -pedantic -mtune=core2   -c compat.c -o compat.o
compat.c:65:5: error: redefinition of 'snprintf'
 int snprintf(char *buffer, size_t max, const char *format, ...)
 ^
In file included from compat.c:3:0:
F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous
definition of 'snprintf' was here
 int snprintf (char * __restrict__ __stream, size_t __n, const char *
__restrict__ __format, ...)
 ^
compat.c:75:5: error: redefinition of 'vsnprintf'
 int vsnprintf(char *buffer, size_t bufferSize, const char *format,
va_list args)
 ^
In file included from compat.c:3:0:
F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous
definition of 'vsnprintf' was here
   int vsnprintf (char * __restrict__ __stream, size_t __n, const char
* __restrict__ __format, va_list __local_argv)
   ^
../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed
make[4]: *** [compat.o] Error 1
Makefile:120: recipe for target 'rlibs' failed
make[3]: *** [rlibs] Error 1
Makefile:179: recipe for target '../../bin/x64/R.dll' failed
make[2]: *** [../../bin/x64/R.dll] Error 2
Makefile:104: recipe for target 'rbuild' failed
make[1]: *** [rbuild] Error 2
Makefile:14: recipe for target 'all' failed
make: *** [all] Error 2

After doing some checking (for example see [4]), I asked Duncan about
the problem, and he suggested moving the #ifndef _W64 in compat.c up
above the offending lines (65-75). That did not work, so, I figured
(it seems mistakenly from the other thread) that if those functions
are included from stdio already, I can just delete them from compat.c.
The specific lines are:

int snprintf(char *buffer, size_t max, const char *format, ...)
{
int res;
va_list(ap);
va_start(ap, format);
res = trio_vsnprintf(buffer, max, format, ap);
va_end(ap);
return res;
}

int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args)
{
return trio_vsnprintf(buffer, bufferSize, format, args);
}

Continuing the build using 4.9.2 crashed again at the following point:

gcc -std=gnu99 -m64 -I../include -I. -I../extra -DHAVE_CONFIG_H
-DR_DLL_BUILD  -O3 -Wall -pedantic -mtune=core2   -c malloc.c -o
malloc.o
windres -F pe-x86-64  -I../include -i dllversion.rc -o dllversion.o
gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o
dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o
psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o
system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a
../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a
../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a
../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a
../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L.
-lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv
-lcomctl32 -lversion
collect2.exe: error: ld returned 5 exit status
Makefile:150: recipe for target 'R.dll' failed
make[3]: *** [R.dll] Error 1
Makefile:179: recipe for target '../../bin/x64/R.dll' failed
make[2]: *** [../../bin/x64/R.dll] Error 2
Makefile:104: recipe for target 'rbuild' failed
make[1]: *** [rbuild] Error 2
Makefile:14: recipe for target 'all' failed
make: *** [all] Error 2

As all those files existed in their correct places, the only reason I
could think of that this would fail here is that GCC version 4.9 did
make some changes to enhance link-time optimization [5], and probably
something isn't 

Re: [Rd] New version of Rtools for Windows

2015-01-08 Thread Henric Winell

On 2015-01-08 14:18, Avraham Adler wrote:


Very timely, as this is how I got into the problem I posted about
earlier; maybe some of the problems I ran into will mean more to the
you and the experts on this thread, Dr. Murdoch.For reference, I run
Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2.

As we discussed offline, Dr. Murdoch, I've been trying to build R
using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen
(rubenvb) told me he is no longer developing his own builds of GCC,
but is focusing on MSYS2 and the mingw64 personal builds. So, similar
to what Jeroen said, I first installed MSYS2, whose initial
installation on windows is not so simple[1]. After the initial
install, the following packages need to be manually installed: make,
tar, zip, unzip, zlib, and rsync. I also installed base-devel, which
is way more than necessary, but there may be packages in there which
are necessary.

I originally installed the most up-to-date version of GCC (4.9.2)[2],
and I did pick the -seh version, as since I install (almost) all
packages from source (the one exception being nloptr for now), the
exception handling should be consistent and it is supposed to up to
~15% faster[3].

The initial build crashed with the following error:

gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H  -O3 -Wall
-pedantic -mtune=core2   -c xmalloc.c -o xmalloc.o
ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o
tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o
tre-mem.o tre-parse.o tre-stack.o xmalloc.o
gcc -std=gnu99 -m64   -O3 -Wall -pedantic -mtune=core2   -c compat.c -o compat.o
compat.c:65:5: error: redefinition of 'snprintf'
  int snprintf(char *buffer, size_t max, const char *format, ...)
  ^
In file included from compat.c:3:0:
F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous
definition of 'snprintf' was here
  int snprintf (char * __restrict__ __stream, size_t __n, const char *
__restrict__ __format, ...)
  ^
compat.c:75:5: error: redefinition of 'vsnprintf'
  int vsnprintf(char *buffer, size_t bufferSize, const char *format,
va_list args)
  ^
In file included from compat.c:3:0:
F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous
definition of 'vsnprintf' was here
int vsnprintf (char * __restrict__ __stream, size_t __n, const char
* __restrict__ __format, va_list __local_argv)
^
../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed
make[4]: *** [compat.o] Error 1
Makefile:120: recipe for target 'rlibs' failed
make[3]: *** [rlibs] Error 1
Makefile:179: recipe for target '../../bin/x64/R.dll' failed
make[2]: *** [../../bin/x64/R.dll] Error 2
Makefile:104: recipe for target 'rbuild' failed
make[1]: *** [rbuild] Error 2
Makefile:14: recipe for target 'all' failed
make: *** [all] Error 2

After doing some checking (for example see [4]), I asked Duncan about
the problem, and he suggested moving the #ifndef _W64 in compat.c up
above the offending lines (65-75). That did not work, so, I figured
(it seems mistakenly from the other thread) that if those functions
are included from stdio already, I can just delete them from compat.c.
The specific lines are:

int snprintf(char *buffer, size_t max, const char *format, ...)
{
 int res;
 va_list(ap);
 va_start(ap, format);
 res = trio_vsnprintf(buffer, max, format, ap);
 va_end(ap);
 return res;
}

int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args)
{
 return trio_vsnprintf(buffer, bufferSize, format, args);
}

Continuing the build using 4.9.2 crashed again at the following point:

gcc -std=gnu99 -m64 -I../include -I. -I../extra -DHAVE_CONFIG_H
-DR_DLL_BUILD  -O3 -Wall -pedantic -mtune=core2   -c malloc.c -o
malloc.o
windres -F pe-x86-64  -I../include -i dllversion.rc -o dllversion.o
gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o
dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o
psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o
system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a
../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a
../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a
../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a
../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L.
-lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv
-lcomctl32 -lversion
collect2.exe: error: ld returned 5 exit status
Makefile:150: recipe for target 'R.dll' failed
make[3]: *** [R.dll] Error 1
Makefile:179: recipe for target '../../bin/x64/R.dll' failed
make[2]: *** [../../bin/x64/R.dll] Error 2
Makefile:104: recipe for target 'rbuild' failed
make[1]: *** [rbuild] Error 2
Makefile:14: recipe for target 'all' failed
make: *** [all] Error 2

As all those files existed in their correct places, the only reason I
could think of that this would fail here is that GCC version 4.9 did
make some changes to enhance 

Re: [Rd] On base::rank

2015-01-08 Thread Martin Maechler
 Arunkumar Srinivasan arunkumar.sri...@gmail.com
 on Thu, 8 Jan 2015 13:46:57 +0100 writes:

 Have a look at the following, taken from base::rank:

 ...
 if (!is.na(na.last)  any(nas)) {
 yy - integer(length(x)) # ~
 storage.mode(yy) - storage.mode(y) # 
 yy - NA
 NAkeep - (na.last == keep)
 if (NAkeep || na.last) {
 yy[!nas] - y
 if (!NAkeep)
 yy[nas] - (length(y) + 1L):length(yy)
 }
 ...

 Alternatively, look at lines 36 and 37 here:
 https://github.com/wch/r-source/blob/fbf5cdf29d923395b537a9893f46af1aa75e38f3/src/library/base/R/rank.R#L36

 There seems to be no need for those lines, IIUC. Isn't it? 
 'yy' is replaced with NA in the ver next line.

Indeed.   Interesting that nobody has noticed till now,
even though that part has been world readable since at least 2008-08-25.

Note that the R source code is at 
 http://svn.r-project.org/R/
and the file in question at
 http://svn.r-project.org/R/trunk/src/library/base/R/rank.R

where you can already see the new code
(given that 'x' was no longer needed, there's no need for 'xx').

Martin Maechler, 
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New version of Rtools for Windows

2015-01-08 Thread Avraham Adler
On Thu, Jan 8, 2015 at 10:48 AM, Hin-Tak Leung
ht...@users.sourceforge.net wrote:

 The r.dll crash is easy - you need to be using gcc-ar for ar, and gcc-ranlib 
 for ranlib. I also posted a patch to fix the check failure for stack probing, 
 as lto optimizes away the stack probing code, as it should.

 yes, lto build's speed gain is very impressive.




I apologize for my ignorance, but how would I do that? I tried by
changing the following in src/gnuwin32/MkRules.local:

# prefix for 64-bit: path or x86_64-w64-mingw32-
BINPREF64 = x86_64-w64-mingw32-gcc-

I added the gcc- as the suffix there, but I guess that is insufficient
as I still get the following error using 4.9.2:

windres -F pe-x86-64  -I../include -i dllversion.rc -o dllversion.o
gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o
dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o
psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o
system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a
../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a
../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a
../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a
../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L.
-lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv
-lcomctl32 -lversion
collect2.exe: error: ld returned 5 exit status
Makefile:150: recipe for target 'R.dll' failed
make[3]: *** [R.dll] Error 1
Makefile:179: recipe for target '../../bin/x64/R.dll' failed
make[2]: *** [../../bin/x64/R.dll] Error 2
Makefile:104: recipe for target 'rbuild' failed
make[1]: *** [rbuild] Error 2
Makefile:14: recipe for target 'all' failed
make: *** [all] Error 2

I still had to delete those lines in compat.c, so this build, were it
to have completed, is still subject to the non-conformance of
scientfic notation printing that was discussed earlier.

Hin-tak, any suggestions for this error (and the compat.c for that
matter) that you, or any reader of this list, may have would be
greatly appreciated.

Thank you!

Avi



 --
 On Thu, Jan 8, 2015 2:01 PM GMT Henric Winell wrote:

On 2015-01-08 14:18, Avraham Adler wrote:

 Very timely, as this is how I got into the problem I posted about
 earlier; maybe some of the problems I ran into will mean more to the
 you and the experts on this thread, Dr. Murdoch.For reference, I run
 Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2.

 As we discussed offline, Dr. Murdoch, I've been trying to build R
 using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen
 (rubenvb) told me he is no longer developing his own builds of GCC,
 but is focusing on MSYS2 and the mingw64 personal builds. So, similar
 to what Jeroen said, I first installed MSYS2, whose initial
 installation on windows is not so simple[1]. After the initial
 install, the following packages need to be manually installed: make,
 tar, zip, unzip, zlib, and rsync. I also installed base-devel, which
 is way more than necessary, but there may be packages in there which
 are necessary.

 I originally installed the most up-to-date version of GCC (4.9.2)[2],
 and I did pick the -seh version, as since I install (almost) all
 packages from source (the one exception being nloptr for now), the
 exception handling should be consistent and it is supposed to up to
 ~15% faster[3].

 The initial build crashed with the following error:

 gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H  -O3 -Wall
 -pedantic -mtune=core2   -c xmalloc.c -o xmalloc.o
 ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o
 tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o
 tre-mem.o tre-parse.o tre-stack.o xmalloc.o
 gcc -std=gnu99 -m64   -O3 -Wall -pedantic -mtune=core2   -c compat.c -o 
 compat.o
 compat.c:65:5: error: redefinition of 'snprintf'
   int snprintf(char *buffer, size_t max, const char *format, ...)
   ^
 In file included from compat.c:3:0:
 F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous
 definition of 'snprintf' was here
   int snprintf (char * __restrict__ __stream, size_t __n, const char *
 __restrict__ __format, ...)
   ^
 compat.c:75:5: error: redefinition of 'vsnprintf'
   int vsnprintf(char *buffer, size_t bufferSize, const char *format,
 va_list args)
   ^
 In file included from compat.c:3:0:
 F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous
 definition of 'vsnprintf' was here
 int vsnprintf (char * __restrict__ __stream, size_t __n, const char
 * __restrict__ __format, va_list __local_argv)
 ^
 ../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed
 make[4]: *** [compat.o] Error 1
 Makefile:120: recipe for target 'rlibs' failed
 make[3]: *** [rlibs] Error 1
 Makefile:179: recipe for target '../../bin/x64/R.dll' failed
 make[2]: *** [../../bin/x64/R.dll] Error 2
 Makefile:104: recipe for target 'rbuild' failed
 make[1]: *** [rbuild] 

Re: [Bioc-devel] Announcing Docker containers for Bioconductor

2015-01-08 Thread Dan Tenenbaum


- Original Message -
 From: Elena Grassi grass...@gmail.com
 To: Dan Tenenbaum dtene...@fredhutch.org
 Cc: bioc-devel@r-project.org
 Sent: Thursday, January 8, 2015 8:45:37 AM
 Subject: Re: [Bioc-devel] Announcing Docker containers for Bioconductor
 
 On Thu, Jan 8, 2015 at 4:43 PM, Dan Tenenbaum
 dtene...@fredhutch.org wrote:
  I have a strange (ok, at least to me :) ) issue concerning
  volumes.
  On a machine (debian testing/unstable up to date) everything works
  smoothly when I do something like:
  data@decoder:~$ docker run -v
  /home/data/Dropbox/work/
  matrix/mr_bioc/matrix_rider/:/opt/matrix_rider
  -p 8787:8787 9f40f8036ad4
  (in /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/ I have a
  Rstudio project)
 
  What image is 9f40f8036ad4? Why are you not using the
  user/repository name for the image? Is it a rocker image or a
  Bioconductor image?
 
 Sorry! It's bioconductor/devel_sequencing image that I had pulled
 before:
 data@decoder:~/Dropbox/work/matrix/matrix_rider$ docker images
 REPOSITORY  TAG IMAGE ID
  CREATED VIRTUAL SIZE
 bioconductor/devel_sequencing   latest  9f40f8036ad4
  29 hours ago5.715 GB
 
  Note that Rstudio Server runs as a user called rstudio, not
  necessarily a privileged user.
  So the directory you are trying to mount should probably be
  writable (and readable) by all.
  so try
 
  chmod -R a+rw /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/
 
 On the second machine this works but on this machine I do not love
 having this kind of privileges as long as it's shared PC in my office
 :)
 What puzzles me it's that the privileges were (i.e. not writable and
 readable by everyone) the same on both machines and on the first one
 it has worked from the first moment without a+rw...I assumed that the
 rstudio user has PID 1000 in the container/image and that -e UID= etc
 could fix (change it to whatever PID my user has on the host machine)
 this but this does not seem the case.

It is not the case. These containers are set up differently than Rocker's. 
Though I think I will change them to allow this. 

In the meantime you can accomplish what you want, a bit circuitously. Run a 
command like this:

docker run -p 8787:8787 -e USER=$USER -e USERID=$UID --rm -it  -v 
/home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/:/opt/matrix_rider  
bioconductor/devel_sequencing bash

This opens a shell on the container (as root) where you can do the following:

/tmp/userconf.sh # note different location than on rocker
# ignore warning that /home/data already exists
passwd data # choose a passwd for the user, does not have to match your real 
password
/usr/bin/supervisord

Then open a browser to http://localhost:8787 and log in with username 'data' 
and the password you just set. You should be able to read/write the files in 
/opt/matrix_rider, assuming you can read/write those files when you are logged 
in as 'data' and _not_ using docker.

I will look at doing this in a way more similar to rocker; I'll let you know 
when/if this is done.

Thanks,
Dan




In the meantime you can use R from the command line as described in 
https://github.com/rocker-org/rocker/wiki/Sharing-files-with-host-machine#interactive-containers
 with the change
that userconf.sh is in /tmp so you need to invoke it like this:

/tmp/userconf.sh

And ignore the warning it gives you that your home directory already exists. 







 
 Thank you very much,
 E.
 --
 $ pom


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

2015-01-08 Thread Duncan Murdoch

On 08/01/2015 9:03 AM, John Nolan wrote:

Adding an optional argument to get (and mget) like

val - get(name, where, ..., value.if.not.found=NULL )   (*)


That would be a bad idea, as it would change behaviour of existing uses 
of get().  What I suggested would not
give a default.  If the arg was missing, we'd get the old behaviour, if 
the arg was present, we'd use it.


I'm not sure this is preferable to the separate function 
implementation.  This makes the documentation and implementation of 
get() more complicated, and it would probably be slower for everyone.


Duncan Murdoch



would be useful for many.  HOWEVER, it is possible that there could be
some confusion here: (*) can give a NULL because either x exists and
has value NULL, or because x doesn't exist.   If that matters, the user
would need to be careful about specifying a value.if.not.found that cannot
be confused with a valid value of x.

To avoid this difficulty, perhaps we want both: have Martin's getifexists( )
return a list with two values:
   - a boolean variable 'found'  # = value returned by exists( )
   - a variable 'value'

Then implement get( ) as:

get - function(x,...,value.if.not.found ) {

   if( missing(value.if.not.found) ) {
 a - getifexists(x,... )
 if (!a$found) error(x not found)
   } else {
 a - getifexists(x,...,value.if.not.found )
   }
   return(a$value)
}

Note that value.if.not.found has no default value in above.
It behaves exactly like current get does if value.if.not.found
is not specified, and if it is specified, it would be faster
in the common situation mentioned below:
  if(exists(x,...)) { get(x,...) }

John

P.S. if you like dromedaries call it valueIfNotFound ...

  ..
  John P. Nolan
  Math/Stat Department
  227 Gray Hall,   American University
  4400 Massachusetts Avenue, NW
  Washington, DC 20016-8050

  jpno...@american.edu   voice: 202.885.3140
  web: academic2.american.edu/~jpnolan
  ..


-R-devel r-devel-boun...@r-project.org wrote: -
To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org
From: Duncan Murdoch
Sent by: R-devel
Date: 01/08/2015 06:39AM
Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

On 08/01/2015 4:16 AM, Martin Maechler wrote:
 In November, we had a bug repository conversation
 with Peter Hagerty and myself:

   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065

 where the bug report title started with

  ---  exists is a bottleneck for dispatch and package loading, ...

 Peter proposed an extra simplified and henc faster version of exists(),
 and I commented

  --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch ---
  I'm very grateful that you've started exploring the bottlenecks of 
loading
  packages with many S4 classes (and methods)...
  and I hope we can make real progress there rather sooner than later.

  OTOH, your `summaryRprof()` in your vignette indicates that exists() 
may use
  upto 10% of the time spent in library(reportingTools),  and your speedup
  proposals of exist()  may go up to ca 30%  which is good and well worth
  considering,  but still we can only expect 2-3% speedup for package 
loading
  which unfortunately is not much.

  Still I agree it is worth looking at exists() as you did  ... and
  consider providing a fast simplified version of it in addition to 
current
  exists() [I think].

  BTW, as we talk about enhancements here, maybe consider a further 
possibility:
  My subjective guess is that probably more than half of exists() uses 
are of the
  form

  if(exists(name, where, ...)) {
 get(name, whare, )
 ..
  } else {
  NULL / error() / .. or similar
  }

  i.e. many exists() calls when returning TRUE are immediately followed 
by the
  corresponding get() call which repeats quite a bit of the lookup that 
exists()
  has done.

  Instead, I'd imagine a function, say  getifexists(name, ...) that does 
both at
  once in the exists is TRUE case but in a way we can easily keep the 
if(.) ..
  else clause above.  One already existing approach would use

  if(!inherits(tryCatch(xx - get(name, where, ...), error=function(e)e), 
error)) {

... (( work with xx )) ...

  } else  {
 NULL / error() / .. or similar
  }

  but of course our C implementation would be more efficient and use more 
concise
  syntax {which should not look like error handling}.   Follow ups to 
this idea
  should really go to R-devel (the mailing list).

 and now I do follow up here myself :

 I found that  'getifexists()' is actually very simple to implement,
 I have already tested it a bit, but not yet committed to R-devel
 (the R trunk aka master branch) because I'd like to get
 public comments {RFC := Request For 

[Rd] unloadNamespace

2015-01-08 Thread Paul Gilbert
In the documentation the closed thing I see to an explanation of this is 
that ?detach says Unloading some namespaces has undesirable side effects


Can anyone explain why unloading tseries will load zoo? I don't think 
this behavior is specific to tseries, it's just an example. I realize 
one would not usually unload something that is not loaded, but I would 
expect it to do nothing or give an error. I only discovered this when 
trying to clean up to debug another problem.


R version 3.1.2 (2014-10-31) -- Pumpkin Helmet
and
R Under development (unstable) (2015-01-02 r67308) -- Unsuffered 
Consequences

...
Type 'q()' to quit R.

 loadedNamespaces()
[1] base  datasets  graphics  grDevices methods   stats
[7] utils
 unloadNamespace(tseries) # loads zoo ?
 loadedNamespaces()
 [1] base  datasets  graphics  grDevices grid 
lattice

 [7] methods   quadprog  stats utils zoo


Somewhat related, is there an easy way to get back to a clean state 
for loaded and attached things, as if R had just been started? I'm 
trying to do this in a vignette so it is not easy to stop and restart R.


Paul

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] gsub with perl=TRUE results in 'this version of PCRE is not compiled with Unicode property support' in R-devel

2015-01-08 Thread Kasper Daniel Hansen
Dan, for OS X, there is a new pcre library posted at
http://r.research.att.com/libs/ with a date stamp of Dec 28.  This fixes
this problem.  You can test for this by running
  make check
post compilation.  It'll bang out with a failure if this is not in order.

(And I know that all of this is described in R-admin).

It would be helpful (time saving) if a message is posted to r-sig-mac
whenever a new (version of a) library is added to
http://r.research.att.com/libs/
I know it is adding more work to the helpful people who are doing all the
heavy lifting.

Kasper

On Thu, Jan 8, 2015 at 7:06 AM, Prof Brian Ripley rip...@stats.ox.ac.uk
wrote:

 Why are you reporting that your PCRE library does not have something which
 the R-admin manual says it should preferably have?  To wit, footnote 37 says

 'and not PCRE2, which started at version 10.0. PCRE must be built with
 UTF-8 support (not the default) and support for Unicode properties is
 assumed by some R packages. Neither are tested by configure. JIT support is
 desirable.'

 That certainly does not fail on my Linux, Windows and OS X builds of
 R-devel.  (Issues about pre-built binaries, if that is what you used,
 should be reported to their maintainers, not here.)

 And the help does say in ?regex

  In UTF-8 mode, some Unicode properties may be supported via
  ‘\p{xx}’ and ‘\P{xx}’ which match characters with and without
  property ‘xx’ respectively.

 Note the 'may'.





 On 07/01/2015 23:25, Dan Tenenbaum wrote:

 The following code:

 res - gsub((*UCP)\\b(i)\\b,
  , nhgrimelanomaclass, perl = TRUE)

 results in:

 Error in gsub(sprintf((*UCP)\\b(%s)\\b, i), ,
 nhgrimelanomaclass,  :
invalid regular expression '(*UCP)\b(i)\b'
 In addition: Warning message:
 In gsub(sprintf((*UCP)\\b(%s)\\b, i), , nhgrimelanomaclass,  :
PCRE pattern compilation error
 'this version of PCRE is not compiled with Unicode property
 support'
 at '(*UCP)\b(i)\b'

 on

 R Under development (unstable) (2015-01-01 r67290)
 Platform: x86_64-apple-darwin13.4.0 (64-bit)
 Running under: OS X 10.9.5 (Mavericks)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 And also on the same version of R-devel on Snow Leopard, Windows, and
 Linux. But it does not produce an error on

 R version 3.1.2 (2014-10-31)
 Platform: x86_64-apple-darwin13.4.0 (64-bit)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 Dan

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



 --
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Emeritus Professor of Applied Statistics, University of Oxford
 1 South Parks Road, Oxford OX1 3TG, UK


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] On base::rank

2015-01-08 Thread Arunkumar Srinivasan
 Indeed.   Interesting that nobody has noticed till now,
 even though that part has been world readable since at least 2008-08-25.

That was what made me a bit unsure :-).

 Note that the R source code is at
  http://svn.r-project.org/R/
 and the file in question at
  http://svn.r-project.org/R/trunk/src/library/base/R/rank.R

Okay, thanks.

 where you can already see the new code
 (given that 'x' was no longer needed, there's no need for 'xx').

Great! thanks again.

 Martin Maechler,
 ETH Zurich

Best,
Arun.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

2015-01-08 Thread Michael Lawrence
If we do add an argument to get(), then it should be named consistently
with the ifnotfound argument of mget(). As mentioned, the possibility of a
NULL value is problematic. One solution is a sentinel value that indicates
an unbound value (like R_UnboundValue).

But another idea (and one pretty similar to John's) is to follow the SYMSXP
design at the C level, where there is a structure that points to the name
and a value. We already have SYMSXPs at the R level of course (name
objects) but they do not provide access to the value, which is typically
R_UnboundValue. But this does not even need to be implemented with SYMSXP.
The design would allow something like:

binding - getBinding(x, env)
if (hasValue(binding)) {
  x - value(binding) # throws an error if none
  message(name(binding), has value, x)
}

That I think it is a bit verbose but readable and could be made fast. And I
think binding objects would be useful in other ways, as they are
essentially a named object. For example, when iterating over an
environment.

Michael




On Thu, Jan 8, 2015 at 6:03 AM, John Nolan jpno...@american.edu wrote:

 Adding an optional argument to get (and mget) like

 val - get(name, where, ..., value.if.not.found=NULL )   (*)

 would be useful for many.  HOWEVER, it is possible that there could be
 some confusion here: (*) can give a NULL because either x exists and
 has value NULL, or because x doesn't exist.   If that matters, the user
 would need to be careful about specifying a value.if.not.found that cannot
 be confused with a valid value of x.

 To avoid this difficulty, perhaps we want both: have Martin's getifexists(
 )
 return a list with two values:
   - a boolean variable 'found'  # = value returned by exists( )
   - a variable 'value'

 Then implement get( ) as:

 get - function(x,...,value.if.not.found ) {

   if( missing(value.if.not.found) ) {
 a - getifexists(x,... )
 if (!a$found) error(x not found)
   } else {
 a - getifexists(x,...,value.if.not.found )
   }
   return(a$value)
 }

 Note that value.if.not.found has no default value in above.
 It behaves exactly like current get does if value.if.not.found
 is not specified, and if it is specified, it would be faster
 in the common situation mentioned below:
  if(exists(x,...)) { get(x,...) }

 John

 P.S. if you like dromedaries call it valueIfNotFound ...

  ..
  John P. Nolan
  Math/Stat Department
  227 Gray Hall,   American University
  4400 Massachusetts Avenue, NW
  Washington, DC 20016-8050

  jpno...@american.edu   voice: 202.885.3140
  web: academic2.american.edu/~jpnolan
  ..


 -R-devel r-devel-boun...@r-project.org wrote: -
 To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org
 From: Duncan Murdoch
 Sent by: R-devel
 Date: 01/08/2015 06:39AM
 Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

 On 08/01/2015 4:16 AM, Martin Maechler wrote:
  In November, we had a bug repository conversation
  with Peter Hagerty and myself:
 
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
 
  where the bug report title started with
 
   ---  exists is a bottleneck for dispatch and package loading, ...
 
  Peter proposed an extra simplified and henc faster version of exists(),
  and I commented
 
   --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch
 ---
   I'm very grateful that you've started exploring the bottlenecks of
 loading
   packages with many S4 classes (and methods)...
   and I hope we can make real progress there rather sooner than
 later.
 
   OTOH, your `summaryRprof()` in your vignette indicates that
 exists() may use
   upto 10% of the time spent in library(reportingTools),  and your
 speedup
   proposals of exist()  may go up to ca 30%  which is good and well
 worth
   considering,  but still we can only expect 2-3% speedup for
 package loading
   which unfortunately is not much.
 
   Still I agree it is worth looking at exists() as you did  ... and
   consider providing a fast simplified version of it in addition to
 current
   exists() [I think].
 
   BTW, as we talk about enhancements here, maybe consider a further
 possibility:
   My subjective guess is that probably more than half of exists()
 uses are of the
   form
 
   if(exists(name, where, ...)) {
  get(name, whare, )
  ..
   } else {
   NULL / error() / .. or similar
   }
 
   i.e. many exists() calls when returning TRUE are immediately
 followed by the
   corresponding get() call which repeats quite a bit of the lookup
 that exists()
   has done.
 
   Instead, I'd imagine a function, say  getifexists(name, ...) that
 does both at
   once in the exists is TRUE case but in a way we can easily keep
 the if(.) ..
   else clause above.  One already existing 

Re: [Bioc-devel] Announcing Docker containers for Bioconductor

2015-01-08 Thread Dan Tenenbaum


- Original Message -
 From: Elena Grassi grass...@gmail.com
 To: bioc-devel@r-project.org
 Sent: Thursday, January 8, 2015 5:11:50 AM
 Subject: Re: [Bioc-devel] Announcing Docker containers for Bioconductor
 
 Thanks, this is really useful and I was looking forward to it after
 having used rocker!
 
 I have a strange (ok, at least to me :) ) issue concerning volumes.
 On a machine (debian testing/unstable up to date) everything works
 smoothly when I do something like:
 data@decoder:~$ docker run -v
 /home/data/Dropbox/work/
 matrix/mr_bioc/matrix_rider/:/opt/matrix_rider
 -p 8787:8787 9f40f8036ad4
 (in /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/ I have a
 Rstudio project)

What image is 9f40f8036ad4? Why are you not using the user/repository name for 
the image? Is it a rocker image or a Bioconductor image?



 
 The same command on another machine (more or less the same debian as
 before) leads me to being unable to open the project with Rstudio as
 long as the directory is not writeable.
 The only relevant difference seems to be my user UID which is 1000 on
 the first machine but different on the other one. I tried to use -e
 as
 suggested here
 https://github.com/rocker-org/rocker/wiki/Sharing-files-with-host-machine
 with no luck.
 
 Does anyone more docker-savy than me have a suggestion? (maybe the
 volumes approach is not the best one here).


Note that Rstudio Server runs as a user called rstudio, not necessarily a 
privileged user.
So the directory you are trying to mount should probably be writable (and 
readable) by all.
so try 

chmod -R a+rw /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/

Dan


 
 Thanks,
 E.
 p.s. sorry Dan for the first mail sent only to you
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

2015-01-08 Thread Martin Maechler

 Adding an optional argument to get (and mget) like
 val - get(name, where, ..., value.if.not.found=NULL )   (*)

 would be useful for many.  HOWEVER, it is possible that there could be 
 some confusion here: (*) can give a NULL because either x exists and 
 has value NULL, or because x doesn't exist.   If that matters, the user 
 would need to be careful about specifying a value.if.not.found that cannot 
 be confused with a valid value of x.  

Exactly -- well, of course: That problem { NULL can be the legit value of what 
you
want to get() } was the only reason to have a 'value.if.not' argument at all. 

Note that this is not about a universal replacement of 
the  if(exists(..)) { .. get(..) } idiom, but rather a
replacement of these in the cases where speed matters very much,
which is e.g. in the low level support code for S4 method dispatch.

'value.if.not.found':
Note that CRAN checks requires all arguments to be written in
full length.  Even though we have auto completion in ESS,
Rstudio or other good R IDE's,  I very much like to keep
function calls somewhat compact.

And yes, as you mention the dromedars aka 2-hump camels:  
getIfExist is already horrible to my taste (and _ is not S-like; 
yes that's all very much a matter of taste and yes I'm from the
20th century).

 To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) 
 return a list with two values: 
   - a boolean variable 'found'  # = value returned by exists( )
   - a variable 'value'

 Then implement get( ) as:

 get - function(x,...,value.if.not.found ) {

   if( missing(value.if.not.found) ) {
 a - getifexists(x,... )
 if (!a$found) error(x not found)
   } else {
 a - getifexists(x,...,value.if.not.found )
   }
   return(a$value)
 }

Interesting...
Note that the above get() implementation would just be conceptually, as 
all of this is also quite a bit about speed, and we do the
different cases in C anyway [via 'op' code].

 Note that value.if.not.found has no default value in above.
 It behaves exactly like current get does if value.if.not.found 
 is not specified, and if it is specified, it would be faster 
 in the common situation mentioned below:   
  if(exists(x,...)) { get(x,...) }

Good... Let's talk about your getifexists() as I argue we'd keep
get() exactly as it is now anyway, if we use a new 3rd function (I keep
calling 'getifexists()' for now):

I think in that case, getifexists() would not even *need* an argument 
'value.if.not' (or 'value.if.not.found'); it rather would return a 
  list(found = *, value = *)
in any case.
Alternatively, it could return
  structure(found, value = *)

In the first case, our main use case would be

  if((r - getifexists(x, *))$found) {
 ## work with  r$value
  }

in the 2nd case {structure} :

  if((r - getifexists(x, *))) {
 ## work with  attr(r,value)
  }

I think that (both cases) would still be a bit slower (for the above
most important use case) but probably not much
and it would like slightly more readable than my

   if (!is.null(r - getifexists(x, *))) {
  ## work with  r
   }

After all of this, I think I'd still somewhat prefer my original proposal,
but not strongly -- I had originally also thought of returning the
two parts explicitly, but then tended to prefer the version that
behaved exactly like get() in the case the object is found.

... Nice interesting ideas! ... 
let the proposals and consideration flow ...

Martin


 John

 P.S. if you like dromedaries call it valueIfNotFound ...

:-) ;-)  
I don't .. as I said above, I already strongly dislike more than one hump. 
[ Each capital is one key stroke (Shift) more ,
  and each _ is two key strokes more on most key boards...,
  and I do like identifiers that I can also quickly pronounce on
  the phone or in teaching .. ]

  ..
  John P. Nolan
  Math/Stat Department
  227 Gray Hall,   American University
  4400 Massachusetts Avenue, NW
  Washington, DC 20016-8050
  ..


 -R-devel r-devel-boun...@r-project.org wrote: - 
 To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org
 From: Duncan Murdoch 
 Sent by: R-devel 
 Date: 01/08/2015 06:39AM
 Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

 On 08/01/2015 4:16 AM, Martin Maechler wrote:
  In November, we had a bug repository conversation
  with Peter Hagerty and myself:
  
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
  
  where the bug report title started with
  
   ---  exists is a bottleneck for dispatch and package loading, ...
  
  Peter proposed an extra simplified and henc faster version of exists(),
  and I commented
  
   --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch ---
   I'm very grateful that you've started exploring the bottlenecks of 
  loading
   packages with many S4 classes 

Re: [Rd] New version of Rtools for Windows

2015-01-08 Thread Hin-Tak Leung

The r.dll crash is easy - you need to be using gcc-ar for ar, and gcc-ranlib 
for ranlib. I also posted a patch to fix the check failure for stack probing, 
as lto optimizes away the stack probing code, as it should.

yes, lto build's speed gain is very impressive.



--
On Thu, Jan 8, 2015 2:01 PM GMT Henric Winell wrote:

On 2015-01-08 14:18, Avraham Adler wrote:

 Very timely, as this is how I got into the problem I posted about
 earlier; maybe some of the problems I ran into will mean more to the
 you and the experts on this thread, Dr. Murdoch.For reference, I run
 Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2.

 As we discussed offline, Dr. Murdoch, I've been trying to build R
 using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen
 (rubenvb) told me he is no longer developing his own builds of GCC,
 but is focusing on MSYS2 and the mingw64 personal builds. So, similar
 to what Jeroen said, I first installed MSYS2, whose initial
 installation on windows is not so simple[1]. After the initial
 install, the following packages need to be manually installed: make,
 tar, zip, unzip, zlib, and rsync. I also installed base-devel, which
 is way more than necessary, but there may be packages in there which
 are necessary.

 I originally installed the most up-to-date version of GCC (4.9.2)[2],
 and I did pick the -seh version, as since I install (almost) all
 packages from source (the one exception being nloptr for now), the
 exception handling should be consistent and it is supposed to up to
 ~15% faster[3].

 The initial build crashed with the following error:

 gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H  -O3 -Wall
 -pedantic -mtune=core2   -c xmalloc.c -o xmalloc.o
 ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o
 tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o
 tre-mem.o tre-parse.o tre-stack.o xmalloc.o
 gcc -std=gnu99 -m64   -O3 -Wall -pedantic -mtune=core2   -c compat.c -o 
 compat.o
 compat.c:65:5: error: redefinition of 'snprintf'
   int snprintf(char *buffer, size_t max, const char *format, ...)
   ^
 In file included from compat.c:3:0:
 F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous
 definition of 'snprintf' was here
   int snprintf (char * __restrict__ __stream, size_t __n, const char *
 __restrict__ __format, ...)
   ^
 compat.c:75:5: error: redefinition of 'vsnprintf'
   int vsnprintf(char *buffer, size_t bufferSize, const char *format,
 va_list args)
   ^
 In file included from compat.c:3:0:
 F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous
 definition of 'vsnprintf' was here
 int vsnprintf (char * __restrict__ __stream, size_t __n, const char
 * __restrict__ __format, va_list __local_argv)
 ^
 ../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed
 make[4]: *** [compat.o] Error 1
 Makefile:120: recipe for target 'rlibs' failed
 make[3]: *** [rlibs] Error 1
 Makefile:179: recipe for target '../../bin/x64/R.dll' failed
 make[2]: *** [../../bin/x64/R.dll] Error 2
 Makefile:104: recipe for target 'rbuild' failed
 make[1]: *** [rbuild] Error 2
 Makefile:14: recipe for target 'all' failed
 make: *** [all] Error 2

 After doing some checking (for example see [4]), I asked Duncan about
 the problem, and he suggested moving the #ifndef _W64 in compat.c up
 above the offending lines (65-75). That did not work, so, I figured
 (it seems mistakenly from the other thread) that if those functions
 are included from stdio already, I can just delete them from compat.c.
 The specific lines are:

 int snprintf(char *buffer, size_t max, const char *format, ...)
 {
  int res;
  va_list(ap);
  va_start(ap, format);
  res = trio_vsnprintf(buffer, max, format, ap);
  va_end(ap);
  return res;
 }

 int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list 
 args)
 {
  return trio_vsnprintf(buffer, bufferSize, format, args);
 }

 Continuing the build using 4.9.2 crashed again at the following point:

 gcc -std=gnu99 -m64 -I../include -I. -I../extra -DHAVE_CONFIG_H
 -DR_DLL_BUILD  -O3 -Wall -pedantic -mtune=core2   -c malloc.c -o
 malloc.o
 windres -F pe-x86-64  -I../include -i dllversion.rc -o dllversion.o
 gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o
 dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o
 psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o
 system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a
 ../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a
 ../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a
 ../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a
 ../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L.
 -lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv
 -lcomctl32 -lversion
 collect2.exe: error: ld returned 5 exit status
 Makefile:150: recipe for target 'R.dll' failed

Re: [Rd] unloadNamespace

2015-01-08 Thread Gabriel Becker
Paul,

My switchr package (https://github.com/gmbecker/switchr) has the
flushSession function which does what you want and seems to work (on my
test machine at least).

I havent tested it under a recent Rdevel, or with that specific package,
however I will soon, as the overarching model of switchr relies on this
working.

If you do try it before me with that package, please let me know whether it
works or not.

~G

On Thu, Jan 8, 2015 at 7:45 AM, Paul Gilbert pgilbert...@gmail.com wrote:

 In the documentation the closed thing I see to an explanation of this is
 that ?detach says Unloading some namespaces has undesirable side effects

 Can anyone explain why unloading tseries will load zoo? I don't think this
 behavior is specific to tseries, it's just an example. I realize one would
 not usually unload something that is not loaded, but I would expect it to
 do nothing or give an error. I only discovered this when trying to clean up
 to debug another problem.

 R version 3.1.2 (2014-10-31) -- Pumpkin Helmet
 and
 R Under development (unstable) (2015-01-02 r67308) -- Unsuffered
 Consequences
 ...
 Type 'q()' to quit R.

  loadedNamespaces()
 [1] base  datasets  graphics  grDevices methods   stats
 [7] utils
  unloadNamespace(tseries) # loads zoo ?
  loadedNamespaces()
  [1] base  datasets  graphics  grDevices grid lattice
  [7] methods   quadprog  stats utils zoo
 

 Somewhat related, is there an easy way to get back to a clean state for
 loaded and attached things, as if R had just been started? I'm trying to do
 this in a vignette so it is not easy to stop and restart R.

 Paul

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Gabriel Becker, PhD
Alumnus
Statistics Department
University of California, Davis

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] Announcing Docker containers for Bioconductor

2015-01-08 Thread Radhouane Aniba
Docker and some bioconductor packages were used long time ago at
CodersCrowd http://coderscrowd.com/ , good to see this move finally hit the
community, awesome move, keep it up

It will be great to support bioconductor users by using CodersCrowd as
well, reproducibility is also when you reproduce errors not just working
code

Cheers

Rad


On Thu, Jan 8, 2015 at 7:43 AM, Dan Tenenbaum dtene...@fredhutch.org
wrote:



 - Original Message -
  From: Elena Grassi grass...@gmail.com
  To: bioc-devel@r-project.org
  Sent: Thursday, January 8, 2015 5:11:50 AM
  Subject: Re: [Bioc-devel] Announcing Docker containers for Bioconductor
 
  Thanks, this is really useful and I was looking forward to it after
  having used rocker!
 
  I have a strange (ok, at least to me :) ) issue concerning volumes.
  On a machine (debian testing/unstable up to date) everything works
  smoothly when I do something like:
  data@decoder:~$ docker run -v
  /home/data/Dropbox/work/
  matrix/mr_bioc/matrix_rider/:/opt/matrix_rider
  -p 8787:8787 9f40f8036ad4
  (in /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/ I have a
  Rstudio project)

 What image is 9f40f8036ad4? Why are you not using the user/repository name
 for the image? Is it a rocker image or a Bioconductor image?



 
  The same command on another machine (more or less the same debian as
  before) leads me to being unable to open the project with Rstudio as
  long as the directory is not writeable.
  The only relevant difference seems to be my user UID which is 1000 on
  the first machine but different on the other one. I tried to use -e
  as
  suggested here
 
 https://github.com/rocker-org/rocker/wiki/Sharing-files-with-host-machine
  with no luck.
 
  Does anyone more docker-savy than me have a suggestion? (maybe the
  volumes approach is not the best one here).


 Note that Rstudio Server runs as a user called rstudio, not necessarily a
 privileged user.
 So the directory you are trying to mount should probably be writable (and
 readable) by all.
 so try

 chmod -R a+rw /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/

 Dan


 
  Thanks,
  E.
  p.s. sorry Dan for the first mail sent only to you
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
*Radhouane Aniba*
*Bioinformatics Scientist*
*BC Cancer Agency, Vancouver, Canada*

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] New version of Rtools for Windows

2015-01-08 Thread Hin-Tak Leung
Oh, I forgot to mention that besides setting AR, RANLIB and the stack probing 
fix, you also need a very up to date binutils. 2.25 was out in december. Even 
with that , if you linker's default is not what you are compiling for (i.e. a 
multiarch toolchain), you need to set GNUTARGET also, i.e. -m32/-m64 is not 
enough. Some fix to autodetect non-default targets went in after christmas 
before the new year, but I am not brave enough to try that on a daily basis yet 
(only tested it and reported it, then reverting the change - how gcc invokes 
the linker is rather complicated and it is not easy to have two binutils 
installed...)- setting GNUTARGET seems safer :-).
Whether you need that depends on whether you are compiling for your toolchain's 
default target architecture.

AR, RANLIB, GNUTARGET are all environment variables - you set them the usual 
way. The stack probing fix is for passing make check, when you finish make.

--
On Thu, Jan 8, 2015 6:14 PM GMT Avraham Adler wrote:

On Thu, Jan 8, 2015 at 10:48 AM, Hin-Tak Leung
ht...@users.sourceforge.net wrote:

 The r.dll crash is easy - you need to be using gcc-ar for ar, and gcc-ranlib 
 for ranlib. I also posted a patch to fix the check failure for stack 
 probing, as lto optimizes away the stack probing code, as it should.

 yes, lto build's speed gain is very impressive.



I apologize for my ignorance, but how would I do that? I tried by
changing the following in src/gnuwin32/MkRules.local:

# prefix for 64-bit: path or x86_64-w64-mingw32-
BINPREF64 = x86_64-w64-mingw32-gcc-

I added the gcc- as the suffix there, but I guess that is insufficient
as I still get the following error using 4.9.2:

windres -F pe-x86-64  -I../include -i dllversion.rc -o dllversion.o
gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o
dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o
psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o
system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a
../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a
../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a
../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a
../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L.
-lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv
-lcomctl32 -lversion
collect2.exe: error: ld returned 5 exit status
Makefile:150: recipe for target 'R.dll' failed
make[3]: *** [R.dll] Error 1
Makefile:179: recipe for target '../../bin/x64/R.dll' failed
make[2]: *** [../../bin/x64/R.dll] Error 2
Makefile:104: recipe for target 'rbuild' failed
make[1]: *** [rbuild] Error 2
Makefile:14: recipe for target 'all' failed
make: *** [all] Error 2

I still had to delete those lines in compat.c, so this build, were it
to have completed, is still subject to the non-conformance of
scientfic notation printing that was discussed earlier.

Hin-tak, any suggestions for this error (and the compat.c for that
matter) that you, or any reader of this list, may have would be
greatly appreciated.

Thank you!

Avi


 --
 On Thu, Jan 8, 2015 2:01 PM GMT Henric Winell wrote:

On 2015-01-08 14:18, Avraham Adler wrote:

 Very timely, as this is how I got into the problem I posted about
 earlier; maybe some of the problems I ran into will mean more to the
 you and the experts on this thread, Dr. Murdoch.For reference, I run
 Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2.

 As we discussed offline, Dr. Murdoch, I've been trying to build R
 using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen
 (rubenvb) told me he is no longer developing his own builds of GCC,
 but is focusing on MSYS2 and the mingw64 personal builds. So, similar
 to what Jeroen said, I first installed MSYS2, whose initial
 installation on windows is not so simple[1]. After the initial
 install, the following packages need to be manually installed: make,
 tar, zip, unzip, zlib, and rsync. I also installed base-devel, which
 is way more than necessary, but there may be packages in there which
 are necessary.

 I originally installed the most up-to-date version of GCC (4.9.2)[2],
 and I did pick the -seh version, as since I install (almost) all
 packages from source (the one exception being nloptr for now), the
 exception handling should be consistent and it is supposed to up to
 ~15% faster[3].

 The initial build crashed with the following error:

 gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H  -O3 -Wall
 -pedantic -mtune=core2   -c xmalloc.c -o xmalloc.o
 ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o
 tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o
 tre-mem.o tre-parse.o tre-stack.o xmalloc.o
 gcc -std=gnu99 -m64   -O3 -Wall -pedantic -mtune=core2   -c compat.c -o 
 compat.o
 compat.c:65:5: error: redefinition of 'snprintf'
   int snprintf(char *buffer, size_t max, const char *format, ...)
       ^
 In 

[Rd] Testing R packages on Solaris Studio

2015-01-08 Thread Jeroen Ooms
I have setup a Solaris server to test packages before submitting to
CRAN, in order to catch problems that might not reveal themselves on
Fedora, Debian, OSX or Windows. The machine runs a Solaris 11.2 vm
with Solaris Studio 12.3.

I was able to compile current r-devel using the suggested environment
variables from R Installation and Administration and:

  ./configure --prefix=/opt/R-devel --with-blas='-library=sunperf' --with-lapack

All works great (fast too), except for some CRAN packages with c++
code won't build. The compiler itself works, most packages (including
e.g. MCMCpack) build OK. However packages like Rcpp and RJSONIO fail
with errors shown here:
https://gist.github.com/jeroenooms/f1b6a172320a32f59c82.

I tried installing with GNU make, but that does not seem to be the problem

  configure.vars = MAKE=/opt/csw/bin/gmake

I am aware that I can work around it by compiling with gcc instead of
solaris studio, but I would specifically like to replicate the setup
from CRAN.

Which additional args/vars/dependencies do I need to make Rcpp and
RJSONIO build as they do on the CRAN Solaris server?

 sessionInfo()
R Under development (unstable) (2015-01-07 r67351)
Platform: i386-pc-solaris2.11 (32-bit)
Running under: Solaris 11

locale:
[1] C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tcltk_3.2.0 tools_3.2.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

2015-01-08 Thread Jeroen Ooms
On Thu, Jan 8, 2015 at 6:36 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 val - get(name, where, ..., value.if.not.found=NULL )   (*)

 That would be a bad idea, as it would change behaviour of existing uses of
 get().

Another approach would be if the not found behavior consists of a
callback, e.g. an expression or function:

  get(name, where, ..., not.found=stop(object , name,  not found))

This would cover the case of not.found=NULL, but also allows for
writing code with syntax similar to tryCatch

  obj - get(foo, not.found = someDefaultValue())

Not sure what this would do to performance though.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread peter dalgaard
If you look at the definition of %in%, you'll find that it is implemented using 
match, so if we did as you suggest, I give it about three days before someone 
suggests to inline the function call... Readability of source code is not 
usually our prime concern.

The  idea does have some merit, though. 

Apropos, why is there no setcontains()?

-pd

 On 06 Jan 2015, at 22:02 , Hervé Pagès hpa...@fredhutch.org wrote:
 
 Hi,
 
 Current implementation:
 
 setequal - function (x, y)
 {
  x - as.vector(x)
  y - as.vector(y)
  all(c(match(x, y, 0L)  0L, match(y, x, 0L)  0L))
 }
 
 First what about replacing 'match(x, y, 0L)  0L' and 'match(y, x, 0L)  0L'
 with 'x %in% y' and 'y %in% x', respectively. They're strictly
 equivalent but the latter form is a lot more readable than the former
 (isn't this the raison d'être of %in%?):
 
 setequal - function (x, y)
 {
  x - as.vector(x)
  y - as.vector(y)
  all(c(x %in% y, y %in% x))
 }
 
 Furthermore, replacing 'all(c(x %in% y, y %in x))' with
 'all(x %in% y)  all(y %in% x)' improves readability even more and,
 more importantly, reduces memory footprint significantly on big vectors
 (e.g. by 15% on integer vectors with 15M elements):
 
 setequal - function (x, y)
 {
  x - as.vector(x)
  y - as.vector(y)
  all(x %in% y)  all(y %in% x)
 }
 
 It also seems to speed up things a little bit (not in a significant
 way though).
 
 Cheers,
 H.
 
 -- 
 Hervé Pagès
 
 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024
 
 E-mail: hpa...@fredhutch.org
 Phone:  (206) 667-5791
 Fax:(206) 667-1319
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

2015-01-08 Thread luke-tierney

On Thu, 8 Jan 2015, Michael Lawrence wrote:


If we do add an argument to get(), then it should be named consistently
with the ifnotfound argument of mget(). As mentioned, the possibility of a
NULL value is problematic. One solution is a sentinel value that indicates
an unbound value (like R_UnboundValue).


A null default is fine -- it's a default; if it isn't right for a
particular case you can provide something else.



But another idea (and one pretty similar to John's) is to follow the SYMSXP
design at the C level, where there is a structure that points to the name
and a value. We already have SYMSXPs at the R level of course (name
objects) but they do not provide access to the value, which is typically
R_UnboundValue. But this does not even need to be implemented with SYMSXP.
The design would allow something like:

binding - getBinding(x, env)
if (hasValue(binding)) {
 x - value(binding) # throws an error if none
 message(name(binding), has value, x)
}

That I think it is a bit verbose but readable and could be made fast. And I
think binding objects would be useful in other ways, as they are
essentially a named object. For example, when iterating over an
environment.


This would need a lot more thought. Directly exposing the internals is
definitely not something we want to do as we may well want to change
that design. But there are lots of other corner issues that would have
to be thought through before going forward, such as what happens if an
rm occurs between obtaining a binding object and doing something with
it. Serialization would also need thinking through. This doesn't seem
like a worthwhile place to spend our efforts to me.

Adding getIfExists, or .get, or get0, or whatever seems fine. Adding
an argument to get() with missing giving current behavior may be OK
too. Rewriting exists and get as .Primitives may be sufficient though.

Best,

luke



Michael




On Thu, Jan 8, 2015 at 6:03 AM, John Nolan jpno...@american.edu wrote:


Adding an optional argument to get (and mget) like

val - get(name, where, ..., value.if.not.found=NULL )   (*)

would be useful for many.  HOWEVER, it is possible that there could be
some confusion here: (*) can give a NULL because either x exists and
has value NULL, or because x doesn't exist.   If that matters, the user
would need to be careful about specifying a value.if.not.found that cannot
be confused with a valid value of x.

To avoid this difficulty, perhaps we want both: have Martin's getifexists(
)
return a list with two values:
  - a boolean variable 'found'  # = value returned by exists( )
  - a variable 'value'

Then implement get( ) as:

get - function(x,...,value.if.not.found ) {

  if( missing(value.if.not.found) ) {
a - getifexists(x,... )
if (!a$found) error(x not found)
  } else {
a - getifexists(x,...,value.if.not.found )
  }
  return(a$value)
}

Note that value.if.not.found has no default value in above.
It behaves exactly like current get does if value.if.not.found
is not specified, and if it is specified, it would be faster
in the common situation mentioned below:
 if(exists(x,...)) { get(x,...) }

John

P.S. if you like dromedaries call it valueIfNotFound ...

 ..
 John P. Nolan
 Math/Stat Department
 227 Gray Hall,   American University
 4400 Massachusetts Avenue, NW
 Washington, DC 20016-8050

 jpno...@american.edu   voice: 202.885.3140
 web: academic2.american.edu/~jpnolan
 ..


-R-devel r-devel-boun...@r-project.org wrote: -
To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org
From: Duncan Murdoch
Sent by: R-devel
Date: 01/08/2015 06:39AM
Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}

On 08/01/2015 4:16 AM, Martin Maechler wrote:
 In November, we had a bug repository conversation
 with Peter Hagerty and myself:

   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065

 where the bug report title started with

  ---  exists is a bottleneck for dispatch and package loading, ...

 Peter proposed an extra simplified and henc faster version of exists(),
 and I commented

  --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch
---
  I'm very grateful that you've started exploring the bottlenecks of
loading
  packages with many S4 classes (and methods)...
  and I hope we can make real progress there rather sooner than
later.

  OTOH, your `summaryRprof()` in your vignette indicates that
exists() may use
  upto 10% of the time spent in library(reportingTools),  and your
speedup
  proposals of exist()  may go up to ca 30%  which is good and well
worth
  considering,  but still we can only expect 2-3% speedup for
package loading
  which unfortunately is not much.

  Still I agree it is worth looking at exists() as you did  ... and
  consider providing a fast simplified version of it in addition to
current