Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
Currently unique() does duplicated() internally and then extracts. One could make a countUnique that simply counts, rather than allocate the logical return value of duplicated(). But so much of the cost is in the hash operation that it probably won't help much, but that might depend on the sizes of things. The more unique elements, the better it would perform. On Thu, Jan 8, 2015 at 2:06 PM, Peter Haverty haverty.pe...@gene.com wrote: How about unique them both and compare the lengths? It's less work, especially allocation. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard pda...@gmail.com wrote: If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... Readability of source code is not usually our prime concern. The idea does have some merit, though. Apropos, why is there no setcontains()? -pd On 06 Jan 2015, at 22:02 , Hervé Pagès hpa...@fredhutch.org wrote: Hi, Current implementation: setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(match(x, y, 0L) 0L, match(y, x, 0L) 0L)) } First what about replacing 'match(x, y, 0L) 0L' and 'match(y, x, 0L) 0L' with 'x %in% y' and 'y %in% x', respectively. They're strictly equivalent but the latter form is a lot more readable than the former (isn't this the raison d'être of %in%?): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(x %in% y, y %in% x)) } Furthermore, replacing 'all(c(x %in% y, y %in x))' with 'all(x %in% y) all(y %in% x)' improves readability even more and, more importantly, reduces memory footprint significantly on big vectors (e.g. by 15% on integer vectors with 15M elements): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(x %in% y) all(y %in% x) } It also seems to speed up things a little bit (not in a significant way though). Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
I was thinking something like: setequal - function(x,y) { xu = unique(x) yu = unique(y) if (length(xu) != length(yu)) { return FALSE; } return (all( match( xu, yu, 0L ) 0L ) ) } This lets you fail early for cheap (skipping the allocation from the 0Ls). Whether or not this goes fast depends a lot on the uniqueness of x and y and whether or not you want to optimize for the TRUE or FALSE case. You'd do much better to make some real hashes in C and compare the keys, but it's probably not worth the complexity. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 2:06 PM, Peter Haverty phave...@gene.com wrote: How about unique them both and compare the lengths? It's less work, especially allocation. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard pda...@gmail.com wrote: If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... Readability of source code is not usually our prime concern. The idea does have some merit, though. Apropos, why is there no setcontains()? -pd On 06 Jan 2015, at 22:02 , Herv� Pag�s hpa...@fredhutch.org wrote: Hi, Current implementation: setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(match(x, y, 0L) 0L, match(y, x, 0L) 0L)) } First what about replacing 'match(x, y, 0L) 0L' and 'match(y, x, 0L) 0L' with 'x %in% y' and 'y %in% x', respectively. They're strictly equivalent but the latter form is a lot more readable than the former (isn't this the raison d'�tre of %in%?): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(x %in% y, y %in% x)) } Furthermore, replacing 'all(c(x %in% y, y %in x))' with 'all(x %in% y) all(y %in% x)' improves readability even more and, more importantly, reduces memory footprint significantly on big vectors (e.g. by 15% on integer vectors with 15M elements): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(x %in% y) all(y %in% x) } It also seems to speed up things a little bit (not in a significant way though). Cheers, H. -- Herv� Pag�s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
Try this out. It looks like a 2X speedup for some cases and a wash in others. unique does two allocations, but skipping the 0L allocation could make up for it. library(microbenchmark) library(RUnit) x = sample.int(1e4, 1e5, TRUE) y = sample.int(1e4, 1e5, TRUE) set_equal - function(x, y) { xu = .Internal(unique(x, FALSE, FALSE, NA)) yu = .Internal(unique(y, FALSE, FALSE, NA)) if (length(xu) != length(yu)) { return(FALSE); } return( all(match(xu, yu, 0L) 0L) ) } set_equal2 - function(x, y) { xu = .Internal(unique(x, FALSE, FALSE, NA)) yu = .Internal(unique(y, FALSE, FALSE, NA)) if (length(xu) != length(yu)) { return(FALSE); } return( !anyNA(match(xu, yu)) ) } microbenchmark( a = setequal(x, y), b = set_equal(x, y), c = set_equal2(x, y) ) checkIdentical(setequal(x, y), set_equal(x, y)) checkIdentical(setequal(x, y), set_equal2(x, y)) x = y microbenchmark( a = setequal(x, y), b = set_equal(x, y), c = set_equal2(x, y) ) checkIdentical(setequal(x, y), set_equal(x, y)) checkIdentical(setequal(x, y), set_equal2(x, y)) Sorry, I'm probably over-posting today. Regards, [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
On 01/08/2015 01:30 PM, peter dalgaard wrote: If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... But you wouldn't bet money on that right? Because you know you would loose. Readability of source code is not usually our prime concern. Don't sacrifice readability if you do not have a good reason for it. What's your reason here? Are you seriously suggesting that inlining makes a significant difference? As Michael pointed out, the expensive operation here is the hashing. But sadly some people like inlining and want to use it everywhere: it's easy and they feel good about it, even if it hurts readability and maintainability (if you use x %in% y instead of the inlined version, the day someone changes the implementation of x %in% y for something faster, or fixes a bug in it, your code will automatically benefit, right now it won't). More simply put: good readability generally leads to better code. The idea does have some merit, though. Apropos, why is there no setcontains()? Wait... shouldn't everybody use all(match(x, y, nomatch = 0L) 0L) ? H. -pd On 06 Jan 2015, at 22:02 , Hervé Pagès hpa...@fredhutch.org wrote: Hi, Current implementation: setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(match(x, y, 0L) 0L, match(y, x, 0L) 0L)) } First what about replacing 'match(x, y, 0L) 0L' and 'match(y, x, 0L) 0L' with 'x %in% y' and 'y %in% x', respectively. They're strictly equivalent but the latter form is a lot more readable than the former (isn't this the raison d'être of %in%?): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(x %in% y, y %in% x)) } Furthermore, replacing 'all(c(x %in% y, y %in x))' with 'all(x %in% y) all(y %in% x)' improves readability even more and, more importantly, reduces memory footprint significantly on big vectors (e.g. by 15% on integer vectors with 15M elements): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(x %in% y) all(y %in% x) } It also seems to speed up things a little bit (not in a significant way though). Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] New version of Rtools for Windows
Regarding the redefinition error, I've asked on StackOverflow for advice [1], but I have noticed the following; perhaps someone here can understand what changed between the stdio.h of 4.6.3 and the stdio.h of 4.8.4. In GCC 4.8.4, the section of stdio.h which is referenced in the errors is the following: #if !defined (__USE_MINGW_ANSI_STDIO) || __USE_MINGW_ANSI_STDIO == 0 /* this is here to deal with software defining * vsnprintf as _vsnprintf, eg. libxml2. */ #pragma push_macro(snprintf) #pragma push_macro(vsnprintf) # undef snprintf # undef vsnprintf int __cdecl __ms_vsnprintf(char * __restrict__ d,size_t n,const char * __restrict__ format,va_list arg) __MINGW_ATTRIB_DEPRECATED_MSVC2005 __MINGW_ATTRIB_DEPRECATED_SEC_WARN; __mingw_ovr __MINGW_ATTRIB_NONNULL(3) int vsnprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, va_list __local_argv) { return __ms_vsnprintf (__stream, __n, __format, __local_argv); } int __cdecl __ms_snprintf(char * __restrict__ s, size_t n, const char * __restrict__ format, ...); #ifndef __NO_ISOCEXT __mingw_ovr __MINGW_ATTRIB_NONNULL(3) int snprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, ...) { register int __retval; __builtin_va_list __local_argv; __builtin_va_start( __local_argv, __format ); __retval = __ms_vsnprintf (__stream, __n, __format, __local_argv); __builtin_va_end( __local_argv ); return __retval; } #endif /* !__NO_ISOCEXT */ #pragma pop_macro (vsnprintf) #pragma pop_macro (snprintf) #endif The corresponding section in 4.6.3 as found in the Rtools for Windows installation is: #if !defined (__USE_MINGW_ANSI_STDIO) || __USE_MINGW_ANSI_STDIO == 0 /* this is here to deal with software defining * vsnprintf as _vsnprintf, eg. libxml2. */ #pragma push_macro(snprintf) #pragma push_macro(vsnprintf) # undef snprintf # undef vsnprintf int __cdecl vsnprintf(char * __restrict__ d,size_t n,const char * __restrict__ format,va_list arg) __MINGW_ATTRIB_DEPRECATED_MSVC2005 __MINGW_ATTRIB_DEPRECATED_SEC_WARN; #ifndef __NO_ISOCEXT int __cdecl snprintf(char * __restrict__ s, size_t n, const char * __restrict__ format, ...); #ifndef __CRT__NO_INLINE __CRT_INLINE int __cdecl vsnprintf(char * __restrict__ d,size_t n,const char * __restrict__ format,va_list arg) { return _vsnprintf (d, n, format, arg); } #endif /* !__CRT__NO_INLINE */ #endif /* !__NO_ISOCEXT */ #pragma pop_macro (vsnprintf) #pragma pop_macro (snprintf) #endif The latter does not have a direct redefinition of the two functions. I still don't know why the #undef calls do not work [1]. Thank you, Avi [1] https://stackoverflow.com/questions/27853225/is-there-a-way-to-include-stdio-h-but-ignore-some-of-the-functions-therein On Thu, Jan 8, 2015 at 2:27 PM, Hin-Tak Leung ht...@users.sourceforge.net wrote: Oh, I forgot to mention that besides setting AR, RANLIB and the stack probing fix, you also need a very up to date binutils. 2.25 was out in december. Even with that , if you linker's default is not what you are compiling for (i.e. a multiarch toolchain), you need to set GNUTARGET also, i.e. -m32/-m64 is not enough. Some fix to autodetect non-default targets went in after christmas before the new year, but I am not brave enough to try that on a daily basis yet (only tested it and reported it, then reverting the change - how gcc invokes the linker is rather complicated and it is not easy to have two binutils installed...)- setting GNUTARGET seems safer :-). Whether you need that depends on whether you are compiling for your toolchain's default target architecture. AR, RANLIB, GNUTARGET are all environment variables - you set them the usual way. The stack probing fix is for passing make check, when you finish make. -- On Thu, Jan 8, 2015 6:14 PM GMT Avraham Adler wrote: On Thu, Jan 8, 2015 at 10:48 AM, Hin-Tak Leung ht...@users.sourceforge.net wrote: The r.dll crash is easy - you need to be using gcc-ar for ar, and gcc-ranlib for ranlib. I also posted a patch to fix the check failure for stack probing, as lto optimizes away the stack probing code, as it should. yes, lto build's speed gain is very impressive. I apologize for my ignorance, but how would I do that? I tried by changing the following in src/gnuwin32/MkRules.local: # prefix for 64-bit: path or x86_64-w64-mingw32- BINPREF64 = x86_64-w64-mingw32-gcc- I added the gcc- as the suffix there, but I guess that is insufficient as I still get the following error using 4.9.2: windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a ../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a
Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}
For what it's worth, I think we would need a new function if the default behavior changes. Since we already have get and mget, maybe cget for conditional get? if get, safe get, ... I like the idea of keeping the original not found behavior if the if.not.found arg is missing. However, it will be important to keep the number of arguments down. (I noticed that Martin's example lacks a frame argument.) I've heard rumors that there are plans to reduce the function call overhead, so perhaps this matters less now. I like Luke's idea of making exists/get/etc. .Primitives. I think that will be necessary in order to go fast. For my two cents, I also think get/assign should just be synonyms for the [[ .Primitive. That could actually simplify things a bit. One might add inherits=FALSE and if.not.found arguments to the environment [[ code, for example. Regards, Pete Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 11:57 AM, luke-tier...@uiowa.edu wrote: On Thu, 8 Jan 2015, Michael Lawrence wrote: If we do add an argument to get(), then it should be named consistently with the ifnotfound argument of mget(). As mentioned, the possibility of a NULL value is problematic. One solution is a sentinel value that indicates an unbound value (like R_UnboundValue). A null default is fine -- it's a default; if it isn't right for a particular case you can provide something else. But another idea (and one pretty similar to John's) is to follow the SYMSXP design at the C level, where there is a structure that points to the name and a value. We already have SYMSXPs at the R level of course (name objects) but they do not provide access to the value, which is typically R_UnboundValue. But this does not even need to be implemented with SYMSXP. The design would allow something like: binding - getBinding(x, env) if (hasValue(binding)) { x - value(binding) # throws an error if none message(name(binding), has value, x) } That I think it is a bit verbose but readable and could be made fast. And I think binding objects would be useful in other ways, as they are essentially a named object. For example, when iterating over an environment. This would need a lot more thought. Directly exposing the internals is definitely not something we want to do as we may well want to change that design. But there are lots of other corner issues that would have to be thought through before going forward, such as what happens if an rm occurs between obtaining a binding object and doing something with it. Serialization would also need thinking through. This doesn't seem like a worthwhile place to spend our efforts to me. Adding getIfExists, or .get, or get0, or whatever seems fine. Adding an argument to get() with missing giving current behavior may be OK too. Rewriting exists and get as .Primitives may be sufficient though. Best, luke Michael On Thu, Jan 8, 2015 at 6:03 AM, John Nolan jpno...@american.edu wrote: Adding an optional argument to get (and mget) like val - get(name, where, ..., value.if.not.found=NULL ) (*) would be useful for many. HOWEVER, it is possible that there could be some confusion here: (*) can give a NULL because either x exists and has value NULL, or because x doesn't exist. If that matters, the user would need to be careful about specifying a value.if.not.found that cannot be confused with a valid value of x. To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) return a list with two values: - a boolean variable 'found' # = value returned by exists( ) - a variable 'value' Then implement get( ) as: get - function(x,...,value.if.not.found ) { if( missing(value.if.not.found) ) { a - getifexists(x,... ) if (!a$found) error(x not found) } else { a - getifexists(x,...,value.if.not.found ) } return(a$value) } Note that value.if.not.found has no default value in above. It behaves exactly like current get does if value.if.not.found is not specified, and if it is specified, it would be faster in the common situation mentioned below: if(exists(x,...)) { get(x,...) } John P.S. if you like dromedaries call it valueIfNotFound ... .. John P. Nolan Math/Stat Department 227 Gray Hall, American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpno...@american.edu voice: 202.885.3140 web: academic2.american.edu/~jpnolan .. -R-devel r-devel-boun...@r-project.org wrote: - To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org From: Duncan Murdoch Sent by: R-devel Date: 01/08/2015 06:39AM Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...} On 08/01/2015 4:16 AM, Martin Maechler wrote: In November, we had a bug
Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}
Michael's idea has an interesting bonus that he and I discussed earlier. It would be very convenient to have a container of key/value pairs. I imagine many people often write this: x - mapply( names(x), x, FUN=function(k,v) { # work with key and value } especially ex perl people accustomed to while ( ($key, $value) = each( some_hash ) { } Perhaps there is room for additional discussion of using lists of SYMSXPs in this manner. (If SYMSXPs are not that safe, perhaps a looping construct for named vectors that gave the illusion iterating over a list of two-tuples.) Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 11:57 AM, luke-tier...@uiowa.edu wrote: On Thu, 8 Jan 2015, Michael Lawrence wrote: If we do add an argument to get(), then it should be named consistently with the ifnotfound argument of mget(). As mentioned, the possibility of a NULL value is problematic. One solution is a sentinel value that indicates an unbound value (like R_UnboundValue). A null default is fine -- it's a default; if it isn't right for a particular case you can provide something else. But another idea (and one pretty similar to John's) is to follow the SYMSXP design at the C level, where there is a structure that points to the name and a value. We already have SYMSXPs at the R level of course (name objects) but they do not provide access to the value, which is typically R_UnboundValue. But this does not even need to be implemented with SYMSXP. The design would allow something like: binding - getBinding(x, env) if (hasValue(binding)) { x - value(binding) # throws an error if none message(name(binding), has value, x) } That I think it is a bit verbose but readable and could be made fast. And I think binding objects would be useful in other ways, as they are essentially a named object. For example, when iterating over an environment. This would need a lot more thought. Directly exposing the internals is definitely not something we want to do as we may well want to change that design. But there are lots of other corner issues that would have to be thought through before going forward, such as what happens if an rm occurs between obtaining a binding object and doing something with it. Serialization would also need thinking through. This doesn't seem like a worthwhile place to spend our efforts to me. Adding getIfExists, or .get, or get0, or whatever seems fine. Adding an argument to get() with missing giving current behavior may be OK too. Rewriting exists and get as .Primitives may be sufficient though. Best, luke Michael On Thu, Jan 8, 2015 at 6:03 AM, John Nolan jpno...@american.edu wrote: Adding an optional argument to get (and mget) like val - get(name, where, ..., value.if.not.found=NULL ) (*) would be useful for many. HOWEVER, it is possible that there could be some confusion here: (*) can give a NULL because either x exists and has value NULL, or because x doesn't exist. If that matters, the user would need to be careful about specifying a value.if.not.found that cannot be confused with a valid value of x. To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) return a list with two values: - a boolean variable 'found' # = value returned by exists( ) - a variable 'value' Then implement get( ) as: get - function(x,...,value.if.not.found ) { if( missing(value.if.not.found) ) { a - getifexists(x,... ) if (!a$found) error(x not found) } else { a - getifexists(x,...,value.if.not.found ) } return(a$value) } Note that value.if.not.found has no default value in above. It behaves exactly like current get does if value.if.not.found is not specified, and if it is specified, it would be faster in the common situation mentioned below: if(exists(x,...)) { get(x,...) } John P.S. if you like dromedaries call it valueIfNotFound ... .. John P. Nolan Math/Stat Department 227 Gray Hall, American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpno...@american.edu voice: 202.885.3140 web: academic2.american.edu/~jpnolan .. -R-devel r-devel-boun...@r-project.org wrote: - To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org From: Duncan Murdoch Sent by: R-devel Date: 01/08/2015 06:39AM Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...} On 08/01/2015 4:16 AM, Martin Maechler wrote: In November, we had a bug repository conversation with Peter Hagerty and myself: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 where the bug report title started with --- exists is a bottleneck for dispatch and package loading, ... Peter proposed an extra simplified and henc faster version of
Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}
On Thu, Jan 8, 2015 at 11:57 AM, luke-tier...@uiowa.edu wrote: On Thu, 8 Jan 2015, Michael Lawrence wrote: If we do add an argument to get(), then it should be named consistently with the ifnotfound argument of mget(). As mentioned, the possibility of a NULL value is problematic. One solution is a sentinel value that indicates an unbound value (like R_UnboundValue). A null default is fine -- it's a default; if it isn't right for a particular case you can provide something else. But another idea (and one pretty similar to John's) is to follow the SYMSXP design at the C level, where there is a structure that points to the name and a value. We already have SYMSXPs at the R level of course (name objects) but they do not provide access to the value, which is typically R_UnboundValue. But this does not even need to be implemented with SYMSXP. The design would allow something like: binding - getBinding(x, env) if (hasValue(binding)) { x - value(binding) # throws an error if none message(name(binding), has value, x) } That I think it is a bit verbose but readable and could be made fast. And I think binding objects would be useful in other ways, as they are essentially a named object. For example, when iterating over an environment. This would need a lot more thought. Directly exposing the internals is definitely not something we want to do as we may well want to change that design. But there are lots of other corner issues that would have to be thought through before going forward, such as what happens if an rm occurs between obtaining a binding object and doing something with it. Serialization would also need thinking through. This doesn't seem like a worthwhile place to spend our efforts to me. Just wanted to be clear that I was not suggesting to expose any internals. We could implement the behavior using SYMSXP, or not. Nor would the binding need to be mutable. The binding would be considered independent of the environment from which it was retrieved. As Pete has mentioned, it could be a useful abstraction to have in general. Adding getIfExists, or .get, or get0, or whatever seems fine. Adding an argument to get() with missing giving current behavior may be OK too. Rewriting exists and get as .Primitives may be sufficient though. Best, luke Michael On Thu, Jan 8, 2015 at 6:03 AM, John Nolan jpno...@american.edu wrote: Adding an optional argument to get (and mget) like val - get(name, where, ..., value.if.not.found=NULL ) (*) would be useful for many. HOWEVER, it is possible that there could be some confusion here: (*) can give a NULL because either x exists and has value NULL, or because x doesn't exist. If that matters, the user would need to be careful about specifying a value.if.not.found that cannot be confused with a valid value of x. To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) return a list with two values: - a boolean variable 'found' # = value returned by exists( ) - a variable 'value' Then implement get( ) as: get - function(x,...,value.if.not.found ) { if( missing(value.if.not.found) ) { a - getifexists(x,... ) if (!a$found) error(x not found) } else { a - getifexists(x,...,value.if.not.found ) } return(a$value) } Note that value.if.not.found has no default value in above. It behaves exactly like current get does if value.if.not.found is not specified, and if it is specified, it would be faster in the common situation mentioned below: if(exists(x,...)) { get(x,...) } John P.S. if you like dromedaries call it valueIfNotFound ... .. John P. Nolan Math/Stat Department 227 Gray Hall, American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpno...@american.edu voice: 202.885.3140 web: academic2.american.edu/~jpnolan .. -R-devel r-devel-boun...@r-project.org wrote: - To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org From: Duncan Murdoch Sent by: R-devel Date: 01/08/2015 06:39AM Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...} On 08/01/2015 4:16 AM, Martin Maechler wrote: In November, we had a bug repository conversation with Peter Hagerty and myself: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 where the bug report title started with --- exists is a bottleneck for dispatch and package loading, ... Peter proposed an extra simplified and henc faster version of exists(), and I commented --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch --- I'm very grateful that you've started exploring the bottlenecks of loading packages with many S4 classes (and methods)... and I hope we can make real progress there rather sooner than later.
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
why is there no setcontains()? Several packages define is.subset(), which I am assuming is what you are proposing, but it its arguments reversed. E.g., package:algstat has is.subset - function(x, y) all(x %in% y) containsQ - function(y, x) all(x %in% y) and package:rje has essentially the same is.subset. package:arulesSequences and package:arules have an S4 generic called is.subset, which is entirely different (it is not a predicate, but returns a matrix). Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard pda...@gmail.com wrote: If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... Readability of source code is not usually our prime concern. The idea does have some merit, though. Apropos, why is there no setcontains()? -pd On 06 Jan 2015, at 22:02 , Hervé Pagès hpa...@fredhutch.org wrote: Hi, Current implementation: setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(match(x, y, 0L) 0L, match(y, x, 0L) 0L)) } First what about replacing 'match(x, y, 0L) 0L' and 'match(y, x, 0L) 0L' with 'x %in% y' and 'y %in% x', respectively. They're strictly equivalent but the latter form is a lot more readable than the former (isn't this the raison d'être of %in%?): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(x %in% y, y %in% x)) } Furthermore, replacing 'all(c(x %in% y, y %in x))' with 'all(x %in% y) all(y %in% x)' improves readability even more and, more importantly, reduces memory footprint significantly on big vectors (e.g. by 15% on integer vectors with 15M elements): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(x %in% y) all(y %in% x) } It also seems to speed up things a little bit (not in a significant way though). Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
How about unique them both and compare the lengths? It's less work, especially allocation. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard pda...@gmail.com wrote: If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... Readability of source code is not usually our prime concern. The idea does have some merit, though. Apropos, why is there no setcontains()? -pd On 06 Jan 2015, at 22:02 , Herv� Pag�s hpa...@fredhutch.org wrote: Hi, Current implementation: setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(match(x, y, 0L) 0L, match(y, x, 0L) 0L)) } First what about replacing 'match(x, y, 0L) 0L' and 'match(y, x, 0L) 0L' with 'x %in% y' and 'y %in% x', respectively. They're strictly equivalent but the latter form is a lot more readable than the former (isn't this the raison d'�tre of %in%?): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(x %in% y, y %in% x)) } Furthermore, replacing 'all(c(x %in% y, y %in x))' with 'all(x %in% y) all(y %in% x)' improves readability even more and, more importantly, reduces memory footprint significantly on big vectors (e.g. by 15% on integer vectors with 15M elements): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(x %in% y) all(y %in% x) } It also seems to speed up things a little bit (not in a significant way though). Cheers, H. -- Herv� Pag�s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}
On 08/01/2015 4:16 AM, Martin Maechler wrote: In November, we had a bug repository conversation with Peter Hagerty and myself: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 where the bug report title started with --- exists is a bottleneck for dispatch and package loading, ... Peter proposed an extra simplified and henc faster version of exists(), and I commented --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch --- I'm very grateful that you've started exploring the bottlenecks of loading packages with many S4 classes (and methods)... and I hope we can make real progress there rather sooner than later. OTOH, your `summaryRprof()` in your vignette indicates that exists() may use upto 10% of the time spent in library(reportingTools), and your speedup proposals of exist() may go up to ca 30% which is good and well worth considering, but still we can only expect 2-3% speedup for package loading which unfortunately is not much. Still I agree it is worth looking at exists() as you did ... and consider providing a fast simplified version of it in addition to current exists() [I think]. BTW, as we talk about enhancements here, maybe consider a further possibility: My subjective guess is that probably more than half of exists() uses are of the form if(exists(name, where, ...)) { get(name, whare, ) .. } else { NULL / error() / .. or similar } i.e. many exists() calls when returning TRUE are immediately followed by the corresponding get() call which repeats quite a bit of the lookup that exists() has done. Instead, I'd imagine a function, say getifexists(name, ...) that does both at once in the exists is TRUE case but in a way we can easily keep the if(.) .. else clause above. One already existing approach would use if(!inherits(tryCatch(xx - get(name, where, ...), error=function(e)e), error)) { ... (( work with xx )) ... } else { NULL / error() / .. or similar } but of course our C implementation would be more efficient and use more concise syntax {which should not look like error handling}. Follow ups to this idea should really go to R-devel (the mailing list). and now I do follow up here myself : I found that 'getifexists()' is actually very simple to implement, I have already tested it a bit, but not yet committed to R-devel (the R trunk aka master branch) because I'd like to get public comments {RFC := Request For Comments}. I don't like the name -- I'd prefer getIfExists. As Baath (2012, R Journal) pointed out, R names are very inconsistent in naming conventions, but lowerCamelCase is the most common choice. Second most common is period.separated, so an argument could be made for get.if.exists, but there's still the possibility of confusion with S3 methods, and users of other languages where . is an operator find it a little strange. If you don't like lowerCamelCase (and a lot of people don't), then I think underscore_separated is the next best choice, so would use get_if_exists. Another possibility is to make no new name at all, and just add an optional parameter to get() (which if present acts as your value.if.not parameter, if not present keeps the current object not found error). Duncan Murdoch My version of the help file {for both exists() and getifexists()} rendered in text is -- help(getifexists) --- Is an Object Defined? Description: Look for an R object of the given name and possibly return it Usage: exists(x, where = -1, envir = , frame, mode = any, inherits = TRUE) getifexists(x, where = -1, envir = as.environment(where), mode = any, inherits = TRUE, value.if.not = NULL) Arguments: x: a variable name (given as a character string). where: where to look for the object (see the details section); if omitted, the function will search as if the name of the object appeared unquoted in an expression. envir: an alternative way to specify an environment to look in, but it is usually simpler to just use the ‘where’ argument. frame: a frame in the calling list. Equivalent to giving ‘where’ as ‘sys.frame(frame)’. mode: the mode or type of object sought: see the ‘Details’ section. inherits: should the enclosing frames of the environment be searched? value.if.not: the return value of ‘getifexists(x, *)’ when ‘x’ does not exist. Details: The ‘where’ argument can specify the environment in which to look for the object in any of several ways: as an integer (the position in the ‘search’ list); as the character string name
Re: [Rd] gsub with perl=TRUE results in 'this version of PCRE is not compiled with Unicode property support' in R-devel
Why are you reporting that your PCRE library does not have something which the R-admin manual says it should preferably have? To wit, footnote 37 says 'and not PCRE2, which started at version 10.0. PCRE must be built with UTF-8 support (not the default) and support for Unicode properties is assumed by some R packages. Neither are tested by configure. JIT support is desirable.' That certainly does not fail on my Linux, Windows and OS X builds of R-devel. (Issues about pre-built binaries, if that is what you used, should be reported to their maintainers, not here.) And the help does say in ?regex In UTF-8 mode, some Unicode properties may be supported via ‘\p{xx}’ and ‘\P{xx}’ which match characters with and without property ‘xx’ respectively. Note the 'may'. On 07/01/2015 23:25, Dan Tenenbaum wrote: The following code: res - gsub((*UCP)\\b(i)\\b, , nhgrimelanomaclass, perl = TRUE) results in: Error in gsub(sprintf((*UCP)\\b(%s)\\b, i), , nhgrimelanomaclass, : invalid regular expression '(*UCP)\b(i)\b' In addition: Warning message: In gsub(sprintf((*UCP)\\b(%s)\\b, i), , nhgrimelanomaclass, : PCRE pattern compilation error 'this version of PCRE is not compiled with Unicode property support' at '(*UCP)\b(i)\b' on R Under development (unstable) (2015-01-01 r67290) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.9.5 (Mavericks) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base And also on the same version of R-devel on Snow Leopard, Windows, and Linux. But it does not produce an error on R version 3.1.2 (2014-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base Dan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] On base::rank
Have a look at the following, taken from base::rank: ... if (!is.na(na.last) any(nas)) { yy - integer(length(x)) # ~ storage.mode(yy) - storage.mode(y) # yy - NA NAkeep - (na.last == keep) if (NAkeep || na.last) { yy[!nas] - y if (!NAkeep) yy[nas] - (length(y) + 1L):length(yy) } ... Alternatively, look at lines 36 and 37 here: https://github.com/wch/r-source/blob/fbf5cdf29d923395b537a9893f46af1aa75e38f3/src/library/base/R/rank.R#L36 There seems to be no need for those lines, IIUC. Isn't it? 'yy' is replaced with NA in the ver next line. Best, Arun. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Bioc-devel] Announcing Docker containers for Bioconductor
Thanks, this is really useful and I was looking forward to it after having used rocker! I have a strange (ok, at least to me :) ) issue concerning volumes. On a machine (debian testing/unstable up to date) everything works smoothly when I do something like: data@decoder:~$ docker run -v /home/data/Dropbox/work/ matrix/mr_bioc/matrix_rider/:/opt/matrix_rider -p 8787:8787 9f40f8036ad4 (in /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/ I have a Rstudio project) The same command on another machine (more or less the same debian as before) leads me to being unable to open the project with Rstudio as long as the directory is not writeable. The only relevant difference seems to be my user UID which is 1000 on the first machine but different on the other one. I tried to use -e as suggested here https://github.com/rocker-org/rocker/wiki/Sharing-files-with-host-machine with no luck. Does anyone more docker-savy than me have a suggestion? (maybe the volumes approach is not the best one here). Thanks, E. p.s. sorry Dan for the first mail sent only to you ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Rd] RFC: getifexists() {was [Bug 16065] exists ...}
In November, we had a bug repository conversation with Peter Hagerty and myself: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 where the bug report title started with --- exists is a bottleneck for dispatch and package loading, ... Peter proposed an extra simplified and henc faster version of exists(), and I commented --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch --- I'm very grateful that you've started exploring the bottlenecks of loading packages with many S4 classes (and methods)... and I hope we can make real progress there rather sooner than later. OTOH, your `summaryRprof()` in your vignette indicates that exists() may use upto 10% of the time spent in library(reportingTools), and your speedup proposals of exist() may go up to ca 30% which is good and well worth considering, but still we can only expect 2-3% speedup for package loading which unfortunately is not much. Still I agree it is worth looking at exists() as you did ... and consider providing a fast simplified version of it in addition to current exists() [I think]. BTW, as we talk about enhancements here, maybe consider a further possibility: My subjective guess is that probably more than half of exists() uses are of the form if(exists(name, where, ...)) { get(name, whare, ) .. } else { NULL / error() / .. or similar } i.e. many exists() calls when returning TRUE are immediately followed by the corresponding get() call which repeats quite a bit of the lookup that exists() has done. Instead, I'd imagine a function, say getifexists(name, ...) that does both at once in the exists is TRUE case but in a way we can easily keep the if(.) .. else clause above. One already existing approach would use if(!inherits(tryCatch(xx - get(name, where, ...), error=function(e)e), error)) { ... (( work with xx )) ... } else { NULL / error() / .. or similar } but of course our C implementation would be more efficient and use more concise syntax {which should not look like error handling}. Follow ups to this idea should really go to R-devel (the mailing list). and now I do follow up here myself : I found that 'getifexists()' is actually very simple to implement, I have already tested it a bit, but not yet committed to R-devel (the R trunk aka master branch) because I'd like to get public comments {RFC := Request For Comments}. My version of the help file {for both exists() and getifexists()} rendered in text is -- help(getifexists) --- Is an Object Defined? Description: Look for an R object of the given name and possibly return it Usage: exists(x, where = -1, envir = , frame, mode = any, inherits = TRUE) getifexists(x, where = -1, envir = as.environment(where), mode = any, inherits = TRUE, value.if.not = NULL) Arguments: x: a variable name (given as a character string). where: where to look for the object (see the details section); if omitted, the function will search as if the name of the object appeared unquoted in an expression. envir: an alternative way to specify an environment to look in, but it is usually simpler to just use the ‘where’ argument. frame: a frame in the calling list. Equivalent to giving ‘where’ as ‘sys.frame(frame)’. mode: the mode or type of object sought: see the ‘Details’ section. inherits: should the enclosing frames of the environment be searched? value.if.not: the return value of ‘getifexists(x, *)’ when ‘x’ does not exist. Details: The ‘where’ argument can specify the environment in which to look for the object in any of several ways: as an integer (the position in the ‘search’ list); as the character string name of an element in the search list; or as an ‘environment’ (including using ‘sys.frame’ to access the currently active function calls). The ‘envir’ argument is an alternative way to specify an environment, but is primarily there for back compatibility. This function looks to see if the name ‘x’ has a value bound to it in the specified environment. If ‘inherits’ is ‘TRUE’ and a value is not found for ‘x’ in the specified environment, the enclosing frames of the environment are searched until the name ‘x’ is encountered. See ‘environment’ and the ‘R Language Definition’ manual for details about the structure of environments and their enclosures. *Warning:* ‘inherits = TRUE’ is the default behaviour for R but not for S. If ‘mode’ is specified then only objects of that type are sought. The ‘mode’ may specify one of the collections ‘numeric’ and ‘function’ (see ‘mode’): any
Re: [Rd] New version of Rtools for Windows
Very timely, as this is how I got into the problem I posted about earlier; maybe some of the problems I ran into will mean more to the you and the experts on this thread, Dr. Murdoch.For reference, I run Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2. As we discussed offline, Dr. Murdoch, I've been trying to build R using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen (rubenvb) told me he is no longer developing his own builds of GCC, but is focusing on MSYS2 and the mingw64 personal builds. So, similar to what Jeroen said, I first installed MSYS2, whose initial installation on windows is not so simple[1]. After the initial install, the following packages need to be manually installed: make, tar, zip, unzip, zlib, and rsync. I also installed base-devel, which is way more than necessary, but there may be packages in there which are necessary. I originally installed the most up-to-date version of GCC (4.9.2)[2], and I did pick the -seh version, as since I install (almost) all packages from source (the one exception being nloptr for now), the exception handling should be consistent and it is supposed to up to ~15% faster[3]. The initial build crashed with the following error: gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H -O3 -Wall -pedantic -mtune=core2 -c xmalloc.c -o xmalloc.o ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o tre-mem.o tre-parse.o tre-stack.o xmalloc.o gcc -std=gnu99 -m64 -O3 -Wall -pedantic -mtune=core2 -c compat.c -o compat.o compat.c:65:5: error: redefinition of 'snprintf' int snprintf(char *buffer, size_t max, const char *format, ...) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous definition of 'snprintf' was here int snprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, ...) ^ compat.c:75:5: error: redefinition of 'vsnprintf' int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous definition of 'vsnprintf' was here int vsnprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, va_list __local_argv) ^ ../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed make[4]: *** [compat.o] Error 1 Makefile:120: recipe for target 'rlibs' failed make[3]: *** [rlibs] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 After doing some checking (for example see [4]), I asked Duncan about the problem, and he suggested moving the #ifndef _W64 in compat.c up above the offending lines (65-75). That did not work, so, I figured (it seems mistakenly from the other thread) that if those functions are included from stdio already, I can just delete them from compat.c. The specific lines are: int snprintf(char *buffer, size_t max, const char *format, ...) { int res; va_list(ap); va_start(ap, format); res = trio_vsnprintf(buffer, max, format, ap); va_end(ap); return res; } int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args) { return trio_vsnprintf(buffer, bufferSize, format, args); } Continuing the build using 4.9.2 crashed again at the following point: gcc -std=gnu99 -m64 -I../include -I. -I../extra -DHAVE_CONFIG_H -DR_DLL_BUILD -O3 -Wall -pedantic -mtune=core2 -c malloc.c -o malloc.o windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a ../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a ../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a ../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a ../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L. -lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv -lcomctl32 -lversion collect2.exe: error: ld returned 5 exit status Makefile:150: recipe for target 'R.dll' failed make[3]: *** [R.dll] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 As all those files existed in their correct places, the only reason I could think of that this would fail here is that GCC version 4.9 did make some changes to enhance link-time optimization [5], and probably something isn't
Re: [Rd] New version of Rtools for Windows
On 2015-01-08 14:18, Avraham Adler wrote: Very timely, as this is how I got into the problem I posted about earlier; maybe some of the problems I ran into will mean more to the you and the experts on this thread, Dr. Murdoch.For reference, I run Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2. As we discussed offline, Dr. Murdoch, I've been trying to build R using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen (rubenvb) told me he is no longer developing his own builds of GCC, but is focusing on MSYS2 and the mingw64 personal builds. So, similar to what Jeroen said, I first installed MSYS2, whose initial installation on windows is not so simple[1]. After the initial install, the following packages need to be manually installed: make, tar, zip, unzip, zlib, and rsync. I also installed base-devel, which is way more than necessary, but there may be packages in there which are necessary. I originally installed the most up-to-date version of GCC (4.9.2)[2], and I did pick the -seh version, as since I install (almost) all packages from source (the one exception being nloptr for now), the exception handling should be consistent and it is supposed to up to ~15% faster[3]. The initial build crashed with the following error: gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H -O3 -Wall -pedantic -mtune=core2 -c xmalloc.c -o xmalloc.o ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o tre-mem.o tre-parse.o tre-stack.o xmalloc.o gcc -std=gnu99 -m64 -O3 -Wall -pedantic -mtune=core2 -c compat.c -o compat.o compat.c:65:5: error: redefinition of 'snprintf' int snprintf(char *buffer, size_t max, const char *format, ...) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous definition of 'snprintf' was here int snprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, ...) ^ compat.c:75:5: error: redefinition of 'vsnprintf' int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous definition of 'vsnprintf' was here int vsnprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, va_list __local_argv) ^ ../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed make[4]: *** [compat.o] Error 1 Makefile:120: recipe for target 'rlibs' failed make[3]: *** [rlibs] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 After doing some checking (for example see [4]), I asked Duncan about the problem, and he suggested moving the #ifndef _W64 in compat.c up above the offending lines (65-75). That did not work, so, I figured (it seems mistakenly from the other thread) that if those functions are included from stdio already, I can just delete them from compat.c. The specific lines are: int snprintf(char *buffer, size_t max, const char *format, ...) { int res; va_list(ap); va_start(ap, format); res = trio_vsnprintf(buffer, max, format, ap); va_end(ap); return res; } int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args) { return trio_vsnprintf(buffer, bufferSize, format, args); } Continuing the build using 4.9.2 crashed again at the following point: gcc -std=gnu99 -m64 -I../include -I. -I../extra -DHAVE_CONFIG_H -DR_DLL_BUILD -O3 -Wall -pedantic -mtune=core2 -c malloc.c -o malloc.o windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a ../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a ../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a ../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a ../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L. -lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv -lcomctl32 -lversion collect2.exe: error: ld returned 5 exit status Makefile:150: recipe for target 'R.dll' failed make[3]: *** [R.dll] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 As all those files existed in their correct places, the only reason I could think of that this would fail here is that GCC version 4.9 did make some changes to enhance
Re: [Rd] On base::rank
Arunkumar Srinivasan arunkumar.sri...@gmail.com on Thu, 8 Jan 2015 13:46:57 +0100 writes: Have a look at the following, taken from base::rank: ... if (!is.na(na.last) any(nas)) { yy - integer(length(x)) # ~ storage.mode(yy) - storage.mode(y) # yy - NA NAkeep - (na.last == keep) if (NAkeep || na.last) { yy[!nas] - y if (!NAkeep) yy[nas] - (length(y) + 1L):length(yy) } ... Alternatively, look at lines 36 and 37 here: https://github.com/wch/r-source/blob/fbf5cdf29d923395b537a9893f46af1aa75e38f3/src/library/base/R/rank.R#L36 There seems to be no need for those lines, IIUC. Isn't it? 'yy' is replaced with NA in the ver next line. Indeed. Interesting that nobody has noticed till now, even though that part has been world readable since at least 2008-08-25. Note that the R source code is at http://svn.r-project.org/R/ and the file in question at http://svn.r-project.org/R/trunk/src/library/base/R/rank.R where you can already see the new code (given that 'x' was no longer needed, there's no need for 'xx'). Martin Maechler, ETH Zurich __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] New version of Rtools for Windows
On Thu, Jan 8, 2015 at 10:48 AM, Hin-Tak Leung ht...@users.sourceforge.net wrote: The r.dll crash is easy - you need to be using gcc-ar for ar, and gcc-ranlib for ranlib. I also posted a patch to fix the check failure for stack probing, as lto optimizes away the stack probing code, as it should. yes, lto build's speed gain is very impressive. I apologize for my ignorance, but how would I do that? I tried by changing the following in src/gnuwin32/MkRules.local: # prefix for 64-bit: path or x86_64-w64-mingw32- BINPREF64 = x86_64-w64-mingw32-gcc- I added the gcc- as the suffix there, but I guess that is insufficient as I still get the following error using 4.9.2: windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a ../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a ../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a ../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a ../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L. -lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv -lcomctl32 -lversion collect2.exe: error: ld returned 5 exit status Makefile:150: recipe for target 'R.dll' failed make[3]: *** [R.dll] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 I still had to delete those lines in compat.c, so this build, were it to have completed, is still subject to the non-conformance of scientfic notation printing that was discussed earlier. Hin-tak, any suggestions for this error (and the compat.c for that matter) that you, or any reader of this list, may have would be greatly appreciated. Thank you! Avi -- On Thu, Jan 8, 2015 2:01 PM GMT Henric Winell wrote: On 2015-01-08 14:18, Avraham Adler wrote: Very timely, as this is how I got into the problem I posted about earlier; maybe some of the problems I ran into will mean more to the you and the experts on this thread, Dr. Murdoch.For reference, I run Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2. As we discussed offline, Dr. Murdoch, I've been trying to build R using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen (rubenvb) told me he is no longer developing his own builds of GCC, but is focusing on MSYS2 and the mingw64 personal builds. So, similar to what Jeroen said, I first installed MSYS2, whose initial installation on windows is not so simple[1]. After the initial install, the following packages need to be manually installed: make, tar, zip, unzip, zlib, and rsync. I also installed base-devel, which is way more than necessary, but there may be packages in there which are necessary. I originally installed the most up-to-date version of GCC (4.9.2)[2], and I did pick the -seh version, as since I install (almost) all packages from source (the one exception being nloptr for now), the exception handling should be consistent and it is supposed to up to ~15% faster[3]. The initial build crashed with the following error: gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H -O3 -Wall -pedantic -mtune=core2 -c xmalloc.c -o xmalloc.o ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o tre-mem.o tre-parse.o tre-stack.o xmalloc.o gcc -std=gnu99 -m64 -O3 -Wall -pedantic -mtune=core2 -c compat.c -o compat.o compat.c:65:5: error: redefinition of 'snprintf' int snprintf(char *buffer, size_t max, const char *format, ...) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous definition of 'snprintf' was here int snprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, ...) ^ compat.c:75:5: error: redefinition of 'vsnprintf' int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous definition of 'vsnprintf' was here int vsnprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, va_list __local_argv) ^ ../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed make[4]: *** [compat.o] Error 1 Makefile:120: recipe for target 'rlibs' failed make[3]: *** [rlibs] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild]
Re: [Bioc-devel] Announcing Docker containers for Bioconductor
- Original Message - From: Elena Grassi grass...@gmail.com To: Dan Tenenbaum dtene...@fredhutch.org Cc: bioc-devel@r-project.org Sent: Thursday, January 8, 2015 8:45:37 AM Subject: Re: [Bioc-devel] Announcing Docker containers for Bioconductor On Thu, Jan 8, 2015 at 4:43 PM, Dan Tenenbaum dtene...@fredhutch.org wrote: I have a strange (ok, at least to me :) ) issue concerning volumes. On a machine (debian testing/unstable up to date) everything works smoothly when I do something like: data@decoder:~$ docker run -v /home/data/Dropbox/work/ matrix/mr_bioc/matrix_rider/:/opt/matrix_rider -p 8787:8787 9f40f8036ad4 (in /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/ I have a Rstudio project) What image is 9f40f8036ad4? Why are you not using the user/repository name for the image? Is it a rocker image or a Bioconductor image? Sorry! It's bioconductor/devel_sequencing image that I had pulled before: data@decoder:~/Dropbox/work/matrix/matrix_rider$ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE bioconductor/devel_sequencing latest 9f40f8036ad4 29 hours ago5.715 GB Note that Rstudio Server runs as a user called rstudio, not necessarily a privileged user. So the directory you are trying to mount should probably be writable (and readable) by all. so try chmod -R a+rw /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/ On the second machine this works but on this machine I do not love having this kind of privileges as long as it's shared PC in my office :) What puzzles me it's that the privileges were (i.e. not writable and readable by everyone) the same on both machines and on the first one it has worked from the first moment without a+rw...I assumed that the rstudio user has PID 1000 in the container/image and that -e UID= etc could fix (change it to whatever PID my user has on the host machine) this but this does not seem the case. It is not the case. These containers are set up differently than Rocker's. Though I think I will change them to allow this. In the meantime you can accomplish what you want, a bit circuitously. Run a command like this: docker run -p 8787:8787 -e USER=$USER -e USERID=$UID --rm -it -v /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/:/opt/matrix_rider bioconductor/devel_sequencing bash This opens a shell on the container (as root) where you can do the following: /tmp/userconf.sh # note different location than on rocker # ignore warning that /home/data already exists passwd data # choose a passwd for the user, does not have to match your real password /usr/bin/supervisord Then open a browser to http://localhost:8787 and log in with username 'data' and the password you just set. You should be able to read/write the files in /opt/matrix_rider, assuming you can read/write those files when you are logged in as 'data' and _not_ using docker. I will look at doing this in a way more similar to rocker; I'll let you know when/if this is done. Thanks, Dan In the meantime you can use R from the command line as described in https://github.com/rocker-org/rocker/wiki/Sharing-files-with-host-machine#interactive-containers with the change that userconf.sh is in /tmp so you need to invoke it like this: /tmp/userconf.sh And ignore the warning it gives you that your home directory already exists. Thank you very much, E. -- $ pom ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}
On 08/01/2015 9:03 AM, John Nolan wrote: Adding an optional argument to get (and mget) like val - get(name, where, ..., value.if.not.found=NULL ) (*) That would be a bad idea, as it would change behaviour of existing uses of get(). What I suggested would not give a default. If the arg was missing, we'd get the old behaviour, if the arg was present, we'd use it. I'm not sure this is preferable to the separate function implementation. This makes the documentation and implementation of get() more complicated, and it would probably be slower for everyone. Duncan Murdoch would be useful for many. HOWEVER, it is possible that there could be some confusion here: (*) can give a NULL because either x exists and has value NULL, or because x doesn't exist. If that matters, the user would need to be careful about specifying a value.if.not.found that cannot be confused with a valid value of x. To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) return a list with two values: - a boolean variable 'found' # = value returned by exists( ) - a variable 'value' Then implement get( ) as: get - function(x,...,value.if.not.found ) { if( missing(value.if.not.found) ) { a - getifexists(x,... ) if (!a$found) error(x not found) } else { a - getifexists(x,...,value.if.not.found ) } return(a$value) } Note that value.if.not.found has no default value in above. It behaves exactly like current get does if value.if.not.found is not specified, and if it is specified, it would be faster in the common situation mentioned below: if(exists(x,...)) { get(x,...) } John P.S. if you like dromedaries call it valueIfNotFound ... .. John P. Nolan Math/Stat Department 227 Gray Hall, American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpno...@american.edu voice: 202.885.3140 web: academic2.american.edu/~jpnolan .. -R-devel r-devel-boun...@r-project.org wrote: - To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org From: Duncan Murdoch Sent by: R-devel Date: 01/08/2015 06:39AM Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...} On 08/01/2015 4:16 AM, Martin Maechler wrote: In November, we had a bug repository conversation with Peter Hagerty and myself: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 where the bug report title started with --- exists is a bottleneck for dispatch and package loading, ... Peter proposed an extra simplified and henc faster version of exists(), and I commented --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch --- I'm very grateful that you've started exploring the bottlenecks of loading packages with many S4 classes (and methods)... and I hope we can make real progress there rather sooner than later. OTOH, your `summaryRprof()` in your vignette indicates that exists() may use upto 10% of the time spent in library(reportingTools), and your speedup proposals of exist() may go up to ca 30% which is good and well worth considering, but still we can only expect 2-3% speedup for package loading which unfortunately is not much. Still I agree it is worth looking at exists() as you did ... and consider providing a fast simplified version of it in addition to current exists() [I think]. BTW, as we talk about enhancements here, maybe consider a further possibility: My subjective guess is that probably more than half of exists() uses are of the form if(exists(name, where, ...)) { get(name, whare, ) .. } else { NULL / error() / .. or similar } i.e. many exists() calls when returning TRUE are immediately followed by the corresponding get() call which repeats quite a bit of the lookup that exists() has done. Instead, I'd imagine a function, say getifexists(name, ...) that does both at once in the exists is TRUE case but in a way we can easily keep the if(.) .. else clause above. One already existing approach would use if(!inherits(tryCatch(xx - get(name, where, ...), error=function(e)e), error)) { ... (( work with xx )) ... } else { NULL / error() / .. or similar } but of course our C implementation would be more efficient and use more concise syntax {which should not look like error handling}. Follow ups to this idea should really go to R-devel (the mailing list). and now I do follow up here myself : I found that 'getifexists()' is actually very simple to implement, I have already tested it a bit, but not yet committed to R-devel (the R trunk aka master branch) because I'd like to get public comments {RFC := Request For
[Rd] unloadNamespace
In the documentation the closed thing I see to an explanation of this is that ?detach says Unloading some namespaces has undesirable side effects Can anyone explain why unloading tseries will load zoo? I don't think this behavior is specific to tseries, it's just an example. I realize one would not usually unload something that is not loaded, but I would expect it to do nothing or give an error. I only discovered this when trying to clean up to debug another problem. R version 3.1.2 (2014-10-31) -- Pumpkin Helmet and R Under development (unstable) (2015-01-02 r67308) -- Unsuffered Consequences ... Type 'q()' to quit R. loadedNamespaces() [1] base datasets graphics grDevices methods stats [7] utils unloadNamespace(tseries) # loads zoo ? loadedNamespaces() [1] base datasets graphics grDevices grid lattice [7] methods quadprog stats utils zoo Somewhat related, is there an easy way to get back to a clean state for loaded and attached things, as if R had just been started? I'm trying to do this in a vignette so it is not easy to stop and restart R. Paul __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] gsub with perl=TRUE results in 'this version of PCRE is not compiled with Unicode property support' in R-devel
Dan, for OS X, there is a new pcre library posted at http://r.research.att.com/libs/ with a date stamp of Dec 28. This fixes this problem. You can test for this by running make check post compilation. It'll bang out with a failure if this is not in order. (And I know that all of this is described in R-admin). It would be helpful (time saving) if a message is posted to r-sig-mac whenever a new (version of a) library is added to http://r.research.att.com/libs/ I know it is adding more work to the helpful people who are doing all the heavy lifting. Kasper On Thu, Jan 8, 2015 at 7:06 AM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: Why are you reporting that your PCRE library does not have something which the R-admin manual says it should preferably have? To wit, footnote 37 says 'and not PCRE2, which started at version 10.0. PCRE must be built with UTF-8 support (not the default) and support for Unicode properties is assumed by some R packages. Neither are tested by configure. JIT support is desirable.' That certainly does not fail on my Linux, Windows and OS X builds of R-devel. (Issues about pre-built binaries, if that is what you used, should be reported to their maintainers, not here.) And the help does say in ?regex In UTF-8 mode, some Unicode properties may be supported via ‘\p{xx}’ and ‘\P{xx}’ which match characters with and without property ‘xx’ respectively. Note the 'may'. On 07/01/2015 23:25, Dan Tenenbaum wrote: The following code: res - gsub((*UCP)\\b(i)\\b, , nhgrimelanomaclass, perl = TRUE) results in: Error in gsub(sprintf((*UCP)\\b(%s)\\b, i), , nhgrimelanomaclass, : invalid regular expression '(*UCP)\b(i)\b' In addition: Warning message: In gsub(sprintf((*UCP)\\b(%s)\\b, i), , nhgrimelanomaclass, : PCRE pattern compilation error 'this version of PCRE is not compiled with Unicode property support' at '(*UCP)\b(i)\b' on R Under development (unstable) (2015-01-01 r67290) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.9.5 (Mavericks) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base And also on the same version of R-devel on Snow Leopard, Windows, and Linux. But it does not produce an error on R version 3.1.2 (2014-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base Dan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] On base::rank
Indeed. Interesting that nobody has noticed till now, even though that part has been world readable since at least 2008-08-25. That was what made me a bit unsure :-). Note that the R source code is at http://svn.r-project.org/R/ and the file in question at http://svn.r-project.org/R/trunk/src/library/base/R/rank.R Okay, thanks. where you can already see the new code (given that 'x' was no longer needed, there's no need for 'xx'). Great! thanks again. Martin Maechler, ETH Zurich Best, Arun. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}
If we do add an argument to get(), then it should be named consistently with the ifnotfound argument of mget(). As mentioned, the possibility of a NULL value is problematic. One solution is a sentinel value that indicates an unbound value (like R_UnboundValue). But another idea (and one pretty similar to John's) is to follow the SYMSXP design at the C level, where there is a structure that points to the name and a value. We already have SYMSXPs at the R level of course (name objects) but they do not provide access to the value, which is typically R_UnboundValue. But this does not even need to be implemented with SYMSXP. The design would allow something like: binding - getBinding(x, env) if (hasValue(binding)) { x - value(binding) # throws an error if none message(name(binding), has value, x) } That I think it is a bit verbose but readable and could be made fast. And I think binding objects would be useful in other ways, as they are essentially a named object. For example, when iterating over an environment. Michael On Thu, Jan 8, 2015 at 6:03 AM, John Nolan jpno...@american.edu wrote: Adding an optional argument to get (and mget) like val - get(name, where, ..., value.if.not.found=NULL ) (*) would be useful for many. HOWEVER, it is possible that there could be some confusion here: (*) can give a NULL because either x exists and has value NULL, or because x doesn't exist. If that matters, the user would need to be careful about specifying a value.if.not.found that cannot be confused with a valid value of x. To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) return a list with two values: - a boolean variable 'found' # = value returned by exists( ) - a variable 'value' Then implement get( ) as: get - function(x,...,value.if.not.found ) { if( missing(value.if.not.found) ) { a - getifexists(x,... ) if (!a$found) error(x not found) } else { a - getifexists(x,...,value.if.not.found ) } return(a$value) } Note that value.if.not.found has no default value in above. It behaves exactly like current get does if value.if.not.found is not specified, and if it is specified, it would be faster in the common situation mentioned below: if(exists(x,...)) { get(x,...) } John P.S. if you like dromedaries call it valueIfNotFound ... .. John P. Nolan Math/Stat Department 227 Gray Hall, American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpno...@american.edu voice: 202.885.3140 web: academic2.american.edu/~jpnolan .. -R-devel r-devel-boun...@r-project.org wrote: - To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org From: Duncan Murdoch Sent by: R-devel Date: 01/08/2015 06:39AM Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...} On 08/01/2015 4:16 AM, Martin Maechler wrote: In November, we had a bug repository conversation with Peter Hagerty and myself: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 where the bug report title started with --- exists is a bottleneck for dispatch and package loading, ... Peter proposed an extra simplified and henc faster version of exists(), and I commented --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch --- I'm very grateful that you've started exploring the bottlenecks of loading packages with many S4 classes (and methods)... and I hope we can make real progress there rather sooner than later. OTOH, your `summaryRprof()` in your vignette indicates that exists() may use upto 10% of the time spent in library(reportingTools), and your speedup proposals of exist() may go up to ca 30% which is good and well worth considering, but still we can only expect 2-3% speedup for package loading which unfortunately is not much. Still I agree it is worth looking at exists() as you did ... and consider providing a fast simplified version of it in addition to current exists() [I think]. BTW, as we talk about enhancements here, maybe consider a further possibility: My subjective guess is that probably more than half of exists() uses are of the form if(exists(name, where, ...)) { get(name, whare, ) .. } else { NULL / error() / .. or similar } i.e. many exists() calls when returning TRUE are immediately followed by the corresponding get() call which repeats quite a bit of the lookup that exists() has done. Instead, I'd imagine a function, say getifexists(name, ...) that does both at once in the exists is TRUE case but in a way we can easily keep the if(.) .. else clause above. One already existing
Re: [Bioc-devel] Announcing Docker containers for Bioconductor
- Original Message - From: Elena Grassi grass...@gmail.com To: bioc-devel@r-project.org Sent: Thursday, January 8, 2015 5:11:50 AM Subject: Re: [Bioc-devel] Announcing Docker containers for Bioconductor Thanks, this is really useful and I was looking forward to it after having used rocker! I have a strange (ok, at least to me :) ) issue concerning volumes. On a machine (debian testing/unstable up to date) everything works smoothly when I do something like: data@decoder:~$ docker run -v /home/data/Dropbox/work/ matrix/mr_bioc/matrix_rider/:/opt/matrix_rider -p 8787:8787 9f40f8036ad4 (in /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/ I have a Rstudio project) What image is 9f40f8036ad4? Why are you not using the user/repository name for the image? Is it a rocker image or a Bioconductor image? The same command on another machine (more or less the same debian as before) leads me to being unable to open the project with Rstudio as long as the directory is not writeable. The only relevant difference seems to be my user UID which is 1000 on the first machine but different on the other one. I tried to use -e as suggested here https://github.com/rocker-org/rocker/wiki/Sharing-files-with-host-machine with no luck. Does anyone more docker-savy than me have a suggestion? (maybe the volumes approach is not the best one here). Note that Rstudio Server runs as a user called rstudio, not necessarily a privileged user. So the directory you are trying to mount should probably be writable (and readable) by all. so try chmod -R a+rw /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/ Dan Thanks, E. p.s. sorry Dan for the first mail sent only to you ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}
Adding an optional argument to get (and mget) like val - get(name, where, ..., value.if.not.found=NULL ) (*) would be useful for many. HOWEVER, it is possible that there could be some confusion here: (*) can give a NULL because either x exists and has value NULL, or because x doesn't exist. If that matters, the user would need to be careful about specifying a value.if.not.found that cannot be confused with a valid value of x. Exactly -- well, of course: That problem { NULL can be the legit value of what you want to get() } was the only reason to have a 'value.if.not' argument at all. Note that this is not about a universal replacement of the if(exists(..)) { .. get(..) } idiom, but rather a replacement of these in the cases where speed matters very much, which is e.g. in the low level support code for S4 method dispatch. 'value.if.not.found': Note that CRAN checks requires all arguments to be written in full length. Even though we have auto completion in ESS, Rstudio or other good R IDE's, I very much like to keep function calls somewhat compact. And yes, as you mention the dromedars aka 2-hump camels: getIfExist is already horrible to my taste (and _ is not S-like; yes that's all very much a matter of taste and yes I'm from the 20th century). To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) return a list with two values: - a boolean variable 'found' # = value returned by exists( ) - a variable 'value' Then implement get( ) as: get - function(x,...,value.if.not.found ) { if( missing(value.if.not.found) ) { a - getifexists(x,... ) if (!a$found) error(x not found) } else { a - getifexists(x,...,value.if.not.found ) } return(a$value) } Interesting... Note that the above get() implementation would just be conceptually, as all of this is also quite a bit about speed, and we do the different cases in C anyway [via 'op' code]. Note that value.if.not.found has no default value in above. It behaves exactly like current get does if value.if.not.found is not specified, and if it is specified, it would be faster in the common situation mentioned below: if(exists(x,...)) { get(x,...) } Good... Let's talk about your getifexists() as I argue we'd keep get() exactly as it is now anyway, if we use a new 3rd function (I keep calling 'getifexists()' for now): I think in that case, getifexists() would not even *need* an argument 'value.if.not' (or 'value.if.not.found'); it rather would return a list(found = *, value = *) in any case. Alternatively, it could return structure(found, value = *) In the first case, our main use case would be if((r - getifexists(x, *))$found) { ## work with r$value } in the 2nd case {structure} : if((r - getifexists(x, *))) { ## work with attr(r,value) } I think that (both cases) would still be a bit slower (for the above most important use case) but probably not much and it would like slightly more readable than my if (!is.null(r - getifexists(x, *))) { ## work with r } After all of this, I think I'd still somewhat prefer my original proposal, but not strongly -- I had originally also thought of returning the two parts explicitly, but then tended to prefer the version that behaved exactly like get() in the case the object is found. ... Nice interesting ideas! ... let the proposals and consideration flow ... Martin John P.S. if you like dromedaries call it valueIfNotFound ... :-) ;-) I don't .. as I said above, I already strongly dislike more than one hump. [ Each capital is one key stroke (Shift) more , and each _ is two key strokes more on most key boards..., and I do like identifiers that I can also quickly pronounce on the phone or in teaching .. ] .. John P. Nolan Math/Stat Department 227 Gray Hall, American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 .. -R-devel r-devel-boun...@r-project.org wrote: - To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org From: Duncan Murdoch Sent by: R-devel Date: 01/08/2015 06:39AM Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...} On 08/01/2015 4:16 AM, Martin Maechler wrote: In November, we had a bug repository conversation with Peter Hagerty and myself: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 where the bug report title started with --- exists is a bottleneck for dispatch and package loading, ... Peter proposed an extra simplified and henc faster version of exists(), and I commented --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch --- I'm very grateful that you've started exploring the bottlenecks of loading packages with many S4 classes
Re: [Rd] New version of Rtools for Windows
The r.dll crash is easy - you need to be using gcc-ar for ar, and gcc-ranlib for ranlib. I also posted a patch to fix the check failure for stack probing, as lto optimizes away the stack probing code, as it should. yes, lto build's speed gain is very impressive. -- On Thu, Jan 8, 2015 2:01 PM GMT Henric Winell wrote: On 2015-01-08 14:18, Avraham Adler wrote: Very timely, as this is how I got into the problem I posted about earlier; maybe some of the problems I ran into will mean more to the you and the experts on this thread, Dr. Murdoch.For reference, I run Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2. As we discussed offline, Dr. Murdoch, I've been trying to build R using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen (rubenvb) told me he is no longer developing his own builds of GCC, but is focusing on MSYS2 and the mingw64 personal builds. So, similar to what Jeroen said, I first installed MSYS2, whose initial installation on windows is not so simple[1]. After the initial install, the following packages need to be manually installed: make, tar, zip, unzip, zlib, and rsync. I also installed base-devel, which is way more than necessary, but there may be packages in there which are necessary. I originally installed the most up-to-date version of GCC (4.9.2)[2], and I did pick the -seh version, as since I install (almost) all packages from source (the one exception being nloptr for now), the exception handling should be consistent and it is supposed to up to ~15% faster[3]. The initial build crashed with the following error: gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H -O3 -Wall -pedantic -mtune=core2 -c xmalloc.c -o xmalloc.o ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o tre-mem.o tre-parse.o tre-stack.o xmalloc.o gcc -std=gnu99 -m64 -O3 -Wall -pedantic -mtune=core2 -c compat.c -o compat.o compat.c:65:5: error: redefinition of 'snprintf' int snprintf(char *buffer, size_t max, const char *format, ...) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous definition of 'snprintf' was here int snprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, ...) ^ compat.c:75:5: error: redefinition of 'vsnprintf' int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous definition of 'vsnprintf' was here int vsnprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, va_list __local_argv) ^ ../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed make[4]: *** [compat.o] Error 1 Makefile:120: recipe for target 'rlibs' failed make[3]: *** [rlibs] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 After doing some checking (for example see [4]), I asked Duncan about the problem, and he suggested moving the #ifndef _W64 in compat.c up above the offending lines (65-75). That did not work, so, I figured (it seems mistakenly from the other thread) that if those functions are included from stdio already, I can just delete them from compat.c. The specific lines are: int snprintf(char *buffer, size_t max, const char *format, ...) { int res; va_list(ap); va_start(ap, format); res = trio_vsnprintf(buffer, max, format, ap); va_end(ap); return res; } int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args) { return trio_vsnprintf(buffer, bufferSize, format, args); } Continuing the build using 4.9.2 crashed again at the following point: gcc -std=gnu99 -m64 -I../include -I. -I../extra -DHAVE_CONFIG_H -DR_DLL_BUILD -O3 -Wall -pedantic -mtune=core2 -c malloc.c -o malloc.o windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a ../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a ../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a ../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a ../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L. -lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv -lcomctl32 -lversion collect2.exe: error: ld returned 5 exit status Makefile:150: recipe for target 'R.dll' failed
Re: [Rd] unloadNamespace
Paul, My switchr package (https://github.com/gmbecker/switchr) has the flushSession function which does what you want and seems to work (on my test machine at least). I havent tested it under a recent Rdevel, or with that specific package, however I will soon, as the overarching model of switchr relies on this working. If you do try it before me with that package, please let me know whether it works or not. ~G On Thu, Jan 8, 2015 at 7:45 AM, Paul Gilbert pgilbert...@gmail.com wrote: In the documentation the closed thing I see to an explanation of this is that ?detach says Unloading some namespaces has undesirable side effects Can anyone explain why unloading tseries will load zoo? I don't think this behavior is specific to tseries, it's just an example. I realize one would not usually unload something that is not loaded, but I would expect it to do nothing or give an error. I only discovered this when trying to clean up to debug another problem. R version 3.1.2 (2014-10-31) -- Pumpkin Helmet and R Under development (unstable) (2015-01-02 r67308) -- Unsuffered Consequences ... Type 'q()' to quit R. loadedNamespaces() [1] base datasets graphics grDevices methods stats [7] utils unloadNamespace(tseries) # loads zoo ? loadedNamespaces() [1] base datasets graphics grDevices grid lattice [7] methods quadprog stats utils zoo Somewhat related, is there an easy way to get back to a clean state for loaded and attached things, as if R had just been started? I'm trying to do this in a vignette so it is not easy to stop and restart R. Paul __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Gabriel Becker, PhD Alumnus Statistics Department University of California, Davis [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Bioc-devel] Announcing Docker containers for Bioconductor
Docker and some bioconductor packages were used long time ago at CodersCrowd http://coderscrowd.com/ , good to see this move finally hit the community, awesome move, keep it up It will be great to support bioconductor users by using CodersCrowd as well, reproducibility is also when you reproduce errors not just working code Cheers Rad On Thu, Jan 8, 2015 at 7:43 AM, Dan Tenenbaum dtene...@fredhutch.org wrote: - Original Message - From: Elena Grassi grass...@gmail.com To: bioc-devel@r-project.org Sent: Thursday, January 8, 2015 5:11:50 AM Subject: Re: [Bioc-devel] Announcing Docker containers for Bioconductor Thanks, this is really useful and I was looking forward to it after having used rocker! I have a strange (ok, at least to me :) ) issue concerning volumes. On a machine (debian testing/unstable up to date) everything works smoothly when I do something like: data@decoder:~$ docker run -v /home/data/Dropbox/work/ matrix/mr_bioc/matrix_rider/:/opt/matrix_rider -p 8787:8787 9f40f8036ad4 (in /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/ I have a Rstudio project) What image is 9f40f8036ad4? Why are you not using the user/repository name for the image? Is it a rocker image or a Bioconductor image? The same command on another machine (more or less the same debian as before) leads me to being unable to open the project with Rstudio as long as the directory is not writeable. The only relevant difference seems to be my user UID which is 1000 on the first machine but different on the other one. I tried to use -e as suggested here https://github.com/rocker-org/rocker/wiki/Sharing-files-with-host-machine with no luck. Does anyone more docker-savy than me have a suggestion? (maybe the volumes approach is not the best one here). Note that Rstudio Server runs as a user called rstudio, not necessarily a privileged user. So the directory you are trying to mount should probably be writable (and readable) by all. so try chmod -R a+rw /home/data/Dropbox/work/matrix/mr_bioc/matrix_rider/ Dan Thanks, E. p.s. sorry Dan for the first mail sent only to you ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- *Radhouane Aniba* *Bioinformatics Scientist* *BC Cancer Agency, Vancouver, Canada* [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Rd] New version of Rtools for Windows
Oh, I forgot to mention that besides setting AR, RANLIB and the stack probing fix, you also need a very up to date binutils. 2.25 was out in december. Even with that , if you linker's default is not what you are compiling for (i.e. a multiarch toolchain), you need to set GNUTARGET also, i.e. -m32/-m64 is not enough. Some fix to autodetect non-default targets went in after christmas before the new year, but I am not brave enough to try that on a daily basis yet (only tested it and reported it, then reverting the change - how gcc invokes the linker is rather complicated and it is not easy to have two binutils installed...)- setting GNUTARGET seems safer :-). Whether you need that depends on whether you are compiling for your toolchain's default target architecture. AR, RANLIB, GNUTARGET are all environment variables - you set them the usual way. The stack probing fix is for passing make check, when you finish make. -- On Thu, Jan 8, 2015 6:14 PM GMT Avraham Adler wrote: On Thu, Jan 8, 2015 at 10:48 AM, Hin-Tak Leung ht...@users.sourceforge.net wrote: The r.dll crash is easy - you need to be using gcc-ar for ar, and gcc-ranlib for ranlib. I also posted a patch to fix the check failure for stack probing, as lto optimizes away the stack probing code, as it should. yes, lto build's speed gain is very impressive. I apologize for my ignorance, but how would I do that? I tried by changing the following in src/gnuwin32/MkRules.local: # prefix for 64-bit: path or x86_64-w64-mingw32- BINPREF64 = x86_64-w64-mingw32-gcc- I added the gcc- as the suffix there, but I guess that is insufficient as I still get the following error using 4.9.2: windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a ../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a ../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a ../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a ../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L. -lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv -lcomctl32 -lversion collect2.exe: error: ld returned 5 exit status Makefile:150: recipe for target 'R.dll' failed make[3]: *** [R.dll] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 I still had to delete those lines in compat.c, so this build, were it to have completed, is still subject to the non-conformance of scientfic notation printing that was discussed earlier. Hin-tak, any suggestions for this error (and the compat.c for that matter) that you, or any reader of this list, may have would be greatly appreciated. Thank you! Avi -- On Thu, Jan 8, 2015 2:01 PM GMT Henric Winell wrote: On 2015-01-08 14:18, Avraham Adler wrote: Very timely, as this is how I got into the problem I posted about earlier; maybe some of the problems I ran into will mean more to the you and the experts on this thread, Dr. Murdoch.For reference, I run Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2. As we discussed offline, Dr. Murdoch, I've been trying to build R using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen (rubenvb) told me he is no longer developing his own builds of GCC, but is focusing on MSYS2 and the mingw64 personal builds. So, similar to what Jeroen said, I first installed MSYS2, whose initial installation on windows is not so simple[1]. After the initial install, the following packages need to be manually installed: make, tar, zip, unzip, zlib, and rsync. I also installed base-devel, which is way more than necessary, but there may be packages in there which are necessary. I originally installed the most up-to-date version of GCC (4.9.2)[2], and I did pick the -seh version, as since I install (almost) all packages from source (the one exception being nloptr for now), the exception handling should be consistent and it is supposed to up to ~15% faster[3]. The initial build crashed with the following error: gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H -O3 -Wall -pedantic -mtune=core2 -c xmalloc.c -o xmalloc.o ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o tre-mem.o tre-parse.o tre-stack.o xmalloc.o gcc -std=gnu99 -m64 -O3 -Wall -pedantic -mtune=core2 -c compat.c -o compat.o compat.c:65:5: error: redefinition of 'snprintf' int snprintf(char *buffer, size_t max, const char *format, ...) ^ In
[Rd] Testing R packages on Solaris Studio
I have setup a Solaris server to test packages before submitting to CRAN, in order to catch problems that might not reveal themselves on Fedora, Debian, OSX or Windows. The machine runs a Solaris 11.2 vm with Solaris Studio 12.3. I was able to compile current r-devel using the suggested environment variables from R Installation and Administration and: ./configure --prefix=/opt/R-devel --with-blas='-library=sunperf' --with-lapack All works great (fast too), except for some CRAN packages with c++ code won't build. The compiler itself works, most packages (including e.g. MCMCpack) build OK. However packages like Rcpp and RJSONIO fail with errors shown here: https://gist.github.com/jeroenooms/f1b6a172320a32f59c82. I tried installing with GNU make, but that does not seem to be the problem configure.vars = MAKE=/opt/csw/bin/gmake I am aware that I can work around it by compiling with gcc instead of solaris studio, but I would specifically like to replicate the setup from CRAN. Which additional args/vars/dependencies do I need to make Rcpp and RJSONIO build as they do on the CRAN Solaris server? sessionInfo() R Under development (unstable) (2015-01-07 r67351) Platform: i386-pc-solaris2.11 (32-bit) Running under: Solaris 11 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tcltk_3.2.0 tools_3.2.0 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}
On Thu, Jan 8, 2015 at 6:36 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: val - get(name, where, ..., value.if.not.found=NULL ) (*) That would be a bad idea, as it would change behaviour of existing uses of get(). Another approach would be if the not found behavior consists of a callback, e.g. an expression or function: get(name, where, ..., not.found=stop(object , name, not found)) This would cover the case of not.found=NULL, but also allows for writing code with syntax similar to tryCatch obj - get(foo, not.found = someDefaultValue()) Not sure what this would do to performance though. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... Readability of source code is not usually our prime concern. The idea does have some merit, though. Apropos, why is there no setcontains()? -pd On 06 Jan 2015, at 22:02 , Hervé Pagès hpa...@fredhutch.org wrote: Hi, Current implementation: setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(match(x, y, 0L) 0L, match(y, x, 0L) 0L)) } First what about replacing 'match(x, y, 0L) 0L' and 'match(y, x, 0L) 0L' with 'x %in% y' and 'y %in% x', respectively. They're strictly equivalent but the latter form is a lot more readable than the former (isn't this the raison d'être of %in%?): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(c(x %in% y, y %in% x)) } Furthermore, replacing 'all(c(x %in% y, y %in x))' with 'all(x %in% y) all(y %in% x)' improves readability even more and, more importantly, reduces memory footprint significantly on big vectors (e.g. by 15% on integer vectors with 15M elements): setequal - function (x, y) { x - as.vector(x) y - as.vector(y) all(x %in% y) all(y %in% x) } It also seems to speed up things a little bit (not in a significant way though). Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...}
On Thu, 8 Jan 2015, Michael Lawrence wrote: If we do add an argument to get(), then it should be named consistently with the ifnotfound argument of mget(). As mentioned, the possibility of a NULL value is problematic. One solution is a sentinel value that indicates an unbound value (like R_UnboundValue). A null default is fine -- it's a default; if it isn't right for a particular case you can provide something else. But another idea (and one pretty similar to John's) is to follow the SYMSXP design at the C level, where there is a structure that points to the name and a value. We already have SYMSXPs at the R level of course (name objects) but they do not provide access to the value, which is typically R_UnboundValue. But this does not even need to be implemented with SYMSXP. The design would allow something like: binding - getBinding(x, env) if (hasValue(binding)) { x - value(binding) # throws an error if none message(name(binding), has value, x) } That I think it is a bit verbose but readable and could be made fast. And I think binding objects would be useful in other ways, as they are essentially a named object. For example, when iterating over an environment. This would need a lot more thought. Directly exposing the internals is definitely not something we want to do as we may well want to change that design. But there are lots of other corner issues that would have to be thought through before going forward, such as what happens if an rm occurs between obtaining a binding object and doing something with it. Serialization would also need thinking through. This doesn't seem like a worthwhile place to spend our efforts to me. Adding getIfExists, or .get, or get0, or whatever seems fine. Adding an argument to get() with missing giving current behavior may be OK too. Rewriting exists and get as .Primitives may be sufficient though. Best, luke Michael On Thu, Jan 8, 2015 at 6:03 AM, John Nolan jpno...@american.edu wrote: Adding an optional argument to get (and mget) like val - get(name, where, ..., value.if.not.found=NULL ) (*) would be useful for many. HOWEVER, it is possible that there could be some confusion here: (*) can give a NULL because either x exists and has value NULL, or because x doesn't exist. If that matters, the user would need to be careful about specifying a value.if.not.found that cannot be confused with a valid value of x. To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) return a list with two values: - a boolean variable 'found' # = value returned by exists( ) - a variable 'value' Then implement get( ) as: get - function(x,...,value.if.not.found ) { if( missing(value.if.not.found) ) { a - getifexists(x,... ) if (!a$found) error(x not found) } else { a - getifexists(x,...,value.if.not.found ) } return(a$value) } Note that value.if.not.found has no default value in above. It behaves exactly like current get does if value.if.not.found is not specified, and if it is specified, it would be faster in the common situation mentioned below: if(exists(x,...)) { get(x,...) } John P.S. if you like dromedaries call it valueIfNotFound ... .. John P. Nolan Math/Stat Department 227 Gray Hall, American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpno...@american.edu voice: 202.885.3140 web: academic2.american.edu/~jpnolan .. -R-devel r-devel-boun...@r-project.org wrote: - To: Martin Maechler maech...@stat.math.ethz.ch, R-devel@r-project.org From: Duncan Murdoch Sent by: R-devel Date: 01/08/2015 06:39AM Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] exists ...} On 08/01/2015 4:16 AM, Martin Maechler wrote: In November, we had a bug repository conversation with Peter Hagerty and myself: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 where the bug report title started with --- exists is a bottleneck for dispatch and package loading, ... Peter proposed an extra simplified and henc faster version of exists(), and I commented --- Comment #2 from Martin Maechler maech...@stat.math.ethz.ch --- I'm very grateful that you've started exploring the bottlenecks of loading packages with many S4 classes (and methods)... and I hope we can make real progress there rather sooner than later. OTOH, your `summaryRprof()` in your vignette indicates that exists() may use upto 10% of the time spent in library(reportingTools), and your speedup proposals of exist() may go up to ca 30% which is good and well worth considering, but still we can only expect 2-3% speedup for package loading which unfortunately is not much. Still I agree it is worth looking at exists() as you did ... and consider providing a fast simplified version of it in addition to current