Re: [Rd] Why is the diag function so slow (for extraction)?

2015-05-14 Thread peter dalgaard

> On 13 May 2015, at 19:31 , Radford Neal  wrote:
> 
>> From: Martin Maechler 
> 
>> diag() should not work only for pure matrices, but for all
>> "matrix-like" objects for which ``the usual methods'' work, such
>> as
>>   as.vector(.), c(.)
>> 
>> That's why there has been the c(.) in there.
>> 
>> You can always make code faster if you write the code so it only
>> has to work in one special case .. and work there very fast.
> 
> help(diag) gives no hint whatever that diag(x) will work for
> objects that are "matrix-like", but aren't actual matrices.
> 
> help(c) explicitly says that methods for it are NOT required to
> convert matrices to vectors.
> 
> So you're advocating slowing down all ordinary uses of diag to
> accommodate a usage that nobody thought was important enough to
> actually document.


That's not really the point, Radford. The point is that we want to be very 
careful when changing a function to work less generally than it used to do. 
I.e., if we just change diag() by removing the c(), there's a risk of finding 
out the hard way, at the public release, why it was there to begin with. We can 
check against CRAN before release, but not all existing user scripts. 

As I pointed out earlier, avoiding a matrix copy before extracting the diagonal 
may be an impressive speedup in isolation: O(N) vs O(N^2), but you're not going 
to do much with large matrices without other operations being O(N^2) or O(N^3). 
The user-impact of a change could be quite small.

That being said, it is not like we're too good at making diag() work with 
matrix-like objects as it is:

> df <- as.data.frame(matrix(1:9,3))
> diag(df)
Error in diag(df) : (list) object cannot be coerced to type 'double'

(It's not like I actually want diag() to work on a data frame, it was just the 
first matrix-like non-matrix object that came to mind.)

The only case where the c() inside diag() has an effect is where 
M[i,j] != M[(i-1)*m+j] 
AND c(M) will stringize M in column-major order, so that 
M[i,j] == c(M)[(i-1)*m+j].

The former is not true for ordinary matrices (i.e., single-index extraction 
just works), and the latter does not hold for data frames. The 1$ question 
is whether there actually are cases for which both are true, and if so, which 
are they?


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Reading exit code of pipe()

2015-05-14 Thread Gábor Csárdi
On Thu, May 14, 2015 at 1:27 AM, Kevin Ushey  wrote:
[...]

> So maybe `cat` just doesn't set a status code, and so there's nothing
> for R to forward back (ergo -- NULL)?
>

cat definitely sets the status. IMHO every command sets the exit status, by
definition, at least on Unix/Linux.

/tmp$ touch x
/tmp$ cat x
/tmp$ echo $?
0
/tmp$ cat y
cat: y: No such file or directory
/tmp$ echo $?
1

Gabor

[...]

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Reading exit code of pipe()

2015-05-14 Thread William Dunlap
The difference in the return value of close(pipeConnectionObject)
seems to depend on whether the pipe connection was opened via
the pipe() or open() functions (close() returns NULL)
   > con <- pipe("ls")
   > open(con, "r")
   > readLines(con, n=1)
   [1] "1032.R"
   > print(close(con))
   NULL
   > con <- pipe("ls", "r")
   > scan(con, n=1, what="")
  Read 1 item
  [1] "1032.R"
  > print(close(con))
  NULL
or via something like readLines() or scan() (close() returns status
integer).
  > con <- pipe("ls")
  > scan(con, n=1, what="")
  Read 1 item
  [1] "1032.R"
  > print(close(con))
  [1] 36096
  > sprintf("0x%x", .Last.value)
  [1] "0x8d00"






Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, May 13, 2015 at 10:27 PM, Kevin Ushey  wrote:

> Hi Jeroen,
>
> I think `pipe` might just be returning the status code of the
> underlying command executed; for example, I get a status code of '0'
> when I test a pipe on `ls`:
>
> conn <- pipe("ls")
> stream <- readLines(conn)
> print(close(conn))
>
> Similarly, I get an error code if I try to `ls` a non-existent
> directory (512 in my case), e.g.
>
> conn <- pipe("ls /no/path/here/sir")
> stream <- readLines(conn)
> print(close(conn))
>
> So maybe `cat` just doesn't set a status code, and so there's nothing
> for R to forward back (ergo -- NULL)?
>
> Kevin
>
> On Wed, May 13, 2015 at 5:24 PM, Jeroen Ooms 
> wrote:
> > Is there a way to get the status code of a pipe() command? The
> > documentation suggests that it might be returned by close, however
> > this does not seem to be the case.
> >
> >   con <- pipe("cat /etc/passwd", "r")
> >   stream <- readLines(con, n = 10)
> >   err <- close(con)
> >   print(err)
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Reading exit code of pipe()

2015-05-14 Thread Tim Keitt
Not sure if it helps for your use case, but I have an experimental package
for controlling bidirectional pipe streams from R. Just thought I'd mention
it. Its at

https://github.com/thk686/pipestreamr

THK

On Thu, May 14, 2015 at 9:30 AM, William Dunlap  wrote:

> The difference in the return value of close(pipeConnectionObject)
> seems to depend on whether the pipe connection was opened via
> the pipe() or open() functions (close() returns NULL)
>> con <- pipe("ls")
>> open(con, "r")
>> readLines(con, n=1)
>[1] "1032.R"
>> print(close(con))
>NULL
>> con <- pipe("ls", "r")
>> scan(con, n=1, what="")
>   Read 1 item
>   [1] "1032.R"
>   > print(close(con))
>   NULL
> or via something like readLines() or scan() (close() returns status
> integer).
>   > con <- pipe("ls")
>   > scan(con, n=1, what="")
>   Read 1 item
>   [1] "1032.R"
>   > print(close(con))
>   [1] 36096
>   > sprintf("0x%x", .Last.value)
>   [1] "0x8d00"
>
>
>
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, May 13, 2015 at 10:27 PM, Kevin Ushey 
> wrote:
>
> > Hi Jeroen,
> >
> > I think `pipe` might just be returning the status code of the
> > underlying command executed; for example, I get a status code of '0'
> > when I test a pipe on `ls`:
> >
> > conn <- pipe("ls")
> > stream <- readLines(conn)
> > print(close(conn))
> >
> > Similarly, I get an error code if I try to `ls` a non-existent
> > directory (512 in my case), e.g.
> >
> > conn <- pipe("ls /no/path/here/sir")
> > stream <- readLines(conn)
> > print(close(conn))
> >
> > So maybe `cat` just doesn't set a status code, and so there's nothing
> > for R to forward back (ergo -- NULL)?
> >
> > Kevin
> >
> > On Wed, May 13, 2015 at 5:24 PM, Jeroen Ooms 
> > wrote:
> > > Is there a way to get the status code of a pipe() command? The
> > > documentation suggests that it might be returned by close, however
> > > this does not seem to be the case.
> > >
> > >   con <- pipe("cat /etc/passwd", "r")
> > >   stream <- readLines(con, n = 10)
> > >   err <- close(con)
> > >   print(err)
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Timothy H. Keitt
http://www.keittlab.org/

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Reading exit code of pipe()

2015-05-14 Thread Jeroen Ooms
On Thu, May 14, 2015 at 7:30 AM, William Dunlap  wrote:
> The difference in the return value of close(pipeConnectionObject) seems to 
> depend on whether the pipe connection was opened via the pipe() or open() 
> functions (close() returns NULL) or via something like readLines() or scan() 
> (close() returns status integer).

Hmm interesting. It doesn't help me though; the connection has to be
explicitly opened to support streaming otherwise it keeps running the
command over and over again:

 con <- pipe("ls -ltr /")
 readLines(con, n = 3)
 readLines(con, n = 3)
 readLines(con, n = 3)
 isOpen(con)

Under the hood, R distinguishes "closing" and "destroying" the
connection. The R function close actually means destroy. It seems like
the pipe exit code is only properly returned if the connection was
already closed but not destroyed by the time close() was called.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Creating a vignette which depends on a non-distributable file

2015-05-14 Thread January Weiner
Dear all,

I am writing a vignette that requires a file which I am not allowed to
distribute, but which the user can easily download manually. Moreover, it
is not possible to download this file automatically from R: downloading
requires a (free) registration that seems to work only through a browser.
(I'm talking here about the MSigDB from the Broad Institute,
http://www.broadinstitute.org/gsea/msigdb/index.jsp).

In the vignette, I tell the user to download the file and then show how it
can be parsed and used in R. Thus, I can compile the vignette only if this
file is present in the vignettes/ directory of the package. However, it
would then get included in the package -- which I am not allowed to do.

What should I do?

(1) finding an alternative to MSigDB is not a solution -- there simply is
no alternative.
(2) I could enter the code (and the results) in a verbatim environment
instead of using Sweave. This has obvious drawbacks (for one thing, it
would look incosistent).
(3) I could build vignette outside of the package and put it into the
inst/doc directory. This also has obvious drawbacks.
(4) Leaving this example out defies the purpose of my package.

I am tending towards solution (2). What do you think?

Kind regards,

j.



--
 January Weiner --

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Creating a vignette which depends on a non-distributable file

2015-05-14 Thread Henrik Bengtsson
On May 14, 2015 15:04, "January Weiner"  wrote:
>
> Dear all,
>
> I am writing a vignette that requires a file which I am not allowed to
> distribute, but which the user can easily download manually. Moreover, it
> is not possible to download this file automatically from R: downloading
> requires a (free) registration that seems to work only through a browser.
> (I'm talking here about the MSigDB from the Broad Institute,
> http://www.broadinstitute.org/gsea/msigdb/index.jsp).
>
> In the vignette, I tell the user to download the file and then show how it
> can be parsed and used in R. Thus, I can compile the vignette only if this
> file is present in the vignettes/ directory of the package. However, it
> would then get included in the package -- which I am not allowed to do.
>
> What should I do?
>
> (1) finding an alternative to MSigDB is not a solution -- there simply is
> no alternative.
> (2) I could enter the code (and the results) in a verbatim environment
> instead of using Sweave. This has obvious drawbacks (for one thing, it
> would look incosistent).
> (3) I could build vignette outside of the package and put it into the
> inst/doc directory. This also has obvious drawbacks.
> (4) Leaving this example out defies the purpose of my package.
>
> I am tending towards solution (2). What do you think?

Not clear how big of a static piece you're taking about, but maybe you
could set it up such that you use (2) as a fallback, i.e. have the vignette
include a static/pre-generated piece (which is clearly marked as such) only
if the external dependency is not available.

Just a thought

Henrik

>
> Kind regards,
>
> j.
>
>
>
> --
>  January Weiner --
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Creating a vignette which depends on a non-distributable file

2015-05-14 Thread Martin Morgan

On 05/14/2015 04:33 PM, Henrik Bengtsson wrote:

On May 14, 2015 15:04, "January Weiner"  wrote:


Dear all,

I am writing a vignette that requires a file which I am not allowed to
distribute, but which the user can easily download manually. Moreover, it
is not possible to download this file automatically from R: downloading
requires a (free) registration that seems to work only through a browser.
(I'm talking here about the MSigDB from the Broad Institute,
http://www.broadinstitute.org/gsea/msigdb/index.jsp).

In the vignette, I tell the user to download the file and then show how it
can be parsed and used in R. Thus, I can compile the vignette only if this
file is present in the vignettes/ directory of the package. However, it
would then get included in the package -- which I am not allowed to do.

What should I do?

(1) finding an alternative to MSigDB is not a solution -- there simply is
no alternative.
(2) I could enter the code (and the results) in a verbatim environment
instead of using Sweave. This has obvious drawbacks (for one thing, it
would look incosistent).


use the chunk argument eval=FALSE instead of placing the code in a verbatim 
argument. See ?RweaveLatex if you're compiling a PDF vignette from Rnw or the 
knitr documentation for (much nicer for users of your vignette, in my opinion) 
Rmd vignettes processed to HTML.


A common pattern is to process chunks 1, 2, 3, 4, and then there is a 'leap of 
faith' in chunk 5 (with eval=FALSE) and a second chunk (maybe with echo=FALSE, 
eval=TRUE) that reads the _result_ that would have been produced by chunk 5 from 
a serialized instance into the R session for processing in chunks 6, 7, 8...


Also very often while it might make sense to analyse an entire data set as part 
of a typical work flow, for illustrative purposes a much smaller subset or 
simulated data might be relevant; again a strategy would be to illustrate the 
problematic steps with simulated data, and then resume the narrative with the 
analyzed full data.


A secondary consideration may be that if your package _requires_ MSigDB to 
function, then it can't be automatically tested by repository build machines -- 
you'll want to have unit tests or other approaches to ensure that 'bit rot' does 
not set in without you being aware of it.


If this is a Bioconductor package, then it's appropriate to ask on the 
Bioconductor devel mailing list.


  http://bioconductor.org/developers/

http://bioconductor.org/packages/BiocStyle/ might be your friend for producing 
stylish vignettes.


Martin


(3) I could build vignette outside of the package and put it into the
inst/doc directory. This also has obvious drawbacks.
(4) Leaving this example out defies the purpose of my package.

I am tending towards solution (2). What do you think?


Not clear how big of a static piece you're taking about, but maybe you
could set it up such that you use (2) as a fallback, i.e. have the vignette
include a static/pre-generated piece (which is clearly marked as such) only
if the external dependency is not available.

Just a thought

Henrik



Kind regards,

j.



--
 January Weiner --

 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Installation using iconv from glibc

2015-05-14 Thread Smith, Virgil
The R Installation and Administration manual section A.1 states that glibc 
should provide a suitable iconv function, but I can't get R's configure script 
to accept/validate iconv on a Linux platform I need to support using glibc 2.20.

Is glibc is actually compatible (and/or is gnu libiconv essentially the only 
path)?

If glibc should work, what should I check to troubleshoot my environment?

The configure error I get is
checking iconv.h usability... yes
checking iconv.h presence... yes
checking for iconv.h... yes
checking for iconv... yes
checking whether iconv accepts "UTF-8", "latin1", "ASCII" and "UCS-*"... no
configure: error: a suitable iconv is essential

My full list of installed glibc / libc packages is
glibc-binary-localedata-en-gb - 2.20-r0
glibc-binary-localedata-en-us - 2.20-r0
glibc-gconv - 2.20-r0
glibc-gconv-utf-16 - 2.20-r0
glibc-locale-en-gb - 2.20-r0
libc6 - 2.20-r0
libc6-dev - 2.20-r0
libc6-extra-nss - 2.20-r0
libc6-thread-db - 2.20-r0

This is for a custom Linux build, not a major distro, so unfortunately I cannot 
use pre-packaged configurations.




Notice to recipient: This email is meant for only the intended recipient of the 
transmission, and may be a communication privileged by law, subject to export 
control restrictions or that otherwise contains proprietary information. If you 
receive this email by mistake, please notify us immediately by replying to this 
message and then destroy it and do not review, disclose, copy or distribute it. 
Thank you in advance for your cooperation.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel