Re: [Rd] [New Patch] Fix disk corruption when writing

2017-07-05 Thread January W.
OK, this does indeed seem to be the case. It is interesting that it works
on MacOS, though. I think that given that errors on flushing the cache
cannot be caught, the behavior is inadvertently unpredictable.

best,

j.



On 5 July 2017 at 13:09, Duncan Murdoch <murdoch.dun...@gmail.com> wrote:

> On 05/07/2017 5:26 AM, January W. wrote:
>
>> I tried the newest patch, but it does not seem to work for me (on
>> Linux). Despite the check in Rconn_printf, the write.csv happily writes
>> to /dev/full and does not report an error. When I added a printf("%d\n",
>> res); to the Rconn_printf() definition, I see only positive values
>> returned by the vfprintf call.
>>
>>
> That's likely because you aren't writing enough to actually trigger a
> write to disk during the write.  Writes are buffered, and the error doesn't
> happen until the buffer is written.  The regression test I put in had this
> problem; I'm working on MacOS and Windows, so I never got to actually try
> it before committing.
>
> Unfortunately, it doesn't look possible to catch the final flush of the
> buffer when the connection is closed, so small writes won't trigger any
> error.
>
> It's also possible that whatever system you're on doesn't signal an error
> when the write fails.
>
> Duncan Murdoch
>
> Cheers,
>>
>> j.
>>
>>
>> On 4 July 2017 at 21:37, Duncan Murdoch <murdoch.dun...@gmail.com
>> <mailto:murdoch.dun...@gmail.com>> wrote:
>>
>> On 04/07/2017 11:50 AM, Jean-Sébastien Bevilacqua wrote:
>>
>> Hello,
>> You can find here a patch to fix disk corruption.
>> When your disk is full, the write function exit without error
>> but the file
>> is truncated.
>>
>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17243
>> <https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17243>
>>
>>
>> Thanks.  I didn't see that when it came through (or did and forgot).
>> I'll probably move the error check to a lower level (in the
>> Rconn_printf function), if tests show that works.
>>
>> Duncan Murdoch
>>
>>
>> __
>> R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>
>>
>>
>>
>> --
>>  January Weiner --
>>
>
>


-- 
 January Weiner --

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [New Patch] Fix disk corruption when writing

2017-07-05 Thread January W.
I tried the newest patch, but it does not seem to work for me (on Linux).
Despite the check in Rconn_printf, the write.csv happily writes to
/dev/full and does not report an error. When I added a printf("%d\n", res);
to the Rconn_printf() definition, I see only positive values returned by
the vfprintf call.

Cheers,

j.


On 4 July 2017 at 21:37, Duncan Murdoch  wrote:

> On 04/07/2017 11:50 AM, Jean-Sébastien Bevilacqua wrote:
>
>> Hello,
>> You can find here a patch to fix disk corruption.
>> When your disk is full, the write function exit without error but the file
>> is truncated.
>>
>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17243
>>
>
> Thanks.  I didn't see that when it came through (or did and forgot). I'll
> probably move the error check to a lower level (in the Rconn_printf
> function), if tests show that works.
>
> Duncan Murdoch
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
 January Weiner --

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] write.csv

2017-07-05 Thread January W.
Dear Jean-Luc,

neither write.csv nor save nor save.image nor any other default write
functions in R check for enough space remaining. While this might be indeed
a problem that one should take care of -- sooner or later -- I would
strongly recommend using data.table::fwrite as the working horse for saving
CSV files anyway.

Firstly, it takes better care of error conditions (as demonstrated by you).
Second, it is much faster:

> system.time(fwrite(list(a=1:1e8), file="test.csv"))
   user  system elapsed
  4.672   0.572   0.857
> system.time(write.csv(list(a=1:1e8), file="test.csv"))
   user  system elapsed
165.056   2.684 176.832

That said, I think that the larger issue here is that the logic behind the
family of functions for saving data in base R is different from the logic
of fwrite(). While fwrite allows to write some contents to a file, save(),
write.csv() and family are based on R file connections and can write not
only to a file, but just any sort of a connection. For example, you can
directly write to a compressed file:

df <- data.frame(a=1:1000)
gz <- gzfile("file.gz")
write.csv(df, file=gz)

It can be a socket, it can be a pipe, an URL etc etc.

The problem might be that there is no easy, general way for testing the
specific errors. Internally (see code in the connections.c file in R
sources), the Rconnection object has a member called "write", which is a
pointer to function writing the data to the connection, a different
function for different types of connections. I do not fully understand all
of this code, but since the functions used for writing return errors (I
think) in different ways, it could be that a reasonable solution is not
straightforward.

In the end and at the moment, as usual, you are faced with a compromise
between safety and freedom. If you need freedom or flexibility, you need to
use the core R functions which allow you to compress data on the fly or use
all sorts of connections. If you rather have stay safe, tell your users to
use fwrite for crucial data.

Best,

j.


On 4 July 2017 at 17:01, Nathan Sosnovske via R-devel  wrote:

> The best way to test on Windows would probably be creating a small virtual
> hard disk (via CreateVirtualDisk), mounting it, and writing to the mounted
> location. I believe the drive could even be mounted to an arbitrary
> location on the filesystem (instead of a drive letter) so that drive letter
> conflicts don't come into play.
>
> -Original Message-
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan
> Murdoch
> Sent: Tuesday, July 4, 2017 7:53 AM
> To: Jim Hester 
> Cc: r-devel@r-project.org; Lipatz Jean-Luc 
> Subject: Re: [Rd] write.csv
>
> On 04/07/2017 10:01 AM, Jim Hester wrote:
> > On linux at least you can use `/dev/full` [1] to test writing to a full
> device.
> >
> > > echo 'foo' > /dev/full
> > bash: echo: write error: No space left on device
>
> Unfortunately, I get a permission denied error if I try to write there
> from MacOS.  I don't know if Windows has an equivalent.
>
> I've taken a look at the code.  Essentially it comes down to a call to the
> C function vfprintf, which is supposed to return the number of bytes
> written, or a negative value for an error. This return value is often not
> checked; in particular, write.table and friends don't check it.
>
> I'll add code to signal an error if there's a negative value.
>
> I don't think it's feasible to check the number of bytes (formatted text
> with possible translation to a different encoding could have any number of
> bytes) if it's positive.  So hopefully all of our file systems will
> correctly signal an error, and not just report how many bytes were
> successfully written.
>
> >
> > Although that won't be a perfect test for this case where part of the
> > file is written successfully.
> >
> > An alternative suggestion for testing this is to create and mount a
> > loop device [2] with a small file.
> >
> > [1]:
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wi
> > kipedia.org%2Fwiki%2F%2Fdev%2Ffull=02%7C01%7Cnsosnov%40microsoft.
> > com%7Cb97a7371538b4dbe9a7308d4c2ec5aa0%7C72f988bf86f141af91ab2d7cd011d
> > b47%7C1%7C0%7C636347767773809248=Cb2oduozc2IDCLvXZGG1C4i4hQA7FPs
> > 5jHmnFYbk7zQ%3D=0
> > [2]:
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstack
> > overflow.com%2Fa%2F16044420%2F2055486=02%7C01%7Cnsosnov%40microso
> > ft.com%7Cb97a7371538b4dbe9a7308d4c2ec5aa0%7C72f988bf86f141af91ab2d7cd0
> > 11db47%7C1%7C0%7C636347767773809248=%2BWPfqD0nUS%2F30DUNDqQU79lR
> > EJh02ZX0yik9HXiY5kg%3D=0
>
> Loop devices sound ideal, but seem to be Linux-only (at least with that
> recipe).
>
> Duncan
>
>
> >
> > On Tue, Jul 4, 2017 at 3:38 PM, Duncan Murdoch 
> wrote:
> >> On 04/07/2017 8:40 AM, Lipatz Jean-Luc wrote:
> >>>
> >>> I would really like the bug fixed. At least this one, because I know
> 

Re: [Rd] Crash after (wrongly) applying product operator on object from LIMMA package

2017-04-24 Thread January W.
Hi Hilmar,

weird. The memory problem seems be due to recursion (my R, version 3.3.3,
says: Error: evaluation nested too deeply: infinite recursion /
options(expressions=)?, just write traceback() to see how it happens), but
why does it segfault with xlsx? Nb xlsx is the culprit: neither rJava nor
xlsxjars cause the problem.

On the other hand, quick googling for r+xlsx+segfault returns tons of
reports of how xlsx crashes in dozens of situations. See for example
http://r.789695.n4.nabble.com/segfault-in-gplots-heatmap-2-td4641808.html.
Also, the problem might be platform-specific. It would be interesting to
see whether anyone with a Mac can reproduce it.

kind regards,

j.





On 19 April 2017 at 10:01, Hilmar Berger  wrote:

> Hi,
>
> following up on my own question, I found smaller example that does not
> require LIMMA:
>
> setClass("FOOCLASS",
>  representation("list")
> )
> ma = new("FOOCLASS", list(M=matrix(rnorm(300), 30,10)))
>
> > ma * ma$M
> Error: C stack usage  7970512 is too close to the limit
>
> > library(xlsx)
> Loading required package: rJava
> Loading required package: xlsxjars
> > ma * ma$M
> ---> Crash
>
> xlsx seems to act like a catalyst here, with the product operator running
> in a deep nested iteration, exhausting the stack. Valgrind shows thousands
> of invalid stack accesses when loading xslx, which might contribute to the
> problem. Package xlsx has not been updated since 2014, so it might fail
> with more current versions of R or Java (I'm using Oracle Java 8).
>
> Still, even if xlsx was the package to be blamed for the crash, I fail to
> understand what exactly the product operator is trying to do in the
> multiplication of the matrix with the object.
>
> Best regards,
> Hilmar
>
>
> On 18/04/17 18:57, Hilmar Berger wrote:
>
>> Hi,
>>
>> this is a problem that occurs in the presence of two libraries (limma,
>> xlsx) and leads to a crash of R. The problematic code is the wrong
>> application of sweep or the product ("*") function on an LIMMA MAList
>> object. To my knowledge, limma does not define a "*" method for MAList
>> objects.
>>
>> If only LIMMA is loaded but not package xlsx, the code does not crash but
>> rather produces an error ("Error: C stack usage  7970512 is too close to
>> the limit"). Loading only package rJava instead of xlsx does also not
>> produce the crash but the error message instead. Note that xlsx functions
>> are not explicitly used.
>>
>> It could be reproduced on two different Linux machines running R-3.2.5,
>> R-3.3.0 and R-3.3.2.
>>
>> Code to reproduce the problem:
>> -
>> library(limma)
>> library(xlsx)
>>
>> # a MAList
>> ma = new("MAList", list(A=matrix(rnorm(300), 30,10), M=matrix(rnorm(300),
>> 30,10)))
>>
>> # This should actually be sweep(ma$M, ...) for functional code, but I
>> omitted the $M...
>> #sweep(ma, 2, c(1:10), "*")
>> # sweep will crash when doing the final operation of applying the
>> function over the input matrix, which in this case is function "*"
>>
>> f = match.fun("*")
>> # This is not exactly the same as in sweep but it also tries to multiply
>> the MAList object with a matrix of same size and leads to the crash
>> f(ma, ma$M)
>> # ma * ma$M has the same effect
>> -
>>
>> My output:
>>
>> R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
>> Copyright (C) 2016 The R Foundation for Statistical Computing
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>> You are welcome to redistribute it under certain conditions.
>> Type 'license()' or 'licence()' for distribution details.
>>
>>   Natural language support but running in an English locale
>>
>> R is a collaborative project with many contributors.
>> Type 'contributors()' for more information and
>> 'citation()' on how to cite R or R packages in publications.
>>
>> Type 'demo()' for some demos, 'help()' for on-line help, or
>> 'help.start()' for an HTML browser interface to help.
>> Type 'q()' to quit R.
>>
>> > library(limma)
>> > library(xlsx)
>> Loading required package: rJava
>> Loading required package: xlsxjars
>> >
>> > sessionInfo()
>> R version 3.3.0 (2016-05-03)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 14.04.5 LTS
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C LC_TIME=en_US.UTF-8
>>  [4] LC_COLLATE=en_US.UTF-8LC_MONETARY=en_US.UTF-8
>> LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8
>> LC_ADDRESS=en_US.UTF-8
>> [10] LC_TELEPHONE=en_US.UTF-8  LC_MEASUREMENT=en_US.UTF-8
>> LC_IDENTIFICATION=en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods base
>>
>> other attached packages:
>> [1] xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-8limma_3.30.7
>>
>> loaded via a namespace (and not attached):
>> [1] tools_3.3.0
>> >
>> > ma = new("MAList",