[Rd] Issue with seek() on gzipped connections in R-devel

2011-09-23 Thread Jon Clayden
Dear all,

In R-devel (2011-09-23 r57050), I'm running into a serious problem
with seek()ing on connections opened with gzfile(). A warning is
generated and the file position does not seek to the requested
location. It doesn't seem to occur all the time - I tried to create a
small example file to illustrate it, but the problem didn't occur.
However, it can be seen with a file I use for testing my packages,
which is available through the URL
https://github.com/jonclayden/tractor/blob/master/tests/data/nifti/maskedb0_lia.nii.gz?raw=true:

 con - gzfile(~/Downloads/maskedb0_lia.nii.gz,rb)
 seek(con, 352)
[1] 0
Warning message:
In seek.connection(con, 352) :
  seek on a gzfile connection returned an internal error
 seek(con, NA)
[1] 190

The same commands with the same file work as expected in R 2.13.1, and
have worked over many previous versions of R.

 con - gzfile(~/Downloads/maskedb0_lia.nii.gz,rb)
 seek(con, 352)
[1] 0
 seek(con, NA)
[1] 352

My sessionInfo() output is:

R Under development (unstable) (2011-09-23 r57050)
Platform: x86_64-apple-darwin11.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_US.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] tractor.nt_2.0.1  tractor.session_2.0.3 tractor.utils_2.0.0
[4] tractor.base_2.0.3reportr_0.2.0

This seems to occur whether or not R is compiled with
--with-system-zlib. I see some zlib-related changes mentioned in the
NEWS, but I don't see any indication that this is expected. Could
anyone shed any light on it, please?

Thanks and all the best,
Jon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issue with seek() on gzipped connections in R-devel

2011-09-23 Thread Prof Brian Ripley
Basically seek with zlib is flaky: we've stumbled on several errors. 
If it worked for you in the past, count yourself lucky.  I'd suggest 
you avoid relying on it in your packages.


On Fri, 23 Sep 2011, Jon Clayden wrote:


Dear all,

In R-devel (2011-09-23 r57050), I'm running into a serious problem
with seek()ing on connections opened with gzfile(). A warning is
generated and the file position does not seek to the requested
location. It doesn't seem to occur all the time - I tried to create a
small example file to illustrate it, but the problem didn't occur.
However, it can be seen with a file I use for testing my packages,
which is available through the URL
https://github.com/jonclayden/tractor/blob/master/tests/data/nifti/maskedb0_lia.nii.gz?raw=true:


con - gzfile(~/Downloads/maskedb0_lia.nii.gz,rb)
seek(con, 352)

[1] 0
Warning message:
In seek.connection(con, 352) :
 seek on a gzfile connection returned an internal error

seek(con, NA)

[1] 190

The same commands with the same file work as expected in R 2.13.1, and
have worked over many previous versions of R.


con - gzfile(~/Downloads/maskedb0_lia.nii.gz,rb)
seek(con, 352)

[1] 0

seek(con, NA)

[1] 352

My sessionInfo() output is:

R Under development (unstable) (2011-09-23 r57050)
Platform: x86_64-apple-darwin11.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_US.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] tractor.nt_2.0.1  tractor.session_2.0.3 tractor.utils_2.0.0
[4] tractor.base_2.0.3reportr_0.2.0

This seems to occur whether or not R is compiled with
--with-system-zlib. I see some zlib-related changes mentioned in the
NEWS, but I don't see any indication that this is expected. Could
anyone shed any light on it, please?

Thanks and all the best,
Jon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issue with seek() on gzipped connections in R-devel

2011-09-23 Thread Jeffrey Ryan
seek() in general is a bad idea IMO if you are writing cross-platform code.

?seek

Warning:

 Use of ‘seek’ on Windows is discouraged.  We have found so many
 errors in the Windows implementation of file positioning that
 users are advised to use it only at their own risk, and asked not
 to waste the R developers' time with bug reports on Windows'
 deficiencies.

Aside from making me laugh, the above highlights the core reason to not use IMO.

For not zipped files, you can try the mmap package.  ?mmap and ?types
are good starting points.  Allows for accessing binary data on disk
with very simple R-like semantics, and is very fast.  Not as fast as a
sequential read... but fast.  At present this is 'little endian' only
though, but that describes most of the world today.

Best,
Jeff

On Fri, Sep 23, 2011 at 8:58 AM, Jon Clayden jon.clay...@gmail.com wrote:
 Dear all,

 In R-devel (2011-09-23 r57050), I'm running into a serious problem
 with seek()ing on connections opened with gzfile(). A warning is
 generated and the file position does not seek to the requested
 location. It doesn't seem to occur all the time - I tried to create a
 small example file to illustrate it, but the problem didn't occur.
 However, it can be seen with a file I use for testing my packages,
 which is available through the URL
 https://github.com/jonclayden/tractor/blob/master/tests/data/nifti/maskedb0_lia.nii.gz?raw=true:

 con - gzfile(~/Downloads/maskedb0_lia.nii.gz,rb)
 seek(con, 352)
 [1] 0
 Warning message:
 In seek.connection(con, 352) :
  seek on a gzfile connection returned an internal error
 seek(con, NA)
 [1] 190

 The same commands with the same file work as expected in R 2.13.1, and
 have worked over many previous versions of R.

 con - gzfile(~/Downloads/maskedb0_lia.nii.gz,rb)
 seek(con, 352)
 [1] 0
 seek(con, NA)
 [1] 352

 My sessionInfo() output is:

 R Under development (unstable) (2011-09-23 r57050)
 Platform: x86_64-apple-darwin11.1.0 (64-bit)

 locale:
 [1] en_GB.UTF-8/en_US.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

 attached base packages:
 [1] splines   stats     graphics  grDevices utils     datasets  methods
 [8] base

 other attached packages:
 [1] tractor.nt_2.0.1      tractor.session_2.0.3 tractor.utils_2.0.0
 [4] tractor.base_2.0.3    reportr_0.2.0

 This seems to occur whether or not R is compiled with
 --with-system-zlib. I see some zlib-related changes mentioned in the
 NEWS, but I don't see any indication that this is expected. Could
 anyone shed any light on it, please?

 Thanks and all the best,
 Jon

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Jeffrey Ryan
jeffrey.r...@lemnica.com

www.lemnica.com
www.esotericR.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issue with seek() on gzipped connections in R-devel

2011-09-23 Thread Jon Clayden
Thanks for the replies. I take the point, although it does seem like a
substantial regression (on non-Windows platforms).

I like to keep the external dependencies of my packages minimal, but I
will look into the mmap package - thanks, Jeff, for the tip.

Aside from that, though, what is the alternative to using seek? If I
want to read something at (original, uncompressed) byte offset 352, as
here, do I have to read and discard everything that comes before it
first? That seems inelegant at best...

Regards,
Jon


On 23 September 2011 16:54, Jeffrey Ryan jeffrey.r...@lemnica.com wrote:
 seek() in general is a bad idea IMO if you are writing cross-platform code.

 ?seek

 Warning:

     Use of ‘seek’ on Windows is discouraged.  We have found so many
     errors in the Windows implementation of file positioning that
     users are advised to use it only at their own risk, and asked not
     to waste the R developers' time with bug reports on Windows'
     deficiencies.

 Aside from making me laugh, the above highlights the core reason to not use 
 IMO.

 For not zipped files, you can try the mmap package.  ?mmap and ?types
 are good starting points.  Allows for accessing binary data on disk
 with very simple R-like semantics, and is very fast.  Not as fast as a
 sequential read... but fast.  At present this is 'little endian' only
 though, but that describes most of the world today.

 Best,
 Jeff

 On Fri, Sep 23, 2011 at 8:58 AM, Jon Clayden jon.clay...@gmail.com wrote:
 Dear all,

 In R-devel (2011-09-23 r57050), I'm running into a serious problem
 with seek()ing on connections opened with gzfile(). A warning is
 generated and the file position does not seek to the requested
 location. It doesn't seem to occur all the time - I tried to create a
 small example file to illustrate it, but the problem didn't occur.
 However, it can be seen with a file I use for testing my packages,
 which is available through the URL
 https://github.com/jonclayden/tractor/blob/master/tests/data/nifti/maskedb0_lia.nii.gz?raw=true:

 con - gzfile(~/Downloads/maskedb0_lia.nii.gz,rb)
 seek(con, 352)
 [1] 0
 Warning message:
 In seek.connection(con, 352) :
  seek on a gzfile connection returned an internal error
 seek(con, NA)
 [1] 190

 The same commands with the same file work as expected in R 2.13.1, and
 have worked over many previous versions of R.

 con - gzfile(~/Downloads/maskedb0_lia.nii.gz,rb)
 seek(con, 352)
 [1] 0
 seek(con, NA)
 [1] 352

 My sessionInfo() output is:

 R Under development (unstable) (2011-09-23 r57050)
 Platform: x86_64-apple-darwin11.1.0 (64-bit)

 locale:
 [1] en_GB.UTF-8/en_US.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

 attached base packages:
 [1] splines   stats     graphics  grDevices utils     datasets  methods
 [8] base

 other attached packages:
 [1] tractor.nt_2.0.1      tractor.session_2.0.3 tractor.utils_2.0.0
 [4] tractor.base_2.0.3    reportr_0.2.0

 This seems to occur whether or not R is compiled with
 --with-system-zlib. I see some zlib-related changes mentioned in the
 NEWS, but I don't see any indication that this is expected. Could
 anyone shed any light on it, please?

 Thanks and all the best,
 Jon

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




 --
 Jeffrey Ryan
 jeffrey.r...@lemnica.com

 www.lemnica.com
 www.esotericR.com


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issue with seek() on gzipped connections in R-devel

2011-09-23 Thread Prof Brian Ripley

On Fri, 23 Sep 2011, Jon Clayden wrote:


Thanks for the replies. I take the point, although it does seem like a
substantial regression (on non-Windows platforms).

I like to keep the external dependencies of my packages minimal, but I
will look into the mmap package - thanks, Jeff, for the tip.

Aside from that, though, what is the alternative to using seek? If I
want to read something at (original, uncompressed) byte offset 352, as
here, do I have to read and discard everything that comes before it
first? That seems inelegant at best...


Or uncompress the file.



Regards,
Jon


On 23 September 2011 16:54, Jeffrey Ryan jeffrey.r...@lemnica.com wrote:

seek() in general is a bad idea IMO if you are writing cross-platform code.

?seek

Warning:

    Use of ‘seek’ on Windows is discouraged.  We have found so many
    errors in the Windows implementation of file positioning that
    users are advised to use it only at their own risk, and asked not
    to waste the R developers' time with bug reports on Windows'
    deficiencies.

Aside from making me laugh, the above highlights the core reason to not use IMO.

For not zipped files, you can try the mmap package.  ?mmap and ?types
are good starting points.  Allows for accessing binary data on disk
with very simple R-like semantics, and is very fast.  Not as fast as a
sequential read... but fast.  At present this is 'little endian' only
though, but that describes most of the world today.

Best,
Jeff

On Fri, Sep 23, 2011 at 8:58 AM, Jon Clayden jon.clay...@gmail.com wrote:

Dear all,

In R-devel (2011-09-23 r57050), I'm running into a serious problem
with seek()ing on connections opened with gzfile(). A warning is
generated and the file position does not seek to the requested
location. It doesn't seem to occur all the time - I tried to create a
small example file to illustrate it, but the problem didn't occur.
However, it can be seen with a file I use for testing my packages,
which is available through the URL
https://github.com/jonclayden/tractor/blob/master/tests/data/nifti/maskedb0_lia.nii.gz?raw=true:


con - gzfile(~/Downloads/maskedb0_lia.nii.gz,rb)
seek(con, 352)

[1] 0
Warning message:
In seek.connection(con, 352) :
 seek on a gzfile connection returned an internal error

seek(con, NA)

[1] 190

The same commands with the same file work as expected in R 2.13.1, and
have worked over many previous versions of R.


con - gzfile(~/Downloads/maskedb0_lia.nii.gz,rb)
seek(con, 352)

[1] 0

seek(con, NA)

[1] 352

My sessionInfo() output is:

R Under development (unstable) (2011-09-23 r57050)
Platform: x86_64-apple-darwin11.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_US.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] tractor.nt_2.0.1      tractor.session_2.0.3 tractor.utils_2.0.0
[4] tractor.base_2.0.3    reportr_0.2.0

This seems to occur whether or not R is compiled with
--with-system-zlib. I see some zlib-related changes mentioned in the
NEWS, but I don't see any indication that this is expected. Could
anyone shed any light on it, please?

Thanks and all the best,
Jon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





--
Jeffrey Ryan
jeffrey.r...@lemnica.com

www.lemnica.com
www.esotericR.com



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel