[Bioc-devel] GenomicRanges: Concatenation of GRanges with matrices in mcols

2014-05-20 Thread Julian Gehring

Hi,

If I want to bind two GRanges object with a matrix in the meta columns, 
the concatenation of the two fails in bioc-stable (GenomicRanges 1.16.3) 
and bioc-devel (GenomicRanges 1.17.13) with:


'''
Error in validObject(.Object) :
  invalid class “GRanges” object: number of rows in DataTable 
'mcols(x)' must match length of 'x'

'''

If multiple columns are used, the class of of the first column seem to 
determine the behavior:


#+BEGIN_SRC R
  library(GenomicRanges)

  ## sample data, two identical GRanges
  gr1 = gr2 = GRanges(1, IRanges(1:2, width = 1))
  m = matrix(1:4, 2)

  ## the vector alone works
  mcols(gr1) = mcols(gr2) = DataFrame(x = 1)
  c(gr1, gr2) ## works

  ## vector first, matrix second works
  mcols(gr1) = mcols(gr2) = DataFrame(x = 1, m = I(m))
  c(gr1, gr2) ## works

  ## the matrix alone fails
  mcols(gr1) = mcols(gr2) = DataFrame(m = I(m))
  c(gr1, gr2) ## fails

  ## matrix first, vector second fails
  mcols(gr1) = mcols(gr2) = DataFrame(m = I(m), x = 1)
  c(gr1, gr2) ## fails
#+END_SRC

Best wishes
Julian

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Contradictory Clinical Data in curatedOvarianData

2014-05-20 Thread Dario Strbenac
library(curatedOvarianData)
data(TCGA_eset)
clinical - pData(TCGA_eset)[, 1:20]
alive - clinical[clinical[, vital_status] == living, ]
head(alive[, c(days_to_death, vital_status, days_to_tumor_recurrence)])

 days_to_death vital_status days_to_tumor_recurrence
TCGA.20.0990   789   living  870
TCGA.23.1118  2616   living 2616
TCGA.23.1026   816   living  797
TCGA.20.0991   797   living  797
TCGA.23.1119  3953   living 3378
TCGA.23.1028  1503   living  133

I thought days_to_death would be NA for living samples. The first sample has a 
recurrence time larger than the days to death. I realise that days_to_death can 
also contain time to last follow-up, but then why isn't it the same as 
days_to_tumor_recurrence, for the first sample ?

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Contradictory Clinical Data in curatedOvarianData

2014-05-20 Thread Topsoil Wang
if living, the time-to-death usually refers to last follow-up time. it's
just censored in cox model.
However, the first patient example indeed looks suspicious to me:

 days_to_death vital_status
days_to_tumor_recurrence
TCGA.20.0990   789 living  870

if this patient has been followed up in the 870-th day, determining
recurrence yes/no, then the last follow-up date should be at least 870; in
another word, I always expect time-to-death or last follow-up larger or
equal to time-to-recurrence.

it'll be more informative to print out recurrence_status as well.

Chen




On Tue, May 20, 2014 at 4:00 AM, Dario Strbenac
dstr7...@uni.sydney.edu.auwrote:

 library(curatedOvarianData)
 data(TCGA_eset)
 clinical - pData(TCGA_eset)[, 1:20]
 alive - clinical[clinical[, vital_status] == living, ]
 head(alive[, c(days_to_death, vital_status,
 days_to_tumor_recurrence)])

  days_to_death vital_status days_to_tumor_recurrence
 TCGA.20.0990   789   living  870
 TCGA.23.1118  2616   living 2616
 TCGA.23.1026   816   living  797
 TCGA.20.0991   797   living  797
 TCGA.23.1119  3953   living 3378
 TCGA.23.1028  1503   living  133

 I thought days_to_death would be NA for living samples. The first sample
 has a recurrence time larger than the days to death. I realise that
 days_to_death can also contain time to last follow-up, but then why isn't
 it the same as days_to_tumor_recurrence, for the first sample ?

 --
 Dario Strbenac
 PhD Student
 University of Sydney
 Camperdown NSW 2050
 Australia

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Slow performance on scanBam

2014-05-20 Thread Martin Morgan

On 05/13/2014 08:17 AM, James Bullard wrote:

Hi Martin, thanks for the quick response. The data is certainly shareable. Here
is a link to a bam + bai + sam file that I have been using for benchmarking:
https://www.dropbox.com/s/eat31mnmmco1zoh/example-bam.tar.bz2

There is a method in SAM to elide the reference names from a header, but I think
they are just shunted to another file so I gave up on that track. Since I only
end up aligning to a small fraction of the transcripts, I might be able to
post-process the file, but it would be best to process as-is.


Hi Jim -- I updated the seqinfo,BamFile-method to do more work in C, and for 
scanBamHeader to optionally parse only the targets|text part of the header. I 
also reverted a change to seqinfo,BamFile-method, introduced in Rsamtools 
version 1.15.28, to try to place seqlevels into 'natural' order; now they are 
returned in the order they appear in the file.


Together these should make for much faster code, for your sim.bam about 3.5 (vs 
185) seconds for seqinfo, and ~7s for scanBam.


This is in Rsamtools version 1.17.16, which is in svn now but won't make it to 
biocLite until tomorrow, all being well...


Martin



thanks again, jim



On Tue, May 13, 2014 at 5:16 AM, Martin Morgan mtmor...@fhcrc.org
mailto:mtmor...@fhcrc.org wrote:

Hi James -- I don't think there's anything in existence to make this easier,
but I'll expose something in the next 24 hours; is your data shareable?
There might be deeper things to be done for processing this
small-but-numerous style data.

Martin


On 05/12/2014 05:32 PM, James Bullard wrote:

I've been dealing with some small bam files with millions of reference
sequences leading to monster headers. As one might guess, this leads to
pretty bad performance when calling scanBam.

Right now, it takes a bit (27MB bam file, 16k alignments, 2.5 million
reference sequences in the reference fasta file):

scanBam(sim.fasta-L27-ma8-__mp6-rfg5_2-rdg3_1.bam)

 user  system elapsed
186.264   0.528 186.934

I've traced it down to scanBamHeader and seqinfo-BamFile, the 
problematic
code is in scanBamHeader which processes the entire header when all 
seqinfo
needs is the `targets` portion of the list. Additionally, the
order(rankSeqLevels(.)) doesn't scale either. So I've replaced this as
well. I've changed the body of seqinfo-BamFile from:

  h - scanBamHeader(x)[[targets]]
  o - order(rankSeqlevels(names(h)))
  Seqinfo(names(h)[o], unname(h)[o])

to this:

  if (!isOpen(x)) {
  open(x)
  on.exit(close(x))
  }
  h - .Call(.read_bamfile_header, .extptr(x))$targets
  Seqinfo(names(h), unname(h))

We then get:

scanBam(sim.fasta-L27-ma8-__mp6-rfg5_2-rdg3_1.bam)

 user  system elapsed
   14.780   0.360  15.158

Which is still pretty slow for how small these files are, but I can
probably live with that.

Two questions:
-- do we need the: order(rankSeqlevels(names(h))) bit? that does change 
the
return value, but I can certainly live with the ordering in the file.
-- what else am I missing?

I can send a patch if need be, but would like to hear from the 
cognoscenti
first if there is a built-in way to avoid this overhead.

thanks, jim

 [[alternative HTML version deleted]]

_
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793 tel:%28206%29%20667-2793





--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] GenomicRanges: Concatenation of GRanges with matrices in mcols

2014-05-20 Thread Hervé Pagès

Hi Julian,

At the root of the problem is what rbind() does on DataFrames containing
matrices:

  m - matrix(1:4, nrow=2)
  df - DataFrame(m=I(m))
  df2 - rbind(df, df)

Then:

   df2
  DataFrame with 8 rows and 1 column
   m
matrix
  1  1 3
  2  2 4
  3  1 3
  4  2 4

   nrow(df2)
  [1] 8

Too many rows!

   str(df2)
  Formal class 'DataFrame' [package IRanges] with 6 slots
..@ rownames   : NULL
..@ nrows  : int 12
..@ listData   :List of 1
.. ..$ m: int [1:6, 1:2] 1 2 3 1 2 3 4 5 6 4 ...
..@ elementType: chr ANY
..@ elementMetadata: NULL
..@ metadata   : list()

   validObject(df2)
  [1] TRUE

I'll leave this to Michael.

Thanks,
H.


On 05/20/2014 01:22 AM, Julian Gehring wrote:

Hi,

If I want to bind two GRanges object with a matrix in the meta columns,
the concatenation of the two fails in bioc-stable (GenomicRanges 1.16.3)
and bioc-devel (GenomicRanges 1.17.13) with:

'''
Error in validObject(.Object) :
   invalid class “GRanges” object: number of rows in DataTable
'mcols(x)' must match length of 'x'
'''

If multiple columns are used, the class of of the first column seem to
determine the behavior:

#+BEGIN_SRC R
   library(GenomicRanges)

   ## sample data, two identical GRanges
   gr1 = gr2 = GRanges(1, IRanges(1:2, width = 1))
   m = matrix(1:4, 2)

   ## the vector alone works
   mcols(gr1) = mcols(gr2) = DataFrame(x = 1)
   c(gr1, gr2) ## works

   ## vector first, matrix second works
   mcols(gr1) = mcols(gr2) = DataFrame(x = 1, m = I(m))
   c(gr1, gr2) ## works

   ## the matrix alone fails
   mcols(gr1) = mcols(gr2) = DataFrame(m = I(m))
   c(gr1, gr2) ## fails

   ## matrix first, vector second fails
   mcols(gr1) = mcols(gr2) = DataFrame(m = I(m), x = 1)
   c(gr1, gr2) ## fails
#+END_SRC

Best wishes
Julian

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] GenomicRanges: Concatenation of GRanges with matrices in mcols

2014-05-20 Thread Hervé Pagès



On 05/20/2014 12:49 PM, Hervé Pagès wrote:

Hi Julian,

At the root of the problem is what rbind() does on DataFrames containing
matrices:

   m - matrix(1:4, nrow=2)
   df - DataFrame(m=I(m))
   df2 - rbind(df, df)

Then:

df2
   DataFrame with 8 rows and 1 column
m
 matrix
   1  1 3
   2  2 4
   3  1 3
   4  2 4

nrow(df2)
   [1] 8

Too many rows!

str(df2)
   Formal class 'DataFrame' [package IRanges] with 6 slots
 ..@ rownames   : NULL
 ..@ nrows  : int 12
 ..@ listData   :List of 1
 .. ..$ m: int [1:6, 1:2] 1 2 3 1 2 3 4 5 6 4 ...
 ..@ elementType: chr ANY
 ..@ elementMetadata: NULL
 ..@ metadata   : list()


Sorry, I mixed up outputs from different sessions. Correct str() output:

   str(df2)
  Formal class 'DataFrame' [package IRanges] with 6 slots
..@ rownames   : NULL
..@ nrows  : int 8
..@ listData   :List of 1
.. ..$ m: int [1:4, 1:2] 1 2 1 2 3 4 3 4
..@ elementType: chr ANY
..@ elementMetadata: NULL
..@ metadata   : list()

H.



validObject(df2)
   [1] TRUE

I'll leave this to Michael.

Thanks,
H.


On 05/20/2014 01:22 AM, Julian Gehring wrote:

Hi,

If I want to bind two GRanges object with a matrix in the meta columns,
the concatenation of the two fails in bioc-stable (GenomicRanges 1.16.3)
and bioc-devel (GenomicRanges 1.17.13) with:

'''
Error in validObject(.Object) :
   invalid class “GRanges” object: number of rows in DataTable
'mcols(x)' must match length of 'x'
'''

If multiple columns are used, the class of of the first column seem to
determine the behavior:

#+BEGIN_SRC R
   library(GenomicRanges)

   ## sample data, two identical GRanges
   gr1 = gr2 = GRanges(1, IRanges(1:2, width = 1))
   m = matrix(1:4, 2)

   ## the vector alone works
   mcols(gr1) = mcols(gr2) = DataFrame(x = 1)
   c(gr1, gr2) ## works

   ## vector first, matrix second works
   mcols(gr1) = mcols(gr2) = DataFrame(x = 1, m = I(m))
   c(gr1, gr2) ## works

   ## the matrix alone fails
   mcols(gr1) = mcols(gr2) = DataFrame(m = I(m))
   c(gr1, gr2) ## fails

   ## matrix first, vector second fails
   mcols(gr1) = mcols(gr2) = DataFrame(m = I(m), x = 1)
   c(gr1, gr2) ## fails
#+END_SRC

Best wishes
Julian

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Slow performance on scanBam

2014-05-20 Thread James Bullard
Hi Martin,

Thanks for the fix, I'll check it out.

jim


On Tue, May 20, 2014 at 12:35 PM, Martin Morgan mtmor...@fhcrc.org wrote:

 On 05/13/2014 08:17 AM, James Bullard wrote:

 Hi Martin, thanks for the quick response. The data is certainly
 shareable. Here
 is a link to a bam + bai + sam file that I have been using for
 benchmarking:
 https://www.dropbox.com/s/eat31mnmmco1zoh/example-bam.tar.bz2

 There is a method in SAM to elide the reference names from a header, but
 I think
 they are just shunted to another file so I gave up on that track. Since I
 only
 end up aligning to a small fraction of the transcripts, I might be able to
 post-process the file, but it would be best to process as-is.


 Hi Jim -- I updated the seqinfo,BamFile-method to do more work in C, and
 for scanBamHeader to optionally parse only the targets|text part of the
 header. I also reverted a change to seqinfo,BamFile-method, introduced in
 Rsamtools version 1.15.28, to try to place seqlevels into 'natural' order;
 now they are returned in the order they appear in the file.

 Together these should make for much faster code, for your sim.bam about
 3.5 (vs 185) seconds for seqinfo, and ~7s for scanBam.

 This is in Rsamtools version 1.17.16, which is in svn now but won't make
 it to biocLite until tomorrow, all being well...

 Martin


 thanks again, jim



 On Tue, May 13, 2014 at 5:16 AM, Martin Morgan mtmor...@fhcrc.org
 mailto:mtmor...@fhcrc.org wrote:

 Hi James -- I don't think there's anything in existence to make this
 easier,
 but I'll expose something in the next 24 hours; is your data
 shareable?
 There might be deeper things to be done for processing this
 small-but-numerous style data.

 Martin


 On 05/12/2014 05:32 PM, James Bullard wrote:

 I've been dealing with some small bam files with millions of
 reference
 sequences leading to monster headers. As one might guess, this
 leads to
 pretty bad performance when calling scanBam.

 Right now, it takes a bit (27MB bam file, 16k alignments, 2.5
 million
 reference sequences in the reference fasta file):

 scanBam(sim.fasta-L27-ma8-__mp6-rfg5_2-rdg3_1.bam)


  user  system elapsed
 186.264   0.528 186.934

 I've traced it down to scanBamHeader and seqinfo-BamFile, the
 problematic
 code is in scanBamHeader which processes the entire header when
 all seqinfo
 needs is the `targets` portion of the list. Additionally, the
 order(rankSeqLevels(.)) doesn't scale either. So I've replaced
 this as
 well. I've changed the body of seqinfo-BamFile from:

   h - scanBamHeader(x)[[targets]]
   o - order(rankSeqlevels(names(h)))
   Seqinfo(names(h)[o], unname(h)[o])

 to this:

   if (!isOpen(x)) {
   open(x)
   on.exit(close(x))
   }
   h - .Call(.read_bamfile_header, .extptr(x))$targets
   Seqinfo(names(h), unname(h))

 We then get:

 scanBam(sim.fasta-L27-ma8-__mp6-rfg5_2-rdg3_1.bam)


  user  system elapsed
14.780   0.360  15.158

 Which is still pretty slow for how small these files are, but I
 can
 probably live with that.

 Two questions:
 -- do we need the: order(rankSeqlevels(names(h))) bit? that does
 change the
 return value, but I can certainly live with the ordering in the
 file.
 -- what else am I missing?

 I can send a patch if need be, but would like to hear from the
 cognoscenti
 first if there is a built-in way to avoid this overhead.

 thanks, jim

  [[alternative HTML version deleted]]

 _
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/__listinfo/bioc-devel

 https://stat.ethz.ch/mailman/listinfo/bioc-devel



 --
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109

 Location: Arnold Building M1 B861
 Phone: (206) 667-2793 tel:%28206%29%20667-2793




 --
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109

 Location: Arnold Building M1 B861
 Phone: (206) 667-2793


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] GenomicRanges: Concatenation of GRanges with matrices in mcols

2014-05-20 Thread Michael Lawrence
Fixed in release and devel (IRanges).

Thanks,
Michael


On Tue, May 20, 2014 at 1:22 AM, Julian Gehring julian.gehr...@embl.dewrote:

 Hi,

 If I want to bind two GRanges object with a matrix in the meta columns,
 the concatenation of the two fails in bioc-stable (GenomicRanges 1.16.3)
 and bioc-devel (GenomicRanges 1.17.13) with:

 '''
 Error in validObject(.Object) :
   invalid class “GRanges” object: number of rows in DataTable 'mcols(x)'
 must match length of 'x'
 '''

 If multiple columns are used, the class of of the first column seem to
 determine the behavior:

 #+BEGIN_SRC R
   library(GenomicRanges)

   ## sample data, two identical GRanges
   gr1 = gr2 = GRanges(1, IRanges(1:2, width = 1))
   m = matrix(1:4, 2)

   ## the vector alone works
   mcols(gr1) = mcols(gr2) = DataFrame(x = 1)
   c(gr1, gr2) ## works

   ## vector first, matrix second works
   mcols(gr1) = mcols(gr2) = DataFrame(x = 1, m = I(m))
   c(gr1, gr2) ## works

   ## the matrix alone fails
   mcols(gr1) = mcols(gr2) = DataFrame(m = I(m))
   c(gr1, gr2) ## fails

   ## matrix first, vector second fails
   mcols(gr1) = mcols(gr2) = DataFrame(m = I(m), x = 1)
   c(gr1, gr2) ## fails
 #+END_SRC

 Best wishes
 Julian

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Rd] [patch] Add support for editor function in edit.default

2014-05-20 Thread Scott Kostyshak
Regarding the following extract of ?options:
 ‘editor’: a non-empty string, or a function that is called with a
  file path as argument.

edit.default currently calls the function with three arguments: name,
file, and title. For example, running the following

vimCmd - 'vim -c set ft=r'
vimEdit - function(file_) system(paste(vimCmd, file_))
options(editor = vimEdit)
myls - edit(ls)

gives Error in editor(name, file, title) : unused arguments (file, title).

The attached patch changes edit.default to call the editor function
with just the file path. There is at least one inconsistent behavior
that this patch causes in its current form. It does not obey the
following (from ?edit):
 Calling ‘edit()’, with no arguments, will result in the temporary
file being reopened for further editing.

I see two ways to address this: (1) add a getEdFile() function to
utils/edit.R that calls a function getEd() defined in edit.c that
returns DefaultFileName; or (2) this patch could be rewritten in C in
a new function in edit.c.

Is there any interest in this patch?
If not, would there be interest in an update of the docs, either
?options (stating the possibility that if 'editor' is a function, it
might be called with 'name', 'file', and 'title' arguments) or ?edit
 ?

Scott


 sessionInfo()
R Under development (unstable) (2014-05-20 r65677)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


--
Scott Kostyshak
Economics PhD Candidate
Princeton University
Index: src/library/utils/R/edit.R
===
--- src/library/utils/R/edit.R	(revision 65677)
+++ src/library/utils/R/edit.R	(working copy)
@@ -53,7 +53,13 @@
   editor = getOption(editor), ...)
 {
 if (is.null(title)) title - deparse(substitute(name))
-if (is.function(editor)) invisible(editor(name, file, title))
+if (is.function(editor)) {
+if (file == ) file - tempfile()
+objDep - if (is.null(name))  else deparse(name)
+writeLines(objDep, con = file)
+editor(file)
+eval(parse(file))
+}
 else .External2(C_edit, name, file, title, editor)
 }
 
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Question about fifo behavior on Linux between versions 3.0.3 and 3.1.0

2014-05-20 Thread James Smith
Version 3.1.0 of R has imposed a very small data limit on writing to fifos on 
Linux. Consider the following R code (Assumes that ff is a fifo in the R 
process's current directory):

con - fifo(ff, a+b)
writeBin(raw(12501), con)

In R 3.0.3, this returns without error and the data is available on the fifo. 
In R 3.1.0, however, this returns the following error:

Error in writeBin(raw(12501), con) : too large a block specified

In investigating R's source, the difference seems to be in 
src/main/connections.c, in the function fifo_write() (around line 932). In R 
3.0.3, fifo_write() has these lines:

if ((double) size * (double) nitems  SSIZE_MAX)
error(_(too large a block specified));

R 3.1.0 has these lines changed to this:

if ((size * sizeof(wchar_t) * nitems)  5) {
  error(_(too large a block specified));
}

The change effectively places a limit of 12500 bytes on writes (since 
sizeof(wchar_t) == 4). Does anyone know why this change was made? I understand 
that fifos on Windows were implemented for R 3.1.0, but the code for fifos on 
Windows is in a separate part of connections.c that doesn't get compiled on 
Linux (i.e., the code given is Unix only). I also couldn't find any references 
to fifo behavior changes under Linux in any of R's documentation.

My platform is Fedora 20 (64-bit) and I have built and installed R from source.

Thank you for your time and consideration.

James O Smith
Harmonia Holdings Group, LLC

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question about fifo behavior on Linux between versions 3.0.3 and 3.1.0

2014-05-20 Thread Prof Brian Ripley
It _was_ part of the fifo for Windows patch.  As if does not seem to be 
needed for Windows, it has been reverted.



On 20/05/2014 16:02, James Smith wrote:

Version 3.1.0 of R has imposed a very small data limit on writing to fifos on Linux. 
Consider the following R code (Assumes that ff is a fifo in the R process's 
current directory):

con - fifo(ff, a+b)
writeBin(raw(12501), con)

In R 3.0.3, this returns without error and the data is available on the fifo. 
In R 3.1.0, however, this returns the following error:

Error in writeBin(raw(12501), con) : too large a block specified

In investigating R's source, the difference seems to be in 
src/main/connections.c, in the function fifo_write() (around line 932). In R 
3.0.3, fifo_write() has these lines:

 if ((double) size * (double) nitems  SSIZE_MAX)
error(_(too large a block specified));

R 3.1.0 has these lines changed to this:

 if ((size * sizeof(wchar_t) * nitems)  5) {
   error(_(too large a block specified));
 }

The change effectively places a limit of 12500 bytes on writes (since 
sizeof(wchar_t) == 4). Does anyone know why this change was made? I understand 
that fifos on Windows were implemented for R 3.1.0, but the code for fifos on 
Windows is in a separate part of connections.c that doesn't get compiled on 
Linux (i.e., the code given is Unix only). I also couldn't find any references 
to fifo behavior changes under Linux in any of R's documentation.

My platform is Fedora 20 (64-bit) and I have built and installed R from source.

Thank you for your time and consideration.

James O Smith
Harmonia Holdings Group, LLC

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel