[Bioc-devel] bug when coercing from list to SimpleList

2014-09-05 Thread Hervé Pagès

Hi Michael,

I found the following bug when coercing a list to a SimpleList
with IRanges devel (not with IRanges release):

  library(IRanges)
  x - list(a=matrix(rep(a, 6), nrow=3),
b=array(rep(b, 24), dim=c(3,4,2)))

Then:

   sapply(as(x, SimpleList), class)
 ab
  matrix matrix

   lapply(as(x, SimpleList), dim)
  $a
  [1] 3 2

  $b
  [1] 24  1

The array was turned into a matrix!

Note that the SimpleList() constructor behaves as expected:

   sapply(SimpleList(x), class)
 ab
  matrix  array

   lapply(SimpleList(x), dim)
  $a
  [1] 3 2

  $b
  [1] 3 4 2

Do you think you can have a look?

Thanks,
H.

 sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] IRanges_1.99.25 S4Vectors_0.1.5 BiocGenerics_0.11.4

loaded via a namespace (and not attached):
[1] stats4_3.1.0

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] writeVcf performance

2014-09-05 Thread Kasper Daniel Hansen
This approach, writing in chunks, is the same Herve and I used for writing
FASTA in the Biostrings package, although I see that Herve has now replaced
the R implementation with a C implementation.  I similarly found an
absolutely huge speed up when writing genomes, by chunking.

Best,
Kasper


On Tue, Sep 2, 2014 at 4:33 PM, Martin Morgan mtmor...@fhcrc.org wrote:

 On 08/27/2014 11:56 AM, Gabe Becker wrote:

 The profiling I attached in my previous email is for 24 geno fields, as I
 said,
 but our typical usecase involves only ~4-6 fields, and is faster but
 still on
 the order of dozens of minutes.


 I think Val is arriving at a (much) more efficient implementation, but...

 I wanted to share my guess that the poor _scaling_ is because the garbage
 collector runs multiple times as the different strings are pasted together,
 and has to traverse, in linear time, increasing numbers of allocated SEXPs.
 So times scale approximately quadratically with the number of rows in the
 VCF

 An efficiency is to reduce the number of SEXPs in play by writing out in
 chunks -- as each chunk is written, the SEXPs become available for
 collection and are re-used. Here's my toy example

 time.R
 ==
 splitIndices - function (nx, ncl)
 {
 i - seq_len(nx)
 if (ncl == 0L)
 list()
 else if (ncl == 1L || nx == 1L)
 list(i)
 else {
 fuzz - min((nx - 1L)/1000, 0.4 * nx/ncl)
 breaks - seq(1 - fuzz, nx + fuzz, length = ncl + 1L)
 structure(split(i, cut(i, breaks, labels=FALSE)), names = NULL)
 }
 }

 x = as.character(seq_len(1e7)); y = sample(x)
 if (!is.na(Sys.getenv(SPLIT, NA))) {
 idx - splitIndices(length(x), 20)
 system.time(for (i in idx) paste(x[i], y[i], sep=:))
 } else {
 system.time(paste(x, y, sep=:))
 }


 running under R-devel with $ SPLIT=TRUE R --no-save --quiet -f time.R the
 relevant time is

user  system elapsed
  15.320   0.064  15.381

 versus with $ R --no-save --quiet -f time.R it is

user  system elapsed
  95.360   0.164  95.511

 I think this is likely an overall strategy when dealing with character
 data -- processing in independent chunks of moderate (1M?) size (enabling
 as a consequence parallel evaluation in modest memory) that are sufficient
 to benefit from vectorization, but that do not entail allocation of large
 numbers of in-use SEXPs.

 Martin


 Sorry for the confusion.
 ~G


 On Wed, Aug 27, 2014 at 11:45 AM, Gabe Becker becke...@gene.com
 mailto:becke...@gene.com wrote:

 Martin and Val.

 I re-ran writeVcf on our (G)VCF data (34790518 ranges, 24 geno
 fields) with
 profiling enabled. The results of summaryRprof for that run are
 attached,
 though for a variety of reasons they are pretty misleading.

 It took over an hour to write (3700+seconds), so it's definitely a
 bottleneck when the data get very large, even if it isn't for smaller
 data.

 Michael and I both think the culprit is all the pasting and cbinding
 that is
 going on, and more to the point, that memory for an internal
 representation
 to be written out is allocated at all.  Streaming across the object,
 looping
 by rows and writing directly to file (e.g. from C) should be
 blisteringly
 fast in comparison.

 ~G


 On Tue, Aug 26, 2014 at 11:57 AM, Michael Lawrence micha...@gene.com
 mailto:micha...@gene.com wrote:

 Gabe is still testing/profiling, but we'll send something
 randomized
 along eventually.


 On Tue, Aug 26, 2014 at 11:15 AM, Martin Morgan 
 mtmor...@fhcrc.org
 mailto:mtmor...@fhcrc.org wrote:

 I didn't see in the original thread a reproducible
 (simulated, I
 guess) example, to be explicit about what the problem is??

 Martin


 On 08/26/2014 10:47 AM, Michael Lawrence wrote:

 My understanding is that the heap optimization provided
 marginal
 gains, and
 that we need to think harder about how to optimize the
 all of
 the string
 manipulation in writeVcf. We either need to reduce it or
 reduce its
 overhead (i.e., the CHARSXP allocation). Gabe is doing
 more tests.


 On Tue, Aug 26, 2014 at 9:43 AM, Valerie Obenchain
 voben...@fhcrc.org mailto:voben...@fhcrc.org
 wrote:

 Hi Gabe,

 Martin responded, and so did Michael,

 https://stat.ethz.ch/__pipermail/bioc-devel/2014-__
 August/006082.html

 https://stat.ethz.ch/pipermail/bioc-devel/2014-
 August/006082.html

 It sounded like Michael was ok with working
 with/around heap
 initialization.

 Michael, is that right or should we still consider
 this on
 the table?


 Val


 On 08/26/2014 09:34 AM, Gabe Becker 

Re: [Bioc-devel] bug when coercing from list to SimpleList

2014-09-05 Thread Michael Lawrence
Thanks, just a typo. Fixed in S4Vectors 0.2.1.


On Fri, Sep 5, 2014 at 12:38 AM, Hervé Pagès hpa...@fhcrc.org wrote:

 Hi Michael,

 I found the following bug when coercing a list to a SimpleList
 with IRanges devel (not with IRanges release):

   library(IRanges)
   x - list(a=matrix(rep(a, 6), nrow=3),
 b=array(rep(b, 24), dim=c(3,4,2)))

 Then:

sapply(as(x, SimpleList), class)
  ab
   matrix matrix

lapply(as(x, SimpleList), dim)
   $a
   [1] 3 2

   $b
   [1] 24  1

 The array was turned into a matrix!

 Note that the SimpleList() constructor behaves as expected:

sapply(SimpleList(x), class)
  ab
   matrix  array

lapply(SimpleList(x), dim)
   $a
   [1] 3 2

   $b
   [1] 3 4 2

 Do you think you can have a look?

 Thanks,
 H.

  sessionInfo()
 R version 3.1.0 (2014-04-10)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] parallel  stats graphics  grDevices utils datasets  methods
 [8] base

 other attached packages:
 [1] IRanges_1.99.25 S4Vectors_0.1.5 BiocGenerics_0.11.4

 loaded via a namespace (and not attached):
 [1] stats4_3.1.0

 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fhcrc.org
 Phone:  (206) 667-5791
 Fax:(206) 667-1319

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Please bump version number when committing changes

2014-09-05 Thread Stephanie M. Gogarten
I am guilty of doing this today, but I have (I think) a good reason. 
I'm making a bunch of changes that are all related to each other, but 
are being implemented and tested in stages.  I'd like to use svn to 
commit when I've made a set of changes that works, so I can roll back if 
I break something in the next step, but I'd like the users to see them 
all at once as a single version update.  Perhaps others are doing 
something similar?


Stephanie

On 9/4/14, 12:04 PM, Dan Tenenbaum wrote:

Hello,

Looking through our svn logs, I see that there are many commits that are not 
accompanied by version bumps.
All svn commits (or, if you are using the git-svn bridge, every group of commits included 
in a push) should include a version bump (that is, incrementing the z segment 
of the x.y.z version number). This practice is documented at 
http://www.bioconductor.org/developers/how-to/version-numbering/ .

Failure to bump the version has two consequences:

1) Your changes will not propagate to our package repository or web site, so 
users installing your package via biocLite() will not receive the latest 
changes unless you bump the version.

2) Users *can* always get the current files of your package using Subversion, 
but if you've made changes without bumping the version number, it can be 
difficult to troubleshoot problems. If two people are looking at what appears 
to be the same version of a package, but it's behaving differently, it can be 
really frustrating to realize that the packages actually differ (but not by 
version number).

So if you're not already, please get in the habit of bumping the version number 
with each set of changes you commit.

Let us know on bioc-devel if you have any questions about this.

Thanks,
Dan

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Please bump version number when committing changes

2014-09-05 Thread Dan Tenenbaum


- Original Message -
 From: Stephanie M. Gogarten sdmor...@u.washington.edu
 To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel 
 bioc-devel@r-project.org
 Sent: Friday, September 5, 2014 4:27:13 PM
 Subject: Re: [Bioc-devel] Please bump version number when committing changes
 
 I am guilty of doing this today, but I have (I think) a good reason.
 I'm making a bunch of changes that are all related to each other, but
 are being implemented and tested in stages.  I'd like to use svn to
 commit when I've made a set of changes that works, so I can roll back
 if
 I break something in the next step, but I'd like the users to see
 them
 all at once as a single version update.  Perhaps others are doing
 something similar?
 

I understand the motivation but this still results in an ambiguous state if two 
different people check out your package from svn at different times today 
(before and after your changes). 

Version numbers are cheap, so if version 1.2.3 exists for a day before version 
1.2.4 (which contains all the changes you want to push to your users) then 
that's ok, IMO.

Including a version bump doesn't impact whether or not you can rollback a 
commit with svn.

Dan



 Stephanie
 
 On 9/4/14, 12:04 PM, Dan Tenenbaum wrote:
  Hello,
 
  Looking through our svn logs, I see that there are many commits
  that are not accompanied by version bumps.
  All svn commits (or, if you are using the git-svn bridge, every
  group of commits included in a push) should include a version bump
  (that is, incrementing the z segment of the x.y.z version
  number). This practice is documented at
  http://www.bioconductor.org/developers/how-to/version-numbering/ .
 
  Failure to bump the version has two consequences:
 
  1) Your changes will not propagate to our package repository or web
  site, so users installing your package via biocLite() will not
  receive the latest changes unless you bump the version.
 
  2) Users *can* always get the current files of your package using
  Subversion, but if you've made changes without bumping the version
  number, it can be difficult to troubleshoot problems. If two
  people are looking at what appears to be the same version of a
  package, but it's behaving differently, it can be really
  frustrating to realize that the packages actually differ (but not
  by version number).
 
  So if you're not already, please get in the habit of bumping the
  version number with each set of changes you commit.
 
  Let us know on bioc-devel if you have any questions about this.
 
  Thanks,
  Dan
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Please bump version number when committing changes

2014-09-05 Thread Gabe Becker
Dan,

If that is is a hard BioC policy I'll endeavor to follow it (I do already
in the vast majority of cases), but I must say it makes the Bioc repository
much less useful from a development standpoint.

There are lots of reason to commit code that doesn't work and shouldn't yet
be deployed, from portability between machines to simple preservation of
work in progress. What is the suggested behavior in the under heavy
development and not safe but I don't want to lose days of work case?

~G


On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org wrote:



 - Original Message -
  From: Stephanie M. Gogarten sdmor...@u.washington.edu
  To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel 
 bioc-devel@r-project.org
  Sent: Friday, September 5, 2014 4:27:13 PM
  Subject: Re: [Bioc-devel] Please bump version number when committing
 changes
 
  I am guilty of doing this today, but I have (I think) a good reason.
  I'm making a bunch of changes that are all related to each other, but
  are being implemented and tested in stages.  I'd like to use svn to
  commit when I've made a set of changes that works, so I can roll back
  if
  I break something in the next step, but I'd like the users to see
  them
  all at once as a single version update.  Perhaps others are doing
  something similar?
 

 I understand the motivation but this still results in an ambiguous state
 if two different people check out your package from svn at different times
 today (before and after your changes).

 Version numbers are cheap, so if version 1.2.3 exists for a day before
 version 1.2.4 (which contains all the changes you want to push to your
 users) then that's ok, IMO.

 Including a version bump doesn't impact whether or not you can rollback a
 commit with svn.

 Dan



  Stephanie
 
  On 9/4/14, 12:04 PM, Dan Tenenbaum wrote:
   Hello,
  
   Looking through our svn logs, I see that there are many commits
   that are not accompanied by version bumps.
   All svn commits (or, if you are using the git-svn bridge, every
   group of commits included in a push) should include a version bump
   (that is, incrementing the z segment of the x.y.z version
   number). This practice is documented at
   http://www.bioconductor.org/developers/how-to/version-numbering/ .
  
   Failure to bump the version has two consequences:
  
   1) Your changes will not propagate to our package repository or web
   site, so users installing your package via biocLite() will not
   receive the latest changes unless you bump the version.
  
   2) Users *can* always get the current files of your package using
   Subversion, but if you've made changes without bumping the version
   number, it can be difficult to troubleshoot problems. If two
   people are looking at what appears to be the same version of a
   package, but it's behaving differently, it can be really
   frustrating to realize that the packages actually differ (but not
   by version number).
  
   So if you're not already, please get in the habit of bumping the
   version number with each set of changes you commit.
  
   Let us know on bioc-devel if you have any questions about this.
  
   Thanks,
   Dan
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
 

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Please bump version number when committing changes

2014-09-05 Thread Ryan C. Thompson

Hi all,

Just to throw in a suggestion here, I know that many people use a tool 
like git-svn in this kind of situation. They want the ability to make 
multiple small commits in order to save their progress, but they don't 
want those commits visible until they are ready to push all at once. 
This allows one to make breaking changes in one commit that are fixed 
by subsequent commits, because the intermediate states will never be 
exposed.


For information on git-svn, see here: 
http://git-scm.com/book/en/Git-and-Other-Systems-Git-and-Subversion


Note that I don't personally have any experience with svn or with 
git-svn, but this seems like exactly the use case for it.


-Ryan

On Fri 05 Sep 2014 04:50:49 PM PDT, Peter Haverty wrote:

Hi all,

I respectfully disagree.  One should certainly check in each discrete unit
of work.  These will often not result in something that is ready to be used
by someone else.  Bumping the version number constitutes a new release and
carries the implicit promise that the package works again.  This is why
continuous integration systems do a build when the version number changes.

One should expect working software when installing a pre-build package (the
tests passed, right?).  Checking out from SVN is for developers of that
package and nothing should be assumed about the current state of the code.

To keep everyone happy, one could add a commit hook to our SVN setup that
would add the SVN revision number to the version string.  This would be for
dev only and hopefully not sufficient to trigger a build.

That's my two cents.  Happy weekend all.

Regards,



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com


On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org wrote:




- Original Message -

From: Stephanie M. Gogarten sdmor...@u.washington.edu
To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel 

bioc-devel@r-project.org

Sent: Friday, September 5, 2014 4:27:13 PM
Subject: Re: [Bioc-devel] Please bump version number when committing

changes


I am guilty of doing this today, but I have (I think) a good reason.
I'm making a bunch of changes that are all related to each other, but
are being implemented and tested in stages.  I'd like to use svn to
commit when I've made a set of changes that works, so I can roll back
if
I break something in the next step, but I'd like the users to see
them
all at once as a single version update.  Perhaps others are doing
something similar?



I understand the motivation but this still results in an ambiguous state
if two different people check out your package from svn at different times
today (before and after your changes).

Version numbers are cheap, so if version 1.2.3 exists for a day before
version 1.2.4 (which contains all the changes you want to push to your
users) then that's ok, IMO.

Including a version bump doesn't impact whether or not you can rollback a
commit with svn.

Dan




Stephanie

On 9/4/14, 12:04 PM, Dan Tenenbaum wrote:

Hello,

Looking through our svn logs, I see that there are many commits
that are not accompanied by version bumps.
All svn commits (or, if you are using the git-svn bridge, every
group of commits included in a push) should include a version bump
(that is, incrementing the z segment of the x.y.z version
number). This practice is documented at
http://www.bioconductor.org/developers/how-to/version-numbering/ .

Failure to bump the version has two consequences:

1) Your changes will not propagate to our package repository or web
site, so users installing your package via biocLite() will not
receive the latest changes unless you bump the version.

2) Users *can* always get the current files of your package using
Subversion, but if you've made changes without bumping the version
number, it can be difficult to troubleshoot problems. If two
people are looking at what appears to be the same version of a
package, but it's behaving differently, it can be really
frustrating to realize that the packages actually differ (but not
by version number).

So if you're not already, please get in the habit of bumping the
version number with each set of changes you commit.

Let us know on bioc-devel if you have any questions about this.

Thanks,
Dan

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Please bump version number when committing changes

2014-09-05 Thread Peter Haverty
Hi All,
Git-svn is a nice workaround for the developer. As a user you don't want to be 
installing from version control in any case. Version control is a means for 
tracking changes, not for distributing software.   Let the CI system protect 
you from needless drama.

Typed with thumbs.

 On Sep 5, 2014, at 5:03 PM, Ryan C. Thompson r...@thompsonclan.org wrote:
 
 Hi all,
 
 Just to throw in a suggestion here, I know that many people use a tool like 
 git-svn in this kind of situation. They want the ability to make multiple 
 small commits in order to save their progress, but they don't want those 
 commits visible until they are ready to push all at once. This allows one to 
 make breaking changes in one commit that are fixed by subsequent commits, 
 because the intermediate states will never be exposed.
 
 For information on git-svn, see here: 
 http://git-scm.com/book/en/Git-and-Other-Systems-Git-and-Subversion
 
 Note that I don't personally have any experience with svn or with git-svn, 
 but this seems like exactly the use case for it.
 
 -Ryan
 
 On Fri 05 Sep 2014 04:50:49 PM PDT, Peter Haverty wrote:
 Hi all,
 
 I respectfully disagree.  One should certainly check in each discrete unit
 of work.  These will often not result in something that is ready to be used
 by someone else.  Bumping the version number constitutes a new release and
 carries the implicit promise that the package works again.  This is why
 continuous integration systems do a build when the version number changes.
 
 One should expect working software when installing a pre-build package (the
 tests passed, right?).  Checking out from SVN is for developers of that
 package and nothing should be assumed about the current state of the code.
 
 To keep everyone happy, one could add a commit hook to our SVN setup that
 would add the SVN revision number to the version string.  This would be for
 dev only and hopefully not sufficient to trigger a build.
 
 That's my two cents.  Happy weekend all.
 
 Regards,
 
 
 
 Pete
 
 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com
 
 
 On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org wrote:
 
 
 
 - Original Message -
 From: Stephanie M. Gogarten sdmor...@u.washington.edu
 To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel 
 bioc-devel@r-project.org
 Sent: Friday, September 5, 2014 4:27:13 PM
 Subject: Re: [Bioc-devel] Please bump version number when committing
 changes
 
 I am guilty of doing this today, but I have (I think) a good reason.
 I'm making a bunch of changes that are all related to each other, but
 are being implemented and tested in stages.  I'd like to use svn to
 commit when I've made a set of changes that works, so I can roll back
 if
 I break something in the next step, but I'd like the users to see
 them
 all at once as a single version update.  Perhaps others are doing
 something similar?
 
 I understand the motivation but this still results in an ambiguous state
 if two different people check out your package from svn at different times
 today (before and after your changes).
 
 Version numbers are cheap, so if version 1.2.3 exists for a day before
 version 1.2.4 (which contains all the changes you want to push to your
 users) then that's ok, IMO.
 
 Including a version bump doesn't impact whether or not you can rollback a
 commit with svn.
 
 Dan
 
 
 
 Stephanie
 
 On 9/4/14, 12:04 PM, Dan Tenenbaum wrote:
 Hello,
 
 Looking through our svn logs, I see that there are many commits
 that are not accompanied by version bumps.
 All svn commits (or, if you are using the git-svn bridge, every
 group of commits included in a push) should include a version bump
 (that is, incrementing the z segment of the x.y.z version
 number). This practice is documented at
 http://www.bioconductor.org/developers/how-to/version-numbering/ .
 
 Failure to bump the version has two consequences:
 
 1) Your changes will not propagate to our package repository or web
 site, so users installing your package via biocLite() will not
 receive the latest changes unless you bump the version.
 
 2) Users *can* always get the current files of your package using
 Subversion, but if you've made changes without bumping the version
 number, it can be difficult to troubleshoot problems. If two
 people are looking at what appears to be the same version of a
 package, but it's behaving differently, it can be really
 frustrating to realize that the packages actually differ (but not
 by version number).
 
 So if you're not already, please get in the habit of bumping the
 version number with each set of changes you commit.
 
 Let us know on bioc-devel if you have any questions about this.
 
 Thanks,
 Dan
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 ___
 Bioc-devel@r-project.org mailing list
 

Re: [Bioc-devel] Please bump version number when committing changes

2014-09-05 Thread Vincent Carey
On Fri, Sep 5, 2014 at 7:50 PM, Peter Haverty haverty.pe...@gene.com
wrote:

 Hi all,

 I respectfully disagree.  One should certainly check in each discrete unit
 of work.  These will often not result in something that is ready to be used
 by someone else.  Bumping the version number constitutes a new release and
 carries the implicit promise that the package works again.  This is why


Here I would respectfully disagree.  Code in the devel branch carries no
guarantees.
I think we have been pretty loose with respect to package version number
bumping in devel
branch; the svn tracking can be used to deal with isolation of code for
rollbacks.

In this informal regime the package version number is a simple marker of
package state.
I think it has served us pretty well in past years but the developer
community was smaller
and had fairly homogeneous habits.

Clearly there is room for more regimentation in this area but at the moment
I agree with
Dan that version numbers are cheap and should be bumped when new code is
committed.
And the recognition by all that a devel image may not work and may change
fairly dramatically
while in devel should be general; whether we need to alter that is open to
question but I would
think not.


 continuous integration systems do a build when the version number changes.

 One should expect working software when installing a pre-build package (the
 tests passed, right?).  Checking out from SVN is for developers of that
 package and nothing should be assumed about the current state of the code.

 To keep everyone happy, one could add a commit hook to our SVN setup that
 would add the SVN revision number to the version string.  This would be for
 dev only and hopefully not sufficient to trigger a build.

 That's my two cents.  Happy weekend all.

 Regards,



 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com


 On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org wrote:

 
 
  - Original Message -
   From: Stephanie M. Gogarten sdmor...@u.washington.edu
   To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel 
  bioc-devel@r-project.org
   Sent: Friday, September 5, 2014 4:27:13 PM
   Subject: Re: [Bioc-devel] Please bump version number when committing
  changes
  
   I am guilty of doing this today, but I have (I think) a good reason.
   I'm making a bunch of changes that are all related to each other, but
   are being implemented and tested in stages.  I'd like to use svn to
   commit when I've made a set of changes that works, so I can roll back
   if
   I break something in the next step, but I'd like the users to see
   them
   all at once as a single version update.  Perhaps others are doing
   something similar?
  
 
  I understand the motivation but this still results in an ambiguous state
  if two different people check out your package from svn at different
 times
  today (before and after your changes).
 
  Version numbers are cheap, so if version 1.2.3 exists for a day before
  version 1.2.4 (which contains all the changes you want to push to your
  users) then that's ok, IMO.
 
  Including a version bump doesn't impact whether or not you can rollback a
  commit with svn.
 
  Dan
 
 
 
   Stephanie
  
   On 9/4/14, 12:04 PM, Dan Tenenbaum wrote:
Hello,
   
Looking through our svn logs, I see that there are many commits
that are not accompanied by version bumps.
All svn commits (or, if you are using the git-svn bridge, every
group of commits included in a push) should include a version bump
(that is, incrementing the z segment of the x.y.z version
number). This practice is documented at
http://www.bioconductor.org/developers/how-to/version-numbering/ .
   
Failure to bump the version has two consequences:
   
1) Your changes will not propagate to our package repository or web
site, so users installing your package via biocLite() will not
receive the latest changes unless you bump the version.
   
2) Users *can* always get the current files of your package using
Subversion, but if you've made changes without bumping the version
number, it can be difficult to troubleshoot problems. If two
people are looking at what appears to be the same version of a
package, but it's behaving differently, it can be really
frustrating to realize that the packages actually differ (but not
by version number).
   
So if you're not already, please get in the habit of bumping the
version number with each set of changes you commit.
   
Let us know on bioc-devel if you have any questions about this.
   
Thanks,
Dan
   
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
   
  
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version 

Re: [Bioc-devel] Please bump version number when committing changes

2014-09-05 Thread Michael Lawrence
As Pete and Ryan have pointed out, it seems that the version control system
should somehow ease the burden of the developer here.

Let's look at this from the github perspective, since it is likely to be
the primary hosting mechanism for the foreseeable future. Just thinking out
loud, if R could somehow dynamically ascertain the version of a package at
build time, it could query the git checkout for a version. A simple
algorithm that I have found effective in non-R projects is to consider git
tags, which on github equate to releases. If the repository state is *at*
the tag, then use the tag as the version. If the state is ahead of the most
recent tag, then use the tag + latest commit hash. I wonder if R could
support this by allowing a path to an R script in the version field?



On Fri, Sep 5, 2014 at 6:27 PM, Vincent Carey st...@channing.harvard.edu
wrote:

 On Fri, Sep 5, 2014 at 7:50 PM, Peter Haverty haverty.pe...@gene.com
 wrote:

  Hi all,
 
  I respectfully disagree.  One should certainly check in each discrete
 unit
  of work.  These will often not result in something that is ready to be
 used
  by someone else.  Bumping the version number constitutes a new release
 and
  carries the implicit promise that the package works again.  This is why
 

 Here I would respectfully disagree.  Code in the devel branch carries no
 guarantees.
 I think we have been pretty loose with respect to package version number
 bumping in devel
 branch; the svn tracking can be used to deal with isolation of code for
 rollbacks.

 In this informal regime the package version number is a simple marker of
 package state.
 I think it has served us pretty well in past years but the developer
 community was smaller
 and had fairly homogeneous habits.

 Clearly there is room for more regimentation in this area but at the moment
 I agree with
 Dan that version numbers are cheap and should be bumped when new code is
 committed.
 And the recognition by all that a devel image may not work and may change
 fairly dramatically
 while in devel should be general; whether we need to alter that is open to
 question but I would
 think not.


  continuous integration systems do a build when the version number
 changes.
 
  One should expect working software when installing a pre-build package
 (the
  tests passed, right?).  Checking out from SVN is for developers of that
  package and nothing should be assumed about the current state of the
 code.
 
  To keep everyone happy, one could add a commit hook to our SVN setup that
  would add the SVN revision number to the version string.  This would be
 for
  dev only and hopefully not sufficient to trigger a build.
 
  That's my two cents.  Happy weekend all.
 
  Regards,
 
 
 
  Pete
 
  
  Peter M. Haverty, Ph.D.
  Genentech, Inc.
  phave...@gene.com
 
 
  On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org
 wrote:
 
  
  
   - Original Message -
From: Stephanie M. Gogarten sdmor...@u.washington.edu
To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel 
   bioc-devel@r-project.org
Sent: Friday, September 5, 2014 4:27:13 PM
Subject: Re: [Bioc-devel] Please bump version number when committing
   changes
   
I am guilty of doing this today, but I have (I think) a good reason.
I'm making a bunch of changes that are all related to each other, but
are being implemented and tested in stages.  I'd like to use svn to
commit when I've made a set of changes that works, so I can roll back
if
I break something in the next step, but I'd like the users to see
them
all at once as a single version update.  Perhaps others are doing
something similar?
   
  
   I understand the motivation but this still results in an ambiguous
 state
   if two different people check out your package from svn at different
  times
   today (before and after your changes).
  
   Version numbers are cheap, so if version 1.2.3 exists for a day before
   version 1.2.4 (which contains all the changes you want to push to your
   users) then that's ok, IMO.
  
   Including a version bump doesn't impact whether or not you can
 rollback a
   commit with svn.
  
   Dan
  
  
  
Stephanie
   
On 9/4/14, 12:04 PM, Dan Tenenbaum wrote:
 Hello,

 Looking through our svn logs, I see that there are many commits
 that are not accompanied by version bumps.
 All svn commits (or, if you are using the git-svn bridge, every
 group of commits included in a push) should include a version bump
 (that is, incrementing the z segment of the x.y.z version
 number). This practice is documented at
 http://www.bioconductor.org/developers/how-to/version-numbering/ .

 Failure to bump the version has two consequences:

 1) Your changes will not propagate to our package repository or web
 site, so users installing your package via biocLite() will not
 receive the latest changes unless you bump the version.

 

Re: [Bioc-devel] Please bump version number when committing changes

2014-09-05 Thread Kasper Daniel Hansen
Before we go and invent all kinds of stuff, is this a real problem that we
need to spend resources thinking about?

Dan's original post was about 2 people who check out devel from svn may see
the same version number, but have different versions of the code.  I
acknowledge that this is theoretical possible.  In the rare situation where
this might matter, it would be better to compare svn revision numbers.  And
does this really happen with any frequency, I mean, the people who install
packages from devel using svn must be very limited for a given package
(perhaps I am different, but I only do it occasionally, and almost always
for my own packages or if I depend on a package where I have identified an
issue, the other author has fixed it and I need to test now and not
tomorrow).

With the current build policy, as I understand it, two people each
installing not from svn, but from the published tarball throguh biocLite,
is guaranteed to have the same code if they have the same version.

The remaining issue is if one user installs from svn and one user from a
tarball.  But I think everyone who does svn just need to understand that
this can happen.  The affected users must be rather limited.

One version of the problem, which I can see being confusing, is if an
author pushes a bug fix to svn, but does not bump DESCRIPTION.  Then I
could see some unfortunate discussion between the developer and a user, but
that really comes down to lack of understand of the build system for the
developer. While I am sure it happens, the solution in my opinion is better
education for the developers about the build system.

Best,
Kasper



On Fri, Sep 5, 2014 at 9:48 PM, Michael Lawrence lawrence.mich...@gene.com
wrote:

 As Pete and Ryan have pointed out, it seems that the version control system
 should somehow ease the burden of the developer here.

 Let's look at this from the github perspective, since it is likely to be
 the primary hosting mechanism for the foreseeable future. Just thinking out
 loud, if R could somehow dynamically ascertain the version of a package at
 build time, it could query the git checkout for a version. A simple
 algorithm that I have found effective in non-R projects is to consider git
 tags, which on github equate to releases. If the repository state is *at*
 the tag, then use the tag as the version. If the state is ahead of the most
 recent tag, then use the tag + latest commit hash. I wonder if R could
 support this by allowing a path to an R script in the version field?



 On Fri, Sep 5, 2014 at 6:27 PM, Vincent Carey st...@channing.harvard.edu
 wrote:

  On Fri, Sep 5, 2014 at 7:50 PM, Peter Haverty haverty.pe...@gene.com
  wrote:
 
   Hi all,
  
   I respectfully disagree.  One should certainly check in each discrete
  unit
   of work.  These will often not result in something that is ready to be
  used
   by someone else.  Bumping the version number constitutes a new release
  and
   carries the implicit promise that the package works again.  This is why
  
 
  Here I would respectfully disagree.  Code in the devel branch carries no
  guarantees.
  I think we have been pretty loose with respect to package version number
  bumping in devel
  branch; the svn tracking can be used to deal with isolation of code for
  rollbacks.
 
  In this informal regime the package version number is a simple marker of
  package state.
  I think it has served us pretty well in past years but the developer
  community was smaller
  and had fairly homogeneous habits.
 
  Clearly there is room for more regimentation in this area but at the
 moment
  I agree with
  Dan that version numbers are cheap and should be bumped when new code is
  committed.
  And the recognition by all that a devel image may not work and may change
  fairly dramatically
  while in devel should be general; whether we need to alter that is open
 to
  question but I would
  think not.
 
 
   continuous integration systems do a build when the version number
  changes.
  
   One should expect working software when installing a pre-build package
  (the
   tests passed, right?).  Checking out from SVN is for developers of that
   package and nothing should be assumed about the current state of the
  code.
  
   To keep everyone happy, one could add a commit hook to our SVN setup
 that
   would add the SVN revision number to the version string.  This would be
  for
   dev only and hopefully not sufficient to trigger a build.
  
   That's my two cents.  Happy weekend all.
  
   Regards,
  
  
  
   Pete
  
   
   Peter M. Haverty, Ph.D.
   Genentech, Inc.
   phave...@gene.com
  
  
   On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org
  wrote:
  
   
   
- Original Message -
 From: Stephanie M. Gogarten sdmor...@u.washington.edu
 To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel 
bioc-devel@r-project.org
 Sent: Friday, September 5, 2014 4:27:13 PM
 Subject: Re: [Bioc-devel] Please bump 

Re: [Bioc-devel] Please bump version number when committing changes

2014-09-05 Thread Dan Tenenbaum
I'd add another scenario which is that every night the build system builds 
whatever was checked in.  This can cause extensive and confusing breakage in 
the build system. The build report does indicate svn revision number and 
timestamp of last commit, but one tends to look at the version number. As 
Kasper points out, though, packages that haven't been bumped do not propagate, 
and neither do packages that fail to build or check as a result. So the 
confusion is limited to the build system. 

My initial email was motivated by noticing that several people had made obvious 
fixes without bumping versions, clearly not understanding that they needed to 
do so. In our discussion of the finer points let's not lose sight of this. 

Dan

On September 5, 2014 7:06:03 PM PDT, Kasper Daniel Hansen 
kasperdanielhan...@gmail.com wrote:
Before we go and invent all kinds of stuff, is this a real problem that
we
need to spend resources thinking about?

Dan's original post was about 2 people who check out devel from svn may
see
the same version number, but have different versions of the code.  I
acknowledge that this is theoretical possible.  In the rare situation
where
this might matter, it would be better to compare svn revision numbers. 
And
does this really happen with any frequency, I mean, the people who
install
packages from devel using svn must be very limited for a given package
(perhaps I am different, but I only do it occasionally, and almost
always
for my own packages or if I depend on a package where I have identified
an
issue, the other author has fixed it and I need to test now and not
tomorrow).

With the current build policy, as I understand it, two people each
installing not from svn, but from the published tarball throguh
biocLite,
is guaranteed to have the same code if they have the same version.

The remaining issue is if one user installs from svn and one user from
a
tarball.  But I think everyone who does svn just need to understand
that
this can happen.  The affected users must be rather limited.

One version of the problem, which I can see being confusing, is if an
author pushes a bug fix to svn, but does not bump DESCRIPTION.  Then I
could see some unfortunate discussion between the developer and a user,
but
that really comes down to lack of understand of the build system for
the
developer. While I am sure it happens, the solution in my opinion is
better
education for the developers about the build system.

Best,
Kasper



On Fri, Sep 5, 2014 at 9:48 PM, Michael Lawrence
lawrence.mich...@gene.com
wrote:

 As Pete and Ryan have pointed out, it seems that the version control
system
 should somehow ease the burden of the developer here.

 Let's look at this from the github perspective, since it is likely to
be
 the primary hosting mechanism for the foreseeable future. Just
thinking out
 loud, if R could somehow dynamically ascertain the version of a
package at
 build time, it could query the git checkout for a version. A simple
 algorithm that I have found effective in non-R projects is to
consider git
 tags, which on github equate to releases. If the repository state is
*at*
 the tag, then use the tag as the version. If the state is ahead of
the most
 recent tag, then use the tag + latest commit hash. I wonder if R
could
 support this by allowing a path to an R script in the version field?



 On Fri, Sep 5, 2014 at 6:27 PM, Vincent Carey
st...@channing.harvard.edu
 wrote:

  On Fri, Sep 5, 2014 at 7:50 PM, Peter Haverty
haverty.pe...@gene.com
  wrote:
 
   Hi all,
  
   I respectfully disagree.  One should certainly check in each
discrete
  unit
   of work.  These will often not result in something that is ready
to be
  used
   by someone else.  Bumping the version number constitutes a new
release
  and
   carries the implicit promise that the package works again.  This
is why
  
 
  Here I would respectfully disagree.  Code in the devel branch
carries no
  guarantees.
  I think we have been pretty loose with respect to package version
number
  bumping in devel
  branch; the svn tracking can be used to deal with isolation of code
for
  rollbacks.
 
  In this informal regime the package version number is a simple
marker of
  package state.
  I think it has served us pretty well in past years but the
developer
  community was smaller
  and had fairly homogeneous habits.
 
  Clearly there is room for more regimentation in this area but at
the
 moment
  I agree with
  Dan that version numbers are cheap and should be bumped when new
code is
  committed.
  And the recognition by all that a devel image may not work and may
change
  fairly dramatically
  while in devel should be general; whether we need to alter that is
open
 to
  question but I would
  think not.
 
 
   continuous integration systems do a build when the version number
  changes.
  
   One should expect working software when installing a pre-build
package
  (the
   tests passed, right?).  Checking out from SVN is for developers
of that