[Bioc-devel] bug when coercing from list to SimpleList
Hi Michael, I found the following bug when coercing a list to a SimpleList with IRanges devel (not with IRanges release): library(IRanges) x - list(a=matrix(rep(a, 6), nrow=3), b=array(rep(b, 24), dim=c(3,4,2))) Then: sapply(as(x, SimpleList), class) ab matrix matrix lapply(as(x, SimpleList), dim) $a [1] 3 2 $b [1] 24 1 The array was turned into a matrix! Note that the SimpleList() constructor behaves as expected: sapply(SimpleList(x), class) ab matrix array lapply(SimpleList(x), dim) $a [1] 3 2 $b [1] 3 4 2 Do you think you can have a look? Thanks, H. sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] IRanges_1.99.25 S4Vectors_0.1.5 BiocGenerics_0.11.4 loaded via a namespace (and not attached): [1] stats4_3.1.0 -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] writeVcf performance
This approach, writing in chunks, is the same Herve and I used for writing FASTA in the Biostrings package, although I see that Herve has now replaced the R implementation with a C implementation. I similarly found an absolutely huge speed up when writing genomes, by chunking. Best, Kasper On Tue, Sep 2, 2014 at 4:33 PM, Martin Morgan mtmor...@fhcrc.org wrote: On 08/27/2014 11:56 AM, Gabe Becker wrote: The profiling I attached in my previous email is for 24 geno fields, as I said, but our typical usecase involves only ~4-6 fields, and is faster but still on the order of dozens of minutes. I think Val is arriving at a (much) more efficient implementation, but... I wanted to share my guess that the poor _scaling_ is because the garbage collector runs multiple times as the different strings are pasted together, and has to traverse, in linear time, increasing numbers of allocated SEXPs. So times scale approximately quadratically with the number of rows in the VCF An efficiency is to reduce the number of SEXPs in play by writing out in chunks -- as each chunk is written, the SEXPs become available for collection and are re-used. Here's my toy example time.R == splitIndices - function (nx, ncl) { i - seq_len(nx) if (ncl == 0L) list() else if (ncl == 1L || nx == 1L) list(i) else { fuzz - min((nx - 1L)/1000, 0.4 * nx/ncl) breaks - seq(1 - fuzz, nx + fuzz, length = ncl + 1L) structure(split(i, cut(i, breaks, labels=FALSE)), names = NULL) } } x = as.character(seq_len(1e7)); y = sample(x) if (!is.na(Sys.getenv(SPLIT, NA))) { idx - splitIndices(length(x), 20) system.time(for (i in idx) paste(x[i], y[i], sep=:)) } else { system.time(paste(x, y, sep=:)) } running under R-devel with $ SPLIT=TRUE R --no-save --quiet -f time.R the relevant time is user system elapsed 15.320 0.064 15.381 versus with $ R --no-save --quiet -f time.R it is user system elapsed 95.360 0.164 95.511 I think this is likely an overall strategy when dealing with character data -- processing in independent chunks of moderate (1M?) size (enabling as a consequence parallel evaluation in modest memory) that are sufficient to benefit from vectorization, but that do not entail allocation of large numbers of in-use SEXPs. Martin Sorry for the confusion. ~G On Wed, Aug 27, 2014 at 11:45 AM, Gabe Becker becke...@gene.com mailto:becke...@gene.com wrote: Martin and Val. I re-ran writeVcf on our (G)VCF data (34790518 ranges, 24 geno fields) with profiling enabled. The results of summaryRprof for that run are attached, though for a variety of reasons they are pretty misleading. It took over an hour to write (3700+seconds), so it's definitely a bottleneck when the data get very large, even if it isn't for smaller data. Michael and I both think the culprit is all the pasting and cbinding that is going on, and more to the point, that memory for an internal representation to be written out is allocated at all. Streaming across the object, looping by rows and writing directly to file (e.g. from C) should be blisteringly fast in comparison. ~G On Tue, Aug 26, 2014 at 11:57 AM, Michael Lawrence micha...@gene.com mailto:micha...@gene.com wrote: Gabe is still testing/profiling, but we'll send something randomized along eventually. On Tue, Aug 26, 2014 at 11:15 AM, Martin Morgan mtmor...@fhcrc.org mailto:mtmor...@fhcrc.org wrote: I didn't see in the original thread a reproducible (simulated, I guess) example, to be explicit about what the problem is?? Martin On 08/26/2014 10:47 AM, Michael Lawrence wrote: My understanding is that the heap optimization provided marginal gains, and that we need to think harder about how to optimize the all of the string manipulation in writeVcf. We either need to reduce it or reduce its overhead (i.e., the CHARSXP allocation). Gabe is doing more tests. On Tue, Aug 26, 2014 at 9:43 AM, Valerie Obenchain voben...@fhcrc.org mailto:voben...@fhcrc.org wrote: Hi Gabe, Martin responded, and so did Michael, https://stat.ethz.ch/__pipermail/bioc-devel/2014-__ August/006082.html https://stat.ethz.ch/pipermail/bioc-devel/2014- August/006082.html It sounded like Michael was ok with working with/around heap initialization. Michael, is that right or should we still consider this on the table? Val On 08/26/2014 09:34 AM, Gabe Becker
Re: [Bioc-devel] bug when coercing from list to SimpleList
Thanks, just a typo. Fixed in S4Vectors 0.2.1. On Fri, Sep 5, 2014 at 12:38 AM, Hervé Pagès hpa...@fhcrc.org wrote: Hi Michael, I found the following bug when coercing a list to a SimpleList with IRanges devel (not with IRanges release): library(IRanges) x - list(a=matrix(rep(a, 6), nrow=3), b=array(rep(b, 24), dim=c(3,4,2))) Then: sapply(as(x, SimpleList), class) ab matrix matrix lapply(as(x, SimpleList), dim) $a [1] 3 2 $b [1] 24 1 The array was turned into a matrix! Note that the SimpleList() constructor behaves as expected: sapply(SimpleList(x), class) ab matrix array lapply(SimpleList(x), dim) $a [1] 3 2 $b [1] 3 4 2 Do you think you can have a look? Thanks, H. sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] IRanges_1.99.25 S4Vectors_0.1.5 BiocGenerics_0.11.4 loaded via a namespace (and not attached): [1] stats4_3.1.0 -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Please bump version number when committing changes
I am guilty of doing this today, but I have (I think) a good reason. I'm making a bunch of changes that are all related to each other, but are being implemented and tested in stages. I'd like to use svn to commit when I've made a set of changes that works, so I can roll back if I break something in the next step, but I'd like the users to see them all at once as a single version update. Perhaps others are doing something similar? Stephanie On 9/4/14, 12:04 PM, Dan Tenenbaum wrote: Hello, Looking through our svn logs, I see that there are many commits that are not accompanied by version bumps. All svn commits (or, if you are using the git-svn bridge, every group of commits included in a push) should include a version bump (that is, incrementing the z segment of the x.y.z version number). This practice is documented at http://www.bioconductor.org/developers/how-to/version-numbering/ . Failure to bump the version has two consequences: 1) Your changes will not propagate to our package repository or web site, so users installing your package via biocLite() will not receive the latest changes unless you bump the version. 2) Users *can* always get the current files of your package using Subversion, but if you've made changes without bumping the version number, it can be difficult to troubleshoot problems. If two people are looking at what appears to be the same version of a package, but it's behaving differently, it can be really frustrating to realize that the packages actually differ (but not by version number). So if you're not already, please get in the habit of bumping the version number with each set of changes you commit. Let us know on bioc-devel if you have any questions about this. Thanks, Dan ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Please bump version number when committing changes
- Original Message - From: Stephanie M. Gogarten sdmor...@u.washington.edu To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel bioc-devel@r-project.org Sent: Friday, September 5, 2014 4:27:13 PM Subject: Re: [Bioc-devel] Please bump version number when committing changes I am guilty of doing this today, but I have (I think) a good reason. I'm making a bunch of changes that are all related to each other, but are being implemented and tested in stages. I'd like to use svn to commit when I've made a set of changes that works, so I can roll back if I break something in the next step, but I'd like the users to see them all at once as a single version update. Perhaps others are doing something similar? I understand the motivation but this still results in an ambiguous state if two different people check out your package from svn at different times today (before and after your changes). Version numbers are cheap, so if version 1.2.3 exists for a day before version 1.2.4 (which contains all the changes you want to push to your users) then that's ok, IMO. Including a version bump doesn't impact whether or not you can rollback a commit with svn. Dan Stephanie On 9/4/14, 12:04 PM, Dan Tenenbaum wrote: Hello, Looking through our svn logs, I see that there are many commits that are not accompanied by version bumps. All svn commits (or, if you are using the git-svn bridge, every group of commits included in a push) should include a version bump (that is, incrementing the z segment of the x.y.z version number). This practice is documented at http://www.bioconductor.org/developers/how-to/version-numbering/ . Failure to bump the version has two consequences: 1) Your changes will not propagate to our package repository or web site, so users installing your package via biocLite() will not receive the latest changes unless you bump the version. 2) Users *can* always get the current files of your package using Subversion, but if you've made changes without bumping the version number, it can be difficult to troubleshoot problems. If two people are looking at what appears to be the same version of a package, but it's behaving differently, it can be really frustrating to realize that the packages actually differ (but not by version number). So if you're not already, please get in the habit of bumping the version number with each set of changes you commit. Let us know on bioc-devel if you have any questions about this. Thanks, Dan ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Please bump version number when committing changes
Dan, If that is is a hard BioC policy I'll endeavor to follow it (I do already in the vast majority of cases), but I must say it makes the Bioc repository much less useful from a development standpoint. There are lots of reason to commit code that doesn't work and shouldn't yet be deployed, from portability between machines to simple preservation of work in progress. What is the suggested behavior in the under heavy development and not safe but I don't want to lose days of work case? ~G On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org wrote: - Original Message - From: Stephanie M. Gogarten sdmor...@u.washington.edu To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel bioc-devel@r-project.org Sent: Friday, September 5, 2014 4:27:13 PM Subject: Re: [Bioc-devel] Please bump version number when committing changes I am guilty of doing this today, but I have (I think) a good reason. I'm making a bunch of changes that are all related to each other, but are being implemented and tested in stages. I'd like to use svn to commit when I've made a set of changes that works, so I can roll back if I break something in the next step, but I'd like the users to see them all at once as a single version update. Perhaps others are doing something similar? I understand the motivation but this still results in an ambiguous state if two different people check out your package from svn at different times today (before and after your changes). Version numbers are cheap, so if version 1.2.3 exists for a day before version 1.2.4 (which contains all the changes you want to push to your users) then that's ok, IMO. Including a version bump doesn't impact whether or not you can rollback a commit with svn. Dan Stephanie On 9/4/14, 12:04 PM, Dan Tenenbaum wrote: Hello, Looking through our svn logs, I see that there are many commits that are not accompanied by version bumps. All svn commits (or, if you are using the git-svn bridge, every group of commits included in a push) should include a version bump (that is, incrementing the z segment of the x.y.z version number). This practice is documented at http://www.bioconductor.org/developers/how-to/version-numbering/ . Failure to bump the version has two consequences: 1) Your changes will not propagate to our package repository or web site, so users installing your package via biocLite() will not receive the latest changes unless you bump the version. 2) Users *can* always get the current files of your package using Subversion, but if you've made changes without bumping the version number, it can be difficult to troubleshoot problems. If two people are looking at what appears to be the same version of a package, but it's behaving differently, it can be really frustrating to realize that the packages actually differ (but not by version number). So if you're not already, please get in the habit of bumping the version number with each set of changes you commit. Let us know on bioc-devel if you have any questions about this. Thanks, Dan ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Computational Biologist Genentech Research [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Please bump version number when committing changes
Hi all, Just to throw in a suggestion here, I know that many people use a tool like git-svn in this kind of situation. They want the ability to make multiple small commits in order to save their progress, but they don't want those commits visible until they are ready to push all at once. This allows one to make breaking changes in one commit that are fixed by subsequent commits, because the intermediate states will never be exposed. For information on git-svn, see here: http://git-scm.com/book/en/Git-and-Other-Systems-Git-and-Subversion Note that I don't personally have any experience with svn or with git-svn, but this seems like exactly the use case for it. -Ryan On Fri 05 Sep 2014 04:50:49 PM PDT, Peter Haverty wrote: Hi all, I respectfully disagree. One should certainly check in each discrete unit of work. These will often not result in something that is ready to be used by someone else. Bumping the version number constitutes a new release and carries the implicit promise that the package works again. This is why continuous integration systems do a build when the version number changes. One should expect working software when installing a pre-build package (the tests passed, right?). Checking out from SVN is for developers of that package and nothing should be assumed about the current state of the code. To keep everyone happy, one could add a commit hook to our SVN setup that would add the SVN revision number to the version string. This would be for dev only and hopefully not sufficient to trigger a build. That's my two cents. Happy weekend all. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org wrote: - Original Message - From: Stephanie M. Gogarten sdmor...@u.washington.edu To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel bioc-devel@r-project.org Sent: Friday, September 5, 2014 4:27:13 PM Subject: Re: [Bioc-devel] Please bump version number when committing changes I am guilty of doing this today, but I have (I think) a good reason. I'm making a bunch of changes that are all related to each other, but are being implemented and tested in stages. I'd like to use svn to commit when I've made a set of changes that works, so I can roll back if I break something in the next step, but I'd like the users to see them all at once as a single version update. Perhaps others are doing something similar? I understand the motivation but this still results in an ambiguous state if two different people check out your package from svn at different times today (before and after your changes). Version numbers are cheap, so if version 1.2.3 exists for a day before version 1.2.4 (which contains all the changes you want to push to your users) then that's ok, IMO. Including a version bump doesn't impact whether or not you can rollback a commit with svn. Dan Stephanie On 9/4/14, 12:04 PM, Dan Tenenbaum wrote: Hello, Looking through our svn logs, I see that there are many commits that are not accompanied by version bumps. All svn commits (or, if you are using the git-svn bridge, every group of commits included in a push) should include a version bump (that is, incrementing the z segment of the x.y.z version number). This practice is documented at http://www.bioconductor.org/developers/how-to/version-numbering/ . Failure to bump the version has two consequences: 1) Your changes will not propagate to our package repository or web site, so users installing your package via biocLite() will not receive the latest changes unless you bump the version. 2) Users *can* always get the current files of your package using Subversion, but if you've made changes without bumping the version number, it can be difficult to troubleshoot problems. If two people are looking at what appears to be the same version of a package, but it's behaving differently, it can be really frustrating to realize that the packages actually differ (but not by version number). So if you're not already, please get in the habit of bumping the version number with each set of changes you commit. Let us know on bioc-devel if you have any questions about this. Thanks, Dan ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Please bump version number when committing changes
Hi All, Git-svn is a nice workaround for the developer. As a user you don't want to be installing from version control in any case. Version control is a means for tracking changes, not for distributing software. Let the CI system protect you from needless drama. Typed with thumbs. On Sep 5, 2014, at 5:03 PM, Ryan C. Thompson r...@thompsonclan.org wrote: Hi all, Just to throw in a suggestion here, I know that many people use a tool like git-svn in this kind of situation. They want the ability to make multiple small commits in order to save their progress, but they don't want those commits visible until they are ready to push all at once. This allows one to make breaking changes in one commit that are fixed by subsequent commits, because the intermediate states will never be exposed. For information on git-svn, see here: http://git-scm.com/book/en/Git-and-Other-Systems-Git-and-Subversion Note that I don't personally have any experience with svn or with git-svn, but this seems like exactly the use case for it. -Ryan On Fri 05 Sep 2014 04:50:49 PM PDT, Peter Haverty wrote: Hi all, I respectfully disagree. One should certainly check in each discrete unit of work. These will often not result in something that is ready to be used by someone else. Bumping the version number constitutes a new release and carries the implicit promise that the package works again. This is why continuous integration systems do a build when the version number changes. One should expect working software when installing a pre-build package (the tests passed, right?). Checking out from SVN is for developers of that package and nothing should be assumed about the current state of the code. To keep everyone happy, one could add a commit hook to our SVN setup that would add the SVN revision number to the version string. This would be for dev only and hopefully not sufficient to trigger a build. That's my two cents. Happy weekend all. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org wrote: - Original Message - From: Stephanie M. Gogarten sdmor...@u.washington.edu To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel bioc-devel@r-project.org Sent: Friday, September 5, 2014 4:27:13 PM Subject: Re: [Bioc-devel] Please bump version number when committing changes I am guilty of doing this today, but I have (I think) a good reason. I'm making a bunch of changes that are all related to each other, but are being implemented and tested in stages. I'd like to use svn to commit when I've made a set of changes that works, so I can roll back if I break something in the next step, but I'd like the users to see them all at once as a single version update. Perhaps others are doing something similar? I understand the motivation but this still results in an ambiguous state if two different people check out your package from svn at different times today (before and after your changes). Version numbers are cheap, so if version 1.2.3 exists for a day before version 1.2.4 (which contains all the changes you want to push to your users) then that's ok, IMO. Including a version bump doesn't impact whether or not you can rollback a commit with svn. Dan Stephanie On 9/4/14, 12:04 PM, Dan Tenenbaum wrote: Hello, Looking through our svn logs, I see that there are many commits that are not accompanied by version bumps. All svn commits (or, if you are using the git-svn bridge, every group of commits included in a push) should include a version bump (that is, incrementing the z segment of the x.y.z version number). This practice is documented at http://www.bioconductor.org/developers/how-to/version-numbering/ . Failure to bump the version has two consequences: 1) Your changes will not propagate to our package repository or web site, so users installing your package via biocLite() will not receive the latest changes unless you bump the version. 2) Users *can* always get the current files of your package using Subversion, but if you've made changes without bumping the version number, it can be difficult to troubleshoot problems. If two people are looking at what appears to be the same version of a package, but it's behaving differently, it can be really frustrating to realize that the packages actually differ (but not by version number). So if you're not already, please get in the habit of bumping the version number with each set of changes you commit. Let us know on bioc-devel if you have any questions about this. Thanks, Dan ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list
Re: [Bioc-devel] Please bump version number when committing changes
On Fri, Sep 5, 2014 at 7:50 PM, Peter Haverty haverty.pe...@gene.com wrote: Hi all, I respectfully disagree. One should certainly check in each discrete unit of work. These will often not result in something that is ready to be used by someone else. Bumping the version number constitutes a new release and carries the implicit promise that the package works again. This is why Here I would respectfully disagree. Code in the devel branch carries no guarantees. I think we have been pretty loose with respect to package version number bumping in devel branch; the svn tracking can be used to deal with isolation of code for rollbacks. In this informal regime the package version number is a simple marker of package state. I think it has served us pretty well in past years but the developer community was smaller and had fairly homogeneous habits. Clearly there is room for more regimentation in this area but at the moment I agree with Dan that version numbers are cheap and should be bumped when new code is committed. And the recognition by all that a devel image may not work and may change fairly dramatically while in devel should be general; whether we need to alter that is open to question but I would think not. continuous integration systems do a build when the version number changes. One should expect working software when installing a pre-build package (the tests passed, right?). Checking out from SVN is for developers of that package and nothing should be assumed about the current state of the code. To keep everyone happy, one could add a commit hook to our SVN setup that would add the SVN revision number to the version string. This would be for dev only and hopefully not sufficient to trigger a build. That's my two cents. Happy weekend all. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org wrote: - Original Message - From: Stephanie M. Gogarten sdmor...@u.washington.edu To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel bioc-devel@r-project.org Sent: Friday, September 5, 2014 4:27:13 PM Subject: Re: [Bioc-devel] Please bump version number when committing changes I am guilty of doing this today, but I have (I think) a good reason. I'm making a bunch of changes that are all related to each other, but are being implemented and tested in stages. I'd like to use svn to commit when I've made a set of changes that works, so I can roll back if I break something in the next step, but I'd like the users to see them all at once as a single version update. Perhaps others are doing something similar? I understand the motivation but this still results in an ambiguous state if two different people check out your package from svn at different times today (before and after your changes). Version numbers are cheap, so if version 1.2.3 exists for a day before version 1.2.4 (which contains all the changes you want to push to your users) then that's ok, IMO. Including a version bump doesn't impact whether or not you can rollback a commit with svn. Dan Stephanie On 9/4/14, 12:04 PM, Dan Tenenbaum wrote: Hello, Looking through our svn logs, I see that there are many commits that are not accompanied by version bumps. All svn commits (or, if you are using the git-svn bridge, every group of commits included in a push) should include a version bump (that is, incrementing the z segment of the x.y.z version number). This practice is documented at http://www.bioconductor.org/developers/how-to/version-numbering/ . Failure to bump the version has two consequences: 1) Your changes will not propagate to our package repository or web site, so users installing your package via biocLite() will not receive the latest changes unless you bump the version. 2) Users *can* always get the current files of your package using Subversion, but if you've made changes without bumping the version number, it can be difficult to troubleshoot problems. If two people are looking at what appears to be the same version of a package, but it's behaving differently, it can be really frustrating to realize that the packages actually differ (but not by version number). So if you're not already, please get in the habit of bumping the version number with each set of changes you commit. Let us know on bioc-devel if you have any questions about this. Thanks, Dan ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version
Re: [Bioc-devel] Please bump version number when committing changes
As Pete and Ryan have pointed out, it seems that the version control system should somehow ease the burden of the developer here. Let's look at this from the github perspective, since it is likely to be the primary hosting mechanism for the foreseeable future. Just thinking out loud, if R could somehow dynamically ascertain the version of a package at build time, it could query the git checkout for a version. A simple algorithm that I have found effective in non-R projects is to consider git tags, which on github equate to releases. If the repository state is *at* the tag, then use the tag as the version. If the state is ahead of the most recent tag, then use the tag + latest commit hash. I wonder if R could support this by allowing a path to an R script in the version field? On Fri, Sep 5, 2014 at 6:27 PM, Vincent Carey st...@channing.harvard.edu wrote: On Fri, Sep 5, 2014 at 7:50 PM, Peter Haverty haverty.pe...@gene.com wrote: Hi all, I respectfully disagree. One should certainly check in each discrete unit of work. These will often not result in something that is ready to be used by someone else. Bumping the version number constitutes a new release and carries the implicit promise that the package works again. This is why Here I would respectfully disagree. Code in the devel branch carries no guarantees. I think we have been pretty loose with respect to package version number bumping in devel branch; the svn tracking can be used to deal with isolation of code for rollbacks. In this informal regime the package version number is a simple marker of package state. I think it has served us pretty well in past years but the developer community was smaller and had fairly homogeneous habits. Clearly there is room for more regimentation in this area but at the moment I agree with Dan that version numbers are cheap and should be bumped when new code is committed. And the recognition by all that a devel image may not work and may change fairly dramatically while in devel should be general; whether we need to alter that is open to question but I would think not. continuous integration systems do a build when the version number changes. One should expect working software when installing a pre-build package (the tests passed, right?). Checking out from SVN is for developers of that package and nothing should be assumed about the current state of the code. To keep everyone happy, one could add a commit hook to our SVN setup that would add the SVN revision number to the version string. This would be for dev only and hopefully not sufficient to trigger a build. That's my two cents. Happy weekend all. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org wrote: - Original Message - From: Stephanie M. Gogarten sdmor...@u.washington.edu To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel bioc-devel@r-project.org Sent: Friday, September 5, 2014 4:27:13 PM Subject: Re: [Bioc-devel] Please bump version number when committing changes I am guilty of doing this today, but I have (I think) a good reason. I'm making a bunch of changes that are all related to each other, but are being implemented and tested in stages. I'd like to use svn to commit when I've made a set of changes that works, so I can roll back if I break something in the next step, but I'd like the users to see them all at once as a single version update. Perhaps others are doing something similar? I understand the motivation but this still results in an ambiguous state if two different people check out your package from svn at different times today (before and after your changes). Version numbers are cheap, so if version 1.2.3 exists for a day before version 1.2.4 (which contains all the changes you want to push to your users) then that's ok, IMO. Including a version bump doesn't impact whether or not you can rollback a commit with svn. Dan Stephanie On 9/4/14, 12:04 PM, Dan Tenenbaum wrote: Hello, Looking through our svn logs, I see that there are many commits that are not accompanied by version bumps. All svn commits (or, if you are using the git-svn bridge, every group of commits included in a push) should include a version bump (that is, incrementing the z segment of the x.y.z version number). This practice is documented at http://www.bioconductor.org/developers/how-to/version-numbering/ . Failure to bump the version has two consequences: 1) Your changes will not propagate to our package repository or web site, so users installing your package via biocLite() will not receive the latest changes unless you bump the version.
Re: [Bioc-devel] Please bump version number when committing changes
Before we go and invent all kinds of stuff, is this a real problem that we need to spend resources thinking about? Dan's original post was about 2 people who check out devel from svn may see the same version number, but have different versions of the code. I acknowledge that this is theoretical possible. In the rare situation where this might matter, it would be better to compare svn revision numbers. And does this really happen with any frequency, I mean, the people who install packages from devel using svn must be very limited for a given package (perhaps I am different, but I only do it occasionally, and almost always for my own packages or if I depend on a package where I have identified an issue, the other author has fixed it and I need to test now and not tomorrow). With the current build policy, as I understand it, two people each installing not from svn, but from the published tarball throguh biocLite, is guaranteed to have the same code if they have the same version. The remaining issue is if one user installs from svn and one user from a tarball. But I think everyone who does svn just need to understand that this can happen. The affected users must be rather limited. One version of the problem, which I can see being confusing, is if an author pushes a bug fix to svn, but does not bump DESCRIPTION. Then I could see some unfortunate discussion between the developer and a user, but that really comes down to lack of understand of the build system for the developer. While I am sure it happens, the solution in my opinion is better education for the developers about the build system. Best, Kasper On Fri, Sep 5, 2014 at 9:48 PM, Michael Lawrence lawrence.mich...@gene.com wrote: As Pete and Ryan have pointed out, it seems that the version control system should somehow ease the burden of the developer here. Let's look at this from the github perspective, since it is likely to be the primary hosting mechanism for the foreseeable future. Just thinking out loud, if R could somehow dynamically ascertain the version of a package at build time, it could query the git checkout for a version. A simple algorithm that I have found effective in non-R projects is to consider git tags, which on github equate to releases. If the repository state is *at* the tag, then use the tag as the version. If the state is ahead of the most recent tag, then use the tag + latest commit hash. I wonder if R could support this by allowing a path to an R script in the version field? On Fri, Sep 5, 2014 at 6:27 PM, Vincent Carey st...@channing.harvard.edu wrote: On Fri, Sep 5, 2014 at 7:50 PM, Peter Haverty haverty.pe...@gene.com wrote: Hi all, I respectfully disagree. One should certainly check in each discrete unit of work. These will often not result in something that is ready to be used by someone else. Bumping the version number constitutes a new release and carries the implicit promise that the package works again. This is why Here I would respectfully disagree. Code in the devel branch carries no guarantees. I think we have been pretty loose with respect to package version number bumping in devel branch; the svn tracking can be used to deal with isolation of code for rollbacks. In this informal regime the package version number is a simple marker of package state. I think it has served us pretty well in past years but the developer community was smaller and had fairly homogeneous habits. Clearly there is room for more regimentation in this area but at the moment I agree with Dan that version numbers are cheap and should be bumped when new code is committed. And the recognition by all that a devel image may not work and may change fairly dramatically while in devel should be general; whether we need to alter that is open to question but I would think not. continuous integration systems do a build when the version number changes. One should expect working software when installing a pre-build package (the tests passed, right?). Checking out from SVN is for developers of that package and nothing should be assumed about the current state of the code. To keep everyone happy, one could add a commit hook to our SVN setup that would add the SVN revision number to the version string. This would be for dev only and hopefully not sufficient to trigger a build. That's my two cents. Happy weekend all. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org wrote: - Original Message - From: Stephanie M. Gogarten sdmor...@u.washington.edu To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel bioc-devel@r-project.org Sent: Friday, September 5, 2014 4:27:13 PM Subject: Re: [Bioc-devel] Please bump
Re: [Bioc-devel] Please bump version number when committing changes
I'd add another scenario which is that every night the build system builds whatever was checked in. This can cause extensive and confusing breakage in the build system. The build report does indicate svn revision number and timestamp of last commit, but one tends to look at the version number. As Kasper points out, though, packages that haven't been bumped do not propagate, and neither do packages that fail to build or check as a result. So the confusion is limited to the build system. My initial email was motivated by noticing that several people had made obvious fixes without bumping versions, clearly not understanding that they needed to do so. In our discussion of the finer points let's not lose sight of this. Dan On September 5, 2014 7:06:03 PM PDT, Kasper Daniel Hansen kasperdanielhan...@gmail.com wrote: Before we go and invent all kinds of stuff, is this a real problem that we need to spend resources thinking about? Dan's original post was about 2 people who check out devel from svn may see the same version number, but have different versions of the code. I acknowledge that this is theoretical possible. In the rare situation where this might matter, it would be better to compare svn revision numbers. And does this really happen with any frequency, I mean, the people who install packages from devel using svn must be very limited for a given package (perhaps I am different, but I only do it occasionally, and almost always for my own packages or if I depend on a package where I have identified an issue, the other author has fixed it and I need to test now and not tomorrow). With the current build policy, as I understand it, two people each installing not from svn, but from the published tarball throguh biocLite, is guaranteed to have the same code if they have the same version. The remaining issue is if one user installs from svn and one user from a tarball. But I think everyone who does svn just need to understand that this can happen. The affected users must be rather limited. One version of the problem, which I can see being confusing, is if an author pushes a bug fix to svn, but does not bump DESCRIPTION. Then I could see some unfortunate discussion between the developer and a user, but that really comes down to lack of understand of the build system for the developer. While I am sure it happens, the solution in my opinion is better education for the developers about the build system. Best, Kasper On Fri, Sep 5, 2014 at 9:48 PM, Michael Lawrence lawrence.mich...@gene.com wrote: As Pete and Ryan have pointed out, it seems that the version control system should somehow ease the burden of the developer here. Let's look at this from the github perspective, since it is likely to be the primary hosting mechanism for the foreseeable future. Just thinking out loud, if R could somehow dynamically ascertain the version of a package at build time, it could query the git checkout for a version. A simple algorithm that I have found effective in non-R projects is to consider git tags, which on github equate to releases. If the repository state is *at* the tag, then use the tag as the version. If the state is ahead of the most recent tag, then use the tag + latest commit hash. I wonder if R could support this by allowing a path to an R script in the version field? On Fri, Sep 5, 2014 at 6:27 PM, Vincent Carey st...@channing.harvard.edu wrote: On Fri, Sep 5, 2014 at 7:50 PM, Peter Haverty haverty.pe...@gene.com wrote: Hi all, I respectfully disagree. One should certainly check in each discrete unit of work. These will often not result in something that is ready to be used by someone else. Bumping the version number constitutes a new release and carries the implicit promise that the package works again. This is why Here I would respectfully disagree. Code in the devel branch carries no guarantees. I think we have been pretty loose with respect to package version number bumping in devel branch; the svn tracking can be used to deal with isolation of code for rollbacks. In this informal regime the package version number is a simple marker of package state. I think it has served us pretty well in past years but the developer community was smaller and had fairly homogeneous habits. Clearly there is room for more regimentation in this area but at the moment I agree with Dan that version numbers are cheap and should be bumped when new code is committed. And the recognition by all that a devel image may not work and may change fairly dramatically while in devel should be general; whether we need to alter that is open to question but I would think not. continuous integration systems do a build when the version number changes. One should expect working software when installing a pre-build package (the tests passed, right?). Checking out from SVN is for developers of that