Re: [Bioc-devel] differences between petty and perceval (OS X 10.6.8 build machines for release/devel)

2014-06-16 Thread Michael Stadler
Dear Dan, Martin and Nate,

Thank you for looking into it. I guess that is pointing to a problem
within bowtie.

It looks like the EXC_BAD_ACCESS you see on petty in ebwt.h is not
reproducible on the other Mac or Linux machines we tried. Is it possible
to run valgrind on petty? That may confirm/rule out if the memory
(de-)allocation issues reported on Linux are related.

I would like to submit a bug-report to the bowtie developers, but am
reluctant to do that without being able to reproduce the problem or test
potential fixed. I would have the options to go through Rbowtie build
cycles, but would have to rely on the assumption that petty will keep
hitting this hickup even with modified bowtie code. The minor
differences between bowtie 1.0.1 and bowtie 1.0.1-bug-312 argue against
that.

I am tempted to stay with the current situtation:
  - OS X before 10.9 needs to use Rbowtie = 1.4.4
(based on bowtie 1.0.1)
  - OS X 10.9 onwards and everything else uses Rbowtie = 1.4.5
(based on bowtie 1.0.1 /patched bugs-312).

Thanks again for your efforts,
Michael


On 14.06.2014 01:31, Dan Tenenbaum wrote:
 Hi Michael,
 
 
 
 - Original Message -
 From: Michael Stadler michael.stad...@fmi.ch
 To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel@r-project.org
 Sent: Friday, June 13, 2014 12:32:52 AM
 Subject: differences between petty and perceval (OS X 10.6.8 build machines 
 for release/devel)

 Hi Dan,

 I'm cc'ing the list; maybe somebody else has experienced differences
 between petty and perceval.

 Rbowtie release (1.4.5) is not building under OS X 10.6.8 (petty).

 Rbowtie release (1.4.5) and development (1.5.5) are virtually
 identical
 (only DESCRIPTION and NEWS differ).

 The development version builds without problems on perceval, but the
 release version fails on petty:
 http://bioconductor.org/checkResults/devel/bioc-LATEST/Rbowtie/perceval-buildsrc.html
 http://bioconductor.org/checkResults/release/bioc-LATEST/Rbowtie/petty-buildsrc.html

 The only difference I can make out from the node info pages is that
 perceval has an additional section on C++11 compiler that is
 lacking
 from petty's NodeInfo page.

 Unfortunately, I cannot reproduce the issue, both Rbowtie 1.4.5 and
 1.5.5 build successfully under OS X 10.6.8 and 10.7.5 using
 llvm-gcc-4.2.

 Do you have any idea what else could be different between petty and
 perceval?
 
 Martin and Nate and I took a look at this. I managed to come up with a bowtie 
 command line that would reliably reproduce the segfault on petty.
 
 Then we ran that under gdb (and turned off compiler optimizations) and came 
 up with this, which may or may not help you:
 
 petty:vignettes biocbuild$ gdb --args 
 '/Library/Frameworks/R.framework/Versions/3.1/Resources/library/Rbowtie/bowtie'
  -y -S -k 10 -m 10 -v 2 -r -p 4 --best --strata 'doit/refsIndex/index' 
 'doit/SpliceMapTemp_876c378e20ac/25mers.map' 
 'doit/SpliceMapTemp_876c378e20ac/25mers.map_unsorted' 
 GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Mon Aug 15 16:03:10 UTC 
 2011)
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as x86_64-apple-darwin...Reading symbols for shared 
 libraries ... done
 
 (gdb) run
 Starting program: 
 /Library/Frameworks/R.framework/Versions/3.1/Resources/library/Rbowtie/bowtie 
 -y -S -k 10 -m 10 -v 2 -r -p 4 --best --strata doit/refsIndex/index 
 doit/SpliceMapTemp_876c378e20ac/25mers.map 
 doit/SpliceMapTemp_876c378e20ac/25mers.map_unsorted
 Reading symbols for shared libraries ++. done
 
 Program received signal EXC_BAD_ACCESS, Could not access memory.
 Reason: KERN_INVALID_ADDRESS at address: 0x23d0d92d
 [Switching to process 36144 thread 0x20f]
 0x000478b1 in Ebwtseqan::Stringseqan::SimpleTypeunsigned char, 
 seqan::_Dna, seqan::Allocvoid  ::rowL (this=0xbfffda10, l=@0xa300e14) at 
 ebwt.h:1816
 1816return unpack_2b_from_8b(l.side(this-_ebwt)[l._by], l._bp);
 (gdb) l
 1811inline int EbwtTStr::rowL(const SideLocus l) const {
 1812// Extract and return appropriate bit-pair
 1813#ifdef SIXTY4_FORMAT
 1814return (((uint64_t*)l.side(this-_ebwt))[l._by  3]  l._by 
  7)  2) + l._bp)  1))  3;
 1815#else
 1816return unpack_2b_from_8b(l.side(this-_ebwt)[l._by], l._bp);
 1817#endif
 1818}
 1819
 1820/**
 (gdb) p this -_ebwt
 $1 = (uint8_t *) 0x4804a00 \b2
 (gdb) p *this -_ebwt
 $2 = 8 '\b'
 (gdb) p l._by
 $3 = 45
 (gdb) p l.side 
 $4 = SideLocus::side(unsigned char const*) const
 (gdb) p l.side(this-_ebwt)
 $5 = (uint8_t *) 0x23d0d900 Address 0x23d0d900 out of bounds
 (gdb) p l.side(this-_ebwt)[l._by]
 Cannot access memory at address 0x23d0d92d
 (gdb) p this -_ebwt
 $6 = (uint8_t *) 0x4804a00 \b2
 (gdb) 
 
 Running 

[Bioc-devel] question about affy::plotLocation

2014-06-16 Thread Kristóf Jakab

Dear BiocDevelR!

I'm working lot with the excelent *affy package* of Rafael A. Irizarry, 
I find it very useful.


I have a bit strange experience with it's *plotLocation function*.
It seems, *I have to mirror Y coordinates* to plot properly.
Perhaps it's because the CEL file reading starts from the top, and 
plotting starts from the bottom.


I'm not sure if I'm rigtht, can you check, that I haven't made mistake?
If yes, I suggest a (simple) solution for this.

I attach two plot made from a GEO GSM CEL file (see script).
First I've plotted all gene name (ProbeSet) on the CEL file images, 
second I've plotted after mirroring the Y coordinates.
As you can see on the raw plotting there are points on chip name 
(printed by BioB spots).


I attach my plotting script too, and a potential correction for the 
affy::plotLocation. (I've tried it, it seems good.)


Yours sincerly:
Kristóf Jakab

I've linked 2 files to this email:
geo_testing_spot_locations_mirrored.png 
https://www.box.com/shared/ow3q5sn3fpmyz3u8w533(6.0 MB)Box 
https://www.box.com/thunderbirdhttps://www.box.com/shared/ow3q5sn3fpmyz3u8w533
geo_testing_spot_locations_raw.png 
https://www.box.com/shared/3sj9i3lpkixkq85qar0r(6.1 MB)Box 
https://www.box.com/thunderbirdhttps://www.box.com/shared/3sj9i3lpkixkq85qar0r
Mozilla Thunderbird http://www.getthunderbird.com makes it easy to 
share large files over email.


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] question about affy::plotLocation - scripts

2014-06-16 Thread Kristóf Jakab
It seems I can't send attachments, I copy the codes here.


test_plotLocation_affy.R

#!/usr/bin/env Rscript
#kristof.ja...@hegelab.org

# MAKE AFFYBATCH
#--
# download CEL file
library(GEOquery)
getGEOSuppFiles(GSM229005)

#--
# read CEL file
library(affy)
geoS - ReadAffy(filenames=paste(GSM229005,GSM229005.CEL.gz, sep=/))

# PLOTTING TO PNG
#--
# raw
png(filename=geo_testing_spot_locations_raw.png,height=744*10,width=744*10,res=1200)

## image (log scale intensities)
image(geoS,transfo=log)
## perfectmatches
l - indexProbes(geoS, which=pm, geneNames(geoS))
lapply(l,function(li){
   xy - indices2xy(li, abatch=geoS)
   plotLocation(xy,col=tomato,pch=18,cex=0.075)
})
## missmatches
l - indexProbes(geoS, which=mm, geneNames(geoS))
lapply(l,function(li){
   xy - indices2xy(li, abatch=geoS)
   plotLocation(xy,col=aquamarine,pch=18,cex=0.075)
})
dev.off()

#--
# mirrored
png(filename=geo_testing_spot_locations_mirrored.png,height=744*10,width=744*10,res=1200)

## image (log scale intensities)
image(geoS,transfo=log)
## perfectmatches
l - indexProbes(geoS, which=pm, geneNames(geoS))
lapply(l,function(li){
   xy - indices2xy(li, abatch=geoS)
   xy - cbind(x=xy[,1],y=(743-xy[,2])) # mirroring
   plotLocation(xy,col=tomato,pch=18,cex=0.075)
})
## missmatches
l - indexProbes(geoS, which=mm, geneNames(geoS))
lapply(l,function(li){
   xy - indices2xy(li, abatch=geoS)
   xy - cbind(x=xy[,1],y=(743-xy[,2])) # mirroring
   plotLocation(xy,col=aquamarine,pch=18,cex=0.075)
})
dev.off()


correction_for_plotLocation.R

plotLocation - function(x, col=green, pch=22, ...) {
   if (is.list(x)) {
 x - cbind(unlist(lapply(x, function(x) x[,1])),
unlist(lapply(x, function(x) x[,2])))
   }
   points(x[,1], 743-x[,2] # mirroring 744Ã---744 matrix, numbered from 0 to 743
  , pch=pch, col=col, ...)
}


On 06/16/2014 10:59 AM, Kristóf Jakab wrote:
 Dear BiocDevelR!

 I'm working lot with the excelent *affy package* of Rafael A. 
 Irizarry, I find it very useful.

 I have a bit strange experience with it's *plotLocation function*.
 It seems, *I have to mirror Y coordinates* to plot properly.
 Perhaps it's because the CEL file reading starts from the top, and 
 plotting starts from the bottom.

 I'm not sure if I'm rigtht, can you check, that I haven't made mistake?
 If yes, I suggest a (simple) solution for this.

 I attach two plot made from a GEO GSM CEL file (see script).
 First I've plotted all gene name (ProbeSet) on the CEL file images, 
 second I've plotted after mirroring the Y coordinates.
 As you can see on the raw plotting there are points on chip name 
 (printed by BioB spots).

 I attach my plotting script too, and a potential correction for the 
 affy::plotLocation. (I've tried it, it seems good.)

 Yours sincerly:
 Kristóf Jakab

 I've linked 2 files to this email:
 geo_testing_spot_locations_mirrored.png 
 https://www.box.com/shared/ow3q5sn3fpmyz3u8w533(6.0 MB)Box 
 https://www.box.com/thunderbirdhttps://www.box.com/shared/ow3q5sn3fpmyz3u8w533
  

 geo_testing_spot_locations_raw.png 
 https://www.box.com/shared/3sj9i3lpkixkq85qar0r(6.1 MB)Box 
 https://www.box.com/thunderbirdhttps://www.box.com/shared/3sj9i3lpkixkq85qar0r
  

 Mozilla Thunderbird http://www.getthunderbird.com makes it easy to 
 share large files over email.


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] question about affy::plotLocation - scripts

2014-06-16 Thread James W. MacDonald

Hi Kristóf,

On 6/16/2014 10:20 AM, Kristóf Jakab wrote:

It seems I can't send attachments, I copy the codes here.


test_plotLocation_affy.R

#!/usr/bin/env Rscript
#kristof.ja...@hegelab.org

# MAKE AFFYBATCH
#--
# download CEL file
library(GEOquery)
getGEOSuppFiles(GSM229005)

#--
# read CEL file
library(affy)
geoS - ReadAffy(filenames=paste(GSM229005,GSM229005.CEL.gz, sep=/))

# PLOTTING TO PNG
#--
# raw
png(filename=geo_testing_spot_locations_raw.png,height=744*10,width=744*10,res=1200)

## image (log scale intensities)
image(geoS,transfo=log)
## perfectmatches
l - indexProbes(geoS, which=pm, geneNames(geoS))
lapply(l,function(li){
xy - indices2xy(li, abatch=geoS)
plotLocation(xy,col=tomato,pch=18,cex=0.075)
})
## missmatches
l - indexProbes(geoS, which=mm, geneNames(geoS))
lapply(l,function(li){
xy - indices2xy(li, abatch=geoS)
plotLocation(xy,col=aquamarine,pch=18,cex=0.075)
})
dev.off()

#--
# mirrored
png(filename=geo_testing_spot_locations_mirrored.png,height=744*10,width=744*10,res=1200)

## image (log scale intensities)
image(geoS,transfo=log)
## perfectmatches
l - indexProbes(geoS, which=pm, geneNames(geoS))
lapply(l,function(li){
xy - indices2xy(li, abatch=geoS)
xy - cbind(x=xy[,1],y=(743-xy[,2])) # mirroring
plotLocation(xy,col=tomato,pch=18,cex=0.075)
})
## missmatches
l - indexProbes(geoS, which=mm, geneNames(geoS))
lapply(l,function(li){
xy - indices2xy(li, abatch=geoS)
xy - cbind(x=xy[,1],y=(743-xy[,2])) # mirroring
plotLocation(xy,col=aquamarine,pch=18,cex=0.075)
})
dev.off()


correction_for_plotLocation.R

plotLocation - function(x, col=green, pch=22, ...) {
if (is.list(x)) {
  x - cbind(unlist(lapply(x, function(x) x[,1])),
 unlist(lapply(x, function(x) x[,2])))
}
points(x[,1], 743-x[,2] # mirroring 744Ã---744 matrix, numbered from 0 to 
743
   , pch=pch, col=col, ...)
}


Thanks for pointing this out. It's apparent almost nobody ever uses this 
code, as it has been in the affy package since pretty much the beginning 
(2002), and you are the first to notice this.


Unfortunately, hard-coding the number of rows isn't the answer, since 
Affy arrays have different dimensions. Probably the best fix is to add 
an additional required argument 'affybatch' that we can use to extract 
the chip dimensions from.


Best,

Jim





On 06/16/2014 10:59 AM, Kristóf Jakab wrote:

Dear BiocDevelR!

I'm working lot with the excelent *affy package* of Rafael A.
Irizarry, I find it very useful.

I have a bit strange experience with it's *plotLocation function*.
It seems, *I have to mirror Y coordinates* to plot properly.
Perhaps it's because the CEL file reading starts from the top, and
plotting starts from the bottom.

I'm not sure if I'm rigtht, can you check, that I haven't made mistake?
If yes, I suggest a (simple) solution for this.

I attach two plot made from a GEO GSM CEL file (see script).
First I've plotted all gene name (ProbeSet) on the CEL file images,
second I've plotted after mirroring the Y coordinates.
As you can see on the raw plotting there are points on chip name
(printed by BioB spots).

I attach my plotting script too, and a potential correction for the
affy::plotLocation. (I've tried it, it seems good.)

Yours sincerly:
Kristóf Jakab

I've linked 2 files to this email:
geo_testing_spot_locations_mirrored.png
https://www.box.com/shared/ow3q5sn3fpmyz3u8w533(6.0 MB)Box
https://www.box.com/thunderbirdhttps://www.box.com/shared/ow3q5sn3fpmyz3u8w533

geo_testing_spot_locations_raw.png
https://www.box.com/shared/3sj9i3lpkixkq85qar0r(6.1 MB)Box
https://www.box.com/thunderbirdhttps://www.box.com/shared/3sj9i3lpkixkq85qar0r

Mozilla Thunderbird http://www.getthunderbird.com makes it easy to
share large files over email.



[[alternative HTML version deleted]]



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] filterVcf: why require a filter?

2014-06-16 Thread Michael Lawrence
Hi,

I was trying to use filterVcf just to filter a VCF by a range, via which
in ScanVcfParam, without any filters, but it failed with:

Error in filterVcf(tbx, genome = genome, destination = destination, ...,
(from #2) :
  no 'prefilters' or 'filters' specified

Why not allow identity, i.e., where the filter is inherent in the
restricted query?

Michael

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] evaluation of C post-increments changed in GCC 4.8.2

2014-06-16 Thread Robert Castelo

hi Nathaniel cc Dan,

thanks a lot for clearing up completely the entire story. I'm afraid 
that one or two cycles ago of our conversation i did a simple reply 
instead of a reply-all and the bioc-devel list wasn't included anymore 
in the recipients of these emails.


since what you say below sounds like a relevant piece of information for 
anyone working with C code i'm cc'ing the bioc-devel list again.


cheers,
robert.

On 6/16/14 11:15 PM, Nathaniel Hayden wrote:
Hi, Robert. You are correct. zin2 and petty failed to emit warnings 
for the problematic code. After some digging we discovered that for 
gcc, any optimization level above 0 prevents emission of the 
-Wsequence-point warning in this case. But the optimizations must stay 
for production code.


As a follow-up to the recommendations before about flags to use during 
package development, we have added content to the Package Guidelines 
page on our website: 
http://www.bioconductor.org/developers/package-guidelines/#c-code


The failure of some build machines to emit the warning under 
production conditions underscores the importance of the original 
recommendation to enable as many warnings as possible during development.


Thanks for bringing it up!
Nate

On Mon 16 Jun 2014 07:42:36 AM PDT, Robert Castelo wrote:


hi Nathaniel,

On 06/14/2014 01:01 AM, Nathaniel Hayden wrote:


Hi, Robert. You're welcome.

It sounds like something isn't happening, but you think it
should. Could you be more precise about what you expect to happen (the
conditions that *should* lead to the warning, but do not)? There are
lots of variables floating around:
- devel or release? (I see similar commits to devel and release so
unclear which I should look at; current devel version looks like it
fails before it has a chance to give the warning.)



yes, this was an unrelated error, which actually Dan warned me about
and for which i sent a fix yesterday. the situation i was describing
was occurring in both, devel and release, but both are fixed by now.



- it sounds like you're talking about a Mavericks machine in the Bioc
build system; can you confirm which one?

Both the devel and release Mavericks build machines use clang, and
both linux machines (zin1/zin2) use gcc with -Wall.



so, for instance, the release version from VariantFiltering 1.0.1 was
giving these warnings i was talking about:

Found the following significant warnings:
methods-WeightMatrix.c:256:19: warning: unsequenced modification and
access to 'q' [-Wunsequenced]
methods-WeightMatrix.c:638:17: warning: unsequenced modification and
access to 'q' [-Wunsequenced]

*only* in the R CMD check from 'morelia' and not from 'petty' or
'zin2', while all three machines in principle have the -Wall option
activated.

currently, because i submitted the fix, version 1.0.2 does not give
these warnings anymore. however, i have just committed a new version
to de release branch, 1.0.3, that has this problem back in line 256:

while ((*q++=tolower(*q)));

and should recreate the odd situation i saw, that only 'morelia' warns
about this line, but not 'petty' or 'zin2'.


thanks!
robert.





Thanks,
Nate

On 06/13/2014 12:54 AM, Robert Castelo wrote:


hi Nathaniel,

thanks for the very clear examples. after all, probably it is just my
package which may have this problem. one further question below..

On 06/12/2014 07:12 PM, Nathaniel Hayden wrote:


Hi, Robert. C++ is my area so I can't speak as knowledgeably about C,


[...]


I confirm that using your test file gcc 4.6.3 indeed warns about
unsequenced shenanigans with -Wall 'warning: operation on ‘p’ may be
undefined [-Wsequence-point]'. I would add it's also a good idea
during
the development cycle to use -Wextra and -pedantic flags. (You can
read
about them here:
http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html)



the strange things is that the only machine at the building pipeline
of BioC that warned about this in my package was the one running Mac
OSX Mavericks with gcc 4.8.2 and not also the Linux zin2 which is
running gcc 4.6.3

you can see it if the 1.0.1 version of VariantFiltering is still at
the check report.

anyway, i'll use those options during development and that should
avoid me this kind of problems in the future.

thanks!
robert.









___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel