Hi Florian - There are at least two approaches. You are on the right track with making a union of all gene locations. The compound location that results from the Union will contain all the nucleotides that are coding. You can then iterate through each nucleotide in the genome and find out if the union contains the nucleotide. If it doesn't then it is non coding. This is surprisingly rapid as the comparisons are simple. The pseudo code would be something like...
RichLocation coding; //initialize this by making a union of all locations of CDS or Gene Features. RichSequence genome; // read from file or database for(int i = 1; i <= genome.lenght(); i++){ //you might need to be a bit more sophisticated for a circular genome if( ! genome.contains(i){ //you have a non-coding nucleotide. } } The other approach is to use the blockIterator() method of the compound location that results from the union of coding sequences. This will output each contiguous chunk of coding sequence. If you know the length of the sequence then you can rapidly figure out the intervening pieces. For example, if the block iterator tells you that [10..50], [90..100], [350..380] are coding and you know the genome is of length 400 then you can quickly derive [1..9], [51..89], [101..349] and [381..400] are non-coding. Again it is more complicated for circular sequences and more complex if you consider the opposite strand of a gene (the gene shadow) to be non-coding. Unfortunately there is no convenience method to do this but if you code something up it would be great to put it in the cookbook so others can re-use it. - Mark You could actually make point locations of all the non-coding nucleotides and then merge the whole lot at the end into a compound location of non-coding On Wed, Apr 23, 2008 at 9:49 PM, Florian Schatz <[EMAIL PROTECTED]> wrote: > Hello, > > I am new to biojava and worked a lot with in the last few weeks. I hope > this is the right place for questions, if not please tell me. > > I want to get the nucleotid sequence outside the genes of a genebank file. > So everything that is not marked by a 'gene' feature. Unfortunately, there > is no sustract or exclude function for the Location class. Any hints? > > Btw: union() of location worked fine for extracting nucleotids of the genes > only. > > Best, > Florian > _______________________________________________ > Biojava-l mailing list - Biojava-l@lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - Biojava-l@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l