Hi all, I've just committed some new code to the biojava3 branch of the biojava-live subversion repository. It's the foundations of a brand new alphabet+symbol set of classes, and an example of how to use them to represent DNA. You'll notice that the new code is very lightweight and allows for a lot more flexibility than the old code - for instance, the concept of Alphabet has changed radically. It also makes much more extensive use of the Collections API.
I haven't got any test cases or usage examples yet but give me a shout if you don't understand the code and I'll explain how it works. (Hint: SymbolFormat is there to convert Strings into SymbolList objects, and vice versa). So, now we want some volunteers! We're starting from scratch here so there's a lot of work to do. The whole of BioJava needs 'translating' into BJ3, whether it be copy-and-paste existing classes and modify them to suit the new style, or write completely new ones to provide equivalent functionality. I'll post an example of how to do file parsing soon, probably starting with FASTA. In the meantime, a good place to start would be for people to design object models to represent their favourite data types (e.g. Genbank, or microarray data). Utility classes to manipulate those objects would be great too. The object models need to be normalised as much as possible - e.g. if your data has a lot of comments, and the order of those comments is important, then give your object model a collection of comment objects. The object model for each data type should be completely independent and use basic data types wherever possible (e.g. store sequences as strings, don't attempt to parse them into anything fancy like SymbolLists). The closer the object model is to the original data format, the better. There's going to be clever tricks when it comes to converting data between different object models (e.g. Genbank to INSDSeq), which I will explain later when I put the file parsing examples up. You'll notice how the biojava3 branch uses Maven instead of Ant. This is because we want to make it as modular as possible, so if you want to write microarray stuff, create a new microarray sub-project (as per the dna example that's already there). This way if someone only wants the microarray bit of BJ3, they only need install the appropriate JAR file and can ignore the rest. (The 'core' module is for stuff that is so generic it could be used anywhere, or is used in every single other module.) If coding isn't your cup of tea, then we would very much welcome testers (particularly those who enjoy writing test cases!), documenters (particularly code commenters), translators (for internationalisation of the code), and of course all those who wish to contribute ideas and suggestions no matter how off-the-wall they might be. In particular if you'd like to take charge of an area of the development process, e.g. Documentation Chief, or Protein Champion, then that would be much appreciated. I'm very much looking forward to working with everyone on this. Good luck, and happy coding! cheers, Richard PS. Please don't forget to attach the appropriate licence to your code. You can copy-and-paste it from the existing classes I just committed this evening. PPS. For those who are worried about backwards compatibility - this was discussed on the lists a while back and it was made clear that BJ3 is a clean break. However, the existing code will continue to be maintained and bugfixed for a couple of years so you don't have to upgrade if you don't want to - it just won't have any new features developed for it. This is largely because it'll probably take just that long to write all the new BJ3 code. When we do decide to desupport the existing BJ code, plenty of notice will be given (i.e. years as opposed to months). -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: [EMAIL PROTECTED] http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - Biojava-l@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l