On Wed, Mar 25, 2009 at 5:07 PM, Charles Plessy <ple...@debian.org> wrote: > In the application to be posted in the SoC web interface > (http://socghop.appspot.com), please present yourself as you did in this > email, > and explain briefly with your own words why you think that the project you > selected is interesting. Then develop your ideas on how to acheive it, and > conclude by explaining why and how you think that you will manage to end up > your project with something concrete. Ideally, be precise and concise.
Thanks for the help! I just submitted my application. Under you'll find the content. Does it look ok? Title: Large dataset manager Student: Roy Flemming Hvaara Abstract: Large public datasets, like databases for bioinformatics are typically too big and too volatile to fit the traditional source/binary packaging scheme of Debian. There are some programs that are distributed in Debian, like blast and emboss, can index specialised databases, but Debian lacks a tool to install or update the datasets they need and keep their indexing in sync. Content: Name: Roy Flemming Hvaara Email: roy.hva...@gmail.com Background: I'm 21 years old and from Norway, but I study medicine at Pécs University in Hungary. I've been developing projects in bash, perl and php for some years. I've had linux as my main Operating System for about six years. Project title: Large database manager Synopsis: I want to create a tool to install and update large databases. Initially I want to make an application that downloads the databases and updates directly from the content provider. Later I will include an option to download from a mirror, and only the updates. That means a user would not have to download the whole database again in case the updates are not in separate files. Thus the amount the user has to download will be less, and time and resources are spared. I also want to include a tool to make the databases in the Debian software package format (.deb), and possibly for other distributions for linux as well - RPM is something I definitely want to add support for. This has multiple benefits; 1) There will be one file that the user can move between multiple computers. 2) The user only has to download the database once. 3) The files are managed by the package manager of the distribution, keeping the system more streamline. 4) Less hassle to update the databases. 5) Keeping track of multiple version of databases and/or versions. To me the most logic approach is to create this tool in perl. See getData [1] I have been in contact with a webhost that has approved hosting for packages. I hope to be able to distribute the Debian packages - and others - through my own repository, apt-and-the-likes compatible. Benefits to Debian: As of right now there are no good way to distribute very large datasets in debian. This project will help towards solving this issue. Deliverables: Management of large datasets. Project schedule: I think the schedule depends on how many features I'm going to implement and so on. Package management is an ongoing process which always requires some work. I am commited to continue work even after the summer for continued support. Exams and other commitments: My exam period starts on the 18th of May. I will work very hard to finish as soon as possible so that I can start work on this project. If you are not a Debian Developer: I have always wanted to be a package maintainer. The dgen package in debian-games appealed to me when it was orphaned, but as a medical student I have very limited time. I will definitely continue development on this project after the summer. I think it's very interesting, and bioinformatics is something that I would love to work more with. [1] http://wiki.debian.org/getData -- Best Regards, Roy Hvaara -- To UNSUBSCRIBE, email to debian-med-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org