Re: [DAS] DAS projects/Future of DAS

Jonathan Warren Thu, 24 Nov 2011 09:20:32 -0800

Hi Manuel

Thanks again for the input ;)

I agree with Andy that a generic DAS upload/server is going to beinherently complicated and will be limited by the upload across theinternet. To a large degree the DAS system came about to stop largeamounts of data needing to be uploaded or downloaded over the net.However I can see that having DAS "nodes" such as the EBI and Sangeretc with tailor made upload user interfaces and servers for specificdata types is a reasonable solution/addition to the DAS system. Tothis end, I'm already more than half way to a solution for your needs(which doesn't need a database) which we could use, subject to Sangerapproval. This solution could also be made more generic for other datasources as well with a specific interface developed for each.

As to the future of DAS - I believe that over the last 3 years we haveimproved the DAS protocol and many of the associated implementationsso that we now have dispensed with many if not all of the previouscriticisms people have had of the DAS system (1.6E spec):Validation has improved the level of conformity to the DAS spec andnew servers (MyDAS and Proserver) now behave in the same way for bothrequests and responses.

You can now search DAS sources (MyDAS).
Next feature capability (in the spec and Proserver).

You can have alternative content that will require less bandwidth e.gJSON (the Registry serves this already and soon MyDAS servers andproserver hopefully).We have writeback servers (MyDAS)-(already implemented at the EBI forproteins and an example server available soon that will accept postsand puts at the Sanger for genomic sources).

Really we NEED the community to come together and put data up usingthese new servers and for major clients such as Ensembl to startsupporting 1.6 spec servers and the newer "Extended" features. TheDalliance browser by Thomas Down proves how fast a DAS client can be.I believe there is a lot more potential for the DAS system and it'sstill a good solution for today's data distribution needs.


Cheers

Jonathan.



On 19 Nov 2011, at 15:58, Manuel Corpas wrote:

Having said all this, I am a little confused about what you aretrying to achieve. In your first mail you said you wanted to createsources via an API, in the second you say you want to do it via aclick. Obviously the requirements for both are very different.
Both API and 1-click DAS source creation would be extremely helpful in
my view. The fact that this functionality is not available is
seriously affecting many of our users' ability to create DAS sources
with their 23andMe genotypes.

The fact that these facilities do not exist has stopped new potential
users from utilizing DAS. If DAS is truly going to survive as a
standard, automatic creation of data sources needs to be easier.

Manuel


Manuel Corpas, PhD
Tel:      +44.122349.2372
Web:    http://manuelcorpas.com/about/
Twitter: @manuelcorpas
On 18 November 2011 23:33, Andy Jenkinson <[email protected]>wrote:
Hi Manuel,
It would be nice to be able to create a DAS source from any type ofdata you happen to have with a click or two, but I don't think itis realistic. Even in this email you have just told me what all thecolumns mean, what the assembly is, what kind of file it is. Anyapplication would need to know the same things (and more).
That is not to say that it is difficult to build something to letyou do this if it is specifically designed for the exact type ofdata you are using, just that it does not already exist and so youhave to actually create it. Either MyDas or ProServer would seem tooffer you a starting point to do that, but only a starting point.EasyDAS is the closest thing to what you want but obviously it hasto cater for any type of data so has to ask you a lot morequestions. Its web-based architecture obviously limits the size ofdata files you can process quickly too, but that is the trade offyou make by not needing an Internet-visible web server of your ownto run a DAS server from. I daresay if you wanted to createsomething that an individual can use to make a DAS source fromtheir personal BED/VCF file then it would have to be web based,will always be restricted by the speed of the Internet, but theinterface could be much simpler than EasyDAS and a database mightnot be needed (EasyDAS loads file contents into a database tostandardise them, which slows things down).
Having said all this, I am a little confused about what you aretrying to achieve. In your first mail you said you wanted to createsources via an API, in the second you say you want to do it via aclick. Obviously the requirements for both are very different.
Cheers,
Andy

On 18 Nov 2011, at 15:24, Manuel Corpas wrote:
Hi Andy,
thanks for the info. Having a bed DAS adaptor is part of theproblem,
the other is not having to worry about having to deal with the DAS
server directly. easyDAS manages to do this but unfortunately it is
not obvious for people who do not know DAS how to operate it. Alsoifthe file is very big and the connection slow it can take up to anhour
to create a DAS source.
Wouldn't it be nice to create a DAS source just with one click ortwo?
Please see below a snippet of a few SNPs in my chromosome 16 just as
you would get them from 23andMe (NCBI36 assembly; columns mean
SNP_id/chr/position/genotype).

Cheers,
Manuel

rs7763        16      544555  TT
rs763158      16      546105  GG
rs7190878     16      549131  AG
rs4984890     16      552699  CT
rs710925      16      573355  AG
rs2017567     16      577213  CT
rs4144003     16      585969  CT
rs7190358     16      590789  AG
rs7203694     16      592942  AG
rs11248940    16      595687  TT
rs7204088     16      601143  TT
rs4984677     16      611683  AG
rs9929621     16      619413  CT
rs11642546    16      641657  CC
rs3752496     16      650256  TT
rs2301426     16      651906  GG
rs1044662     16      655061  CC
rs9934288     16      656288  AC
rs3752493     16      657524  TT
rs1139897     16      660987  GG
rs1045763     16      664085  CC
rs3830140     16      665336  AA
rs8056588     16      666190  CC
rs6597        16      671726  TT


Manuel Corpas, PhD
Tel:      +44.122349.2372
Web:    http://manuelcorpas.com/about/
Twitter: @manuelcorpas
On 18 November 2011 15:14, Andy Jenkinson<[email protected]> wrote:
Hi Manuel,
Since 2008 ProServer has had a BED format SourceAdaptor (calledbed12, as it is intended to work with the 12-field BED format).It also supports Hydras, which are modules that are designed toautomatically create DAS sources from a single config withoutrestarting the server. This is how EasyDAS works with ProServer:there is one SourceAdaptor, and a Hydra to scan a relationaldatabase for new data.
I don't know what 23andme's data looks like, but the addition ofa Hydra to scan directories for new files and automatically makethem available as DAS sources would seem to be a trivial piece ofwork. I daresay a VCF adaptor would also be fairly easy,especially if there is a Perl API of some sort (BioPerl?).
Cheers,
Andy

On 17 Nov 2011, at 17:11, Manuel Corpas wrote:
Dear Jonathan,
I hope you do not mind me copying the DAS list in this email, aswewould be very keen to gather interest in the community regardingDAS
applications to whole genomes.
We are interested in exploring DAS in the context of genomicvariants(SNPs, indels, CNVs) from personal genomes plus theirintegration with
relevant sources (genes, variation data, phenotypes).

Currently we have done a lot of work with 23andMe (whole-genome)
genotypes but now we are expecting to extend our efforts furtherto
exome data. A critical tool we are currently missing is one that
allows automatic creation of DAS sources via an API directlyfrom bed
format (used by 23andMe) or vcf (1000genomes).

Anyone interested in discussing these topics please let me know.

Kind regards,
Manuel

Manuel Corpas, PhD
Tel:      +44.122349.2372
Web:    http://manuelcorpas.com/about/
Twitter: @manuelcorpas
On 17 November 2011 12:11, Jonathan Warren <[email protected]>wrote:
Hi
As the 2012 DAS workshop is coming up at the end of February wewould like
to hear from people using DAS.
We would be really grateful to receive just a short email fromanyone usingDAS or developing DAS with a brief summary about their projectand how DASfits in, especially if you have not spoken at the DAS workshopsat any time.
Please also say if you would be interested in giving a shortpresentation atthe workshop in February even if you are not sure if you couldmake it.Previous years the presentations have been 15 minutes with 5minutes forquestions - however this year we intend to be more flexible andso if youwould prefer to give a "lightning talk" of just 5 minutes toupdate peopleor give them a brief overview that will be fine. Links to theprevious years
talks can be found here http://www.biodas.org/wiki/DASWorkshop2011#Day_2
I must emphasise - please give us a summary even if you are notinterestedin giving a talk as we would like to know what is going on outthere and we
promise not to hound you to give a talk :)

Thanks in advance

The Sanger/EBI DAS people.


Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
[email protected]
Ext: 2314
Telephone: 01223 492314










--
The Wellcome Trust Sanger Institute is operated by GenomeResearchLimited, acharity registered in England with number 1021457 and acompanyregistered inEngland with number 2742969, whose registeredoffice is 215Euston Road,
London, NW1 2BE._______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das
_______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das


Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
[email protected]
Ext: 2314
Telephone: 01223 492314









--

The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________

DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das

Re: [DAS] DAS projects/Future of DAS

Reply via email to