Re: which database for gene alignment data ?

2015-06-10 Thread Frank Austin Nothaft
Hi Roni, These are exposed as public APIs. If you want, you can run them inside of the adam-shell (which is just a wrapper for the spark shell, but with the ADAM libraries on the class path). Also , I need to save all my intermediate data. Seems like ADAM stores data in Parquet on HDFS. I

Re: which database for gene alignment data ?

2015-06-09 Thread roni
Hi Frank, Thanks for the reply. I downloaded ADAM and built it but it does not seem to list this function for command line options. Are these exposed as public API and I can call it from code ? Also , I need to save all my intermediate data. Seems like ADAM stores data in Parquet on HDFS. I want

Re: which database for gene alignment data ?

2015-06-08 Thread roni
Sorry for the delay. The files (called .bed files) have format like - Chromosome start endfeature score strand chr1 713776 714375 peak.1 599+ chr1 752401 753000 peak.2 599+ The mandatory fields are 1. chrom - The name of the chromosome (e.g. chr3, chrY,

Re: which database for gene alignment data ?

2015-06-08 Thread Frank Austin Nothaft
Hi Roni, We have a full suite of genomic feature parsers that can read BED, narrowPeak, GATK interval lists, and GTF/GFF into Spark RDDs in ADAM Additionally, we have support for efficient overlap joins (query 3 in your email below). You can load the genomic features with

which database for gene alignment data ?

2015-06-06 Thread roni
I want to use spark for reading compressed .bed file for reading gene sequencing alignments data. I want to store bed file data in db and then use external gene expression data to find overlaps etc, which database is best for it ? Thanks -Roni

Re: which database for gene alignment data ?

2015-06-06 Thread Ted Yu
Can you describe your use case in a bit more detail since not all people on this mailing list are familiar with gene sequencing alignments data ? Thanks On Fri, Jun 5, 2015 at 11:42 PM, roni roni.epi...@gmail.com wrote: I want to use spark for reading compressed .bed file for reading gene