nope. It's an efficient storage for genomics data :-D aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me]
<http://about.me/noootsab> On Mon, Sep 15, 2014 at 1:52 PM, Marius Soutier <mps....@gmail.com> wrote: > So you are living the dream of using HDFS as a database? ;) > > On 15.09.2014, at 13:50, andy petrella <andy.petre...@gmail.com> wrote: > > I'm using Parquet in ADAM, and I can say that it works pretty fine! > Enjoy ;-) > > aℕdy ℙetrella > about.me/noootsab > [image: aℕdy ℙetrella on about.me] > > <http://about.me/noootsab> > > On Mon, Sep 15, 2014 at 1:41 PM, Marius Soutier <mps....@gmail.com> wrote: > >> Thank you guys, I’ll try Parquet and if that’s not quick enough I’ll go >> the usual route with either read-only or normal database. >> >> On 13.09.2014, at 12:45, andy petrella <andy.petre...@gmail.com> wrote: >> >> however, the cache is not guaranteed to remain, if other jobs are >> launched in the cluster and require more memory than what's left in the >> overall caching memory, previous RDDs will be discarded. >> >> Using an off heap cache like tachyon as a dump repo can help. >> >> In general, I'd say that using a persistent sink (like Cassandra for >> instance) is best. >> >> my .2¢ >> >> >> aℕdy ℙetrella >> about.me/noootsab >> [image: aℕdy ℙetrella on about.me] >> >> <http://about.me/noootsab> >> >> On Sat, Sep 13, 2014 at 9:20 AM, Mayur Rustagi <mayur.rust...@gmail.com> >> wrote: >> >>> You can cache data in memory & query it using Spark Job Server. >>> Most folks dump data down to a queue/db for retrieval >>> You can batch up data & store into parquet partitions as well. & query >>> it using another SparkSQL shell, JDBC driver in SparkSQL is part 1.1 i >>> believe. >>> -- >>> Regards, >>> Mayur Rustagi >>> Ph: +1 (760) 203 3257 >>> http://www.sigmoidanalytics.com >>> @mayur_rustagi >>> >>> >>> On Fri, Sep 12, 2014 at 2:54 PM, Marius Soutier <mps....@gmail.com> >>> wrote: >>> >>>> Hi there, >>>> >>>> I’m pretty new to Spark, and so far I’ve written my jobs the same way I >>>> wrote Scalding jobs - one-off, read data from HDFS, count words, write >>>> counts back to HDFS. >>>> >>>> Now I want to display these counts in a dashboard. Since Spark allows >>>> to cache RDDs in-memory and you have to explicitly terminate your app (and >>>> there’s even a new JDBC server in 1.1), I’m assuming it’s possible to keep >>>> an app running indefinitely and query an in-memory RDD from the outside >>>> (via SparkSQL for example). >>>> >>>> Is this how others are using Spark? Or are you just dumping job results >>>> into message queues or databases? >>>> >>>> >>>> Thanks >>>> - Marius >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >> > >