Re: Serving data

andy petrella Mon, 15 Sep 2014 04:56:15 -0700

nope.
It's an efficient storage for genomics data :-D

aℕdy ℙetrella
about.me/noootsab
[image: aℕdy ℙetrella on about.me]


<http://about.me/noootsab>

On Mon, Sep 15, 2014 at 1:52 PM, Marius Soutier <mps....@gmail.com> wrote:

> So you are living the dream of using HDFS as a database? ;)
>
> On 15.09.2014, at 13:50, andy petrella <andy.petre...@gmail.com> wrote:
>
> I'm using Parquet in ADAM, and I can say that it works pretty fine!
> Enjoy ;-)
>
> aℕdy ℙetrella
> about.me/noootsab
> [image: aℕdy ℙetrella on about.me]
>
> <http://about.me/noootsab>
>
> On Mon, Sep 15, 2014 at 1:41 PM, Marius Soutier <mps....@gmail.com> wrote:
>
>> Thank you guys, I’ll try Parquet and if that’s not quick enough I’ll go
>> the usual route with either read-only or normal database.
>>
>> On 13.09.2014, at 12:45, andy petrella <andy.petre...@gmail.com> wrote:
>>
>> however, the cache is not guaranteed to remain, if other jobs are
>> launched in the cluster and require more memory than what's left in the
>> overall caching memory, previous RDDs will be discarded.
>>
>> Using an off heap cache like tachyon as a dump repo can help.
>>
>> In general, I'd say that using a persistent sink (like Cassandra for
>> instance) is best.
>>
>> my .2¢
>>
>>
>> aℕdy ℙetrella
>> about.me/noootsab
>> [image: aℕdy ℙetrella on about.me]
>>
>> <http://about.me/noootsab>
>>
>> On Sat, Sep 13, 2014 at 9:20 AM, Mayur Rustagi <mayur.rust...@gmail.com>
>> wrote:
>>
>>> You can cache data in memory & query it using Spark Job Server.
>>> Most folks dump data down to a queue/db for retrieval
>>> You can batch up data & store into parquet partitions as well. & query
>>> it using another SparkSQL  shell, JDBC driver in SparkSQL is part 1.1 i
>>> believe.
>>> --
>>> Regards,
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>> @mayur_rustagi
>>>
>>>
>>> On Fri, Sep 12, 2014 at 2:54 PM, Marius Soutier <mps....@gmail.com>
>>> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I’m pretty new to Spark, and so far I’ve written my jobs the same way I
>>>> wrote Scalding jobs - one-off, read data from HDFS, count words, write
>>>> counts back to HDFS.
>>>>
>>>> Now I want to display these counts in a dashboard. Since Spark allows
>>>> to cache RDDs in-memory and you have to explicitly terminate your app (and
>>>> there’s even a new JDBC server in 1.1), I’m assuming it’s possible to keep
>>>> an app running indefinitely and query an in-memory RDD from the outside
>>>> (via SparkSQL for example).
>>>>
>>>> Is this how others are using Spark? Or are you just dumping job results
>>>> into message queues or databases?
>>>>
>>>>
>>>> Thanks
>>>> - Marius
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>>
>
>

Re: Serving data

Reply via email to