Your requirements conflict with each other.
1. You want to dump all your messages somewhere
2. You want to be able to update/delete individual message
3. You don't want to introduce anther NOSQL database(like HBASE) since you
already have all messages stored in MongoDB

My suggestion is:
1. Don't introduce another layer of complexity
2. After Spark Streaming, don't store the raw data again
3. If your MongoDB is for OLTP, then simply clone your MongoDB in your OLAP
environment

Cheers,
Tao




On Thu, Sep 3, 2015 at 10:17 AM, <nib...@free.fr> wrote:

> My main question in case of HAR usage is , is it possible to use Pig on it
> and what about performances ?
>
> ----- Mail original -----
> De: "Jörn Franke" <jornfra...@gmail.com>
> À: nib...@free.fr, user@spark.apache.org
> Envoyé: Jeudi 3 Septembre 2015 15:54:42
> Objet: Re: Small File to HDFS
>
>
>
>
> Store them as hadoop archive (har)
>
>
> Le mer. 2 sept. 2015 à 18:07, < nib...@free.fr > a écrit :
>
>
> Hello,
> I'am currently using Spark Streaming to collect small messages (events) ,
> size being <50 KB , volume is high (several millions per day) and I have to
> store those messages in HDFS.
> I understood that storing small files can be problematic in HDFS , how can
> I manage it ?
>
> Tks
> Nicolas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
------------------------------------------------
Thanks!
Tao

Reply via email to