Hi Nibiau,

Hbase seems to be a good solution to your problems. As you may know storing 
yours messages as a key-value pairs in Hbase saves you the overhead of manually 
resizing blocks of data using zip files. 
The added advantage along with the fact that Hbase uses HDFS for storage, is 
the capability of updating your records for example with the "put" function. 

Cheers,
Ardo

> On 03 Sep 2015, at 13:35, nib...@free.fr wrote:
> 
> Ok but so some questions :
> - Sometimes I have to remove some messages from HDFS (cancel/replace cases) , 
> is it possible ?
> - In the case of a big zip file, is it possible to easily process Pig on it 
> directly ?
> 
> Tks
> Nicolas
> 
> ----- Mail original -----
> De: "Tao Lu" <taolu2...@gmail.com>
> À: nib...@free.fr
> Cc: "Ted Yu" <yuzhih...@gmail.com>, "user" <user@spark.apache.org>
> Envoyé: Mercredi 2 Septembre 2015 19:09:23
> Objet: Re: Small File to HDFS
> 
> 
> You may consider storing it in one big HDFS file, and to keep appending new 
> messages to it. 
> 
> 
> For instance, 
> one message -> zip it -> append it to the HDFS as one line 
> 
> 
> On Wed, Sep 2, 2015 at 12:43 PM, < nib...@free.fr > wrote: 
> 
> 
> Hi, 
> I already store them in MongoDB in parralel for operational access and don't 
> want to add an other database in the loop 
> Is it the only solution ? 
> 
> Tks 
> Nicolas 
> 
> ----- Mail original ----- 
> De: "Ted Yu" < yuzhih...@gmail.com > 
> À: nib...@free.fr 
> Cc: "user" < user@spark.apache.org > 
> Envoyé: Mercredi 2 Septembre 2015 18:34:17 
> Objet: Re: Small File to HDFS 
> 
> 
> 
> 
> Instead of storing those messages in HDFS, have you considered storing them 
> in key-value store (e.g. hbase) ? 
> 
> 
> Cheers 
> 
> 
> On Wed, Sep 2, 2015 at 9:07 AM, < nib...@free.fr > wrote: 
> 
> 
> Hello, 
> I'am currently using Spark Streaming to collect small messages (events) , 
> size being <50 KB , volume is high (several millions per day) and I have to 
> store those messages in HDFS. 
> I understood that storing small files can be problematic in HDFS , how can I 
> manage it ? 
> 
> Tks 
> Nicolas 
> 
> --------------------------------------------------------------------- 
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 
> 
> --------------------------------------------------------------------- 
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 
> 
> 
> 
> -- 
> 
> 
> ------------------------------------------------ Thanks! 
> Tao
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to