Ok but so some questions :
- Sometimes I have to remove some messages from HDFS (cancel/replace cases) , 
is it possible ?
- In the case of a big zip file, is it possible to easily process Pig on it 
directly ?

Tks
Nicolas

----- Mail original -----
De: "Tao Lu" <taolu2...@gmail.com>
À: nib...@free.fr
Cc: "Ted Yu" <yuzhih...@gmail.com>, "user" <user@spark.apache.org>
Envoyé: Mercredi 2 Septembre 2015 19:09:23
Objet: Re: Small File to HDFS


You may consider storing it in one big HDFS file, and to keep appending new 
messages to it. 


For instance, 
one message -> zip it -> append it to the HDFS as one line 


On Wed, Sep 2, 2015 at 12:43 PM, < nib...@free.fr > wrote: 


Hi, 
I already store them in MongoDB in parralel for operational access and don't 
want to add an other database in the loop 
Is it the only solution ? 

Tks 
Nicolas 

----- Mail original ----- 
De: "Ted Yu" < yuzhih...@gmail.com > 
À: nib...@free.fr 
Cc: "user" < user@spark.apache.org > 
Envoyé: Mercredi 2 Septembre 2015 18:34:17 
Objet: Re: Small File to HDFS 




Instead of storing those messages in HDFS, have you considered storing them in 
key-value store (e.g. hbase) ? 


Cheers 


On Wed, Sep 2, 2015 at 9:07 AM, < nib...@free.fr > wrote: 


Hello, 
I'am currently using Spark Streaming to collect small messages (events) , size 
being <50 KB , volume is high (several millions per day) and I have to store 
those messages in HDFS. 
I understood that storing small files can be problematic in HDFS , how can I 
manage it ? 

Tks 
Nicolas 

--------------------------------------------------------------------- 
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
For additional commands, e-mail: user-h...@spark.apache.org 



--------------------------------------------------------------------- 
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
For additional commands, e-mail: user-h...@spark.apache.org 





-- 


------------------------------------------------ Thanks! 
Tao

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to