Ok but so some questions : - Sometimes I have to remove some messages from HDFS (cancel/replace cases) , is it possible ? - In the case of a big zip file, is it possible to easily process Pig on it directly ?
Tks Nicolas ----- Mail original ----- De: "Tao Lu" <taolu2...@gmail.com> À: nib...@free.fr Cc: "Ted Yu" <yuzhih...@gmail.com>, "user" <user@spark.apache.org> Envoyé: Mercredi 2 Septembre 2015 19:09:23 Objet: Re: Small File to HDFS You may consider storing it in one big HDFS file, and to keep appending new messages to it. For instance, one message -> zip it -> append it to the HDFS as one line On Wed, Sep 2, 2015 at 12:43 PM, < nib...@free.fr > wrote: Hi, I already store them in MongoDB in parralel for operational access and don't want to add an other database in the loop Is it the only solution ? Tks Nicolas ----- Mail original ----- De: "Ted Yu" < yuzhih...@gmail.com > À: nib...@free.fr Cc: "user" < user@spark.apache.org > Envoyé: Mercredi 2 Septembre 2015 18:34:17 Objet: Re: Small File to HDFS Instead of storing those messages in HDFS, have you considered storing them in key-value store (e.g. hbase) ? Cheers On Wed, Sep 2, 2015 at 9:07 AM, < nib...@free.fr > wrote: Hello, I'am currently using Spark Streaming to collect small messages (events) , size being <50 KB , volume is high (several millions per day) and I have to store those messages in HDFS. I understood that storing small files can be problematic in HDFS , how can I manage it ? Tks Nicolas --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- ------------------------------------------------ Thanks! Tao --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org