Agree with Ado. API provided by hbase is versatile. There is checkAndPut as well.
Cheers > On Sep 3, 2015, at 5:00 AM, Ndjido Ardo Bar <ndj...@gmail.com> wrote: > > Hi Nibiau, > > Hbase seems to be a good solution to your problems. As you may know storing > yours messages as a key-value pairs in Hbase saves you the overhead of > manually resizing blocks of data using zip files. > The added advantage along with the fact that Hbase uses HDFS for storage, is > the capability of updating your records for example with the "put" function. > > Cheers, > Ardo > >> On 03 Sep 2015, at 13:35, nib...@free.fr wrote: >> >> Ok but so some questions : >> - Sometimes I have to remove some messages from HDFS (cancel/replace cases) >> , is it possible ? >> - In the case of a big zip file, is it possible to easily process Pig on it >> directly ? >> >> Tks >> Nicolas >> >> ----- Mail original ----- >> De: "Tao Lu" <taolu2...@gmail.com> >> À: nib...@free.fr >> Cc: "Ted Yu" <yuzhih...@gmail.com>, "user" <user@spark.apache.org> >> Envoyé: Mercredi 2 Septembre 2015 19:09:23 >> Objet: Re: Small File to HDFS >> >> >> You may consider storing it in one big HDFS file, and to keep appending new >> messages to it. >> >> >> For instance, >> one message -> zip it -> append it to the HDFS as one line >> >> >> On Wed, Sep 2, 2015 at 12:43 PM, < nib...@free.fr > wrote: >> >> >> Hi, >> I already store them in MongoDB in parralel for operational access and don't >> want to add an other database in the loop >> Is it the only solution ? >> >> Tks >> Nicolas >> >> ----- Mail original ----- >> De: "Ted Yu" < yuzhih...@gmail.com > >> À: nib...@free.fr >> Cc: "user" < user@spark.apache.org > >> Envoyé: Mercredi 2 Septembre 2015 18:34:17 >> Objet: Re: Small File to HDFS >> >> >> >> >> Instead of storing those messages in HDFS, have you considered storing them >> in key-value store (e.g. hbase) ? >> >> >> Cheers >> >> >> On Wed, Sep 2, 2015 at 9:07 AM, < nib...@free.fr > wrote: >> >> >> Hello, >> I'am currently using Spark Streaming to collect small messages (events) , >> size being <50 KB , volume is high (several millions per day) and I have to >> store those messages in HDFS. >> I understood that storing small files can be problematic in HDFS , how can I >> manage it ? >> >> Tks >> Nicolas >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> >> >> >> -- >> >> >> ------------------------------------------------ Thanks! >> Tao >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org