Re: Small File to HDFS

Ted Yu Thu, 03 Sep 2015 05:11:53 -0700

Agree with Ado. 
API provided by hbase is versatile. There is checkAndPut as well.


Cheers


> On Sep 3, 2015, at 5:00 AM, Ndjido Ardo Bar <ndj...@gmail.com> wrote:
> 
> Hi Nibiau,
> 
> Hbase seems to be a good solution to your problems. As you may know storing 
> yours messages as a key-value pairs in Hbase saves you the overhead of 
> manually resizing blocks of data using zip files. 
> The added advantage along with the fact that Hbase uses HDFS for storage, is 
> the capability of updating your records for example with the "put" function. 
> 
> Cheers,
> Ardo
> 
>> On 03 Sep 2015, at 13:35, nib...@free.fr wrote:
>> 
>> Ok but so some questions :
>> - Sometimes I have to remove some messages from HDFS (cancel/replace cases) 
>> , is it possible ?
>> - In the case of a big zip file, is it possible to easily process Pig on it 
>> directly ?
>> 
>> Tks
>> Nicolas
>> 
>> ----- Mail original -----
>> De: "Tao Lu" <taolu2...@gmail.com>
>> À: nib...@free.fr
>> Cc: "Ted Yu" <yuzhih...@gmail.com>, "user" <user@spark.apache.org>
>> Envoyé: Mercredi 2 Septembre 2015 19:09:23
>> Objet: Re: Small File to HDFS
>> 
>> 
>> You may consider storing it in one big HDFS file, and to keep appending new 
>> messages to it. 
>> 
>> 
>> For instance, 
>> one message -> zip it -> append it to the HDFS as one line 
>> 
>> 
>> On Wed, Sep 2, 2015 at 12:43 PM, < nib...@free.fr > wrote: 
>> 
>> 
>> Hi, 
>> I already store them in MongoDB in parralel for operational access and don't 
>> want to add an other database in the loop 
>> Is it the only solution ? 
>> 
>> Tks 
>> Nicolas 
>> 
>> ----- Mail original ----- 
>> De: "Ted Yu" < yuzhih...@gmail.com > 
>> À: nib...@free.fr 
>> Cc: "user" < user@spark.apache.org > 
>> Envoyé: Mercredi 2 Septembre 2015 18:34:17 
>> Objet: Re: Small File to HDFS 
>> 
>> 
>> 
>> 
>> Instead of storing those messages in HDFS, have you considered storing them 
>> in key-value store (e.g. hbase) ? 
>> 
>> 
>> Cheers 
>> 
>> 
>> On Wed, Sep 2, 2015 at 9:07 AM, < nib...@free.fr > wrote: 
>> 
>> 
>> Hello, 
>> I'am currently using Spark Streaming to collect small messages (events) , 
>> size being <50 KB , volume is high (several millions per day) and I have to 
>> store those messages in HDFS. 
>> I understood that storing small files can be problematic in HDFS , how can I 
>> manage it ? 
>> 
>> Tks 
>> Nicolas 
>> 
>> --------------------------------------------------------------------- 
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>> For additional commands, e-mail: user-h...@spark.apache.org 
>> 
>> 
>> 
>> --------------------------------------------------------------------- 
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>> For additional commands, e-mail: user-h...@spark.apache.org 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> 
>> 
>> ------------------------------------------------ Thanks! 
>> Tao
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Small File to HDFS

Reply via email to