e the content of a file by a new content
>>> (remove/replace)
>>>
>>>
>>> Tks a lot
>>> Nicolas
>>>
>>> - Mail original -----
>>> De: "Jörn Franke" <jornfra...@gmail.com>
>>> À: nib...@free.fr
>>&g
ra...@gmail.com>
>> À: nib...@free.fr
>> Cc: user@spark.apache.org
>> Envoyé: Jeudi 3 Septembre 2015 19:29:42
>> Objet: Re: Small File to HDFS
>>
>>
>>
>> Har is transparent and hardly any performance overhead. You may decide
>> not to compres
move/replace) a file inside the HAR ?
>>>>> Basically the name of my small files will be the keys of my records ,
>>>>> and sometimes I will need to replace the content of a file by a new
>>>>> content
>>>>> (remove/replace)
>>>>
Maybe you can tell us more about your use case, I have somehow the feeling
that we are missing sth here
Le jeu. 3 sept. 2015 à 15:54, Jörn Franke a écrit :
>
> Store them as hadoop archive (har)
>
> Le mer. 2 sept. 2015 à 18:07, a écrit :
>
>> Hello,
>>
Were you able to find a solution to your problem?
My main question in case of HAR usage is , is it possible to use Pig on it and
what about performances ?
- Mail original -
De: "Jörn Franke" <jornfra...@gmail.com>
À: nib...@free.fr, user@spark.apache.org
Envoyé: Jeudi 3 Septembre 2015 15:54:42
Objet: Re: Small File t
Store them as hadoop archive (har)
Le mer. 2 sept. 2015 à 18:07, a écrit :
> Hello,
> I'am currently using Spark Streaming to collect small messages (events) ,
> size being <50 KB , volume is high (several millions per day) and I have to
> store those messages in HDFS.
> I
HAR usage is , is it possible to use Pig on it
> and what about performances ?
>
> - Mail original -
> De: "Jörn Franke" <jornfra...@gmail.com>
> À: nib...@free.fr, user@spark.apache.org
> Envoyé: Jeudi 3 Septembre 2015 15:54:42
> Objet: Re: Small File to HD
@free.fr, user@spark.apache.org
> Envoyé: Jeudi 3 Septembre 2015 15:54:42
> Objet: Re: Small File to HDFS
>
>
>
>
> Store them as hadoop archive (har)
>
>
> Le mer. 2 sept. 2015 à 18:07, < nib...@free.fr > a écrit :
>
>
> Hello,
> I'am currently using S
by a new content
(remove/replace)
Tks a lot
Nicolas
- Mail original -
De: "Jörn Franke" <jornfra...@gmail.com>
À: nib...@free.fr
Cc: user@spark.apache.org
Envoyé: Jeudi 3 Septembre 2015 19:29:42
Objet: Re: Small File to HDFS
Har is transparent and hardly any performance o
o use Pig on it
> and what about performances ?
>
> - Mail original -
> De: "Jörn Franke" <jornfra...@gmail.com>
> À: nib...@free.fr, user@spark.apache.org
> Envoyé: Jeudi 3 Septembre 2015 15:54:42
> Objet: Re: Small File to HDFS
>
>
>
>
> Store them as h
De: "Jörn Franke" <jornfra...@gmail.com>
> À: nib...@free.fr
> Cc: user@spark.apache.org
> Envoyé: Jeudi 3 Septembre 2015 19:29:42
> Objet: Re: Small File to HDFS
>
>
>
> Har is transparent and hardly any performance overhead. You may decide not
> to compre
l.com>
À: nib...@free.fr
Cc: "Ted Yu" <yuzhih...@gmail.com>, "user" <user@spark.apache.org>
Envoyé: Mercredi 2 Septembre 2015 19:09:23
Objet: Re: Small File to HDFS
You may consider storing it in one big HDFS file, and to keep appending new
messages to it.
lution ?
>
> Tks
> Nicolas
>
> - Mail original -
> De: "Ted Yu" < yuzhih...@gmail.com >
> À: nib...@free.fr
> Cc: "user" < user@spark.apache.org >
> Envoyé: Mercredi 2 Septembre 2015 18:34:17
> Objet: Re: Small File
possible to easily process Pig on it
>> directly ?
>>
>> Tks
>> Nicolas
>>
>> - Mail original -
>> De: "Tao Lu" <taolu2...@gmail.com>
>> À: nib...@free.fr
>> Cc: "Ted Yu" <yuzhih...@gmail.com>, "
el for operational access and
> don't want to add an other database in the loop
> Is it the only solution ?
>
> Tks
> Nicolas
>
> - Mail original -
> De: "Ted Yu" <yuzhih...@gmail.com>
> À: nib...@free.fr
> Cc: "user" <user@spark.apa
Hello,
I'am currently using Spark Streaming to collect small messages (events) , size
being <50 KB , volume is high (several millions per day) and I have to store
those messages in HDFS.
I understood that storing small files can be problematic in HDFS , how can I
manage it ?
Tks
Nicolas
pache.org>
Envoyé: Mercredi 2 Septembre 2015 18:34:17
Objet: Re: Small File to HDFS
Instead of storing those messages in HDFS, have you considered storing them in
key-value store (e.g. hbase) ?
Cheers
On Wed, Sep 2, 2015 at 9:07 AM, < nib...@free.fr > wrote:
Hello,
I'am curr
Instead of storing those messages in HDFS, have you considered storing them
in key-value store (e.g. hbase) ?
Cheers
On Wed, Sep 2, 2015 at 9:07 AM, wrote:
> Hello,
> I'am currently using Spark Streaming to collect small messages (events) ,
> size being <50 KB , volume is high
Hi People,
I'm using java kafka spark streaming and saving the result file into hdfs.
As per my understanding, spark streaming write every processed message or
event to hdfs file. Reason to creating one file per message or event could
be to ensure fault tolerance. Is there any way spark handle
20 matches
Mail list logo