Re: Small File to HDFS

2015-09-04 Thread Tao Lu
e the content of a file by a new content >>> (remove/replace) >>> >>> >>> Tks a lot >>> Nicolas >>> >>> - Mail original ----- >>> De: "Jörn Franke" <jornfra...@gmail.com> >>> À: nib...@free.fr >>&g

Re: Small File to HDFS

2015-09-04 Thread Ted Yu
ra...@gmail.com> >> À: nib...@free.fr >> Cc: user@spark.apache.org >> Envoyé: Jeudi 3 Septembre 2015 19:29:42 >> Objet: Re: Small File to HDFS >> >> >> >> Har is transparent and hardly any performance overhead. You may decide >> not to compres

Re: Small File to HDFS

2015-09-04 Thread Ted Yu
move/replace) a file inside the HAR ? >>>>> Basically the name of my small files will be the keys of my records , >>>>> and sometimes I will need to replace the content of a file by a new >>>>> content >>>>> (remove/replace) >>>>

Re: Small File to HDFS

2015-09-04 Thread Jörn Franke
Maybe you can tell us more about your use case, I have somehow the feeling that we are missing sth here Le jeu. 3 sept. 2015 à 15:54, Jörn Franke a écrit : > > Store them as hadoop archive (har) > > Le mer. 2 sept. 2015 à 18:07, a écrit : > >> Hello, >>

Spark Streaming - Small file in HDFS

2015-09-04 Thread Pravesh Jain
Were you able to find a solution to your problem?

Re: Small File to HDFS

2015-09-03 Thread nibiau
My main question in case of HAR usage is , is it possible to use Pig on it and what about performances ? - Mail original - De: "Jörn Franke" <jornfra...@gmail.com> À: nib...@free.fr, user@spark.apache.org Envoyé: Jeudi 3 Septembre 2015 15:54:42 Objet: Re: Small File t

Re: Small File to HDFS

2015-09-03 Thread Jörn Franke
Store them as hadoop archive (har) Le mer. 2 sept. 2015 à 18:07, a écrit : > Hello, > I'am currently using Spark Streaming to collect small messages (events) , > size being <50 KB , volume is high (several millions per day) and I have to > store those messages in HDFS. > I

Re: Small File to HDFS

2015-09-03 Thread Tao Lu
HAR usage is , is it possible to use Pig on it > and what about performances ? > > - Mail original - > De: "Jörn Franke" <jornfra...@gmail.com> > À: nib...@free.fr, user@spark.apache.org > Envoyé: Jeudi 3 Septembre 2015 15:54:42 > Objet: Re: Small File to HD

Re: Small File to HDFS

2015-09-03 Thread Martin Menzel
@free.fr, user@spark.apache.org > Envoyé: Jeudi 3 Septembre 2015 15:54:42 > Objet: Re: Small File to HDFS > > > > > Store them as hadoop archive (har) > > > Le mer. 2 sept. 2015 à 18:07, < nib...@free.fr > a écrit : > > > Hello, > I'am currently using S

Re: Small File to HDFS

2015-09-03 Thread nibiau
by a new content (remove/replace) Tks a lot Nicolas - Mail original - De: "Jörn Franke" <jornfra...@gmail.com> À: nib...@free.fr Cc: user@spark.apache.org Envoyé: Jeudi 3 Septembre 2015 19:29:42 Objet: Re: Small File to HDFS Har is transparent and hardly any performance o

Re: Small File to HDFS

2015-09-03 Thread Jörn Franke
o use Pig on it > and what about performances ? > > - Mail original - > De: "Jörn Franke" <jornfra...@gmail.com> > À: nib...@free.fr, user@spark.apache.org > Envoyé: Jeudi 3 Septembre 2015 15:54:42 > Objet: Re: Small File to HDFS > > > > > Store them as h

Re: Small File to HDFS

2015-09-03 Thread Jörn Franke
De: "Jörn Franke" <jornfra...@gmail.com> > À: nib...@free.fr > Cc: user@spark.apache.org > Envoyé: Jeudi 3 Septembre 2015 19:29:42 > Objet: Re: Small File to HDFS > > > > Har is transparent and hardly any performance overhead. You may decide not > to compre

Re: Small File to HDFS

2015-09-03 Thread nibiau
l.com> À: nib...@free.fr Cc: "Ted Yu" <yuzhih...@gmail.com>, "user" <user@spark.apache.org> Envoyé: Mercredi 2 Septembre 2015 19:09:23 Objet: Re: Small File to HDFS You may consider storing it in one big HDFS file, and to keep appending new messages to it.

Re: Small File to HDFS

2015-09-03 Thread Ndjido Ardo Bar
lution ? > > Tks > Nicolas > > - Mail original - > De: "Ted Yu" < yuzhih...@gmail.com > > À: nib...@free.fr > Cc: "user" < user@spark.apache.org > > Envoyé: Mercredi 2 Septembre 2015 18:34:17 > Objet: Re: Small File

Re: Small File to HDFS

2015-09-03 Thread Ted Yu
possible to easily process Pig on it >> directly ? >> >> Tks >> Nicolas >> >> - Mail original - >> De: "Tao Lu" <taolu2...@gmail.com> >> À: nib...@free.fr >> Cc: "Ted Yu" <yuzhih...@gmail.com>, "

Re: Small File to HDFS

2015-09-02 Thread Tao Lu
el for operational access and > don't want to add an other database in the loop > Is it the only solution ? > > Tks > Nicolas > > - Mail original - > De: "Ted Yu" <yuzhih...@gmail.com> > À: nib...@free.fr > Cc: "user" <user@spark.apa

Small File to HDFS

2015-09-02 Thread nibiau
Hello, I'am currently using Spark Streaming to collect small messages (events) , size being <50 KB , volume is high (several millions per day) and I have to store those messages in HDFS. I understood that storing small files can be problematic in HDFS , how can I manage it ? Tks Nicolas

Re: Small File to HDFS

2015-09-02 Thread nibiau
pache.org> Envoyé: Mercredi 2 Septembre 2015 18:34:17 Objet: Re: Small File to HDFS Instead of storing those messages in HDFS, have you considered storing them in key-value store (e.g. hbase) ? Cheers On Wed, Sep 2, 2015 at 9:07 AM, < nib...@free.fr > wrote: Hello, I'am curr

Re: Small File to HDFS

2015-09-02 Thread Ted Yu
Instead of storing those messages in HDFS, have you considered storing them in key-value store (e.g. hbase) ? Cheers On Wed, Sep 2, 2015 at 9:07 AM, wrote: > Hello, > I'am currently using Spark Streaming to collect small messages (events) , > size being <50 KB , volume is high

Spark Streaming - Small file in HDFS

2014-08-26 Thread Ravi Sharma
Hi People, I'm using java kafka spark streaming and saving the result file into hdfs. As per my understanding, spark streaming write every processed message or event to hdfs file. Reason to creating one file per message or event could be to ensure fault tolerance. Is there any way spark handle