HAR archive seems a good idea , but just a last question to be sure to do the 
best choice :
- Is it possible to override (remove/replace) a file inside the HAR ?
Basically the name of my small files will be the keys of my records , and 
sometimes I will need to replace the content of a file by a new content 
(remove/replace)


Tks a lot
Nicolas

----- Mail original -----
De: "Jörn Franke" <jornfra...@gmail.com>
À: nib...@free.fr
Cc: user@spark.apache.org
Envoyé: Jeudi 3 Septembre 2015 19:29:42
Objet: Re: Small File to HDFS



Har is transparent and hardly any performance overhead. You may decide not to 
compress or use a fast compression algorithm, such as snappy (recommended) 



Le jeu. 3 sept. 2015 à 16:17, < nib...@free.fr > a écrit : 


My main question in case of HAR usage is , is it possible to use Pig on it and 
what about performances ? 

----- Mail original ----- 
De: "Jörn Franke" < jornfra...@gmail.com > 
À: nib...@free.fr , user@spark.apache.org 
Envoyé: Jeudi 3 Septembre 2015 15:54:42 
Objet: Re: Small File to HDFS 




Store them as hadoop archive (har) 


Le mer. 2 sept. 2015 à 18:07, < nib...@free.fr > a écrit : 


Hello, 
I'am currently using Spark Streaming to collect small messages (events) , size 
being <50 KB , volume is high (several millions per day) and I have to store 
those messages in HDFS. 
I understood that storing small files can be problematic in HDFS , how can I 
manage it ? 

Tks 
Nicolas 

--------------------------------------------------------------------- 
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
For additional commands, e-mail: user-h...@spark.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to