Hi Everybody,

I'm doing a project where I have to read a large set of compress files
(gz). I'm using python and streaming to achieve my goals. However, I
have a problem, there are corrupt compress files that are killing my
map/reduce jobs.
My environment is the following:
Hadoop-0.18.3 (CDH1) 
 

Do you guys have some recommendations how to manage this case?
How I can catch that exception using python so that my jobs don't fail?
How I can identify these files using python and move them to a corrupt
file folder?

I really appreciate any recommendation

Xavier

Reply via email to