Dear all,

I need to run a series of transformations that map a RDD into another RDD.
The computation changes over times and so does the resulting RDD. Each
results is then saved to the disk in order to do further analysis (for
example variation of the result over time).

The question is, if I save the RDDs in the same file, is it appended to the
existing file or not ? And If I write into different files each time I want
to save the result I may end with many little files and I read everywhere
that hadoop doesn't like many little files. Does spark ok with that ?

Cheers,

Jaonary

Reply via email to