Hi Ioan, I would encourage you to look at a system like HBase for your mail backend. HDFS doesn't work well with lots of little files, and also doesn't support random update, so existing formats like Maildir wouldn't be a good fit.
-Todd On Wed, May 18, 2011 at 4:02 PM, Ioan Eugen Stan <stan.ieu...@gmail.com> wrote: > Hello everybody, > > I'm a GSoC student for this year and I will be working on James [1]. > My project is to implement email storage over HDFS. I am quite new to > Hadoop and associates and I am looking for some hints as to get > started on the right track. > > I have installed a single node Hadoop instance on my machine and > played around with it (ran some examples) but I am interested into > what you (more experienced people) think it's the best way to approach > my problem. > > I am a little puzzled about the fact that that I read hadoop is best > used for large files and email aren't that large from what I know. > Another thing that crossed my mind is that since HDFS is a file > system, wouldn't it be possible to set it as a back-end for the > (existing) maildir and mailbox storage formats? (I think this question > is more suited on the James mailing list, but if you have some ideas > please speak your mind). > > Also, any development resources to get me started are welcomed. > > > [1] http://james.apache.org/mailbox/ > [2] https://issues.apache.org/jira/browse/MAILBOX-44 > > Regards, > -- > Ioan Eugen Stan > -- Todd Lipcon Software Engineer, Cloudera