Hello everybody, I'm a GSoC student for this year and I will be working on James [1]. My project is to implement email storage over HDFS. I am quite new to Hadoop and associates and I am looking for some hints as to get started on the right track.
I have installed a single node Hadoop instance on my machine and played around with it (ran some examples) but I am interested into what you (more experienced people) think it's the best way to approach my problem. I am a little puzzled about the fact that that I read hadoop is best used for large files and email aren't that large from what I know. Another thing that crossed my mind is that since HDFS is a file system, wouldn't it be possible to set it as a back-end for the (existing) maildir and mailbox storage formats? (I think this question is more suited on the James mailing list, but if you have some ideas please speak your mind). Also, any development resources to get me started are welcomed. [1] http://james.apache.org/mailbox/ [2] https://issues.apache.org/jira/browse/MAILBOX-44 Regards, -- Ioan Eugen Stan