Hello everybody,

I'm a GSoC student for this year and I will be working on James [1].
My project is to implement email storage over HDFS. I am quite new to
Hadoop and associates and I am looking for some hints as to get
started on the right track.

I have installed a single node Hadoop instance on my machine and
played around with it (ran some examples) but I am interested into
what you (more experienced people) think it's the best way to approach
my problem.

I am a little puzzled about the fact that that I read hadoop is best
used for large files and email aren't that large from what I know.
Another thing that crossed my mind is that since HDFS is a file
system, wouldn't it be possible to set it as a back-end for the
(existing) maildir and mailbox storage formats? (I think this question
is more suited on the James mailing list, but if you have some ideas
please speak your mind).

Also, any development resources to get me started are welcomed.


[1] http://james.apache.org/mailbox/
[2] https://issues.apache.org/jira/browse/MAILBOX-44

Regards,
-- 
Ioan Eugen Stan

Reply via email to