Hi Dileepa, I've just posted the comments below on your GSOC proposal. I know that you can't make further changes to the proposal, so I'm posting them here on the dev list, so we can keep the conversation going.
So.. * good to see you intend to set up a project on github for this; please do this asap. That way you can start to capture docs/working notes. I also suggest that you set up github pages for your site [1]. * What I'd like to see right now is some sort of UML diagram; you could sketch one using yuml.me [2] and add it to your github site. I can't quite work out how the persistent domain entities relate to each other. In particular, are EmailSenderProfile and Reputation in 1-1 correspondence? * In your timeline I noticed you said "Commit all code to github", only on Aug 11. It's much better practice (and will help mentors guide you) if you commit changes as you go. That way it's also safely backed up, and you can go back in time if you mess up. * You might also want to version control the academic paper, too, if your university lets you. Some further points relating to the design: * You have Email as a persistent entity. I'm a bit worried what that might mean about storage and also synchronization. Is it necessary to have the Email persisted in Isis? If not persisted, then should the Email entity be a view model, or as a fake persistent entity utilizing a new StoreManager impl in JDO. See the recent thread [3] on this topic. * Conversely, does Mahout require some sort of persistent dataset of emails in order to do the reputation scoring? Or does it just hold aggregated information? If the former, I worry that we now have each email stored in potentially 3 places: gmail, Isis and Mahout. Keeping these in sync would be a nightmare. * It occurs to me that you're going to need some entities to keep track of the high water mark of the most recently analyzed email, so that when you poll for new emails you know which to ask for. This high water mark is per user of RB. So I think you'll either need an entity to represent your RB User, or you could use the UserSettings service [4][5] * In the proposal there's the term "reputation index" is associated with the email sender. Is that the same as "Reputation". * The initial download of emails for analysis probably needs to be done using a multiple batches (of say 100 at a time), in case there's a glitch/network issue. * I was interested to note that you see the Isis webapp as being an email client itself. I suggest you keep it as read-only, though... otherwise you'll end up reinventing all of gmail (not advisable, think). * One of the first tasks you've set yourself (til 21 Apr) is to "try out Apache wicket samples [10] to learn how to develop the presentation layer of the application". In fact, with Isis you don't need to do any presentation layer coding; start building out your prototype and you'll see what I mean. * I'm still unsure about oAuth integration. The EmailService is going to require credentials to access gmail, and that's "within" the Isis domain model. But Shiro/buji-pac4j sits in front of Isis. If Shiro has done the oAuth sign-in, then I guess it'll be necessary to surface those credentials somehow to the EmailService (perhaps using Shiro's org.apache.shiro.SecurityUtils#getSubject() method. Perhaps the best thing is to get buji-pac4j done, then see what information is surfaced that way. HTH Dan [1] http://pages.github.com/ [2] http://yuml.me/ [3] http://isis.markmail.org/thread/lsg3uywlfjviztzi [4] http://isis.apache.org/reference/services/settings-services.html [5] http://isis.apache.org/components/objectstores/jdo/services/settings-services-jdo.html