Hi Dileepa,

I've just posted the comments below on your GSOC proposal.  I know that you
can't make further changes to the proposal, so I'm posting them here on the
dev list, so we can keep the conversation going.

So..

* good to see you intend to set up a project on github for this; please do
this asap.  That way you can start to capture docs/working notes.  I also
suggest that you set up github pages for your site [1].

* What I'd like to see right now is some sort of UML diagram; you could
sketch one using yuml.me [2] and add it to your github site.  I can't quite
work out how the persistent domain entities relate to each other.  In
particular, are EmailSenderProfile and Reputation in 1-1 correspondence?

* In your timeline I noticed you said "Commit all code to github", only on
Aug 11.  It's much better practice (and will help mentors guide you) if you
commit changes as you go.  That way it's also safely backed up, and you can
go back in time if you mess up.

* You might also want to version control the academic paper, too, if your
university lets you.


Some further points relating to the design:

* You have Email as a persistent entity.  I'm a bit worried what that might
mean about storage and also synchronization.  Is it necessary to have the
Email persisted in Isis?  If not persisted, then should the Email entity be
a view model, or as a fake persistent entity utilizing a new StoreManager
impl in JDO.  See the recent thread [3] on this topic.

* Conversely, does Mahout require some sort of persistent dataset of emails
in order to do the reputation scoring?  Or does it just hold aggregated
information?  If the former, I worry that we now have each email stored in
potentially 3 places: gmail, Isis and Mahout.  Keeping these in sync would
be a nightmare.

* It occurs to me that you're going to need some entities to keep track of
the high water mark of the most recently analyzed email, so that when you
poll for new emails you know which to ask for.  This high water mark is per
user of RB.  So I think you'll either need an entity to represent your RB
User, or you could use the UserSettings service [4][5]

* In the proposal there's the term "reputation index" is associated with
the email sender.  Is that the same as "Reputation".

* The initial download of emails for analysis probably needs to be done
using a multiple batches (of say 100 at a time), in case there's a
glitch/network issue.

* I was interested to note that you see the Isis webapp as being an email
client itself.  I suggest you keep it as read-only, though... otherwise
you'll end up reinventing all of gmail (not advisable, think).

* One of the first tasks you've set yourself (til 21 Apr) is to "try out
Apache wicket samples [10] to learn how to develop the presentation layer
of the application".  In fact, with Isis you don't need to do any
presentation layer coding; start building out your prototype and you'll see
what I mean.

* I'm still unsure about oAuth integration.  The EmailService is going to
require credentials to access gmail, and that's "within" the Isis domain
model.  But Shiro/buji-pac4j sits in front of Isis.  If Shiro has done the
oAuth sign-in, then I guess it'll be necessary to surface those credentials
somehow to the EmailService (perhaps using Shiro's
org.apache.shiro.SecurityUtils#getSubject() method.  Perhaps the best thing
is to get buji-pac4j done, then see what information is surfaced that way.


HTH
Dan

[1] http://pages.github.com/
[2] http://yuml.me/
[3] http://isis.markmail.org/thread/lsg3uywlfjviztzi
[4] http://isis.apache.org/reference/services/settings-services.html
[5]
http://isis.apache.org/components/objectstores/jdo/services/settings-services-jdo.html

Reply via email to