Re: GSOC proposals and mentors [was Call to action – Mahout needs your help]

Shannon Quinn Thu, 04 Apr 2013 11:49:32 -0700

According to the GSoC calendar, accepted organizations aren't posteduntil April 8 (Monday), at which point (assuming Apache is accepted...Ican't imagine it wouldn't be) slots will be doled out internally. Thiswill probably take at least a day or two, so probably by middle of nextweek we'll know how many slots Mahout has.

Speaking of which: how do the various subprojects negotiate for slots?Is there a central spreadsheet, or an IRC meeting to attend? Or did Imiss the email detailing this?


On 4/4/13 2:43 PM, Dan Filimon wrote:

Any news on this front? Did we get approved/assigned a slot/anything?


On Fri, Mar 29, 2013 at 7:44 PM, Dan Filimon <[email protected]>wrote:

Ok, updated!


On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg <[email protected]> wrote:

Dan,

I think what you've written is fine (I wanted to edit to remove the
'?' around random forests but couldn't).

ok?



On 29 March 2013 11:14, Dan Filimon <[email protected]> wrote:

I added Andy's first suggestion and Ted's suggestion as ideas.

Andy, could you flesh out your second suggestion into a project and

make an

issue please?


On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <[email protected]>

wrote:

It should be possible to view a Lucene index as a matrix.  This would
require that we standardize on a way to convert documents to rows.

  There

are many choices, the discussion of which should be deferred to the

actual

work on the project, but there are a few obvious constraints:

a) it should be possible to get the same result as dumping the term

vectors

for each document each to a line and converting that result using

standard

Mahout methods.

b) numeric fields ought to work somehow.

c) if there are multiple text fields that ought to work sensibly as

well.

  Two options include dumping multiple matrices or to convert the fields
into a single row of a single matrix.

d) it should be possible to refer back from a row of the matrix to

find the

correct document.  THis might be because we remember the Lucene doc

number

or because a field is named as holding a unique id.

e) named vectors and matrices should be used if plausible.

On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon <

[email protected]

wrote:
...
Ted, could you explain a bit more what you mean by "simplify the

connection

to Lucene for clustering and classification"? It's too vague for an

idea

proposal.



--
Dr Andy Twigg
Junior Research Fellow, St Johns College, Oxford
Room 351, Department of Computer Science
http://www.cs.ox.ac.uk/people/andy.twigg/
[email protected] | +447799647538

Re: GSOC proposals and mentors [was Call to action – Mahout needs your help]

Reply via email to