According to the GSoC calendar, accepted organizations aren't posted until April 8 (Monday), at which point (assuming Apache is accepted...I can't imagine it wouldn't be) slots will be doled out internally. This will probably take at least a day or two, so probably by middle of next week we'll know how many slots Mahout has.

Speaking of which: how do the various subprojects negotiate for slots? Is there a central spreadsheet, or an IRC meeting to attend? Or did I miss the email detailing this?

On 4/4/13 2:43 PM, Dan Filimon wrote:
Any news on this front? Did we get approved/assigned a slot/anything?


On Fri, Mar 29, 2013 at 7:44 PM, Dan Filimon <dangeorge.fili...@gmail.com>wrote:

Ok, updated!


On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg <andy.tw...@gmail.com> wrote:

Dan,

I think what you've written is fine (I wanted to edit to remove the
'?' around random forests but couldn't).

ok?



On 29 March 2013 11:14, Dan Filimon <dangeorge.fili...@gmail.com> wrote:
I added Andy's first suggestion and Ted's suggestion as ideas.

Andy, could you flesh out your second suggestion into a project and
make an
issue please?


On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <ted.dunn...@gmail.com>
wrote:
It should be possible to view a Lucene index as a matrix.  This would
require that we standardize on a way to convert documents to rows.
  There
are many choices, the discussion of which should be deferred to the
actual
work on the project, but there are a few obvious constraints:

a) it should be possible to get the same result as dumping the term
vectors
for each document each to a line and converting that result using
standard
Mahout methods.

b) numeric fields ought to work somehow.

c) if there are multiple text fields that ought to work sensibly as
well.
  Two options include dumping multiple matrices or to convert the fields
into a single row of a single matrix.

d) it should be possible to refer back from a row of the matrix to
find the
correct document.  THis might be because we remember the Lucene doc
number
or because a field is named as holding a unique id.

e) named vectors and matrices should be used if plausible.

On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon <
dangeorge.fili...@gmail.com
wrote:
...
Ted, could you explain a bit more what you mean by "simplify the
connection
to Lucene for clustering and classification"? It's too vague for an
idea
proposal.



--
Dr Andy Twigg
Junior Research Fellow, St Johns College, Oxford
Room 351, Department of Computer Science
http://www.cs.ox.ac.uk/people/andy.twigg/
andy.tw...@cs.ox.ac.uk | +447799647538



Reply via email to