Re: GSOC proposals and mentors [was Call to action – Mahout needs your help]

Dan Filimon Tue, 09 Apr 2013 07:13:16 -0700

I can confirm Apache got in! :)
The slot assignment is not yet clear however.


And, because mailing people to death is what I do, volunteers for mentoring?


On Thu, Apr 4, 2013 at 9:49 PM, Shannon Quinn <[email protected]> wrote:

> According to the GSoC calendar, accepted organizations aren't posted until
> April 8 (Monday), at which point (assuming Apache is accepted...I can't
> imagine it wouldn't be) slots will be doled out internally. This will
> probably take at least a day or two, so probably by middle of next week
> we'll know how many slots Mahout has.
>
> Speaking of which: how do the various subprojects negotiate for slots? Is
> there a central spreadsheet, or an IRC meeting to attend? Or did I miss the
> email detailing this?
>
>
> On 4/4/13 2:43 PM, Dan Filimon wrote:
>
>> Any news on this front? Did we get approved/assigned a slot/anything?
>>
>>
>> On Fri, Mar 29, 2013 at 7:44 PM, Dan Filimon <[email protected]
>> >**wrote:
>>
>>  Ok, updated!
>>>
>>>
>>> On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg <[email protected]>
>>> wrote:
>>>
>>>  Dan,
>>>>
>>>> I think what you've written is fine (I wanted to edit to remove the
>>>> '?' around random forests but couldn't).
>>>>
>>>> ok?
>>>>
>>>>
>>>>
>>>> On 29 March 2013 11:14, Dan Filimon <[email protected]>
>>>> wrote:
>>>>
>>>>> I added Andy's first suggestion and Ted's suggestion as ideas.
>>>>>
>>>>> Andy, could you flesh out your second suggestion into a project and
>>>>>
>>>> make an
>>>>
>>>>> issue please?
>>>>>
>>>>>
>>>>> On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <[email protected]>
>>>>>
>>>> wrote:
>>>>
>>>>> It should be possible to view a Lucene index as a matrix.  This would
>>>>>> require that we standardize on a way to convert documents to rows.
>>>>>>
>>>>>   There
>>>>
>>>>> are many choices, the discussion of which should be deferred to the
>>>>>>
>>>>> actual
>>>>
>>>>> work on the project, but there are a few obvious constraints:
>>>>>>
>>>>>> a) it should be possible to get the same result as dumping the term
>>>>>>
>>>>> vectors
>>>>
>>>>> for each document each to a line and converting that result using
>>>>>>
>>>>> standard
>>>>
>>>>> Mahout methods.
>>>>>>
>>>>>> b) numeric fields ought to work somehow.
>>>>>>
>>>>>> c) if there are multiple text fields that ought to work sensibly as
>>>>>>
>>>>> well.
>>>>
>>>>>   Two options include dumping multiple matrices or to convert the
>>>>>> fields
>>>>>> into a single row of a single matrix.
>>>>>>
>>>>>> d) it should be possible to refer back from a row of the matrix to
>>>>>>
>>>>> find the
>>>>
>>>>> correct document.  THis might be because we remember the Lucene doc
>>>>>>
>>>>> number
>>>>
>>>>> or because a field is named as holding a unique id.
>>>>>>
>>>>>> e) named vectors and matrices should be used if plausible.
>>>>>>
>>>>>> On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon <
>>>>>>
>>>>> [email protected]
>>>>
>>>>> wrote:
>>>>>>> ...
>>>>>>> Ted, could you explain a bit more what you mean by "simplify the
>>>>>>>
>>>>>> connection
>>>>>>
>>>>>>> to Lucene for clustering and classification"? It's too vague for an
>>>>>>>
>>>>>> idea
>>>>
>>>>> proposal.
>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>> Dr Andy Twigg
>>>> Junior Research Fellow, St Johns College, Oxford
>>>> Room 351, Department of Computer Science
>>>> http://www.cs.ox.ac.uk/people/**andy.twigg/<http://www.cs.ox.ac.uk/people/andy.twigg/>
>>>> [email protected] | +447799647538
>>>>
>>>>
>>>
>

Re: GSOC proposals and mentors [was Call to action – Mahout needs your help]

Reply via email to