Re: Helping out on spark efforts

Dmitriy Lyubimov Wed, 30 Apr 2014 11:43:00 -0700

I also would suggest to take some guinea pigs to validate stuff.

E.g. if i may make a suggestion, let's see how we'd do a categorical
variable vectorization into predictor variables in our would-be language
here.



On Wed, Apr 30, 2014 at 11:40 AM, Dmitriy Lyubimov <[email protected]>wrote:

>
>
>
> On Wed, Apr 30, 2014 at 10:53 AM, Dmitriy Lyubimov <[email protected]>wrote:
>
>> +1.
>>
>> And the greatest benefit of data frames work is standardization of
>> feature extraction in Mahout, not necessarily any particular algorithms.
>> This has been the thorniest issue in the history and nobody does it well
>> today as it stands.
>>
>
> Correction: nobody does it well in open source and in distributed way,
> that is.
>
>
>>  If we tackle feature prep techniques in engine-agnostic way, this would
>> be truly unique differentiation factor for Mahout.
>>
>>
>>
>> On Wed, Apr 30, 2014 at 7:52 AM, Sebastian Schelter <[email protected]>wrote:
>>
>>> I think you should concentrate on MAHOUT-1490, that is a highly
>>> important task that will be the foundation for a lot of stuff to be built
>>> on top. Let's focus on getting this thing right and then move on to other
>>> things.
>>>
>>> --sebastian
>>>
>>>
>>> On 04/30/2014 04:44 PM, Saikat Kanjilal wrote:
>>>
>>>> Sebastien/Dmitry,In looking through the current list of issues I didnt
>>>> see other algorithms in mahout that are talked about being ported to spark,
>>>> I was wondering if there's any interest/need in porting or writing things
>>>> like LR/KMeans/SVM to use spark, I'd like to help out in this area while
>>>> working on 1490.  Also are we planning to port the distributed versions of
>>>> taste to use spark as well at some point.
>>>> Thanks in advance.
>>>>
>>>>
>>>
>>
>

Re: Helping out on spark efforts

Reply via email to