Re: Data frames

Pat Ferrel Wed, 02 Apr 2014 13:57:08 -0700

> On Apr 2, 2014, at 1:39 PM, Dmitriy Lyubimov <[email protected]> wrote:
> 
> I think this duality, names and keys, is not very healthy really, and just
> creates addtutiinal hassle. Spark drm takes care of keys automatically
> thoughout, but propagating names from name vectors is solely algorithm
> concern as it stands.

Not sure what you mean. In my experience Names and Properties are primarily 
used to store external keys, which are quite healthy.
Users never have data with Mahout keys, they must constantly go back and forth. 
This is exactly what the R data frame does, no? I’m not so concerned with being 
able to address an element by the external key drmB[“pat”][“iPad’] like a 
HashMap. But it would sure be nice to have the external ids follow the data 
through any calculation that makes sense. 

This would mean clustering, recommendations, transpose, RSJ would require no id 
transforming steps. This would make dealing with Mahout much easier.

> On Apr 2, 2014 1:08 PM, "Pat Ferrel" <[email protected]> wrote:
> 
>> Are the Spark efforts supporting all Mahout Vector types? Named, Property
>> Vectors? It occurred to me that data frames in R is a related but more
>> general solution. If all rows and columns of a DRM and their coresponding
>> Vectors (row or column vectors) were to support arbitrary properties
>> attached to them in such a way that they are preserved during transpose,
>> Vector extraction, and any other operations that make sense there would be
>> a huge benefit for users.
>> 
>> One of the constant problems with input to Mahout is translation of IDs.
>> External to Mahout going in, Mahout to external coming out. Most of this
>> would be unneeded if Mahout supported data frames, some would be avoided by
>> supporting named or property vectors universally.
>> 
>> 
>

Re: Data frames

Reply via email to