[ 
https://issues.apache.org/jira/browse/MAHOUT-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320967#comment-14320967
 ] 

Dmitriy Lyubimov commented on MAHOUT-1641:
------------------------------------------

Not sure if i understand the problem correctly. 

There's indeed a general need to convert whatever key to *sequentially and 
ordinally* numbered things of rows (specifically, to enable certain type of 
transpositions).

E.g. if you have a string-labeled set of rows 

A -> x_1
B-> x_2
....
Z -> x_26

then we may want to replace keys with 

0 -> x_1
... 
25 -> x_26

and thus enable more interesting things.

incidentally, i already have a patch for this -- for the same reason. This is 
coming as a part of larger update soon. (I unfortunately am still wrangling 
with approval of these patches thru the corporate food chain here). So if you 
are willing to wait a tiny bit, it is coming in. 

It is also optionally computing the mapping between old and new Int keys.

But please feel free to add your patch (i may probably ask you to modify its 
signature and name to match mine, as i already have dependencies on that).



> Add conversion from a RDD[(String, String)] to a Drm[Int]
> ---------------------------------------------------------
>
>                 Key: MAHOUT-1641
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1641
>             Project: Mahout
>          Issue Type: Question
>          Components: spark
>    Affects Versions: 1.0
>            Reporter: Erlend Hamnaberg
>
> Hi.
> We are using the coocurrence part of mahout as a library. We get our data 
> from other sources, like for instance Cassandra. We dont want to write that 
> data to disk, and read it back since we already have the data on each slave.
> I have created some conversion functions based on one of the 
> IndexedDatasetSpark readers, cant remember which one at the moment.
> Is there interest in the community for this kind of feature? I can probably 
> clean it up and add this as a github pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to