Also, note that the row keys in Mahout are not actually stored in the
matrices that we manipulate.  If the keys can be handled separately,
outside of the flow for the data in a drm, then you should be pretty much
good to go.




On Wed, Jun 18, 2014 at 5:34 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

>
> On Wed, Jun 18, 2014 at 12:03 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> wrote:
>
>> > How important are the String row keys for the algorithms itself? Would
>> it
>> > grossly mess up a workflow if Strings are silently discarded by the
>> > backend?
>> >
>>
>> like i said, seq2sparse produces them, and postprocessing for stuff like
>> LSA pipelines would not work.
>
>
> Something as coarse as translating to a dictionary index would probably
> work.  Creating the dictionary in parallel while reading the data should be
> quite doable.
>
>

Reply via email to