[ 
https://issues.apache.org/jira/browse/MADLIB-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16168251#comment-16168251
 ] 

Brian Dolan commented on MADLIB-1159:
-------------------------------------

Sure!

Map Documents, using 1 indexing (but that's how counting works :) )
doc_name                                          | id
-------------------------------------------------|---
do androids dream of electric sheep | 1
da vinci code book review                 | 2

Map features
ftr                | id
----------------|--
rachel         | 1
andy           | 2
hands         | 3
vapid          | 4
uninspired  | 5
nauseating | 6
inane          | 7

Then the matrix would be 2 x 7.  This one looks boring, but you get the picture.

1 1 1 0 0 0 0
0 0 0 1 1 1 1

Does that help?


> Provide examples for common sparse matrix cases
> -----------------------------------------------
>
>                 Key: MADLIB-1159
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1159
>             Project: Apache MADlib
>          Issue Type: Documentation
>            Reporter: Brian Dolan
>
> A fairly common table structure is of the form  `key1, key2, value` like a 
> triples in a graph.  These are often not normalized.
> It would be useful to provide an example of transforming this class of tables 
> into a sparse matrix.  Perhaps an example dataset could be a term-document 
> matrix.
> TABLE doc_term;
> document, term, freq
> "do androids dream of electric sheep", "rachel", 75
> "do androids dream of electric sheep", "andy", 56
> "do androids dream of electric sheep", "hands", 128
> "da vinci code book review", "vapid",1326
> "da vinci code book review", "uninspired",265
> "da vinci code book review", "nauseating",879293
> "da vinci code book review", "inane",471
> Into a sparse matrix table of documents by features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to