[
https://issues.apache.org/jira/browse/MADLIB-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16168721#comment-16168721
]
Frank McQuillan commented on MADLIB-1159:
-----------------------------------------
It's a good idea to have some Tutorials, perhaps on the wiki
https://cwiki.apache.org/confluence/display/MADLIB/Apache+MADlib
We often hear about doc/word management as a challenge for users.
I guess the madlib.term_frequency() function shown in
http://madlib.apache.org/docs/latest/group__grp__lda.html
could be used upstream of the example.
As for creating doc numbering, that could be added w/regular SQL in step 3 once
the encoded arrays are collapsed.
> Provide examples for common sparse matrix cases
> -----------------------------------------------
>
> Key: MADLIB-1159
> URL: https://issues.apache.org/jira/browse/MADLIB-1159
> Project: Apache MADlib
> Issue Type: Documentation
> Reporter: Brian Dolan
>
> A fairly common table structure is of the form `key1, key2, value` like a
> triples in a graph. These are often not normalized.
> It would be useful to provide an example of transforming this class of tables
> into a sparse matrix. Perhaps an example dataset could be a term-document
> matrix.
> TABLE doc_term;
> document, term, freq
> "do androids dream of electric sheep", "rachel", 75
> "do androids dream of electric sheep", "andy", 56
> "do androids dream of electric sheep", "hands", 128
> "da vinci code book review", "vapid",1326
> "da vinci code book review", "uninspired",265
> "da vinci code book review", "nauseating",879293
> "da vinci code book review", "inane",471
> Into a sparse matrix table of documents by features.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)