[jira] [Commented] (MAHOUT-1470) Topic dump

2015-03-26 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383113#comment-14383113
 ] 

Andrew Musselman commented on MAHOUT-1470:
--

I agree

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
>  Labels: legacy
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1470) Topic dump

2015-03-26 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383104#comment-14383104
 ] 

Suneel Marthi commented on MAHOUT-1470:
---

This has been a frequent ask for sometime now, this definitely needs to be 
closed out in next release.

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
>  Labels: legacy
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-10-16 Thread Aditya Kashyap Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173515#comment-14173515
 ] 

Aditya Kashyap Singh commented on MAHOUT-1470:
--

I was trying to do the topics analysis on set of documents using the latest 
version of Mahout.
The output for topic to term mapping is proper with each topic having list of 
terms with corresponding probabilities.
But in document to topic mapping , for each document ,it displays only a set of 
topics starting with a particular letter for e.g: with 'a'


> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-06-27 Thread prakash kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046124#comment-14046124
 ] 

prakash kumar commented on MAHOUT-1470:
---

Hi Andrew,

I could not figure out a way to map the output back to the human readable form 
and hence I cannot measure performance of LDA on the test set. I want to run a 
Google search on the human-readable output to collect more information on that 
topic. Output in current format (such as 0.6788) does not allow me to do 
that. Is there anything I am missing here?

Thanks for planning to take out time.



> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-06-26 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045502#comment-14045502
 ] 

Andrew Musselman commented on MAHOUT-1470:
--

Prakash, I've been so occupied at work that I have not touched this.  I am 
going on vacation Monday for a couple weeks during which I plan to crack open 
some bugs.

In the meantime may I ask why this feature is blocking you?  I thought this was 
a convenience function to see output in a human-readable format.

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-06-26 Thread prakash kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045471#comment-14045471
 ] 

prakash kumar commented on MAHOUT-1470:
---

Andrew, Suneel, - any luck getting time to take up this issue? Eagerly waiting 
for resolution of this issue to make progress on my topic recommender. Thanks.





> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 0.9
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-05-26 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009133#comment-14009133
 ] 

Andrew Musselman commented on MAHOUT-1470:
--

Possibly, yes, thanks; I haven't touched it so the context is the same, which I 
think is to allow for dumping topics and the documents that fall into those 
topics, like your first comment suggests.

> "Ideally Mahout's missing a clusterdump like utility for that reads in LDA
> topics, Document - DocumentId mapping and displays a report of the
> topics and the documents that belong to a topic."
>

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 1.0
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-05-26 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009126#comment-14009126
 ] 

Andrew Musselman commented on MAHOUT-1470:
--

I have some stuff, just need to finish it up; will let you know though, thanks.

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 1.0
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-05-26 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008810#comment-14008810
 ] 

Suneel Marthi commented on MAHOUT-1470:
---

Andrew, have some time this week; wanna have me take a crack at this ?  If so 
fill me in on the issue lost context on this.

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 1.0
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-05-19 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001876#comment-14001876
 ] 

Andrew Musselman commented on MAHOUT-1470:
--

No progress; been overbooked at work.  If I don't get to it this week we could 
ask someone else to take it.

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 1.0
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-05-18 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001035#comment-14001035
 ] 

Sebastian Schelter commented on MAHOUT-1470:


[~andrew.musselman] what's the status here?

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 1.0
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-04-28 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983145#comment-13983145
 ] 

Andrew Musselman commented on MAHOUT-1470:
--

I'll take it

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Clustering
>Affects Versions: 1.0
>Reporter: Andrew Musselman
>Assignee: Andrew Musselman
>Priority: Minor
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-04-27 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982244#comment-13982244
 ] 

Sebastian Schelter commented on MAHOUT-1470:


What's the status here?

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Classification
>Affects Versions: 1.0
>Reporter: Andrew Musselman
>Priority: Minor
> Fix For: 1.0
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-03-21 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943163#comment-13943163
 ] 

Andrew Musselman commented on MAHOUT-1470:
--

Yes, that's it.

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Classification
>Affects Versions: 1.0
>Reporter: Andrew Musselman
>Priority: Minor
> Fix For: 0.9
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1470) Topic dump

2014-03-20 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942458#comment-13942458
 ] 

Suneel Marthi commented on MAHOUT-1470:
---

Mahout's already get a LDAPrintTopics which prints the top K terms per topic. 
So this would basically transform the output of LdaTopics to replace topicId => 
topic and documentID => document ??

> Topic dump
> --
>
> Key: MAHOUT-1470
> URL: https://issues.apache.org/jira/browse/MAHOUT-1470
> Project: Mahout
>  Issue Type: New Feature
>  Components: Classification
>Affects Versions: 1.0
>Reporter: Andrew Musselman
>Priority: Minor
> Fix For: 0.9
>
>
> Per 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCAMc_qaL2DCgbVbam2miNsLpa4qvaA9sMy1-arccF9Nz6ApcsvQ%40mail.gmail.com%3E
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Mahout's missing a clusterdump utility that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)