[ https://issues.apache.org/jira/browse/MAHOUT-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114642#comment-14114642 ]
ASF GitHub Bot commented on MAHOUT-1608: ---------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/mahout/pull/45 > Add Option WikipediaToSequenceFile to remove Category Labels from Documents > --------------------------------------------------------------------------- > > Key: MAHOUT-1608 > URL: https://issues.apache.org/jira/browse/MAHOUT-1608 > Project: Mahout > Issue Type: Improvement > Affects Versions: 0.9 > Reporter: Andrew Palumbo > Assignee: Andrew Palumbo > Priority: Minor > Fix For: 1.0 > > > Currently WikipediaMapper job extracts Category labels from the text of the > Wikipedia documents and leaves the label as [[Category:label]] in the > document. Add in an option to WikipediaToSequenceFile.java to remove > [[Category:label]] from the text after extracting the label. -- This message was sent by Atlassian JIRA (v6.2#6252)