[ https://issues.apache.org/jira/browse/MAHOUT-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997495#comment-13997495 ]
Richard Scharrer commented on MAHOUT-1549: ------------------------------------------ Hi Andy, drahcos is actually my account. I'm sorry but I had to ask this question on two or three forums because I was in a hurry. To answer your question, yes this solved my problem. Thank you for your response. Regards, Richard > Extracting tfidf-vectors by key > ------------------------------- > > Key: MAHOUT-1549 > URL: https://issues.apache.org/jira/browse/MAHOUT-1549 > Project: Mahout > Issue Type: Question > Components: Classification > Affects Versions: 0.7, 0.8, 0.9 > Reporter: Richard Scharrer > Labels: documentation, features, newbie > > Hi, > I have about 200000 tfidf-vectors and I need to extract 500 of them of which > I have the keys. Is there some kind of magical option which allows me > something like taking the output of mahout seqdumper and transform it back > into a sequencefile that I can use for trainnb /testnb? The sequencefiles of > tfidf use the Text class for the keys and the VectorWritable class for the > values. I tried > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java > with different settings but the output always gives me the Text class for > both, key and value which can't be used in trainnb and testnb. > I posted this question on: > http://stackoverflow.com/questions/23502362/extracting-tfidf-vectors-by-key-without-destroying-the-fileformat > I ask this question in here because I've seen similar questions on > stackoverflow that where asked last year and still didn't get an answer > I really need this information so in case you know anything please tell me. > Regards, > Richard -- This message was sent by Atlassian JIRA (v6.2#6252)