Richard Scharrer created MAHOUT-1549: ----------------------------------------
Summary: Extracting tfidf-vectors by key Key: MAHOUT-1549 URL: https://issues.apache.org/jira/browse/MAHOUT-1549 Project: Mahout Issue Type: Question Components: Classification Affects Versions: 0.9, 0.8, 0.7 Reporter: Richard Scharrer Hi, I have about 200000 tfidf-vectors and I need to extract 500 of them of which I have the keys. Is there some kind of magical option which allows me something like taking the output of mahout seqdumper and transform it back into a sequencefile that I can use for trainnb /testnb? The sequencefiles of tfidf use the Text class for the keys and the VectorWritable class for the values. I tried https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java with different settings but the output always gives me the Text class for both, key and value which can't be used in trainnb and testnb. I posted this question on: http://stackoverflow.com/questions/23502362/extracting-tfidf-vectors-by-key-without-destroying-the-fileformat I ask this question in here because I've seen similar questions on stackoverflow that where asked last year and still didn't get an answer I really need this information so in case you know anything please tell me. Regards, Richard -- This message was sent by Atlassian JIRA (v6.2#6252)