Re: A hadoop novice meets mahout

Benson Margulies Fri, 29 May 2009 09:05:37 -0700

The Shashikant code ends up with a SparseVector. There must be some easy
easy way to pull in a SparseVector instead of a DenseVector. The
SparseVector reader wants a DataInput, and the InputMapper has a Text, but
perhaps a quick StringReader is all I need.


The code in the example

On Fri, May 29, 2009 at 12:00 PM, Grant Ingersoll <[email protected]>wrote:

> I think Shashikant was using a modified form of Mahout that encoded the
> labels in the output.
>
> I think we're still a little bit away from having a utility that truly
> makes this straightforward to go from text to clusterable vectors.
>
> No doubt what is happening is the recognition of a need for some type of
> pipeline process that can work with multiple data sources and output various
> consumable formats and help select features.  Unfortunately, we aren't there
> just yet.
>
> -Grant
>
>
> On May 29, 2009, at 11:27 AM, Benson Margulies wrote:
>
>  I'll fish for a one more hint. I'm using the MAHOUT-126 code to turn text
>> into data via TF-IDF. What comes out of there is not in the same format as
>> your example data. This means that I need a different InputDriver? Is one
>> lying about for the format written by that DocumentVector class?
>>
>> On Fri, May 29, 2009 at 10:29 AM, Jeff Eastman
>> <[email protected]>wrote:
>>
>>  Benson Margulies wrote:
>>>
>>>  OK, I've got some inputs, I want to run k-means, how do I feed the
>>>> beast?
>>>>
>>>>
>>>>
>>>>  Make sure you can run the Synthetic Control example to get everything
>>> wired
>>> together correctly: JDK, Hadoop, Mahout. See
>>> http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html. Then write an
>>> input job to convert your data similar to
>>>
>>> /Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/InputDriver.java
>>> and make a new job like
>>>
>>> /Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java.
>>> You will have a small adventure and then be operational.
>>>
>>> Have fun,
>>> Jeff
>>>
>>>
>

Re: A hadoop novice meets mahout

Reply via email to