On Tue, Jan 19, 2010 at 2:30 PM, Loek Cleophas <[email protected]>wrote:

> Hi again
>
> My apologies: the results in my previous e-mail were a result of
> inadvertently running *TrainClassifier* with the -i parameter using a
> relative local path vs. one on the DFS. Naturally, since my problem was with
> *TestClassifier*, I should've run that with the adapted -i parameter value.
> (Must have been the lack of morning coffee.)
>
> I have now rerun TrainClassifier to reconstruct the model, and run
> TestClassifier with:
>
> bin/hadoop jar
> ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job
> org.apache.mahout.classifier.bayes.TestClassifier -m 8newsmodel-0.2 -d
> ~/Code/My_Eclipse_Workspace/apache-mahout/examples/work/20news-18828-collapse
> -ng 3 -type bayes -source hdfs -method sequential
>
> That solved the problem. Thanks a lot for that useful remark about the
> input for the TestClassifier needing to come from the local file system.
> I'll now go and sit in the 'feeling silly' corner :)
>
There is nothing silly about it. I missed documenting that particular info.
So I guess I am in the silly corner :P

When you run it against some dataset other than 20 newsgroups, please tell
us how it goes, so that we can take that feedback into improving it.

Regards
Robin



>
> Best wishes,
> Loek
>
>
>
> On Jan 19, 2010, at 08:45, Robin Anil wrote:
>
>  On Tue, Jan 19, 2010 at 1:02 PM, Loek Cleophas <[email protected]
>> >wrote:
>>
>>  Are you sure about it reading from local dir?
>>>
>>
>> Yes, absolutely
>>
>>  Note that I pass -source hdfs to the TestClassifier, and that when I try
>>> to
>>> run it instead with a full local path i.e. as:
>>>
>>>  That source flag in TestClassifier is only for the model (it can be in
>> hdfs
>> or hbase)
>>
>> In sequential mode. the test files are read of the local disk. Where as in
>> mapreduce mode the test files are read off the hdfs
>>
>>
>>  bin/hadoop jar
>>> ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job
>>> org.apache.mahout.classifier.bayes.TrainClassifier -i
>>>
>>> ~/Code/My_Eclipse_Workspace/apache-mahout/examples/work/20news-18828-collapse
>>> -o 8newsmodel-0.2 -ng 3 -type bayes -source hdfs
>>>
>>>  Like i said, Trainer is completely map/reduce it reads of the hdfs
>>
>>
>>
>>> I get the following exception, which seems to imply it is not reading the
>>> input from a local dir...:
>>>
>>> Exception in thread "main"
>>> org.apache.hadoop.mapred.InvalidInputException:
>>> Input path does not exist:
>>>
>>> hdfs://localhost:9000/Users/loekcleophas/Code/My_Eclipse_Workspace/apache-mahout/examples/work/20news-18828-collapse
>>>
>>>
>>>
>>> On Jan 19, 2010, at 08:12, Robin Anil wrote:
>>>
>>> Is it reading the directory correctly ? Note, 8newsinput is read from
>>>
>>>> local
>>>> dir.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jan 19, 2010 at 12:39 PM, Loek Cleophas
>>>> <[email protected]>wrote:
>>>>
>>>> Hi
>>>>
>>>>>
>>>>> I've recently started working with Mahout. At first, I tried the trunk,
>>>>> which I got to compile (both from within Eclipse with a Maven plugin,
>>>>> and
>>>>> command line), but which apparently is in a state of flux regarding
>>>>> building
>>>>> and running the examples (?).
>>>>>
>>>>> I tried running the Twentynewsgroups classification example, after
>>>>> copying
>>>>> the relevant Maven file to the examples directory, as suggested on the
>>>>> mailing list some time ago. I could get the example's data set from
>>>>> wikipedia, could get it processed into input data located on the
>>>>> single-node/local hdfs, and could get a model trained and output to
>>>>> that
>>>>> hdfs. However, the example class TestClassifierto test with the trained
>>>>> model didn't work for me, neither in mapreduce nor in sequential mode.
>>>>> In
>>>>> the mapreduce case, and even with quite high JVM maximum heap sizes (I
>>>>> tried
>>>>> 2048), I get heapspace out of memory errors / object configuration
>>>>> errors.
>>>>> In the sequential case, I seemingly get 0 items classified, see output
>>>>> below. (Note that I reduced the data set to just 8 instead of 20
>>>>> newsgroups,
>>>>> thinking the data size might have something to do with the problem.)
>>>>>
>>>>> I also tried release 0.2, which I got to compile and for which I got
>>>>> the
>>>>> example running more easily, but still with the same errors when
>>>>> testing
>>>>> with the trained model. Any ideas what might be going wrong, or what I
>>>>> might
>>>>> be doing wrong?
>>>>>
>>>>> Kind regards,
>>>>> Loek Cleophas
>>>>>
>>>>>
>>>>> Output of TestClassifier:
>>>>>
>>>>> bin/hadoop jar
>>>>> ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job
>>>>> org.apache.mahout.classifier.bayes.TestClassifier -m 8newsmodel-0.2 -d
>>>>> 8newsInput -ng 3 -type bayes -source hdfs -method sequential
>>>>>
>>>>> <... reading all the feature weights ...>
>>>>>
>>>>> 10/01/13 10:22:08 INFO io.SequenceFileModelReader: Read 1950000 feature
>>>>> weights
>>>>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
>>>>>
>>>>>
>>>>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-weights/Sigma_k/part-00000
>>>>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
>>>>>
>>>>>
>>>>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-weights/Sigma_kSigma_j/part-00000
>>>>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: 420716.6056712613
>>>>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
>>>>>
>>>>>
>>>>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-thetaNormalizer/part-00000
>>>>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
>>>>>
>>>>>
>>>>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-tfIdf/trainer-tfIdf/part-00000
>>>>> comp.windows.x -4443829.798557077 7727496.583973498 -0.5750671967650419
>>>>> comp.graphics -3252365.124498224 7727496.583973498 -0.4208821174044246
>>>>> soc.religion.christian -5106741.34456479 7727496.583973498
>>>>> -0.6608532645819548
>>>>> alt.atheism -3447983.6168798 7727496.583973498 -0.44619671835646907
>>>>> misc.forsale -2276588.3662840202 7727496.583973498 -0.2946087832643716
>>>>> comp.sys.mac.hardware -2445489.855812473 7727496.583973498
>>>>> -0.31646598988918556
>>>>> comp.os.ms-windows.misc -7727496.583973498 7727496.583973498 -1.0
>>>>> comp.sys.ibm.pc.hardware -2687646.590023761 7727496.583973498
>>>>> -0.3478030123750332
>>>>> 10/01/13 10:23:17 INFO bayes.TestClassifier:
>>>>> nCalls = 0;
>>>>> sumTime = 0.0s;
>>>>> minTime = 0.0ms;
>>>>> maxTime = 0.0ms;
>>>>> meanTime = 0.0ms;
>>>>> stdDevTime = 0.0ms;
>>>>> 10/01/13 10:23:18 INFO bayes.TestClassifier:
>>>>> =======================================================
>>>>> Summary
>>>>> -------------------------------------------------------
>>>>> Correctly Classified Instances          :          0             ?%
>>>>> Incorrectly Classified Instances        :          0             ?%
>>>>> Total Classified Instances              :          0
>>>>>
>>>>> =======================================================
>>>>> Confusion Matrix
>>>>> -------------------------------------------------------
>>>>> a       b       c       d       e       f       g       h
>>>>> <--Classified as
>>>>> 0       0       0       0       0       0       0       0        |  0
>>>>>  a     = comp.windows.x
>>>>> 0       0       0       0       0       0       0       0        |  0
>>>>>  b     = comp.graphics
>>>>> 0       0       0       0       0       0       0       0        |  0
>>>>>  c     = soc.religion.christian
>>>>> 0       0       0       0       0       0       0       0        |  0
>>>>>  d     = alt.atheism
>>>>> 0       0       0       0       0       0       0       0        |  0
>>>>>  e     = misc.forsale
>>>>> 0       0       0       0       0       0       0       0        |  0
>>>>>  f     = comp.sys.mac.hardware
>>>>> 0       0       0       0       0       0       0       0        |  0
>>>>>  g     = comp.os.ms-windows.misc
>>>>> 0       0       0       0       0       0       0       0        |  0
>>>>>  h     = comp.sys.ibm.pc.hardware
>>>>> Default Category: unknown: 8
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>


-- 
------
Robin Anil
Blog: http://techdigger.wordpress.com
-------
Try out Swipeball for iPhone
Video: http://www.youtube.com/watch?v=3hvEbWHciwU
iTunes: http://itunes.com/apps/swipeball

Reply via email to