[ 
https://issues.apache.org/jira/browse/MAHOUT-476?focusedWorklogId=989377&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-989377
 ]

ASF GitHub Bot logged work on MAHOUT-476:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Oct/25 06:34
            Start Date: 29/Oct/25 06:34
    Worklog Time Spent: 10m 
      Work Description: guan404ming opened a new pull request, #590:
URL: https://github.com/apache/mahout/pull/590

   ### Purpose of PR  
   Describe what this PR does.
   
   ### Linked Issues  
   Add links to related issues.
   - Closes #476 
   
   ### Changes Made  
   - [ ] Bug fix  
   - [x] New feature  
   - [ ] Documentation update  
   
   ### Important ToDos
   Please mark each with an "x"  
   
   A GitHub issue exists (if not, please create one) 
[https://github.com/apache/mahout/issues]  
   - [x] Title of PR is "Issue #XXXX: Brief Description of Changes" where XXXX 
is the GitHub issue number.  
   - [x] Created unit tests where appropriate  
   - [x] Added correct licenses on newly added files  
   - [x] Assigned GitHub issue to self  
   - [ ] Added documentation in ScalaDocs/JavaDocs and to the website  
   - [x] Successfully built and ran all unit tests, verified that all tests 
pass locally  
   
   If all of these items are not yet complete, but you still feel it is 
appropriate to open a PR, please open it as a **Draft PR** instead.  
   Once all requirements are met, you can mark it as ready for review.
   
   
   ### Breaking Changes  
   Does this PR introduce a breaking change?
   - [ ] Yes  
   - [x] No  
   
   ### Testing & Verification  
   Describe how you tested the changes.
   - [x] Unit tests added  
   - [x] Manually tested  
   
   ### Checklist  
   - [x] The title follows the format "MAHOUT-XXXX Brief Description"  
   - [x] GitHub issue is created
   - [x] Code follows ASF guidelines  
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 989377)
    Remaining Estimate: 0h
            Time Spent: 10m

> bug when running 
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver on hadoop
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-476
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-476
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.3
>         Environment: hadoop 0.20.2
> mahout-0.3
> ubuntu
>            Reporter: leon lee
>            Priority: Major
>             Fix For: 0.3
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> when I follow wiki instruction: 
> https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html 
> (by the way, the bayes examples document in wiki  need update to 0.3 )
> to run step 5:
> Create the countries based Split of wikipedia dataset. 
> I use the following command:
> $HADOOP_HOME/bin/hadoop jar 
> $MAHOUT_HOME/examples/target/mahout-examples-0.3.job  
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i 
> $MAHOUT_HOME/examples/work/wikipedia/chunks -o 
> $MAHOUT_HOME/examples/work/wikipediainput  -c  
> $MAHOUT_HOME/examples/src/test/resources/country.txt
> and failed on hadoop.
> see hadoop log, it hint:
> Error: 
> org.apache.lucene.wikipedia.analysis.WikipediaTokenizer.addAttribute(Ljava/lang/Class;)Lorg/apache/lucene/util/Attribute



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to