[
https://issues.apache.org/jira/browse/MAHOUT-476?focusedWorklogId=989377&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-989377
]
ASF GitHub Bot logged work on MAHOUT-476:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 29/Oct/25 06:34
Start Date: 29/Oct/25 06:34
Worklog Time Spent: 10m
Work Description: guan404ming opened a new pull request, #590:
URL: https://github.com/apache/mahout/pull/590
### Purpose of PR
Describe what this PR does.
### Linked Issues
Add links to related issues.
- Closes #476
### Changes Made
- [ ] Bug fix
- [x] New feature
- [ ] Documentation update
### Important ToDos
Please mark each with an "x"
A GitHub issue exists (if not, please create one)
[https://github.com/apache/mahout/issues]
- [x] Title of PR is "Issue #XXXX: Brief Description of Changes" where XXXX
is the GitHub issue number.
- [x] Created unit tests where appropriate
- [x] Added correct licenses on newly added files
- [x] Assigned GitHub issue to self
- [ ] Added documentation in ScalaDocs/JavaDocs and to the website
- [x] Successfully built and ran all unit tests, verified that all tests
pass locally
If all of these items are not yet complete, but you still feel it is
appropriate to open a PR, please open it as a **Draft PR** instead.
Once all requirements are met, you can mark it as ready for review.
### Breaking Changes
Does this PR introduce a breaking change?
- [ ] Yes
- [x] No
### Testing & Verification
Describe how you tested the changes.
- [x] Unit tests added
- [x] Manually tested
### Checklist
- [x] The title follows the format "MAHOUT-XXXX Brief Description"
- [x] GitHub issue is created
- [x] Code follows ASF guidelines
Issue Time Tracking
-------------------
Worklog Id: (was: 989377)
Remaining Estimate: 0h
Time Spent: 10m
> bug when running
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver on hadoop
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-476
> URL: https://issues.apache.org/jira/browse/MAHOUT-476
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.3
> Environment: hadoop 0.20.2
> mahout-0.3
> ubuntu
> Reporter: leon lee
> Priority: Major
> Fix For: 0.3
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> when I follow wiki instruction:
> https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
> (by the way, the bayes examples document in wiki need update to 0.3 )
> to run step 5:
> Create the countries based Split of wikipedia dataset.
> I use the following command:
> $HADOOP_HOME/bin/hadoop jar
> $MAHOUT_HOME/examples/target/mahout-examples-0.3.job
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i
> $MAHOUT_HOME/examples/work/wikipedia/chunks -o
> $MAHOUT_HOME/examples/work/wikipediainput -c
> $MAHOUT_HOME/examples/src/test/resources/country.txt
> and failed on hadoop.
> see hadoop log, it hint:
> Error:
> org.apache.lucene.wikipedia.analysis.WikipediaTokenizer.addAttribute(Ljava/lang/Class;)Lorg/apache/lucene/util/Attribute
--
This message was sent by Atlassian Jira
(v8.20.10#820010)