[
https://issues.apache.org/jira/browse/SAMOA-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044924#comment-15044924
]
ASF GitHub Bot commented on SAMOA-52:
-------------------------------------
GitHub user abifet opened a pull request:
https://github.com/apache/incubator-samoa/pull/42
SAMOA-52: Fix nominal attribute problem in VHT
1. We don't need to split if the class distribution in a node is pure
2. We need to reset the best and second best attribute after an attempt of
split.
Every time we want to decide if we do a split, we should not reuse the best
and second best split from previous attempts to split. So each time we want to
decide if we split or not, we recollect the information from the attributes,
and then with the best and second best split, decide using the Hoeffding bound.
After that, we need to reset the best and second best split, and start again.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/abifet/incubator-samoa SAMOA-52
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-samoa/pull/42.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #42
----
commit 2b8da26d2aa0c9c5b2db63c2ec114ae82be77a5e
Author: Albert Bifet <[email protected]>
Date: 2015-12-07T13:20:00Z
SAMOA-52: Fix nominal attributes problem in VHT
----
> VHT performance problem with nominal attributes
> -----------------------------------------------
>
> Key: SAMOA-52
> URL: https://issues.apache.org/jira/browse/SAMOA-52
> Project: SAMOA
> Issue Type: Bug
> Components: SAMOA-API
> Reporter: Albert Bifet
> Assignee: Albert Bifet
>
> When running artificial experiments using only nominal attributes with
> {code}
> generators.RandomTreeGenerator -c 2 -o 50 -u 0
> {code}
> 2 classes, 50 nominal attributes, 0 numerical attributes, accuracy goes down
> to 53%. However, in the case
> {code}
> generators.RandomTreeGenerator -c 2 -o 50 -u 0
> {code}
> with 50 numerical attributes, 0 nominal attributes, VHT is performing well
> with 95.42% of accuracy.
> To reproduce:
> {code}
> bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
> "PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l
> (classifiers.trees.VerticalHoeffdingTree -p 4) -s
> (generators.RandomTreeGenerator -c 2 -o 50 -u 0)"
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)