+1 Tested under different setups (mixture of nominal and numerical, or only nominal or only numerical). Indeed it improves significantly the accuracy over the older version and is almost similar to MOA.
On Mon, Dec 7, 2015 at 2:30 PM, ASF GitHub Bot (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/SAMOA-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044924#comment-15044924 > ] > > ASF GitHub Bot commented on SAMOA-52: > ------------------------------------- > > GitHub user abifet opened a pull request: > > https://github.com/apache/incubator-samoa/pull/42 > > SAMOA-52: Fix nominal attribute problem in VHT > > 1. We don't need to split if the class distribution in a node is pure > 2. We need to reset the best and second best attribute after an > attempt of split. > > Every time we want to decide if we do a split, we should not reuse the > best and second best split from previous attempts to split. So each time we > want to decide if we split or not, we recollect the information from the > attributes, and then with the best and second best split, decide using the > Hoeffding bound. After that, we need to reset the best and second best > split, and start again. > > You can merge this pull request into a Git repository by running: > > $ git pull https://github.com/abifet/incubator-samoa SAMOA-52 > > Alternatively you can review and apply these changes as the patch at: > > https://github.com/apache/incubator-samoa/pull/42.patch > > To close this pull request, make a commit to your master/trunk branch > with (at least) the following in the commit message: > > This closes #42 > > ---- > commit 2b8da26d2aa0c9c5b2db63c2ec114ae82be77a5e > Author: Albert Bifet <[email protected]> > Date: 2015-12-07T13:20:00Z > > SAMOA-52: Fix nominal attributes problem in VHT > > ---- > > > > VHT performance problem with nominal attributes > > ----------------------------------------------- > > > > Key: SAMOA-52 > > URL: https://issues.apache.org/jira/browse/SAMOA-52 > > Project: SAMOA > > Issue Type: Bug > > Components: SAMOA-API > > Reporter: Albert Bifet > > Assignee: Albert Bifet > > > > When running artificial experiments using only nominal attributes with > > {code} > > generators.RandomTreeGenerator -c 2 -o 50 -u 0 > > {code} > > 2 classes, 50 nominal attributes, 0 numerical attributes, accuracy goes > down to 53%. However, in the case > > {code} > > generators.RandomTreeGenerator -c 2 -o 50 -u 0 > > {code} > > with 50 numerical attributes, 0 nominal attributes, VHT is performing > well with 95.42% of accuracy. > > To reproduce: > > {code} > > bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar > "PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l > (classifiers.trees.VerticalHoeffdingTree -p 4) -s > (generators.RandomTreeGenerator -c 2 -o 50 -u 0)" > > {code} > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > -- Nicolas Kourtellis
