[ https://issues.apache.org/jira/browse/LUCENE-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238738#comment-15238738 ]
ASF subversion and git services commented on LUCENE-7196: --------------------------------------------------------- Commit 67f6283ce418357938fc12d82783a3504ba700d7 in lucene-solr's branch refs/heads/master from [~shalinmangar] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=67f6283 ] LUCENE-7196: Add dependency on grouping and misc modules to avoid compile failures in IntelliJ IDEA > DataSplitter should be providing class centric doc sets in all generated > indexes > -------------------------------------------------------------------------------- > > Key: LUCENE-7196 > URL: https://issues.apache.org/jira/browse/LUCENE-7196 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/classification > Reporter: Tommaso Teofili > Assignee: Tommaso Teofili > Priority: Minor > Fix For: 6.1 > > > {{DataSplitter}} currently creates 3 indexes (train/test/cv) out of an > _original_ index for evaluation of {{Classifiers}} however "class coverage" > in such generated indexes is not guaranteed; that means e.g. in _training > index_ only documents belonging to 50% of the class set could be indexed and > hence classifiers may not be very effective. In order to provide more > consistent evaluation the generated index should contain _ split-ratio * | > docs in c |_ documents for each class _c_. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org