[ https://issues.apache.org/jira/browse/DATAFU-149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthew Hayes updated DATAFU-149: --------------------------------- Fix Version/s: (was: 1.6.0) > Add MutliLabelStratifiedSample to DataFu.spark > ---------------------------------------------- > > Key: DATAFU-149 > URL: https://issues.apache.org/jira/browse/DATAFU-149 > Project: DataFu > Issue Type: Improvement > Affects Versions: 1.5.0 > Reporter: Russell Jurney > Assignee: Russell Jurney > Priority: Major > Labels: datafu, spark > > I'm working on an implementation of On the Stratification of Multi-Label > Data, to create a stratified (balanced, in my case) sample of highly skewed > labels for a multi-label, multi-class classification problem. This isn't > straightforward because adding one record adds multiple labels to the > balance. A greedy algorithm that adds labels with the least common labels > works, and since I'm writing it, it would probably make a good feature. > http://lpis.csd.auth.gr/publications/sechidis-ecmlpkdd-2011.pdf -- This message was sent by Atlassian Jira (v8.3.4#803005)