William, in my last project that I used doccat, I extended the documentsample and just added a generic Map to hold additional key values. Perhaps adding that to the baseline might be natural
Sent from my iPhone > On Apr 15, 2014, at 11:45 AM, William Colen <william.co...@gmail.com> wrote: > > Hello, > > I've been working with the Doccat module and I am wondering if we could > improve its data structure for the 1.6.0 release. > > Today the DocumentSample has the following attributes: > > - String category > - List<String> text > > I would suggest adding an attribute to hold metadata, or additional > contexts information. What do you think? > > Also, what do you think of including sentences and paragraph information? I > don't know if there is anything a feature generator can extract from it to > improve the classification. > > Thank you, > William