[ https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Filip Bednárik updated TIKA-1315: --------------------------------- Attachment: (was: WordExtractor.java) > Basic list support in WordExtractor > ----------------------------------- > > Key: TIKA-1315 > URL: https://issues.apache.org/jira/browse/TIKA-1315 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.6 > Reporter: Filip Bednárik > Priority: Minor > Fix For: 1.6 > > Attachments: ListUtils.java, WordExtractor.java.patch, > WordParserTest.java.patch > > > Hello guys, I am really sorry to post issue like this because I have no other > way of contacting you and I don't quite understand how you manage forks and > pull requests (I don't think you do that). Plus I don't know your coding > styles and stuff. > In my project I needed for tika to parse numbered lists from word .doc > documents, but TIKA doesn't support it. So I looked for solution and found > one here: > http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/ > . So I adapted this solution to Apache TIKA with few fixes and improvements. > Anyway feel free to use any of it so it can help people who struggle with > lists in TIKA like I did. > Attached files are: > Updated test > Fixed WordExtractor > Added ListUtils -- This message was sent by Atlassian JIRA (v6.2#6252)