[jira] [Updated] (TIKA-1315) Basic list support in WordExtractor
[ https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moritz Dorka updated TIKA-1315: --- Attachment: complex_list_test.doc Basic list support in WordExtractor --- Key: TIKA-1315 URL: https://issues.apache.org/jira/browse/TIKA-1315 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.6 Reporter: Filip Bednárik Assignee: Tim Allison Priority: Minor Fix For: 1.9 Attachments: ListManager.tar.bz2, ListNumbering.patch, ListUtils.java, WordExtractor.java.patch, WordParserTest.java.patch, complex_list_test.doc Hello guys, I am really sorry to post issue like this because I have no other way of contacting you and I don't quite understand how you manage forks and pull requests (I don't think you do that). Plus I don't know your coding styles and stuff. In my project I needed for tika to parse numbered lists from word .doc documents, but TIKA doesn't support it. So I looked for solution and found one here: http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/ . So I adapted this solution to Apache TIKA with few fixes and improvements. Anyway feel free to use any of it so it can help people who struggle with lists in TIKA like I did. Attached files are: Updated test Fixed WordExtractor Added ListUtils -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1315) Basic list support in WordExtractor
[ https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1315: Fix Version/s: (was: 1.7) 1.8 - push to 1.8 Basic list support in WordExtractor --- Key: TIKA-1315 URL: https://issues.apache.org/jira/browse/TIKA-1315 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.6 Reporter: Filip Bednárik Priority: Minor Fix For: 1.8 Attachments: ListManager.tar.bz2, ListNumbering.patch, ListUtils.java, WordExtractor.java.patch, WordParserTest.java.patch Hello guys, I am really sorry to post issue like this because I have no other way of contacting you and I don't quite understand how you manage forks and pull requests (I don't think you do that). Plus I don't know your coding styles and stuff. In my project I needed for tika to parse numbered lists from word .doc documents, but TIKA doesn't support it. So I looked for solution and found one here: http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/ . So I adapted this solution to Apache TIKA with few fixes and improvements. Anyway feel free to use any of it so it can help people who struggle with lists in TIKA like I did. Attached files are: Updated test Fixed WordExtractor Added ListUtils -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1315) Basic list support in WordExtractor
[ https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moritz Dorka updated TIKA-1315: --- Attachment: ListNumbering.patch ListManager.tar.bz2 File paths are relative to the tika-parsers subproject Basic list support in WordExtractor --- Key: TIKA-1315 URL: https://issues.apache.org/jira/browse/TIKA-1315 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.6 Reporter: Filip Bednárik Priority: Minor Fix For: 1.7 Attachments: ListManager.tar.bz2, ListNumbering.patch, ListUtils.java, WordExtractor.java.patch, WordParserTest.java.patch Hello guys, I am really sorry to post issue like this because I have no other way of contacting you and I don't quite understand how you manage forks and pull requests (I don't think you do that). Plus I don't know your coding styles and stuff. In my project I needed for tika to parse numbered lists from word .doc documents, but TIKA doesn't support it. So I looked for solution and found one here: http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/ . So I adapted this solution to Apache TIKA with few fixes and improvements. Anyway feel free to use any of it so it can help people who struggle with lists in TIKA like I did. Attached files are: Updated test Fixed WordExtractor Added ListUtils -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1315) Basic list support in WordExtractor
[ https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Filip Bednárik updated TIKA-1315: - Attachment: ListUtils.java WordExtractor.java WordParserTest.java Basic list support in WordExtractor --- Key: TIKA-1315 URL: https://issues.apache.org/jira/browse/TIKA-1315 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.6 Reporter: Filip Bednárik Priority: Minor Fix For: 1.6 Attachments: ListUtils.java, WordExtractor.java, WordParserTest.java Hello guys, I am really sorry to post issue like this because I have no other way of contacting you and I don't quite understand how you manage forks and pull requests (I don't think you do that). In my project I needed for tika to parse numbered lists from word .doc documents, but TIKA doesn't support it. So I looked for solution and found one here: http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/ . So I adapted this solution to Apache TIKA with few fixes and improvements. Anyway feel free to use any of it so it can help people who struggle with lists in TIKA like I did. Attached files are: Updated test Fixed WordExtractor Added ListUtils -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TIKA-1315) Basic list support in WordExtractor
[ https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Filip Bednárik updated TIKA-1315: - Attachment: (was: WordExtractor.java) Basic list support in WordExtractor --- Key: TIKA-1315 URL: https://issues.apache.org/jira/browse/TIKA-1315 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.6 Reporter: Filip Bednárik Priority: Minor Fix For: 1.6 Attachments: ListUtils.java, WordExtractor.java, WordExtractor.java.patch, WordParserTest.java, WordParserTest.java.patch Hello guys, I am really sorry to post issue like this because I have no other way of contacting you and I don't quite understand how you manage forks and pull requests (I don't think you do that). Plus I don't know your coding styles and stuff. In my project I needed for tika to parse numbered lists from word .doc documents, but TIKA doesn't support it. So I looked for solution and found one here: http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/ . So I adapted this solution to Apache TIKA with few fixes and improvements. Anyway feel free to use any of it so it can help people who struggle with lists in TIKA like I did. Attached files are: Updated test Fixed WordExtractor Added ListUtils -- This message was sent by Atlassian JIRA (v6.2#6252)