[ 
https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filip Bednárik updated TIKA-1315:
---------------------------------

    Description: 
Hello guys, I am really sorry to post issue like this because I have no other 
way of contacting you and I don't quite understand how you manage forks and 
pull requests (I don't think you do that). Plus I don't know your coding styles 
and stuff.

In my project I needed for tika to parse numbered lists from word .doc 
documents, but TIKA doesn't support it. So I looked for solution and found one 
here: 
http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/
 . So I adapted this solution to Apache TIKA with few fixes and improvements. 
Anyway feel free to use any of it so it can help people who struggle with lists 
in TIKA like I did.

Attached files are:
Updated test
Fixed WordExtractor
Added ListUtils

  was:
Hello guys, I am really sorry to post issue like this because I have no other 
way of contacting you and I don't quite understand how you manage forks and 
pull requests (I don't think you do that).

In my project I needed for tika to parse numbered lists from word .doc 
documents, but TIKA doesn't support it. So I looked for solution and found one 
here: 
http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/
 . So I adapted this solution to Apache TIKA with few fixes and improvements. 
Anyway feel free to use any of it so it can help people who struggle with lists 
in TIKA like I did.

Attached files are:
Updated test
Fixed WordExtractor
Added ListUtils


> Basic list support in WordExtractor
> -----------------------------------
>
>                 Key: TIKA-1315
>                 URL: https://issues.apache.org/jira/browse/TIKA-1315
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Filip Bednárik
>            Priority: Minor
>             Fix For: 1.6
>
>         Attachments: ListUtils.java, WordExtractor.java, WordParserTest.java
>
>
> Hello guys, I am really sorry to post issue like this because I have no other 
> way of contacting you and I don't quite understand how you manage forks and 
> pull requests (I don't think you do that). Plus I don't know your coding 
> styles and stuff.
> In my project I needed for tika to parse numbered lists from word .doc 
> documents, but TIKA doesn't support it. So I looked for solution and found 
> one here: 
> http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/
>  . So I adapted this solution to Apache TIKA with few fixes and improvements. 
> Anyway feel free to use any of it so it can help people who struggle with 
> lists in TIKA like I did.
> Attached files are:
> Updated test
> Fixed WordExtractor
> Added ListUtils



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to