[jira] [Updated] (TIKA-1315) Basic list support in WordExtractor

2015-05-05 Thread Moritz Dorka (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Moritz Dorka updated TIKA-1315:
---
Attachment: complex_list_test.doc

 Basic list support in WordExtractor
 ---

 Key: TIKA-1315
 URL: https://issues.apache.org/jira/browse/TIKA-1315
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.6
Reporter: Filip Bednárik
Assignee: Tim Allison
Priority: Minor
 Fix For: 1.9

 Attachments: ListManager.tar.bz2, ListNumbering.patch, 
 ListUtils.java, WordExtractor.java.patch, WordParserTest.java.patch, 
 complex_list_test.doc


 Hello guys, I am really sorry to post issue like this because I have no other 
 way of contacting you and I don't quite understand how you manage forks and 
 pull requests (I don't think you do that). Plus I don't know your coding 
 styles and stuff.
 In my project I needed for tika to parse numbered lists from word .doc 
 documents, but TIKA doesn't support it. So I looked for solution and found 
 one here: 
 http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/
  . So I adapted this solution to Apache TIKA with few fixes and improvements. 
 Anyway feel free to use any of it so it can help people who struggle with 
 lists in TIKA like I did.
 Attached files are:
 Updated test
 Fixed WordExtractor
 Added ListUtils



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1315) Basic list support in WordExtractor

2014-10-24 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-1315:

Fix Version/s: (was: 1.7)
   1.8

- push to 1.8

 Basic list support in WordExtractor
 ---

 Key: TIKA-1315
 URL: https://issues.apache.org/jira/browse/TIKA-1315
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.6
Reporter: Filip Bednárik
Priority: Minor
 Fix For: 1.8

 Attachments: ListManager.tar.bz2, ListNumbering.patch, 
 ListUtils.java, WordExtractor.java.patch, WordParserTest.java.patch


 Hello guys, I am really sorry to post issue like this because I have no other 
 way of contacting you and I don't quite understand how you manage forks and 
 pull requests (I don't think you do that). Plus I don't know your coding 
 styles and stuff.
 In my project I needed for tika to parse numbered lists from word .doc 
 documents, but TIKA doesn't support it. So I looked for solution and found 
 one here: 
 http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/
  . So I adapted this solution to Apache TIKA with few fixes and improvements. 
 Anyway feel free to use any of it so it can help people who struggle with 
 lists in TIKA like I did.
 Attached files are:
 Updated test
 Fixed WordExtractor
 Added ListUtils



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1315) Basic list support in WordExtractor

2014-09-21 Thread Moritz Dorka (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Moritz Dorka updated TIKA-1315:
---
Attachment: ListNumbering.patch
ListManager.tar.bz2

File paths are relative to the tika-parsers subproject

 Basic list support in WordExtractor
 ---

 Key: TIKA-1315
 URL: https://issues.apache.org/jira/browse/TIKA-1315
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.6
Reporter: Filip Bednárik
Priority: Minor
 Fix For: 1.7

 Attachments: ListManager.tar.bz2, ListNumbering.patch, 
 ListUtils.java, WordExtractor.java.patch, WordParserTest.java.patch


 Hello guys, I am really sorry to post issue like this because I have no other 
 way of contacting you and I don't quite understand how you manage forks and 
 pull requests (I don't think you do that). Plus I don't know your coding 
 styles and stuff.
 In my project I needed for tika to parse numbered lists from word .doc 
 documents, but TIKA doesn't support it. So I looked for solution and found 
 one here: 
 http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/
  . So I adapted this solution to Apache TIKA with few fixes and improvements. 
 Anyway feel free to use any of it so it can help people who struggle with 
 lists in TIKA like I did.
 Attached files are:
 Updated test
 Fixed WordExtractor
 Added ListUtils



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1315) Basic list support in WordExtractor

2014-05-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filip Bednárik updated TIKA-1315:
-

Attachment: ListUtils.java
WordExtractor.java
WordParserTest.java

 Basic list support in WordExtractor
 ---

 Key: TIKA-1315
 URL: https://issues.apache.org/jira/browse/TIKA-1315
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.6
Reporter: Filip Bednárik
Priority: Minor
 Fix For: 1.6

 Attachments: ListUtils.java, WordExtractor.java, WordParserTest.java


 Hello guys, I am really sorry to post issue like this because I have no other 
 way of contacting you and I don't quite understand how you manage forks and 
 pull requests (I don't think you do that).
 In my project I needed for tika to parse numbered lists from word .doc 
 documents, but TIKA doesn't support it. So I looked for solution and found 
 one here: 
 http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/
  . So I adapted this solution to Apache TIKA with few fixes and improvements. 
 Anyway feel free to use any of it so it can help people who struggle with 
 lists in TIKA like I did.
 Attached files are:
 Updated test
 Fixed WordExtractor
 Added ListUtils



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TIKA-1315) Basic list support in WordExtractor

2014-05-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filip Bednárik updated TIKA-1315:
-

Attachment: (was: WordExtractor.java)

 Basic list support in WordExtractor
 ---

 Key: TIKA-1315
 URL: https://issues.apache.org/jira/browse/TIKA-1315
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.6
Reporter: Filip Bednárik
Priority: Minor
 Fix For: 1.6

 Attachments: ListUtils.java, WordExtractor.java, 
 WordExtractor.java.patch, WordParserTest.java, WordParserTest.java.patch


 Hello guys, I am really sorry to post issue like this because I have no other 
 way of contacting you and I don't quite understand how you manage forks and 
 pull requests (I don't think you do that). Plus I don't know your coding 
 styles and stuff.
 In my project I needed for tika to parse numbered lists from word .doc 
 documents, but TIKA doesn't support it. So I looked for solution and found 
 one here: 
 http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/
  . So I adapted this solution to Apache TIKA with few fixes and improvements. 
 Anyway feel free to use any of it so it can help people who struggle with 
 lists in TIKA like I did.
 Attached files are:
 Updated test
 Fixed WordExtractor
 Added ListUtils



--
This message was sent by Atlassian JIRA
(v6.2#6252)