[jira] [Commented] (TIKA-1447) CHM parser: wrong directory list

2014-11-22 Thread Bin Hawking (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221949#comment-14221949
 ] 

Bin Hawking commented on TIKA-1447:
---

Yes, it was meant to fix TIKA-1430, 1446, 1447, 1448.

I had added a new test case to validate the extracted directory list. Please 
see the messages in TIKA-1446 and the code.

 CHM parser: wrong directory list
 

 Key: TIKA-1447
 URL: https://issues.apache.org/jira/browse/TIKA-1447
 Project: Tika
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Bin Hawking
Priority: Critical

 CHM parser gets wrong directory list of a chm file (eg. testChm2.chm in 
 tika-parser's test-resources):
 1. Duplicate entries (mostly from PMGI chunks, which should have been 
 ignored.)
 2. Invalid entry (usually with unreadable entry name).
 3. Missed entries (some times it is like TIKA-1176)
 I have fixed it (to some degree), by using the PMGL header to find dir chunks 
 and their respective meaningful parts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1447) CHM parser: wrong directory list

2014-11-17 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214535#comment-14214535
 ] 

Hong-Thai Nguyen commented on TIKA-1447:


[~binhawking], The work on TIKA-1446 fixed this issue ? Any change to double 
check again ?

Thanks,

 CHM parser: wrong directory list
 

 Key: TIKA-1447
 URL: https://issues.apache.org/jira/browse/TIKA-1447
 Project: Tika
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Bin Hawking
Priority: Critical

 CHM parser gets wrong directory list of a chm file (eg. testChm2.chm in 
 tika-parser's test-resources):
 1. Duplicate entries (mostly from PMGI chunks, which should have been 
 ignored.)
 2. Invalid entry (usually with unreadable entry name).
 3. Missed entries (some times it is like TIKA-1176)
 I have fixed it (to some degree), by using the PMGL header to find dir chunks 
 and their respective meaningful parts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)