[ https://issues.apache.org/jira/browse/TIKA-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336150#comment-14336150 ]
Chris A. Mattmann commented on TIKA-1483: ----------------------------------------- That fixed it [~gostep]! Thanks all tests are passing for me. +1 to commit this. If there are no objections in the next 24 hours I'll commit it. {noformat} [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Tika parent ................................. SUCCESS [ 1.821 s] [INFO] Apache Tika core ................................... SUCCESS [ 21.645 s] [INFO] Apache Tika parsers ................................ SUCCESS [02:06 min] [INFO] Apache Tika XMP .................................... SUCCESS [ 2.072 s] [INFO] Apache Tika serialization .......................... SUCCESS [ 2.382 s] [INFO] Apache Tika application ............................ SUCCESS [ 14.697 s] [INFO] Apache Tika OSGi bundle ............................ SUCCESS [ 17.896 s] [INFO] Apache Tika server ................................. SUCCESS [ 21.473 s] [INFO] Apache Tika translate .............................. SUCCESS [ 2.746 s] [INFO] Apache Tika examples ............................... SUCCESS [ 5.429 s] [INFO] Apache Tika Java-7 Components ...................... SUCCESS [ 2.680 s] [INFO] Apache Tika ........................................ SUCCESS [ 0.038 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 03:39 min [INFO] Finished at: 2015-02-24T23:32:12-08:00 [INFO] Final Memory: 100M/1653M [INFO] ------------------------------------------------------------------------ [chipotle:~/tmp/tika] mattmann% {noformat} > Create a general raw string parser > ---------------------------------- > > Key: TIKA-1483 > URL: https://issues.apache.org/jira/browse/TIKA-1483 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 1.6 > Reporter: Luis Filipe Nassif > Attachments: TIKA-1483.patch, TIKA-1483_v2.patch > > > I think it can be very useful adding a general parser able to extract raw > strings from files (like the strings command), which can be used as the > fallback parser for all mimetypes not having a specific parser > implementation, like application/octet-stream. It can also be used as a > fallback for corrupt files throwing a TikaException. > It must be configured with the script/language to be extracted from the files > (currently I implemented one specific for Latin1). > It can use heuristics to extract strings encoded with different charsets > within the same file, mainly the common ISO-8859-1, UTF8 and UTF16. > What the community thinks about that? -- This message was sent by Atlassian JIRA (v6.3.4#6332)