[jira] [Commented] (TIKA-1224) Adding Source code (Java, Groovy, C) parser
[ https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877343#comment-13877343 ] Hong-Thai Nguyen commented on TIKA-1224: I agree that parsing deeply each language is not simple. This work (already done) is just providing HTML format of source languages and some metadata possible (as author, version ...) extracting from javadoc comment and probably interesting others as LoC. When we need more detailed result on a language, we must implement a dedicated parser. This parser is useful in search application. Adding Source code (Java, Groovy, C) parser --- Key: TIKA-1224 URL: https://issues.apache.org/jira/browse/TIKA-1224 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.5 Reporter: Hong-Thai Nguyen Priority: Minor We can parser some source code file formats: text/x-java-source text/x-groovy text/x-c for HTML rendering from code, we can use jhightlight: http://www.ohloh.net/p/jhighlight -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TIKA-1198) Consider optionally utilizing CXF JAX-RS Attachment support
[ https://issues.apache.org/jira/browse/TIKA-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877382#comment-13877382 ] Sergey Beryozkin commented on TIKA-1198: Hi Dave, yes, I agree, All methods accepting multipart/form-data now have /form Path qualifiers Please try the snapshots/trunk Cheers, Sergey Consider optionally utilizing CXF JAX-RS Attachment support --- Key: TIKA-1198 URL: https://issues.apache.org/jira/browse/TIKA-1198 Project: Tika Issue Type: Wish Components: server Reporter: Sergey Beryozkin Priority: Minor CXF offers a fairly extensive support for multiparts: http://cxf.apache.org/docs/jax-rs-multiparts.html Perhaps some of that can help with the server offering more options to do with uploading/downloading files -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (TIKA-1225) MDI files detection
Marco Quaranta created TIKA-1225: Summary: MDI files detection Key: TIKA-1225 URL: https://issues.apache.org/jira/browse/TIKA-1225 Project: Tika Issue Type: Improvement Components: detector, mime Affects Versions: 1.4 Reporter: Marco Quaranta Priority: Minor As stated by IANA, Microsoft Document Imaging magic number is 0x45502A00: http://www.iana.org/assignments/media-types/image/vnd.ms-modi Please add the following magic number to tika registry: {noformat} mime-type type=image/vnd.ms-modi glob pattern=*.mdi/ _commentMicrosoft Document Imaging/_comment magic priority=50 match value=0x45502A00 type=string offset=0/ /magic /mime-type {noformat} Thank you, Marco -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TIKA-1198) Consider optionally utilizing CXF JAX-RS Attachment support
[ https://issues.apache.org/jira/browse/TIKA-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877646#comment-13877646 ] Sergey Beryozkin commented on TIKA-1198: We've got an early agreement that it makes sense to sort out the issue of defaulting Content-Type to application/octet-stream earlier than is currently suggested. I can fix it in CXF right now but that will get it a bit 'exposed' to TCK test restrictions if JAX-RS 2.1 won't actually get it fixed. As such I think we can indeed settle on supporting a unique path for multipart/form-data payloads to support the cases where the client does not provide a content-type Cheers, Sergey Consider optionally utilizing CXF JAX-RS Attachment support --- Key: TIKA-1198 URL: https://issues.apache.org/jira/browse/TIKA-1198 Project: Tika Issue Type: Wish Components: server Reporter: Sergey Beryozkin Priority: Minor CXF offers a fairly extensive support for multiparts: http://cxf.apache.org/docs/jax-rs-multiparts.html Perhaps some of that can help with the server offering more options to do with uploading/downloading files -- This message was sent by Atlassian JIRA (v6.1.5#6160)