[jira] [Commented] (TIKA-1224) Adding Source code (Java, Groovy, C) parser

2014-01-21 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877343#comment-13877343
 ] 

Hong-Thai Nguyen commented on TIKA-1224:


I agree that parsing deeply each language is not simple. This work (already 
done) is just providing HTML format of source languages and some metadata 
possible (as author, version ...) extracting from javadoc comment and probably 
interesting others as LoC. When we need more detailed result on a language, we 
must implement a dedicated parser.
This parser is useful in search application.

 Adding Source code (Java, Groovy, C) parser
 ---

 Key: TIKA-1224
 URL: https://issues.apache.org/jira/browse/TIKA-1224
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.5
Reporter: Hong-Thai Nguyen
Priority: Minor

 We can parser some source code file formats:
 text/x-java-source
 text/x-groovy
 text/x-c
 for HTML rendering from code, we can use jhightlight: 
 http://www.ohloh.net/p/jhighlight



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (TIKA-1198) Consider optionally utilizing CXF JAX-RS Attachment support

2014-01-21 Thread Sergey Beryozkin (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877382#comment-13877382
 ] 

Sergey Beryozkin commented on TIKA-1198:


Hi Dave, yes, I agree, 
All methods accepting multipart/form-data now have /form Path qualifiers
Please try the snapshots/trunk

Cheers, Sergey  

 Consider optionally utilizing CXF JAX-RS Attachment support
 ---

 Key: TIKA-1198
 URL: https://issues.apache.org/jira/browse/TIKA-1198
 Project: Tika
  Issue Type: Wish
  Components: server
Reporter: Sergey Beryozkin
Priority: Minor

 CXF offers a fairly extensive support for multiparts:
 http://cxf.apache.org/docs/jax-rs-multiparts.html
 Perhaps some of that can help with the server offering more options to do 
 with uploading/downloading files



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (TIKA-1225) MDI files detection

2014-01-21 Thread Marco Quaranta (JIRA)
Marco Quaranta created TIKA-1225:


 Summary: MDI files detection
 Key: TIKA-1225
 URL: https://issues.apache.org/jira/browse/TIKA-1225
 Project: Tika
  Issue Type: Improvement
  Components: detector, mime
Affects Versions: 1.4
Reporter: Marco Quaranta
Priority: Minor


As stated by IANA, Microsoft Document Imaging magic number is 0x45502A00: 
http://www.iana.org/assignments/media-types/image/vnd.ms-modi 
Please add the following magic number to tika registry:
{noformat}
 mime-type type=image/vnd.ms-modi
glob pattern=*.mdi/
_commentMicrosoft Document Imaging/_comment
magic priority=50
  match value=0x45502A00 type=string offset=0/
/magic
  /mime-type
{noformat}

Thank you,
Marco



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (TIKA-1198) Consider optionally utilizing CXF JAX-RS Attachment support

2014-01-21 Thread Sergey Beryozkin (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877646#comment-13877646
 ] 

Sergey Beryozkin commented on TIKA-1198:


We've got an early agreement that it makes sense to sort out the issue of 
defaulting Content-Type to application/octet-stream earlier than is currently 
suggested. I can fix it in CXF right now but that will get it a bit 'exposed' 
to TCK test restrictions if JAX-RS 2.1 won't actually get it fixed. As such I 
think we can indeed settle on supporting a unique path for multipart/form-data 
payloads to support the cases where the client does not provide a content-type

Cheers, Sergey

 Consider optionally utilizing CXF JAX-RS Attachment support
 ---

 Key: TIKA-1198
 URL: https://issues.apache.org/jira/browse/TIKA-1198
 Project: Tika
  Issue Type: Wish
  Components: server
Reporter: Sergey Beryozkin
Priority: Minor

 CXF offers a fairly extensive support for multiparts:
 http://cxf.apache.org/docs/jax-rs-multiparts.html
 Perhaps some of that can help with the server offering more options to do 
 with uploading/downloading files



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)