[ 
https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877343#comment-13877343
 ] 

Hong-Thai Nguyen commented on TIKA-1224:
----------------------------------------

I agree that parsing deeply each language is not simple. This work (already 
done) is just providing HTML format of source languages and some metadata 
possible (as author, version ...) extracting from javadoc comment and probably 
interesting others as LoC. When we need more detailed result on a language, we 
must implement a dedicated parser.
This parser is useful in search application.

> Adding Source code (Java, Groovy, C) parser
> -------------------------------------------
>
>                 Key: TIKA-1224
>                 URL: https://issues.apache.org/jira/browse/TIKA-1224
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: Hong-Thai Nguyen
>            Priority: Minor
>
> We can parser some source code file formats:
> text/x-java-source
> text/x-groovy
> text/x-c
> for HTML rendering from code, we can use jhightlight: 
> http://www.ohloh.net/p/jhighlight



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to