[
https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979379#comment-13979379
]
Benoit Moreau commented on TIKA-1224:
-------------------------------------
In debug, Tika uses org.apache.tika.SourceCodeParser with "x-java-source"
mime-type. It removes all end of lines (why?, mistake? readLine() doesn't
return \n or/and \r), then gives the result to JHightlight. JHightlight result
(entire html) is used as argument of characters() method of ContentHandler.
I just start with Tika, but I don't think that is good.
> Adding Source code (Java, Groovy, C) parser
> -------------------------------------------
>
> Key: TIKA-1224
> URL: https://issues.apache.org/jira/browse/TIKA-1224
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.5
> Reporter: Hong-Thai Nguyen
> Priority: Minor
>
> We can parser some source code file formats:
> text/x-java-source
> text/x-groovy
> text/x-c
> for HTML rendering from code, we can use jhightlight:
> http://www.ohloh.net/p/jhighlight
--
This message was sent by Atlassian JIRA
(v6.2#6252)