[ https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877343#comment-13877343 ]
Hong-Thai Nguyen commented on TIKA-1224: ---------------------------------------- I agree that parsing deeply each language is not simple. This work (already done) is just providing HTML format of source languages and some metadata possible (as author, version ...) extracting from javadoc comment and probably interesting others as LoC. When we need more detailed result on a language, we must implement a dedicated parser. This parser is useful in search application. > Adding Source code (Java, Groovy, C) parser > ------------------------------------------- > > Key: TIKA-1224 > URL: https://issues.apache.org/jira/browse/TIKA-1224 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.5 > Reporter: Hong-Thai Nguyen > Priority: Minor > > We can parser some source code file formats: > text/x-java-source > text/x-groovy > text/x-c > for HTML rendering from code, we can use jhightlight: > http://www.ohloh.net/p/jhighlight -- This message was sent by Atlassian JIRA (v6.1.5#6160)