[ 
https://issues.apache.org/jira/browse/TIKA-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946836#comment-13946836
 ] 

David Pilato commented on TIKA-1165:
------------------------------------

Sounds like I never answered to your comment! Shame on me! :(

In TIKA-1123 which is marked as fixed, we say that asciidoc should have 
{{text/x-asciidoc}} mimetype.
That's one of the reason I thought that autodetection was working for asciidoc.

About lib, it sounds like this one could help a lot here: 
https://github.com/asciidoctor/asciidoctorj#document-header

I did not check any further though.

Thanks!

> Autodetect and parse Asciidoc
> -----------------------------
>
>                 Key: TIKA-1165
>                 URL: https://issues.apache.org/jira/browse/TIKA-1165
>             Project: Tika
>          Issue Type: Wish
>          Components: languageidentifier, parser
>    Affects Versions: 1.4
>            Reporter: David Pilato
>            Priority: Trivial
>
> When parsing asciidoc metadata, we currently get the following:
> {noformat}
> Content-Encoding: ISO-8859-1
> Content-Length: 66363
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: asciidoc.adoc
> {noformat}
> Steps to reproduce:
> {code:title=asciidoc.sh|borderStyle=solid}
> curl 
> https://raw.github.com/asciidoctor/asciidoctor.org/master/docs/asciidoc-syntax-quick-reference.adoc
>  -O -s
> java -jar tika-app-1.4.jar -m asciidoc-syntax-quick-reference.adoc
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to