[ 
https://issues.apache.org/jira/browse/TIKA-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146370#comment-17146370
 ] 

Lewis John McGibbney commented on TIKA-3113:
--------------------------------------------

After a wee bit of research I understand it to be a file created by TeX and 
LaTeX, which are typesetting standards often used to generate academic papers 
and other technical documentation; contains information about a document such 
as footnotes, bibliography entries, and cross-references.

_More Information_
AUX files are written when a .TEX file is typeset (formatted to an output 
document) by LaTeX. Since the generation of LaTeX documentation can take 
multiple passes before the document is complete (because of file and citation 
cross-referencing), the AUX file is used to store information between runs of 
the LaTeX compilation process.


Appears to be a temporary file... 


I haven't been able to find a Java parser for this family of data formats but I 
did find https://github.com/nzhagen/bibulous

> Currently Tika is detecting a .aux file as text/html
> ----------------------------------------------------
>
>                 Key: TIKA-3113
>                 URL: https://issues.apache.org/jira/browse/TIKA-3113
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.24
>            Reporter: Danny McKinney
>            Priority: Minor
>         Attachments: TES.PC.00010363.1.aux
>
>
> While processing files from an Enron test data set a file with extension aux 
> was detected to be MediaType of text/html. The file contains elements 
> <Header> and <Data> but is a type of LaTex file I believe. I am attachingĀ  
> sample file.[^TES.PC.00010363.1.aux]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to