Re: Detection problem: Parsing scientific source codes for geoscientists

Lewis John Mcgibbney Tue, 21 Apr 2015 16:28:39 -0700

Hi Ji-Hyun,

On Tue, Apr 21, 2015 at 4:15 PM, <dev-digest-h...@tika.apache.org> wrote:


>
> FORTRAN (.f, .f90, f77)
> Python (.py)
> R (.R)
> Matlab (.m)
> GrADS (Grid Analysis and Display System)
> (.gs)
> NCL (NCAR Command Language) (.ncl)
> IDL (Interactive Data Language) (.pro)
>

NICE list


>
> I checked Fortran and Matlab are included in tike-mimetypes.xml, but when
> I used Tika to obtain content type of the files (with suffix .f, f90, .m),
> but Tika detected these files as text/plain:
>
> ohjihyun% tika -m spctime.f
>
> Content-Encoding: ISO-8859-1
> Content-Length: 16613
> Content-Type: text/plain; charset=ISO-8859-1
> X-Parsed-By: org.apache.tika.parser.DefaultParser
> X-Parsed-By: org.apache.tika.parser.txt.TXTParser
> resourceName: spctime.f
>
>
[SNIP]


> Should I build a parser for each file format to get an exact content-type,
> as Java has SourceCodeParser?


As far as I know we have no parser for Fortran documents.
You could try using the following Java project
http://sourceforge.net/projects/fortran-parser/
It is dual licensed under Eclipse and BSD licenses.
Hope this helps.
Lewis

Re: Detection problem: Parsing scientific source codes for geoscientists

Reply via email to