RE: Detection problem: Parsing scientific source codes for geoscientists

Oh, Ji-Hyun (329F-Affiliate) Wed, 22 Apr 2015 12:32:38 -0700

Hi Lewis,

Thank you for the help :)
I will try the fortran-parser in source forge to see how they work. 
But as Nick pointed out, I might also modify sourceCodeParser for our purpose?


Ji-Hyun
________________________________________
From: Lewis John Mcgibbney [[email protected]]
Sent: Tuesday, April 21, 2015 4:26 PM
To: [email protected]
Subject: Re: Detection problem: Parsing scientific source codes for 
geoscientists

Hi Ji-Hyun,

On Tue, Apr 21, 2015 at 4:15 PM, <[email protected]> wrote:

>
> FORTRAN (.f, .f90, f77)
> Python (.py)
> R (.R)
> Matlab (.m)
> GrADS (Grid Analysis and Display System)
> (.gs)
> NCL (NCAR Command Language) (.ncl)
> IDL (Interactive Data Language) (.pro)
>

NICE list


>
> I checked Fortran and Matlab are included in tike-mimetypes.xml, but when
> I used Tika to obtain content type of the files (with suffix .f, f90, .m),
> but Tika detected these files as text/plain:
>
> ohjihyun% tika -m spctime.f
>
> Content-Encoding: ISO-8859-1
> Content-Length: 16613
> Content-Type: text/plain; charset=ISO-8859-1
> X-Parsed-By: org.apache.tika.parser.DefaultParser
> X-Parsed-By: org.apache.tika.parser.txt.TXTParser
> resourceName: spctime.f
>
>
[SNIP]


> Should I build a parser for each file format to get an exact content-type,
> as Java has SourceCodeParser?


As far as I know we have no parser for Fortran documents.
You could try using the following Java project
http://sourceforge.net/projects/fortran-parser/
It is dual licensed under Eclipse and BSD licenses.
Hope this helps.
Lewis

RE: Detection problem: Parsing scientific source codes for geoscientists

Reply via email to