Thanks for sharing your work Ji-Hyun. Glad, Ken, Lewis and Nick
have replied. Thanks!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Ken Krugler <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Wednesday, April 22, 2015 at 10:38 AM
To: "[email protected]" <[email protected]>
Subject: RE: Detection problem: Parsing scientific source codes for
geoscientists

>I'm looking into whether detection & parsing code from a previous project
>could be open-sourced.
>
>If that happened, we'd get support for many, many languages - though not
>GrADS or NCAR.
>
>But the infrastructure would be there to easily add support for any
>missing languages.
>
>-- Ken
>
>> From: Oh, Ji-Hyun (329F-Affiliate)
>> Sent: April 21, 2015 10:54:16am PDT
>> To: [email protected]
>> Subject: Detection problem: Parsing scientific source codes for
>>geoscientists
>> 
>> Hi Tika friends,
>> 
>> I am currently engaged in a project funded by National Science
>>Foundation. Our goal is to develop a research-friendly environment where
>>geoscientists, like me, can easily find source codes they need.
>>According to a survey, scientists spend a considerable amount of their
>>time in processing data instead of doing actual science. Based on my
>>experience as a climate scientist, there exist most frequently/typically
>>used analysis tools in atmospheric science. Therefore, it could be
>>helpful if these tools can be easily shared among scientists. The thing
>>is that the tools are written in various scientific languages, so we are
>>trying to provide the metadata of source codes stored in public
>>repositories to help scientists select source code for their own usages.
>> 
>> For the first step, I listed up the file formats that widely used in
>>climate science.
>> 
>> FORTRAN (.f, .f90, f77)
>> Python (.py)
>> R (.R)
>> Matlab (.m)
>> GrADS (Grid Analysis and Display System)
>> (.gs)
>> NCL (NCAR Command Language) (.ncl)
>> IDL (Interactive Data Language) (.pro)
>> 
>> I checked Fortran and Matlab are included in tike-mimetypes.xml, but
>>when I used Tika to obtain content type of the files (with suffix .f,
>>f90, .m), but Tika detected these files as text/plain:
>> 
>> ohjihyun% tika -m spctime.f
>> 
>> Content-Encoding: ISO-8859-1
>> Content-Length: 16613
>> Content-Type: text/plain; charset=ISO-8859-1
>> X-Parsed-By: org.apache.tika.parser.DefaultParser
>> X-Parsed-By: org.apache.tika.parser.txt.TXTParser
>> resourceName: spctime.f
>> 
>> ohjihyun% tika -m wavelet.m
>> Content-Encoding: ISO-8859-1
>> Content-Length: 5868
>> Content-Type: text/plain; charset=ISO-8859-1
>> X-Parsed-By: org.apache.tika.parser.DefaultParser
>> X-Parsed-By: org.apache.tika.parser.txt.TXTParser
>> resourceName: wavelet.m
>> 
>> I checked Tika can give correct content type (text/x-java-source) for
>>Java file as:
>> ohjihyun% tika -m UrlParser.java
>> Content-Encoding: ISO-8859-1
>> Content-Length: 2178
>> Content-Type: text/x-java-source
>> LoC: 70
>> X-Parsed-By: org.apache.tika.parser.DefaultParser
>> X-Parsed-By: org.apache.tika.parser.code.SourceCodeParser
>> resourceName: UrlParser.java
>> 
>> Should I build a parser for each file format to get an exact
>>content-type, as Java has SourceCodeParser?
>> Thank you in advance for your insightful comments.
>> 
>> Ji-Hyun
>
>--------------------------
>Ken Krugler
>+1 530-210-6378
>http://www.scaleunlimited.com
>custom big data solutions & training
>Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>--------------------------
>Ken Krugler
>+1 530-210-6378
>http://www.scaleunlimited.com
>custom big data solutions & training
>Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>

Reply via email to