Yes! I've also explored their website, and tried to search a source code (http://opensearch.krugle.org).. great!
________________________________________ From: Mattmann, Chris A (3980) [[email protected]] Sent: Wednesday, April 22, 2015 1:18 PM To: [email protected] Subject: Re: Detection problem: Parsing scientific source codes for geoscientists Wow Ken that would be stellar. Ji-Hyun and I are doing this work as part of the NSF EarthCube project, working with Yolanda Gil at USC/ISI: http://geosoft-earthcube.org/ Our part is Tika + Nutch + Solr over Github and geociences software. The purpose of Ji-Hyun’s postdoc is to work in that area so if Krugle would be willing to do that, it would be awesomeness. Cheers mate. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Ken Krugler <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Wednesday, April 22, 2015 at 3:58 PM To: "[email protected]" <[email protected]> Subject: RE: Detection problem: Parsing scientific source codes for geoscientists > >> From: Oh, Ji-Hyun (329F-Affiliate) >> Sent: April 22, 2015 12:36:28pm PDT >> To: [email protected] >> Subject: RE: Detection problem: Parsing scientific source codes for >>geoscientists >> >> Hi Ken, >> Thank you very much for your comment. >> Could you inform me what kind of previous project are you looking into? > >It's the Krugle code search product. > >Being sold as enterprise software, but they might be willing to open >source the parsing code. > >-- Ken > > >> ________________________________________ >> From: Ken Krugler [[email protected]] >> Sent: Wednesday, April 22, 2015 7:38 AM >> To: [email protected] >> Subject: RE: Detection problem: Parsing scientific source codes for >>geoscientists >> >> I'm looking into whether detection & parsing code from a previous >>project could be open-sourced. >> >> If that happened, we'd get support for many, many languages - though >>not GrADS or NCAR. >> >> But the infrastructure would be there to easily add support for any >>missing languages. >> >> -- Ken >> >>> From: Oh, Ji-Hyun (329F-Affiliate) >>> Sent: April 21, 2015 10:54:16am PDT >>> To: [email protected] >>> Subject: Detection problem: Parsing scientific source codes for >>>geoscientists >>> >>> Hi Tika friends, >>> >>> I am currently engaged in a project funded by National Science >>>Foundation. Our goal is to develop a research-friendly environment >>>where geoscientists, like me, can easily find source codes they need. >>>According to a survey, scientists spend a considerable amount of their >>>time in processing data instead of doing actual science. Based on my >>>experience as a climate scientist, there exist most >>>frequently/typically used analysis tools in atmospheric science. >>>Therefore, it could be helpful if these tools can be easily shared >>>among scientists. The thing is that the tools are written in various >>>scientific languages, so we are trying to provide the metadata of >>>source codes stored in public repositories to help scientists select >>>source code for their own usages. >>> >>> For the first step, I listed up the file formats that widely used in >>>climate science. >>> >>> FORTRAN (.f, .f90, f77) >>> Python (.py) >>> R (.R) >>> Matlab (.m) >>> GrADS (Grid Analysis and Display System) >>> (.gs) >>> NCL (NCAR Command Language) (.ncl) >>> IDL (Interactive Data Language) (.pro) >>> >>> I checked Fortran and Matlab are included in tike-mimetypes.xml, but >>>when I used Tika to obtain content type of the files (with suffix .f, >>>f90, .m), but Tika detected these files as text/plain: >>> >>> ohjihyun% tika -m spctime.f >>> >>> Content-Encoding: ISO-8859-1 >>> Content-Length: 16613 >>> Content-Type: text/plain; charset=ISO-8859-1 >>> X-Parsed-By: org.apache.tika.parser.DefaultParser >>> X-Parsed-By: org.apache.tika.parser.txt.TXTParser >>> resourceName: spctime.f >>> >>> ohjihyun% tika -m wavelet.m >>> Content-Encoding: ISO-8859-1 >>> Content-Length: 5868 >>> Content-Type: text/plain; charset=ISO-8859-1 >>> X-Parsed-By: org.apache.tika.parser.DefaultParser >>> X-Parsed-By: org.apache.tika.parser.txt.TXTParser >>> resourceName: wavelet.m >>> >>> I checked Tika can give correct content type (text/x-java-source) for >>>Java file as: >>> ohjihyun% tika -m UrlParser.java >>> Content-Encoding: ISO-8859-1 >>> Content-Length: 2178 >>> Content-Type: text/x-java-source >>> LoC: 70 >>> X-Parsed-By: org.apache.tika.parser.DefaultParser >>> X-Parsed-By: org.apache.tika.parser.code.SourceCodeParser >>> resourceName: UrlParser.java >>> >>> Should I build a parser for each file format to get an exact >>>content-type, as Java has SourceCodeParser? >>> Thank you in advance for your insightful comments. >>> >>> Ji-Hyun > >-------------------------- >Ken Krugler >+1 530-210-6378 >http://www.scaleunlimited.com >custom big data solutions & training >Hadoop, Cascading, Cassandra & Solr > > > > >
