[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522953#comment-14522953 ] Hudson commented on TIKA-443: - SUCCESS: Integrated in tika-trunk-jdk1.7 #657 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/657/]) Fix for TIKA-443 Geographic Information Parser contributed by unknown this closes #47. (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1677100) * /tika/trunk/CHANGES.txt * /tika/trunk/tika-bundle/pom.xml * /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml * /tika/trunk/tika-parsers/pom.xml * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geoinfo * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geoinfo/GeographicInformationParser.java * /tika/trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/geoinfo * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/geoinfo/GeographicInformationParserTest.java * /tika/trunk/tika-parsers/src/test/resources/test-documents/sampleFile.iso19139 > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Labels: memex, new-parser > Fix For: 1.9 > > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522911#comment-14522911 ] Gautham Gowrishankar commented on TIKA-443: --- Your Welcome Professor Mattmann :) > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Labels: memex, new-parser > Fix For: 1.9 > > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522907#comment-14522907 ] ASF GitHub Bot commented on TIKA-443: - Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/47 > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Labels: new-parser > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522905#comment-14522905 ] Chris A. Mattmann commented on TIKA-443: I fixed it by moving the extractContent function after the metadata extraction happens first. {noformat} [mattmann-0420740:~/tmp/tika1.9] mattmann% java -jar tika-app/target/tika-app-1.9-SNAPSHOT.jar -m tika-parsers/src/test/resources/test-documents/sampleFile.iso19139 May 01, 2015 12:57:16 AM org.apache.sis.internal.jaxb.gml.TM_Primitive setTimePeriod WARNING: This operation requires the “sis-temporal” module. AccessContraints : OTHER_RESTRICTIONS CharacterSet: UTF-8 CitationDate : CREATION-->Mon Dec 16 00:00:00 PST 2013 CitationDate : modified-->Wed Mar 11 00:00:00 PDT 2015 CitedResponsiblePartyEMail : holli...@gvsu.edu CitedResponsiblePartyName : Robert Hollister CitedResponsiblePartyName : Robert Hollister CitedResponsiblePartyRole : Role[POINT_OF_CONTACT] CitedResponsiblePartyRole : Role[AUTHOR] ContactPartyName-: UCAR/NCAR - CISL - ACADIS ContactRole: RESOURCE_PROVIDER Content-Length: 19370 Content-Type: text/iso19139+xml DateInfo : CREATION Mon Dec 16 05:26:08 PST 2013 DistributionFormatSpecificationAlternativeTitle : Other ASCII Distributor Contact : RESOURCE_PROVIDER Distributor Organization Name : UCAR/NCAR - CISL - ACADIS GeographicIdentifierAuthorityAlternativeTitle : Locations GeographicIdentifierAuthorityDate : REVISION Thu Aug 28 00:00:00 PDT 2014 GeographicIdentifierAuthorityTitle : NASA/GCMD Earth Science Keywords GeographicIdentifierCode : UNITED STATES OF AMERICA > ALASKA IdentificationInfoAbstract : These files contain data representing the periodic plant measures of species within each plot in a text tab delimited format. The data presented are seasonal growth of graminoids (length of leaf and length of inflorescence) and seasonal flowering of all species (number of inflorescences in flower within a plot), collected weekly during the summers of 2012-20XX for a subset of 30 grid plots at two sites (Barrow ARCSS grid and Atqasuk ARCSS grid). IdentificationInfoCitationTitle : Barrow Atqasuk ARCSS Plant IdentificationInfoLanguage-->: English IdentificationInfoStatus : ON_GOING IdentificationInfoTopicCategory-->: BIOTA Keywords 2: EARTH SCIENCE > BIOSPHERE > TERRESTRIAL ECOSYSTEMS > ALPINE/TUNDRA Keywords 3: FIELD SURVEY Keywords 4: POINT Keywords 5: LESS THAN 1 METER Keywords 6: DAILY TO WEEKLY KeywordsType 2: THEME KeywordsType 3: THEME KeywordsType 4: THEME KeywordsType 5: THEME KeywordsType 6: THEME MetaDataIdentifierCode: urn:x-wmo:md:org.aoncadis.www::4c1a919d-6690-11e3-9147-00c0f03d5b7c MetaDataResourceScope : DATASET MetaDataStandardEdition : ISO 19115:2003(E) MetaDataStandardTitle : ISO 19115 Geographic information - Metadata OtherConstraints : Access Constraints: No Access Constraints. Use Constraints: No Use Constraints. ParentMetaDataTitle: urn:x-wmo:md:org.aoncadis.www::d2e4e808-6830-11df-abb3-00c0f03d5b7c ResourceFormatSpecificationAlternativeTitle : Other ASCII ThesaurusNameAlternativeTitle 2: [Science and Services Keywords] ThesaurusNameAlternativeTitle 3: [Platforms] ThesaurusNameAlternativeTitle 4: [Spatial Data Type] ThesaurusNameAlternativeTitle 5: [Horizontal Data Resolution] ThesaurusNameAlternativeTitle 6: [Temporal Data Resolution] ThesaurusNameDate : REVISION-->Wed May 21 00:00:00 PDT 2014 ThesaurusNameDate : REVISION-->Tue Oct 07 00:00:00 PDT 2014 ThesaurusNameDate : REVISION-->Tue Oct 07 00:00:00 PDT 2014 ThesaurusNameDate : REVISION-->Wed May 21 00:00:00 PDT 2014 ThesaurusNameDate : REVISION-->Wed May 21 00:00:00 PDT 2014 ThesaurusNameTitle 2: NASA/GCMD Earth Science Keywords ThesaurusNameTitle 3: ACADIS Keywords ThesaurusNameTitle 4: ACADIS Keywords ThesaurusNameTitle 5: NASA/GCMD Earth Science Keywords ThesaurusNameTitle 6: NASA/GCMD Earth Science Keywords TransferOptionsOnlineDescription : Metadata Link TransferOptionsOnlineFunction : DOWNLOAD TransferOptionsOnlineLinkage : https://www.aoncadis.org/dataset/id/4c1a919d-6690-11e3-9147-00c0f03d5b7c.html TransferOptionsOnlineName : Barrow Atqasuk ARCSS Plant TransferOptionsOnlineProfile : browser TransferOptionsOnlineProtocol : https UserConstraints : OTHER_RESTRICTIONS X-Parsed-By: org.apache.tika.parser.DefaultParser X-Parsed-By: org.apache.tika.parser.geoinfo.GeographicInformationParser resourceName: sampleFile.iso19139 [mattmann-0420740:~/tmp/tika1.9] mattmann% {noformat} Works great! Committing. > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Labels: new-parser > Attachments: getFDOMetadata.xml > > > I'm working in the automatic d
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522896#comment-14522896 ] Chris A. Mattmann commented on TIKA-443: OK after combining with my patch, we have success! {noformat} [INFO] Skipping execution for packaging "pom" [INFO] [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika --- [INFO] Skipping execution for packaging "pom" [INFO] [INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ tika --- [INFO] [INFO] --- maven-install-plugin:2.3.1:install (default-install) @ tika --- [INFO] Installing /Users/mattmann/tmp/tika1.9/pom.xml to /Users/mattmann/.m2/repository/org/apache/tika/tika/1.9-SNAPSHOT/tika-1.9-SNAPSHOT.pom [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Tika parent . SUCCESS [ 1.500 s] [INFO] Apache Tika core ... SUCCESS [ 19.538 s] [INFO] Apache Tika parsers SUCCESS [02:17 min] [INFO] Apache Tika XMP SUCCESS [ 2.995 s] [INFO] Apache Tika serialization .. SUCCESS [ 2.224 s] [INFO] Apache Tika batch .. SUCCESS [01:58 min] [INFO] Apache Tika application SUCCESS [ 40.534 s] [INFO] Apache Tika OSGi bundle SUCCESS [ 22.864 s] [INFO] Apache Tika server . SUCCESS [ 21.619 s] [INFO] Apache Tika translate .. SUCCESS [ 3.870 s] [INFO] Apache Tika examples ... SUCCESS [ 5.872 s] [INFO] Apache Tika Java-7 Components .. SUCCESS [ 2.427 s] [INFO] Apache Tika SUCCESS [ 0.037 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 06:20 min [INFO] Finished at: 2015-05-01T00:39:38-07:00 [INFO] Final Memory: 109M/1658M [INFO] [mattmann-0420740:~/tmp/tika1.9] mattmann% {noformat} Ran a simple test too: h2. Detect {noformat} [mattmann-0420740:~/tmp/tika1.9] mattmann% java -jar tika-app/target/tika-app-1.9-SNAPSHOT.jar -d tika-parsers/src/test/resources/test-documents/sampleFile.iso19139 text/iso19139+xml [mattmann-0420740:~/tmp/tika1.9] mattmann% {noformat} h2. Parse Text {noformat} [mattmann-0420740:~/tmp/tika1.9] mattmann% java -jar tika-app/target/tika-app-1.9-SNAPSHOT.jar -t tika-parsers/src/test/resources/test-documents/sampleFile.iso19139 May 01, 2015 12:45:29 AM org.apache.sis.internal.jaxb.gml.TM_Primitive setTimePeriod WARNING: This operation requires the “sis-temporal” module. Barrow Atqasuk ARCSS Plant CitedResponsiblePartyRole Role[POINT_OF_CONTACT]CitedResponsiblePartyName Robert Hollister CitedResponsiblePartyRole Role[AUTHOR]CitedResponsiblePartyName Robert Hollister IdentificationInfoAbstract These files contain data representing the periodic plant measures of species within each plot in a text tab delimited format. The data presented are seasonal growth of graminoids (length of leaf and length of inflorescence) and seasonal flowering of all species (number of inflorescences in flower within a plot), collected weekly during the summers of 2012-20XX for a subset of 30 grid plots at two sites (Barrow ARCSS grid and Atqasuk ARCSS grid). GeographicElementWestBoundLatitude -157.24 GeographicElementEastBoundLatitude -156.4 GeographicElementNorthBoundLatitude 71.18 GeographicElementSouthBoundLatitude 70.27 [mattmann-0420740:~/tmp/tika1.9] mattmann% {noformat} h2. Parse Met {noformat} [mattmann-0420740:~/tmp/tika1.9] mattmann% java -jar tika-app/target/tika-app-1.9-SNAPSHOT.jar -m tika-parsers/src/test/resources/test-documents/sampleFile.iso19139 May 01, 2015 12:46:25 AM org.apache.sis.internal.jaxb.gml.TM_Primitive setTimePeriod WARNING: This operation requires the “sis-temporal” module. Content-Length: 19370 Content-Type: text/iso19139+xml X-Parsed-By: org.apache.tika.parser.DefaultParser X-Parsed-By: org.apache.tika.parser.geoinfo.GeographicInformationParser resourceName: sampleFile.iso19139 [mattmann-0420740:~/tmp/tika1.9] mattmann% {noformat} Something is weird here, met not getting added. Going to commit and investigate. > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann >
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522869#comment-14522869 ] Chris A. Mattmann commented on TIKA-443: OK I also had to grab: https://github.com/gautham4/GeographicDR/commit/e04a7824ab3d9fb8517479007b545d7e8fcee704.patch since the tika-bundle stuff I helped you with wasn't part of your PR. Re-testing. > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Labels: new-parser > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522853#comment-14522853 ] Chris A. Mattmann commented on TIKA-443: OK thanks [~gautham4] going to test this out now. > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Labels: new-parser > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522848#comment-14522848 ] ASF GitHub Bot commented on TIKA-443: - GitHub user gautham4 opened a pull request: https://github.com/apache/tika/pull/47 PULL REQUEST for TIKA-443 You can merge this pull request into a Git repository by running: $ git pull https://github.com/gautham4/tika TIKA-443 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/47.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #47 commit 6bfdbcd869455bbae7a4547b738e5a0b249053e8 Author: unknown Date: 2015-04-22T04:35:03Z fix for TIKA-443 contributed by gautham4 commit 66ba03ee85946d7babf9815b9734f0ee83b4767f Author: unknown Date: 2015-05-01T05:34:38Z fix for TIKA-443 contributed by gautham@gmail.com > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Labels: new-parser > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201379#comment-14201379 ] Martin Desruisseaux commented on TIKA-443: -- For Tika to ISO 19115, I see those choices: * Some core Tika classes could implement some {{org.opengis.metadata}} interfaces. For example if there is a Tika class somewhere which contains the (latitude, longitude) coordinates of a rectangle, that class could implement the {{GeographicBoundingBox}} interface. All the {{org.opengis.metadata}} interfaces follow ISO 19115 model, so this is not like a purely arbitrary API. * Alternatively, if Tika prefer to not modify their core classes, the data could be copied from the Tika class to a separated {{GeographicBoundingBox}} implementation just before marshalling. That separated implementation could be the SIS one or an other one if the Tika group prefer. However using the SIS one would avoid an other copy since SIS will need to copy the data into its own implementation before to marshall anyway (because of the way JAXB works). Once Tika has identified the information of interest to them ({{GeographicBoundingBox}}, maybe {{DataIdentification}}, etc.), those data needs to be put together into a {{org.opengis.metadata.Metadata}} implementation, which is usually the root of ISO 19115 hierarchy. Again it can be either a core SIS class implementing {{Metadata}}, or a separated implementation like the SIS one, at your choice. Once you have a {{Metadata}} instance, the easiest way to marshall it is using {{org.apache.sis.XML}}. This convenience class provides several {{marshal}} methods, so you can pick the most convenient. An easy one for testing purpose is: {code:java} System.out.println(XML.marshal(metadata)); {code} For the reverse operation (ISO 19115 to Tika), the starting point could be: {code:java} Metadata md = (Metadata) XML.unmarshal(inputStream); {code} but the next issue is to use that {{Metadata}} information. Again I see two choices: * Tika may copy the information into its own internal structure. * Or alternatively, some Tika API may be designed to accept {{Metadata}}, {{GeographicBoundingBox}}, etc. arguments. Again they are GeoAPI interfaces, so not necessarily SIS implementations. If Tika implemented those interfaces as a result of above discussion, the modified API would work with Tika classes. > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199006#comment-14199006 ] Chris A. Mattmann commented on TIKA-443: Thanks Martin. I think the use case here that would be great, would be something like: tika < geofile (e.g., ISO-19115) > Tika XHTML tika -m < geofile (e.g., ISO-19115) > ISO-19115 metadata Thoughts of easy ways of accomplishing the above? > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198059#comment-14198059 ] Martin Desruisseaux commented on TIKA-443: -- A note just in case: Tika does not need to have a strong dependency to SIS if you prefer to avoid it. The ISO 19115 metadata are defined by interfaces in a separated JAR file, ({{geoapi-3.0.0.jar}}), which is in turn implemented by SIS. But the Tika project could decide to implement itself a subset of the interface considered most pertinent to Tika needs (e.g. {{GeographicBoundingBox}}, {{DataIdentification}}, etc.), which should allow Tika to switch between its own implementation as SIS implementation transparently. For example Tika could have basic geographic information support as a standalone application, and delegate to SIS only for more advanced needs if the user wish. I'm just mentioning that as one possible strategy. > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182462#comment-14182462 ] Chris A. Mattmann commented on TIKA-443: Guys, I wonder if we should (now 4 years later) standardize on Apache SIS (http://sis.apache.org/) and incorporate its support for parsing ISO19115 metadata. It seems to have the same types of properties that FDO metadata XML has. I'm going to give a whirl at creating a GeoParser that extracts information from ISO 19115 XML files. [~desruisseaux] FYI [~adamestrada] FYI. > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883822#action_12883822 ] Arturo Beltran commented on TIKA-443: - As I commented in the issue TIKA-445, after a few days off I found a pleasant surprise. Good job. Greetings and thanks for your work > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883196#action_12883196 ] Nick Burch commented on TIKA-443: - I've opened TIKA-445 and uploaded a first stab at a patch to implement it. Feedback appreciated! > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883182#action_12883182 ] Chris A. Mattmann commented on TIKA-443: Hey Nick, Yep +1 on having the new namespace called "Geographic" with the given 2 fields as a starting point. We should probably track it and commit in a new issue. Thanks for your thoughts on this! Cheers, Chris > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883171#action_12883171 ] Nick Burch commented on TIKA-443: - I was thinking that making sure you put in the right matching pairs, and remove them again is a little fiddly, but that's nothing that a little wrapper library wouldn't fix for you. With that in mind, I think your proposed solution is likely to be much better than changing tika to support composite values, with the problems that that would bring Any objections to creating a new Metadata keyspace of Geographic, with to start with LATITUDE = geo:latitude & LONGITUDE = geo:longitude ? I can think of a few others we might want in future (height, bearing etc), which makes me think its own space might make sense > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883159#action_12883159 ] Chris A. Mattmann commented on TIKA-443: Hey Nick, I think we need to support both cases (single lat/lon per document as well as many lat/lon pairs per document). In the case of the former, it's easy, we have: key: Metadata.LATITUDE val: some lat key: Metadata.LONGITUDE val: some lon And, in the case of the latter, we have: key: Metadata.LATITUDE val: some lat, some lat2, some lat3, some lat n... key: Metadata.LONGITUDE val: some lon, some lon2, some lon3, some lon n... Because the keys are ordered in the Metadata object, I think that we can make sure they match up and treat single points the same as for multiple points. It's great to have support for both on a per Metadata object basis too since many scientific data formats have both scenarios in them (e.g., NetCDF and HDF typically have arrays of lats and lons, and sometimes, singe point values as well). The reason we need to support both is that distance computation (point/radius, bounding box, and polygon) would require both scenarios to be supported. I've been thinking that once this work is prototyped, to integrate Tika with the work in SIS to build out a computational spatial library. I think Tika could be used to feed in lats/lons into SIS. Cheers, Chris > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883153#action_12883153 ] Nick Burch commented on TIKA-443: - I was wondering about extracting geo data from jpeg exif tags. For this, we'd probably want dedicated metadata properties for lat and long (Other files can have a single lat+long in them too, eg html pages with the icbm meta tags) Not sure how well that might integrate with this work though, since shapefiles will typically contain a large number lats+longs (or similar geographic points) Anyone have any ideas about a single created-at position vs stream of locations from geo formats? > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881146#action_12881146 ] Arturo Beltran commented on TIKA-443: - I'm not convinced about using OGDI. From what I understand from reading the documentation, OGDI offers an API in C, so we encounter the same problem to integrate it with Java. In addition, the project is not updated since 2008, so new geographic formats are not supported (i.e: KML). Also, I think OGDI does not support databases or services. However, you can do some proof of concept to see if it would be very difficult to integrate with Java and see exactly what metadata can be extracted using OGDI. Then we can compare these results with mine and decide. As you can see, I've attached a sample XML file (getFDOMetadata.xml) that contains the information extracted of a SHP by my proof of concept server based on FDO. This is the result after a simple HTTP call (http://localhost:12345/getFDOMetadata?source=C:\ExampleData\shp_world_countries\country.shp&provider=SHP) For now, I'll keep trying to run muy "Hello world" Tika parser. Regards, Arturo > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880921#action_12880921 ] Mayank Singh commented on TIKA-443: --- Arturo I am not very comfortable with C++ and have no knowledge of .NET platform (I'm a Java guy) so my help in this matter will be very limited to you if you plan on using FDO. However, I was looking around for alternatives and found OGDI (http://ogdi.sourceforge.net/) which can act as a middle layer between various data sources and has almost the same capabilities of data dissemination over the network as FDO (more info here: http://www.gisdevelopment.net/technology/gis/techgi0057b.htm). So what I am suggesting is we look into it and once we get the heterogeneous data into the OGDI supported uniform data structure we can use Java to integrate it with Tika. I'll keep searching for more info. Do tell me your views on this Regards Mayank > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880844#action_12880844 ] Arturo Beltran commented on TIKA-443: - You are right Chris. Since now, I will try to keep the discussions on the list or here. I will try to explain in brief where exactly I'm working in order that you can get involved. The first piece is what allows us to access resources, we need a platform to access by the most homogenous way to heterogeneous resources. The best approach I've found has been FDO (http://fdo.osgeo.org/). In short, FDO is an API for manipulating, defining and analyzing geospatial information regardless of where it is stored. So it looks simple, I only have to integrate FDO as a Tika parser and I have it. The problem appeared when trying to connect this C++ API with Java. I have worked with SWIG and directly with JNI but I have not gotten it to work. Finally, temporary and to serve as a proof of concept, I implemented a simple HTTP server in .NET that offers resource descriptions using FDO. And now I'm trying to create a dummy parser for Tika to make calls to that server. I hope I explained well and that you could understand something, otherwise, feel free to ask again. Greetings and thanks for your interest: Arturo > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880836#action_12880836 ] Chris A. Mattmann commented on TIKA-443: Hi Guys, Thanks for the effort here. Please try hard to keep the discussions on list as the community will benefit from them and can help provide feedback incrementally. Thanks, Chris > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880762#action_12880762 ] Arturo Beltran commented on TIKA-443: - Hi all, I am pleased by the interest shown by the community on my proposal. As I said, any help is welcome. I have sent Mayank all the details about my work on this issue. If anyone else is interested in collaborating or simply provide their ideas/comments do not hesitate to contact me. Cheers, Arturo > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880722#action_12880722 ] Mayank Singh commented on TIKA-443: --- Hi Arturo I would like to collaborate on this issue. I have also sent you a mal regarding the same. Thanks and regards Mayank > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879788#action_12879788 ] Chris A. Mattmann commented on TIKA-443: Hi Arturo, Thanks for reporting this issue and it sounds awesome! I'm definitely interested in this topic and will be sure to help however I can. Cheers, Chris > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.