[jira] [Commented] (TIKA-443) Geographic Information Parser

2015-05-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522953#comment-14522953
 ] 

Hudson commented on TIKA-443:
-

SUCCESS: Integrated in tika-trunk-jdk1.7 #657 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/657/])
Fix for TIKA-443 Geographic Information Parser contributed by unknown 
 this closes #47. (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1677100)
* /tika/trunk/CHANGES.txt
* /tika/trunk/tika-bundle/pom.xml
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* /tika/trunk/tika-parsers/pom.xml
* /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geoinfo
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geoinfo/GeographicInformationParser.java
* 
/tika/trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/geoinfo
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/geoinfo/GeographicInformationParserTest.java
* /tika/trunk/tika-parsers/src/test/resources/test-documents/sampleFile.iso19139


> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
>  Labels: memex, new-parser
> Fix For: 1.9
>
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-443) Geographic Information Parser

2015-05-01 Thread Gautham Gowrishankar (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522911#comment-14522911
 ] 

Gautham Gowrishankar commented on TIKA-443:
---

Your Welcome Professor Mattmann :) 

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
>  Labels: memex, new-parser
> Fix For: 1.9
>
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-443) Geographic Information Parser

2015-05-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522907#comment-14522907
 ] 

ASF GitHub Bot commented on TIKA-443:
-

Github user asfgit closed the pull request at:

https://github.com/apache/tika/pull/47


> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
>  Labels: new-parser
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-443) Geographic Information Parser

2015-05-01 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522905#comment-14522905
 ] 

Chris A. Mattmann commented on TIKA-443:


I fixed it by moving the extractContent function after the metadata extraction 
happens first.

{noformat}
[mattmann-0420740:~/tmp/tika1.9] mattmann% java -jar 
tika-app/target/tika-app-1.9-SNAPSHOT.jar -m 
tika-parsers/src/test/resources/test-documents/sampleFile.iso19139
May 01, 2015 12:57:16 AM org.apache.sis.internal.jaxb.gml.TM_Primitive 
setTimePeriod
WARNING: This operation requires the “sis-temporal” module.
AccessContraints : OTHER_RESTRICTIONS
CharacterSet: UTF-8
CitationDate : CREATION-->Mon Dec 16 00:00:00 PST 2013
CitationDate : modified-->Wed Mar 11 00:00:00 PDT 2015
CitedResponsiblePartyEMail : holli...@gvsu.edu
CitedResponsiblePartyName : Robert Hollister
CitedResponsiblePartyName : Robert Hollister
CitedResponsiblePartyRole : Role[POINT_OF_CONTACT]
CitedResponsiblePartyRole : Role[AUTHOR]
ContactPartyName-: UCAR/NCAR - CISL - ACADIS
ContactRole: RESOURCE_PROVIDER
Content-Length: 19370
Content-Type: text/iso19139+xml
DateInfo : CREATION Mon Dec 16 05:26:08 PST 2013
DistributionFormatSpecificationAlternativeTitle : Other ASCII
Distributor Contact : RESOURCE_PROVIDER
Distributor Organization Name : UCAR/NCAR - CISL - ACADIS
GeographicIdentifierAuthorityAlternativeTitle : Locations
GeographicIdentifierAuthorityDate : REVISION Thu Aug 28 00:00:00 PDT 2014
GeographicIdentifierAuthorityTitle : NASA/GCMD Earth Science Keywords
GeographicIdentifierCode : UNITED STATES OF AMERICA > ALASKA
IdentificationInfoAbstract : These files contain data representing the periodic 
plant measures of species within each plot in a text tab delimited format. The 
data presented are seasonal growth of graminoids (length of leaf and length of 
inflorescence) and seasonal flowering of all species (number of inflorescences 
in flower within a plot), collected weekly during the summers of 2012-20XX for 
a subset of 30 grid plots at two sites (Barrow ARCSS grid and Atqasuk ARCSS 
grid).
IdentificationInfoCitationTitle : Barrow Atqasuk ARCSS Plant
IdentificationInfoLanguage-->: English
IdentificationInfoStatus : ON_GOING
IdentificationInfoTopicCategory-->: BIOTA
Keywords 2: EARTH SCIENCE > BIOSPHERE > TERRESTRIAL ECOSYSTEMS > ALPINE/TUNDRA
Keywords 3: FIELD SURVEY
Keywords 4: POINT
Keywords 5: LESS THAN 1 METER
Keywords 6: DAILY TO WEEKLY
KeywordsType 2: THEME
KeywordsType 3: THEME
KeywordsType 4: THEME
KeywordsType 5: THEME
KeywordsType 6: THEME
MetaDataIdentifierCode: 
urn:x-wmo:md:org.aoncadis.www::4c1a919d-6690-11e3-9147-00c0f03d5b7c
MetaDataResourceScope : DATASET
MetaDataStandardEdition : ISO 19115:2003(E)
MetaDataStandardTitle : ISO 19115 Geographic information - Metadata
OtherConstraints : Access Constraints: No Access Constraints. Use Constraints: 
No Use Constraints.
ParentMetaDataTitle: 
urn:x-wmo:md:org.aoncadis.www::d2e4e808-6830-11df-abb3-00c0f03d5b7c
ResourceFormatSpecificationAlternativeTitle : Other ASCII
ThesaurusNameAlternativeTitle 2: [Science and Services Keywords]
ThesaurusNameAlternativeTitle 3: [Platforms]
ThesaurusNameAlternativeTitle 4: [Spatial Data Type]
ThesaurusNameAlternativeTitle 5: [Horizontal Data Resolution]
ThesaurusNameAlternativeTitle 6: [Temporal Data Resolution]
ThesaurusNameDate : REVISION-->Wed May 21 00:00:00 PDT 2014
ThesaurusNameDate : REVISION-->Tue Oct 07 00:00:00 PDT 2014
ThesaurusNameDate : REVISION-->Tue Oct 07 00:00:00 PDT 2014
ThesaurusNameDate : REVISION-->Wed May 21 00:00:00 PDT 2014
ThesaurusNameDate : REVISION-->Wed May 21 00:00:00 PDT 2014
ThesaurusNameTitle 2: NASA/GCMD Earth Science Keywords
ThesaurusNameTitle 3: ACADIS Keywords
ThesaurusNameTitle 4: ACADIS Keywords
ThesaurusNameTitle 5: NASA/GCMD Earth Science Keywords
ThesaurusNameTitle 6: NASA/GCMD Earth Science Keywords
TransferOptionsOnlineDescription : Metadata Link
TransferOptionsOnlineFunction : DOWNLOAD
TransferOptionsOnlineLinkage : 
https://www.aoncadis.org/dataset/id/4c1a919d-6690-11e3-9147-00c0f03d5b7c.html
TransferOptionsOnlineName : Barrow Atqasuk ARCSS Plant
TransferOptionsOnlineProfile : browser
TransferOptionsOnlineProtocol : https
UserConstraints : OTHER_RESTRICTIONS
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.geoinfo.GeographicInformationParser
resourceName: sampleFile.iso19139
[mattmann-0420740:~/tmp/tika1.9] mattmann% 
{noformat}

Works great!
Committing. 

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
>  Labels: new-parser
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic d

[jira] [Commented] (TIKA-443) Geographic Information Parser

2015-05-01 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522896#comment-14522896
 ] 

Chris A. Mattmann commented on TIKA-443:


OK after combining with my patch, we have success!

{noformat}
[INFO] Skipping execution for packaging "pom"
[INFO] 
[INFO] --- forbiddenapis:1.7:testCheck (default) @ tika ---
[INFO] Skipping execution for packaging "pom"
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ tika 
---
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ tika ---
[INFO] Installing /Users/mattmann/tmp/tika1.9/pom.xml to 
/Users/mattmann/.m2/repository/org/apache/tika/tika/1.9-SNAPSHOT/tika-1.9-SNAPSHOT.pom
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Tika parent . SUCCESS [  1.500 s]
[INFO] Apache Tika core ... SUCCESS [ 19.538 s]
[INFO] Apache Tika parsers  SUCCESS [02:17 min]
[INFO] Apache Tika XMP  SUCCESS [  2.995 s]
[INFO] Apache Tika serialization .. SUCCESS [  2.224 s]
[INFO] Apache Tika batch .. SUCCESS [01:58 min]
[INFO] Apache Tika application  SUCCESS [ 40.534 s]
[INFO] Apache Tika OSGi bundle  SUCCESS [ 22.864 s]
[INFO] Apache Tika server . SUCCESS [ 21.619 s]
[INFO] Apache Tika translate .. SUCCESS [  3.870 s]
[INFO] Apache Tika examples ... SUCCESS [  5.872 s]
[INFO] Apache Tika Java-7 Components .. SUCCESS [  2.427 s]
[INFO] Apache Tika  SUCCESS [  0.037 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 06:20 min
[INFO] Finished at: 2015-05-01T00:39:38-07:00
[INFO] Final Memory: 109M/1658M
[INFO] 
[mattmann-0420740:~/tmp/tika1.9] mattmann% 
{noformat}

Ran a simple test too:

h2. Detect

{noformat}
[mattmann-0420740:~/tmp/tika1.9] mattmann% java -jar 
tika-app/target/tika-app-1.9-SNAPSHOT.jar -d 
tika-parsers/src/test/resources/test-documents/sampleFile.iso19139
text/iso19139+xml
[mattmann-0420740:~/tmp/tika1.9] mattmann% 
{noformat}

h2. Parse Text
{noformat}
[mattmann-0420740:~/tmp/tika1.9] mattmann% java -jar 
tika-app/target/tika-app-1.9-SNAPSHOT.jar -t 
tika-parsers/src/test/resources/test-documents/sampleFile.iso19139
May 01, 2015 12:45:29 AM org.apache.sis.internal.jaxb.gml.TM_Primitive 
setTimePeriod
WARNING: This operation requires the “sis-temporal” module.
Barrow Atqasuk ARCSS Plant


CitedResponsiblePartyRole Role[POINT_OF_CONTACT]CitedResponsiblePartyName 
Robert Hollister


CitedResponsiblePartyRole Role[AUTHOR]CitedResponsiblePartyName Robert Hollister


IdentificationInfoAbstract These files contain data representing the periodic 
plant measures of species within each plot in a text tab delimited format. The 
data presented are seasonal growth of graminoids (length of leaf and length of 
inflorescence) and seasonal flowering of all species (number of inflorescences 
in flower within a plot), collected weekly during the summers of 2012-20XX for 
a subset of 30 grid plots at two sites (Barrow ARCSS grid and Atqasuk ARCSS 
grid).

GeographicElementWestBoundLatitude  -157.24
GeographicElementEastBoundLatitude  -156.4
GeographicElementNorthBoundLatitude 71.18
GeographicElementSouthBoundLatitude 70.27

[mattmann-0420740:~/tmp/tika1.9] mattmann% 
{noformat}

h2. Parse Met
{noformat}
[mattmann-0420740:~/tmp/tika1.9] mattmann% java -jar 
tika-app/target/tika-app-1.9-SNAPSHOT.jar -m 
tika-parsers/src/test/resources/test-documents/sampleFile.iso19139
May 01, 2015 12:46:25 AM org.apache.sis.internal.jaxb.gml.TM_Primitive 
setTimePeriod
WARNING: This operation requires the “sis-temporal” module.
Content-Length: 19370
Content-Type: text/iso19139+xml
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.geoinfo.GeographicInformationParser
resourceName: sampleFile.iso19139
[mattmann-0420740:~/tmp/tika1.9] mattmann% 
{noformat}

Something is weird here, met not getting added. Going to commit and investigate.


> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
>

[jira] [Commented] (TIKA-443) Geographic Information Parser

2015-05-01 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522869#comment-14522869
 ] 

Chris A. Mattmann commented on TIKA-443:


OK I also had to grab: 
https://github.com/gautham4/GeographicDR/commit/e04a7824ab3d9fb8517479007b545d7e8fcee704.patch
 since the tika-bundle stuff I helped you with wasn't part of your PR. 
Re-testing.

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
>  Labels: new-parser
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-443) Geographic Information Parser

2015-05-01 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522853#comment-14522853
 ] 

Chris A. Mattmann commented on TIKA-443:


OK thanks [~gautham4] going to test this out now.

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
>  Labels: new-parser
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-443) Geographic Information Parser

2015-04-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522848#comment-14522848
 ] 

ASF GitHub Bot commented on TIKA-443:
-

GitHub user gautham4 opened a pull request:

https://github.com/apache/tika/pull/47

PULL REQUEST for TIKA-443



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gautham4/tika TIKA-443

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tika/pull/47.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #47


commit 6bfdbcd869455bbae7a4547b738e5a0b249053e8
Author: unknown 
Date:   2015-04-22T04:35:03Z

fix for TIKA-443 contributed by gautham4

commit 66ba03ee85946d7babf9815b9734f0ee83b4767f
Author: unknown 
Date:   2015-05-01T05:34:38Z

fix for TIKA-443 contributed by gautham@gmail.com




> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
>  Labels: new-parser
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-443) Geographic Information Parser

2014-11-06 Thread Martin Desruisseaux (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201379#comment-14201379
 ] 

Martin Desruisseaux commented on TIKA-443:
--

For Tika to ISO 19115, I see those choices:

* Some core Tika classes could implement some {{org.opengis.metadata}} 
interfaces. For example if there is a Tika class somewhere which contains the 
(latitude, longitude) coordinates of a rectangle, that class could implement 
the {{GeographicBoundingBox}} interface. All the {{org.opengis.metadata}} 
interfaces follow ISO 19115 model, so this is not like a purely arbitrary API.
* Alternatively, if Tika prefer to not modify their core classes, the data 
could be copied from the Tika class to a separated {{GeographicBoundingBox}} 
implementation just before marshalling. That separated implementation could be 
the SIS one or an other one if the Tika group prefer. However using the SIS one 
would avoid an other copy since SIS will need to copy the data into its own 
implementation before to marshall anyway (because of the way JAXB works).

Once Tika has identified the information of interest to them 
({{GeographicBoundingBox}}, maybe {{DataIdentification}}, etc.), those data 
needs to be put together into a {{org.opengis.metadata.Metadata}} 
implementation, which is usually the root of ISO 19115 hierarchy. Again it can 
be either a core SIS class implementing {{Metadata}}, or a separated 
implementation like the SIS one, at your choice.

Once you have a {{Metadata}} instance, the easiest way to marshall it is using 
{{org.apache.sis.XML}}. This convenience class provides several {{marshal}} 
methods, so you can pick the most convenient. An easy one for testing purpose 
is:

{code:java}
System.out.println(XML.marshal(metadata));
{code}

For the reverse operation (ISO 19115 to Tika), the starting point could be:

{code:java}
Metadata md = (Metadata) XML.unmarshal(inputStream);
{code}

but the next issue is to use that {{Metadata}} information. Again I see two 
choices:

* Tika may copy the information into its own internal structure.
* Or alternatively, some Tika API may be designed to accept {{Metadata}}, 
{{GeographicBoundingBox}}, etc. arguments. Again they are GeoAPI interfaces, so 
not necessarily SIS implementations. If Tika implemented those interfaces as a 
result of above discussion, the modified API would work with Tika classes.


> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-443) Geographic Information Parser

2014-11-05 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199006#comment-14199006
 ] 

Chris A. Mattmann commented on TIKA-443:


Thanks Martin. I think the use case here that would be great, would be 
something like:

tika < geofile (e.g., ISO-19115) > Tika XHTML
tika -m < geofile (e.g., ISO-19115) > ISO-19115 metadata

Thoughts of easy ways of accomplishing the above?

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-443) Geographic Information Parser

2014-11-05 Thread Martin Desruisseaux (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198059#comment-14198059
 ] 

Martin Desruisseaux commented on TIKA-443:
--

A note just in case: Tika does not need to have a strong dependency to SIS if 
you prefer to avoid it. The ISO 19115 metadata are defined by interfaces in a 
separated JAR file, ({{geoapi-3.0.0.jar}}), which is in turn implemented by 
SIS. But the Tika project could decide to implement itself a subset of the 
interface considered most pertinent to Tika needs (e.g. 
{{GeographicBoundingBox}}, {{DataIdentification}}, etc.), which should allow 
Tika to switch between its own implementation as SIS implementation 
transparently. For example Tika could have basic geographic information support 
as a standalone application, and delegate to SIS only for more advanced needs 
if the user wish.

I'm just mentioning that as one possible strategy.


> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-443) Geographic Information Parser

2014-10-23 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182462#comment-14182462
 ] 

Chris A. Mattmann commented on TIKA-443:


Guys, I wonder if we should (now 4 years later) standardize on Apache SIS 
(http://sis.apache.org/) and incorporate its support for parsing ISO19115 
metadata. It seems to have the same types of properties that FDO metadata XML 
has. 

I'm going to give a whirl at creating a GeoParser that extracts information 
from ISO 19115 XML files. [~desruisseaux] FYI [~adamestrada] FYI.

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>Assignee: Chris A. Mattmann
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-30 Thread Arturo Beltran (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883822#action_12883822
 ] 

Arturo Beltran commented on TIKA-443:
-

As I commented in the issue TIKA-445, after a few days off I found a pleasant 
surprise. Good job. 
 
Greetings and thanks for your work

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-28 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883196#action_12883196
 ] 

Nick Burch commented on TIKA-443:
-

I've opened TIKA-445 and uploaded a first stab at a patch to implement it. 
Feedback appreciated!

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-28 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883182#action_12883182
 ] 

Chris A. Mattmann commented on TIKA-443:


Hey Nick,

Yep +1 on having the new namespace called "Geographic" with the given 2 fields 
as a starting point. We should probably track it and commit in a new issue. 

Thanks for your thoughts on this!

Cheers,
Chris


> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-28 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883171#action_12883171
 ] 

Nick Burch commented on TIKA-443:
-

I was thinking that making sure you put in the right matching pairs, and remove 
them again is a little fiddly, but that's nothing that a little wrapper library 
wouldn't fix for you. With that in mind, I think your proposed solution is 
likely to be much better than changing tika to support composite values, with 
the problems that that would bring

Any objections to creating a new Metadata keyspace of Geographic, with to start 
with LATITUDE = geo:latitude & LONGITUDE = geo:longitude ? I can think of a few 
others we might want in future (height, bearing etc), which makes me think its 
own space might make sense

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-28 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883159#action_12883159
 ] 

Chris A. Mattmann commented on TIKA-443:


Hey Nick,

I think we need to support both cases (single lat/lon per document as well as 
many lat/lon pairs per document). In the case of the former, it's easy, we have:

key: Metadata.LATITUDE
val:  some lat

key: Metadata.LONGITUDE
val:  some lon

And, in the case of the latter, we have:

key: Metadata.LATITUDE
val:  some lat, some lat2, some lat3, some lat n...

key: Metadata.LONGITUDE
val:  some lon, some lon2, some lon3, some lon n...

Because the keys are ordered in the Metadata object, I think that we can make 
sure they match up and treat single points the same as for multiple points. 
It's great to have support for both on a per Metadata object basis too since 
many scientific data formats have both scenarios in them (e.g., NetCDF and HDF 
typically have arrays of lats and lons, and sometimes, singe point values as 
well). 

The reason we need to support both is that distance computation (point/radius, 
bounding box, and polygon) would require both scenarios to be supported. I've 
been thinking that once this work is prototyped, to integrate Tika with the 
work in SIS to build out a computational spatial library. I think Tika could be 
used to feed in lats/lons into SIS.

Cheers,
Chris


> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-28 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883153#action_12883153
 ] 

Nick Burch commented on TIKA-443:
-

I was wondering about extracting geo data from jpeg exif tags. For this, we'd 
probably want dedicated metadata properties for lat and long

(Other files can have a single lat+long in them too, eg html pages with the 
icbm meta tags)

Not sure how well that might integrate with this work though, since shapefiles 
will typically contain a large number lats+longs (or similar geographic points)

Anyone have any ideas about a single created-at position vs stream of locations 
from geo formats?

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-22 Thread Arturo Beltran (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881146#action_12881146
 ] 

Arturo Beltran commented on TIKA-443:
-

I'm not convinced about using OGDI. From what I understand from reading the 
documentation, OGDI offers an API in C, so we encounter the same problem to 
integrate it with Java. In addition, the project is not updated since 2008, so 
new geographic formats are not supported (i.e: KML). Also, I think OGDI does 
not support databases or services.

However, you can do some proof of concept to see if it would be very difficult 
to integrate with Java and see exactly what metadata can be extracted using 
OGDI. Then we can compare these results with mine and decide. 

As you can see, I've attached a sample XML file (getFDOMetadata.xml) that 
contains the information extracted of a SHP by my proof of concept server based 
on FDO. This is the result after a simple HTTP call 
(http://localhost:12345/getFDOMetadata?source=C:\ExampleData\shp_world_countries\country.shp&provider=SHP)

For now, I'll keep trying to run muy "Hello world" Tika parser.

Regards,
 Arturo

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
> Attachments: getFDOMetadata.xml
>
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-21 Thread Mayank Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880921#action_12880921
 ] 

Mayank Singh commented on TIKA-443:
---

Arturo I am not very comfortable with C++ and have no knowledge of .NET 
platform (I'm a Java guy) so my help in this matter will be very limited to you 
if you plan on using FDO. However, I was looking around for alternatives and 
found OGDI (http://ogdi.sourceforge.net/) which can act as a middle layer 
between various data sources and has almost the same capabilities of data 
dissemination over the network as FDO (more info here: 
http://www.gisdevelopment.net/technology/gis/techgi0057b.htm).
   So what I am suggesting is we look into it and once we get the heterogeneous 
data into the OGDI supported uniform data structure we can use Java to 
integrate it with Tika.
I'll keep searching for more info. Do tell me your views on this
Regards
Mayank

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-21 Thread Arturo Beltran (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880844#action_12880844
 ] 

Arturo Beltran commented on TIKA-443:
-

You are right Chris. Since now, I will try to keep the discussions on the list 
or here.

I will try to explain in brief where exactly I'm working in order that you can 
get involved.
The first piece is what allows us to access resources, we need a platform to 
access by the most homogenous way to heterogeneous resources. The best approach 
I've found has been FDO (http://fdo.osgeo.org/). In short, FDO is an API for 
manipulating, defining and analyzing geospatial information regardless of where 
it is stored.

So it looks simple, I only have to integrate FDO as a Tika parser and I have 
it. The problem appeared when trying to connect this C++ API with Java. I have 
worked with SWIG and directly with JNI but I have not gotten it to work.
Finally, temporary and to serve as a proof of concept, I implemented a simple 
HTTP server in .NET that offers resource descriptions using FDO. And now I'm 
trying to create a dummy parser for Tika to make calls to that server.

I hope I explained well and that you could understand something, otherwise, 
feel free to ask again.

Greetings and thanks for your interest:
 Arturo

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-21 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880836#action_12880836
 ] 

Chris A. Mattmann commented on TIKA-443:


Hi Guys,

Thanks for the effort here. Please try hard to keep the discussions on list as 
the community will benefit from them and can help provide feedback 
incrementally.

Thanks,
Chris


> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-21 Thread Arturo Beltran (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880762#action_12880762
 ] 

Arturo Beltran commented on TIKA-443:
-

Hi all,

I am pleased by the interest shown by the community on my proposal. As I said, 
any help is welcome.
I have sent Mayank all the details about my work on this issue. If anyone else 
is interested in collaborating or simply provide their ideas/comments do not 
hesitate to contact me.

Cheers,
Arturo

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-20 Thread Mayank Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880722#action_12880722
 ] 

Mayank Singh commented on TIKA-443:
---

Hi Arturo
I would like to collaborate on this issue. I have also sent you a mal regarding 
the same.
Thanks and regards
Mayank

> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (TIKA-443) Geographic Information Parser

2010-06-17 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879788#action_12879788
 ] 

Chris A. Mattmann commented on TIKA-443:


Hi Arturo,

Thanks for reporting this issue and it sounds awesome! I'm definitely 
interested in this topic and will be sure to help however I can.

Cheers,
Chris


> Geographic Information Parser
> -
>
> Key: TIKA-443
> URL: https://issues.apache.org/jira/browse/TIKA-443
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Arturo Beltran
>
> I'm working in the automatic description of geospatial resources, and I think 
> that might be interesting to incorporate new parser/s to Tika in order to 
> manage and describe some geo-formats. These geo-formats include files, 
> services and databases.
> If anyone is interested in this issue or want to collaborate do not hesitate 
> to contact me. Any help is welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.