[jira] [Commented] (TIKA-3948) Migrate to jakarta in Tika 3.x
[ https://issues.apache.org/jira/browse/TIKA-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773363#comment-17773363 ] Martin Desruisseaux commented on TIKA-3948: --- Thanks. If anyone has a chance to test with the staged repositories (copied below for convenience), please let me know if there is any issue. {code:xml} sis.staging.parent SIS staging repository of Parent POM https://repository.apache.org/content/repositories/orgapachesis-1043 sis.staging.main SIS staging repository of main artifacts https://repository.apache.org/content/repositories/orgapachesis-1049 sis.staging.non-free SIS staging repository of non-free resources https://repository.apache.org/content/repositories/orgapachesis-1050 {code} > Migrate to jakarta in Tika 3.x > -- > > Key: TIKA-3948 > URL: https://issues.apache.org/jira/browse/TIKA-3948 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Major > Labels: tika-3x > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3948) Migrate to jakarta in Tika 3.x
[ https://issues.apache.org/jira/browse/TIKA-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765680#comment-17765680 ] Martin Desruisseaux commented on TIKA-3948: --- Hello Tim. Snapshots have been deployed on [https://repository.apache.org/content/repositories/snapshots/] I still have to work on signing and bundling of source code and javadoc (this is my first deployment with Gradle instead of Maven, so I still have to learn), after that I can start a thread for SIS release. It may take about 2 weeks for discussion, vote, etc. > Migrate to jakarta in Tika 3.x > -- > > Key: TIKA-3948 > URL: https://issues.apache.org/jira/browse/TIKA-3948 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Major > Labels: tika-3x > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3948) Migrate to jakarta in Tika 3.x
[ https://issues.apache.org/jira/browse/TIKA-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765621#comment-17765621 ] Martin Desruisseaux commented on TIKA-3948: --- Hello Tim This is an error in the SIS {{build.gradle.kts}} file. I already have a local fix, not yet pushed (will push in a few hours). Sorry for the delay, and glad that you could workaround! I'm working right now in trying to get the CI to work so that snapshots can be deployed again. I will post a new comment when it will be ready. > Migrate to jakarta in Tika 3.x > -- > > Key: TIKA-3948 > URL: https://issues.apache.org/jira/browse/TIKA-3948 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Major > Labels: tika-3x > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Desruisseaux updated TIKA-443: - Attachment: Demo.java Attached an example of mapping some ISO 19115 elements to Dublin Core. This is only a proof of concept for the sake of discussion on the mailing list - this demo could not be included in Tika without work by a Tika developer. This demo fetches the following information from an ISO 19115 metadata. The extraction is quite primitive. For example an ISO 19115 metadata could have many titles, many creators, lot of basically everything. This demo just fetches an arbitrary occurrence of each property. A production code would need to make a more elaborated choice. * title * creator * contributor * created * modified * description * latitude * longitude * altitude Latitude, longitude and altitude are set to the center of the geographic bounding box, if any. > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Labels: memex, new-parser > Fix For: 1.9 > > Attachments: Demo.java, getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201379#comment-14201379 ] Martin Desruisseaux commented on TIKA-443: -- For Tika to ISO 19115, I see those choices: * Some core Tika classes could implement some {{org.opengis.metadata}} interfaces. For example if there is a Tika class somewhere which contains the (latitude, longitude) coordinates of a rectangle, that class could implement the {{GeographicBoundingBox}} interface. All the {{org.opengis.metadata}} interfaces follow ISO 19115 model, so this is not like a purely arbitrary API. * Alternatively, if Tika prefer to not modify their core classes, the data could be copied from the Tika class to a separated {{GeographicBoundingBox}} implementation just before marshalling. That separated implementation could be the SIS one or an other one if the Tika group prefer. However using the SIS one would avoid an other copy since SIS will need to copy the data into its own implementation before to marshall anyway (because of the way JAXB works). Once Tika has identified the information of interest to them ({{GeographicBoundingBox}}, maybe {{DataIdentification}}, etc.), those data needs to be put together into a {{org.opengis.metadata.Metadata}} implementation, which is usually the root of ISO 19115 hierarchy. Again it can be either a core SIS class implementing {{Metadata}}, or a separated implementation like the SIS one, at your choice. Once you have a {{Metadata}} instance, the easiest way to marshall it is using {{org.apache.sis.XML}}. This convenience class provides several {{marshal}} methods, so you can pick the most convenient. An easy one for testing purpose is: {code:java} System.out.println(XML.marshal(metadata)); {code} For the reverse operation (ISO 19115 to Tika), the starting point could be: {code:java} Metadata md = (Metadata) XML.unmarshal(inputStream); {code} but the next issue is to use that {{Metadata}} information. Again I see two choices: * Tika may copy the information into its own internal structure. * Or alternatively, some Tika API may be designed to accept {{Metadata}}, {{GeographicBoundingBox}}, etc. arguments. Again they are GeoAPI interfaces, so not necessarily SIS implementations. If Tika implemented those interfaces as a result of above discussion, the modified API would work with Tika classes. > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198059#comment-14198059 ] Martin Desruisseaux edited comment on TIKA-443 at 11/5/14 10:07 AM: A note just in case: Tika does not need to have a strong dependency to SIS if you prefer to avoid it. The ISO 19115 metadata are defined by interfaces in a separated JAR file, ({{geoapi-3.0.0.jar}}), which is in turn implemented by SIS. But the Tika project could decide to implement itself a subset of those interfaces considered most pertinent to Tika needs (e.g. {{GeographicBoundingBox}}, {{DataIdentification}}, etc.), which should allow Tika to switch between its own implementation and SIS implementation transparently. For example Tika could have basic geographic information support as a standalone application, and delegate to SIS only for more advanced needs if the user wish. I'm just mentioning that as one possible strategy. was (Author: desruisseaux): A note just in case: Tika does not need to have a strong dependency to SIS if you prefer to avoid it. The ISO 19115 metadata are defined by interfaces in a separated JAR file, ({{geoapi-3.0.0.jar}}), which is in turn implemented by SIS. But the Tika project could decide to implement itself a subset of the interface considered most pertinent to Tika needs (e.g. {{GeographicBoundingBox}}, {{DataIdentification}}, etc.), which should allow Tika to switch between its own implementation as SIS implementation transparently. For example Tika could have basic geographic information support as a standalone application, and delegate to SIS only for more advanced needs if the user wish. I'm just mentioning that as one possible strategy. > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-443) Geographic Information Parser
[ https://issues.apache.org/jira/browse/TIKA-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198059#comment-14198059 ] Martin Desruisseaux commented on TIKA-443: -- A note just in case: Tika does not need to have a strong dependency to SIS if you prefer to avoid it. The ISO 19115 metadata are defined by interfaces in a separated JAR file, ({{geoapi-3.0.0.jar}}), which is in turn implemented by SIS. But the Tika project could decide to implement itself a subset of the interface considered most pertinent to Tika needs (e.g. {{GeographicBoundingBox}}, {{DataIdentification}}, etc.), which should allow Tika to switch between its own implementation as SIS implementation transparently. For example Tika could have basic geographic information support as a standalone application, and delegate to SIS only for more advanced needs if the user wish. I'm just mentioning that as one possible strategy. > Geographic Information Parser > - > > Key: TIKA-443 > URL: https://issues.apache.org/jira/browse/TIKA-443 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Arturo Beltran >Assignee: Chris A. Mattmann > Attachments: getFDOMetadata.xml > > > I'm working in the automatic description of geospatial resources, and I think > that might be interesting to incorporate new parser/s to Tika in order to > manage and describe some geo-formats. These geo-formats include files, > services and databases. > If anyone is interested in this issue or want to collaborate do not hesitate > to contact me. Any help is welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)