[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1208: -- Fix Version/s: (was: 2.0.0) 2.0.1 > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney >Priority: Major > Fix For: 1.17, 2.0.0-BETA, 2.0.1 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1208: -- Fix Version/s: (was: 2.0.0) 2.0.0-BETA > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney >Priority: Major > Fix For: 1.17, 2.0.0-BETA, 2.0.1 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1208: Fix Version/s: (was: 1.15) 1.16 > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney > Fix For: 1.16 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1208: Fix Version/s: (was: 1.14) 1.15 > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney > Fix For: 1.15 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1208: Fix Version/s: (was: 1.13) 1.14 > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney > Fix For: 1.14 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1208: Fix Version/s: (was: 1.12) 1.13 > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney > Fix For: 1.13 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1208: Fix Version/s: (was: 1.11) 1.12 > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney > Fix For: 1.12 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Meikle updated TIKA-1208: -- Fix Version/s: (was: 1.10) 1.11 * Pushed to 1.11 following 1.10 release > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney > Fix For: 1.11 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1208: Fix Version/s: (was: 1.7) 1.8 - push to 1.8 > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney > Fix For: 1.8 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1208: Fix Version/s: (was: 1.6) 1.7 > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney > Fix For: 1.7 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Meikle updated TIKA-1208: -- Fix Version/s: (was: 1.5) 1.6 Pushed out to 1.6, preparing for 1.5 RC > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney > Fix For: 1.6 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-1208: --- Attachment: TIKA-1208.patch Hi [~p_ansell], I have been working on a patch for this issue... which I did not wish to push ti Jira... however I've been taken off course by bugs in a Gora branch. I attach a patch for migrating Any23 mime package to Tika which retains the Purifier concept of cleaning documents prior to them being processed for mime/mediaType detection. I've not touched the Tika API or the Dectect API within this implementation as (I personally think) it would be more of a task to succeed in the code migration if we attempt to change well know and well designed 'dectect' and base 'Tika' API's. This therefore means that Purifier implementations are detector specific... right now all we can offer id the WhiteSpacePurifierw which is OK... but 8it NOT configurable e.g. if someone wished to pass a Purifier as a parameter to detect(InputStream, Metedata, Purifier) ... and I think that if other Purifier's were to be introduced then we could revisit this issue. Apart from that, this (WIP) patch introduces an Any23Detector which basically stems from the Tika detector we maintained in Any23... please comment on this as I am not sure if this is the right way to process... THIS PATCH IS MERELY A START... I need input from the Any23 team to see if I am 'attempting' to implement the Any23 mime code in the correct way. It should also be noted that the last time I ran this patch with Tika trunk there were issues with detection of 'semantic' mime types. Hopefully this is a start which we can build from. I am committed to getting this code suitable for proposal to Tika. Any comment are VERY appreciated. > Migrate Any23 mime contributions to Tika > > > Key: TIKA-1208 > URL: https://issues.apache.org/jira/browse/TIKA-1208 > Project: Tika > Issue Type: Sub-task > Components: mime >Reporter: Lewis John McGibbney > Fix For: 1.5 > > Attachments: TIKA-1208.patch > > > We begin with one of the most obvious areas in which there > is overlap. > In short, the appeal of this package is the addition of detection > for the following types: > - text/n3 > - text/rdf+n3 > - application/n3 > - text/x-nquads > - text/rdf+nq > - text/nq > - application/nq > - text/turtle > - application/x-turtle > - application/turtle > - application/trix > > Therefore although both Tika and Any23 execute the task of Mimetype-related > tasks, there is a contribution to be made. This involves the trasferral of > code pertaining to pattern recogition, Mimetype XML defitinions within > tika-mimetypes.xml and a Purifier implementation that removes all > the eventual blank characters at the header of a file that might > prevents its MIME Type detection. -- This message was sent by Atlassian JIRA (v6.1.5#6160)