[ 
https://issues.apache.org/jira/browse/OODT-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated OODT-630:
--------------------------------------
    Labels: memex  (was: )

> Upgrade OODT components from using Tika 0.8 to Tika 1.6
> -------------------------------------------------------
>
>                 Key: OODT-630
>                 URL: https://issues.apache.org/jira/browse/OODT-630
>             Project: OODT
>          Issue Type: Improvement
>          Components: file manager, metadata container, product server
>    Affects Versions: 0.6
>            Reporter: Rishi Verma
>            Assignee: Tyler Palsulich
>              Labels: memex
>             Fix For: 0.8
>
>         Attachments: OODT-630.Palsulich.101014.patch, 
> OODT-630.Palsulich.101014.v3.patch, OODT-630.Palsulich.101014.v4.patch
>
>
> Currently, OODT makes use of Tika v0.8 (tika-core) for mime-detection 
> purposes. This version is quite out-of-date, and is incompatible with the use 
> of a tika-core or tika-app v1.3 JAR.
> Tika v1.3 contains numerous upgrades since 0.8 (see [1]), some of which 
> include improved metadata generation for common files. These improved 
> features are extremely useful for metadata gathering.
> If a project using OODT needs features provided with the v1.3 tika-core or 
> tika-app JAR (e.g. custom met extractor), currently they cannot use this 
> version when interacting with OODT server-side components like filemgr, 
> crawler etc. since it is incompatible with OODT's use of v0.8.
> One of the incompatibilities is the deprecation of the 'getMimeType' method 
> within org.apache.tika.mime.MimeTypes.getMimeType(URL). This has been 
> supplemented with Tika.detect(URL.getPath()) & 
> MimeTypes.getRegisteredMimeType(String)
> See example exception thrown below. when crawler 0.6-SNAPSHOT was invoked 
> while a 'tika-app-1.3.jar' was placed in the crawler's lib directory:
> ---
> Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
> INFO: ProductCrawler: Ready to ingest product: [/data/staging/IMG_2590.jpg]: 
> ProductType: [GenericFile]
> Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.ingest.StdIngester 
> setFileManager
> INFO: StdIngester: connected to file manager: [http://localhost:9000]
> Jun 18, 2013 3:40:07 PM 
> org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer 
> setFileManagerUrl
> INFO: In Place Data Transfer to: [http://localhost:9000] enabled
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.tika.mime.MimeTypes.getMimeType(Ljava/net/URL;)Lorg/apache/tika/mime/MimeType;
> at org.apache.oodt.cas.filemgr.structs.Reference.<init>(Reference.java:115)
> at 
> org.apache.oodt.cas.filemgr.versioning.VersioningUtils.addRefsFromUris(VersioningUtils.java:251)
> at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:189)
> at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304)
> at 
> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188)
> at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108)
> at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
> at 
> org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:82)
> at 
> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:55)
> at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
> at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
> at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
> ---
> This JIRA issue is seeks to document efforts to upgrade OODT's use of tika 
> from 0.8 to 1.3. 
> ---
> [1] http://www.apache.org/dist/tika/CHANGES-1.3.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to