Dumping a .met file and calling the filemgr client ingest routine works
fine, so its something either broken or i'm doing wrong in the crawler it
appears.

Tom

On Mon, Nov 23, 2015 at 3:45 PM, Tom Barber <[email protected]> wrote:

> I'll give it a go. Thanks.
>
> On Mon, Nov 23, 2015 at 3:44 PM, Chris Mattmann <[email protected]>
> wrote:
>
>> Doesn’t look weird. Hmm. Can you generate a metadata file
>> using TikaCmdLine extractor and then use that metadata file
>> to ingest into File Manager by hand? Does that work?
>>
>> —
>> Chris Mattmann
>> [email protected]
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Tom Barber <[email protected]>
>> Reply-To: <[email protected]>
>> Date: Monday, November 23, 2015 at 7:43 AM
>> To: "[email protected]" <[email protected]>
>> Subject: Re: Crawling / Archiving binary data with Solr backend
>>
>> >Author: Alun Davis - Loudmouth
>> >Content-Length: 3273160
>> >Content-Type: audio/mpeg
>> >X-Parsed-By: org.apache.tika.parser.DefaultParser
>> >X-TIKA:digest:MD5: 5f374012180e94778346619515152f74
>> >X-TIKA:digest:SHA256:
>> >34d8bf9da8feb848922138eb7807c0d71ed92376422fb28c8cbbffe788574ab0
>> >channels: 2
>> >creator: Alun Davis - Loudmouth
>> >dc:creator: Alun Davis - Loudmouth
>> >dc:title: Teenage Baghead
>> >meta:author: Alun Davis - Loudmouth
>> >resourceName: Teenage Baghead.mp3
>> >samplerate: 44100
>> >title: Teenage Baghead
>> >version: MPEG 3 Layer III Version 1
>> >xmpDM:album:
>> >xmpDM:artist: Alun Davis - Loudmouth
>> >xmpDM:audioChannelType: Stereo
>> >xmpDM:audioCompressor: MP3
>> >xmpDM:audioSampleRate: 44100
>> >xmpDM:duration: 204577.046875
>> >xmpDM:genre: Pop
>> >xmpDM:logComment: www.maimthattune.com for more!
>> >xmpDM:releaseDate: 2001
>> >
>> >
>> >Nothing that should scare a parser in the mp3 at least.
>> >
>> >On Mon, Nov 23, 2015 at 3:33 PM, Chris Mattmann <
>> [email protected]>
>> >wrote:
>> >
>> >> yeah check the metadata. Any weird UTF-8 encoding?
>> >>
>> >> (aka run tika on the file outside of OODT what do you see?)
>> >>
>> >> —
>> >> Chris Mattmann
>> >> [email protected]
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Tom Barber <[email protected]>
>> >> Reply-To: <[email protected]>
>> >> Date: Monday, November 23, 2015 at 7:23 AM
>> >> To: "[email protected]" <[email protected]>
>> >> Subject: Re: Crawling / Archiving binary data with Solr backend
>> >>
>> >> >./crawler/bin/crawler_launcher     --filemgrUrl http://localhost:9000
>> >> >--operation --launchMetCrawler     --clientTransferer
>> >> >org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> >> >--productPath $OODT_HOME/data/staging     --metExtractor
>> >> >org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor
>> >> >--metExtractorConfig
>> >>/home/bugg/Projects/surrey100/oodt/data/met/tika.conf
>> >> >
>> >> >I'm running that. Which runs fine with the default lucene stuff, also
>> >>runs
>> >> >fine with a txt file, but doesn't run fine over a random picture I
>> >>took or
>> >> >over an mp3 I tested it on.
>> >> >
>> >> >
>> >> >On Mon, Nov 23, 2015 at 3:12 PM, Mattmann, Chris A (3980) <
>> >> >[email protected]> wrote:
>> >> >
>> >> >> Encoding issues with the extracted metadata? What are you getting
>> >> >> just running Tika on the files?
>> >> >>
>> >> >> The actual data shouldn’t matter since it’s not being ingested
>> >> >> (are you doing it in place, or what data transferer are you using)?
>> >> >>
>> >> >> Cheers,
>> >> >> Chris
>> >> >>
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> Chris Mattmann, Ph.D.
>> >> >> Chief Architect
>> >> >> Instrument Software and Science Data Systems Section (398)
>> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> >> Office: 168-519, Mailstop: 168-527
>> >> >> Email: [email protected]
>> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> Adjunct Associate Professor, Computer Science Department
>> >> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: Tom Barber <[email protected]>
>> >> >> Reply-To: "[email protected]" <[email protected]>
>> >> >> Date: Monday, November 23, 2015 at 6:36 AM
>> >> >> To: "[email protected]" <[email protected]>
>> >> >> Subject: Crawling / Archiving binary data with Solr backend
>> >> >>
>> >> >> >Hello,
>> >> >> >
>> >> >> >Looks like I've never tried it before with binary data. If I swap
>> >>the
>> >> >> >filemgr defaults to use solr then try and crawl my staging
>> directory
>> >> >>using
>> >> >> >the Tika extractor I get a lot of
>> >> >> >
>> >> >> >org.apache.xmlrpc.XmlRpcException: java.lang.Exception:
>> >> >> >org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException:
>> >>Error
>> >> >> >ingesting product
>> >> >>[org.apache.oodt.cas.filemgr.structs.Product@62b19476]
>> >> >> :
>> >> >> >null
>> >> >> >at
>> >> >>
>> >>
>>
>> >>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpc
>> >>>>>Cl
>> >> >>>ie
>> >> >> >ntResponseProcessor.java:104)
>> >> >> >at
>> >> >>
>> >>
>>
>> >>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcC
>> >>>>>li
>> >> >>>en
>> >> >> >tResponseProcessor.java:71)
>> >> >> >at
>> >> >>
>> >>
>>
>> >>>>>org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:7
>> >>>>>3)
>> >> >> >
>> >> >> >
>> >> >> >Type things.
>> >> >> >
>> >> >> >Any ideas?
>> >> >> >
>> >> >> >Tom
>> >> >>
>> >> >>
>> >>
>> >>
>> >>
>>
>>
>>
>

Reply via email to