One more thing. do this: ./dspace filter-media -h
and see what is avaialble. I have version 3 so I'm not sure what is in your version, but mine has these options, and one of them is to index a particular item, so you could try that and see what happens. ./dspace filter-media -h usage: MediaFilterManager -p,--plugins ONLY run the specified Media Filter plugin(s) listed from 'filter.plugins' in dspace.cfg. Separate multiple with a comma (,) (e.g. MediaFilterManager -p "Word Text Extractor","PDF Text Extractor") -s,--skip SKIP the bitstreams belonging to identifier Separate multiple identifiers with a comma (,) (e.g. MediaFilterManager -s 123456789/34,123456789/323) -f,--force force all bitstreams to be processed -h,--help help -i,--identifier ONLY process bitstreams belonging to identifier -m,--maximum process no more than maximum items -n,--noindex do NOT update the search index after filtering bitstreams -q,--quiet do not print anything except in the event of errors. -v,--verbose print all extracted text and other details to STDOUT On Thu, Sep 19, 2013 at 9:49 AM, Jose Blanco <blan...@umich.edu> wrote: > Bill, When you go view an item as an admin, you should be able to see > the txt file created based off the pdf file. I suppose you can see > these for the pdf files media-filter actually got to, but not to the > others, right? I also wonder if media filter chocked along the way, > but you said you did not get any error messages. What about in the > logs? Look at some items as admin and see if this gives you any clue. > > -Jose > > On Thu, Sep 19, 2013 at 9:03 AM, Bill Tantzen <wile...@gmail.com> wrote: >> Still working on this media filter issue -- maybe this might point me in the >> right direction: how are bitstreams selected for filtering? Is it >> something like SELECT * FROM bitstream WHERE ??? >> What is in the WHERE clause? Or is there some other basis for selection? >> >> Thanks, >> Bill >> >> >> On Wed, Sep 18, 2013 at 2:09 PM, Bill Tantzen <wile...@gmail.com> wrote: >>> >>> Here's a snip from my dspace.cfg: >>> >>> #Names of the enabled MediaFilter or FormatFilter plugins >>> filter.plugins = \ >>> PDF Text Extractor, \ >>> PDF Thumbnail, \ >>> HTML Text Extractor, \ >>> Word Text Extractor, \ >>> JPEG Thumbnail, \ >>> Branded Preview JPEG, \ >>> PowerPoint Text Extractor >>> >>> # [To enable Branded Preview]: remove last line above, and uncomment 2 >>> lines be\ >>> low >>> # Word Text Extractor, JPEG Thumbnail, \ >>> # Branded Preview JPEG >>> >>> #Assign 'human-understandable' names to each filter >>> plugin.named.org.dspace.app.mediafilter.FormatFilter = \ >>> org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \ >>> org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \ >>> org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \ >>> org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \ >>> org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \ >>> org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview >>> JPEG, \ >>> org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text Extractor >>> >>> Specifically, I *think* the pdf filter should be enabled... As I said, >>> the majority of the files are .pdf... >>> Bill >>> >>> >>> On Wed, Sep 18, 2013 at 2:00 PM, helix84 <heli...@centrum.sk> wrote: >>>> >>>> Hi Bill, >>>> >>>> check your configuration to see which media filters you actually have >>>> enabled: >>>> >>>> https://wiki.duraspace.org/pages/viewpage.action?pageId=32474041#TransformingDSpaceContent(MediaFilters)-AvailableMediaFilters >>>> >>>> It's possible that you have only a mediafilter for one file type >>>> enabled and thus it skips the majority of your files. >>>> >>>> >>>> Regards, >>>> ~~helix84 >>>> >>>> Compulsory reading: DSpace Mailing List Etiquette >>>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette >>> >>> >> >> >> ------------------------------------------------------------------------------ >> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! >> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint >> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack >> includes >> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. >> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk >> _______________________________________________ >> DSpace-tech mailing list >> DSpace-tech@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/dspace-tech >> List Etiquette: >> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette ------------------------------------------------------------------------------ LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette