One more thing.  do this:

./dspace filter-media -h

and see what is avaialble.  I have version 3 so I'm not sure what is
in your version, but mine has these options, and one of them is to
index a particular item, so you could try that and see what happens.

 ./dspace filter-media -h
usage: MediaFilterManager

 -p,--plugins       ONLY run the specified Media Filter plugin(s)
                    listed from 'filter.plugins' in dspace.cfg.
                    Separate multiple with a comma (,)
                    (e.g. MediaFilterManager -p
                    "Word Text Extractor","PDF Text Extractor")
 -s,--skip          SKIP the bitstreams belonging to identifier
                    Separate multiple identifiers with a comma (,)
                    (e.g. MediaFilterManager -s
                    123456789/34,123456789/323)
 -f,--force         force all bitstreams to be processed
 -h,--help          help
 -i,--identifier    ONLY process bitstreams belonging to identifier
 -m,--maximum       process no more than maximum items
 -n,--noindex       do NOT update the search index after filtering
                    bitstreams
 -q,--quiet         do not print anything except in the event of errors.
 -v,--verbose       print all extracted text and other details to STDOUT

On Thu, Sep 19, 2013 at 9:49 AM, Jose Blanco <blan...@umich.edu> wrote:
> Bill, When you go view an item as an admin, you should be able to see
> the txt file created based off the pdf file.  I suppose you can see
> these for the pdf files media-filter actually got to, but not to the
> others, right?  I also wonder if media filter chocked along the way,
> but you said you did not get any error messages.  What about in the
> logs? Look at some items as admin and see if this gives you any clue.
>
> -Jose
>
> On Thu, Sep 19, 2013 at 9:03 AM, Bill Tantzen <wile...@gmail.com> wrote:
>> Still working on this media filter issue -- maybe this might point me in the
>> right direction:  how are bitstreams selected for filtering?  Is it
>> something like SELECT * FROM bitstream WHERE ???
>> What is in the WHERE clause?  Or is there some other basis for selection?
>>
>> Thanks,
>> Bill
>>
>>
>> On Wed, Sep 18, 2013 at 2:09 PM, Bill Tantzen <wile...@gmail.com> wrote:
>>>
>>> Here's a snip from my dspace.cfg:
>>>
>>> #Names of the enabled MediaFilter or FormatFilter plugins
>>> filter.plugins = \
>>>   PDF Text Extractor, \
>>>   PDF Thumbnail, \
>>>   HTML Text Extractor, \
>>>   Word Text Extractor, \
>>>   JPEG Thumbnail, \
>>>   Branded Preview JPEG, \
>>>   PowerPoint Text Extractor
>>>
>>> # [To enable Branded Preview]: remove last line above, and uncomment 2
>>> lines be\
>>> low
>>> #                        Word Text Extractor, JPEG Thumbnail, \
>>> #                        Branded Preview JPEG
>>>
>>> #Assign 'human-understandable' names to each filter
>>> plugin.named.org.dspace.app.mediafilter.FormatFilter = \
>>>   org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \
>>>   org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \
>>>   org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \
>>>   org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \
>>>   org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \
>>>   org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview
>>> JPEG, \
>>>   org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text Extractor
>>>
>>> Specifically, I *think* the pdf filter should be enabled...  As I said,
>>> the majority of the files are .pdf...
>>> Bill
>>>
>>>
>>> On Wed, Sep 18, 2013 at 2:00 PM, helix84 <heli...@centrum.sk> wrote:
>>>>
>>>> Hi Bill,
>>>>
>>>> check your configuration to see which media filters you actually have
>>>> enabled:
>>>>
>>>> https://wiki.duraspace.org/pages/viewpage.action?pageId=32474041#TransformingDSpaceContent(MediaFilters)-AvailableMediaFilters
>>>>
>>>> It's possible that you have only a mediafilter for one file type
>>>> enabled and thus it skips the majority of your files.
>>>>
>>>>
>>>> Regards,
>>>> ~~helix84
>>>>
>>>> Compulsory reading: DSpace Mailing List Etiquette
>>>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
>> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
>> includes
>> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
>> _______________________________________________
>> DSpace-tech mailing list
>> DSpace-tech@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>> List Etiquette:
>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to