Re: [Dspace-tech] media filter question
Solved. In v3.2, bitstreamformatregistry.short_description for mimetype application/pdf is 'Adobe PDF'. However, in my installation (for some long lost reason) the short_description is simply 'PDF'. Therefore in MediaFilterManager.java::filterBitstream(), the test at line 556: if (fmts.contains(myBitstream.getFormat().getShortDescription())) never returns true, so no pdf files are ever processed. As a workaround, in dspace.cfg, I changed filter.org.dspace.app.mediafilter.PDFFilter.inputFormats = Adobe PDF to filter.org.dspace.app.mediafilter.PDFFilter.inputFormats = Adobe PDF, PDF and voila! Everything works. I could just have easily updated bitstreamformatregistry, but I was wary of breaking something else. Cheers! Bill On Mon, Sep 23, 2013 at 11:06 AM, Bill Tantzen wile...@gmail.com wrote: Ivan, Thanks for checking in... dspace filter-media returns with exit status 0. The dspace log shows no errors, just entries of the form: 2013-09-23 10:37:41,012 INFO org.dspace.search.DSIndexer @ Writing Community: 2408/104859 to Index or: 2013-09-23 10:37:40,336 INFO org.dspace.search.DSIndexer @ Writing Collection: 2408/55874 to Index The output from the command line is short. Normally, I would expect to see a log of each bitstream examined beginning with 'FILTERED' or 'SKIPPED'. Instead I see only a few errors for .doc files (Invalid Format) followed by a couple of SKIPPED entries for bitstreams with an existing .txt file. All the .pdf files are in the ORIGINAL bundle. For instance: dspace= select * from item2bundle where item_id = 34950; -[ RECORD 1 ] id| 39982 item_id | 34950 bundle_id | 39983 -[ RECORD 2 ] id| 39983 item_id | 34950 bundle_id | 39984 dspace= select * from bundle where bundle_id in ( 39983, 39984 ); -[ RECORD 1 ]+- bundle_id| 39983 name | LICENSE primary_bitstream_id | -[ RECORD 2 ]+- bundle_id| 39984 name | ORIGINAL primary_bitstream_id | dspace= select * from bundle2bitstream where bundle_id = 39984; -[ RECORD 1 ]---+-- id | 40042 bundle_id | 39984 bitstream_id| 40065 bitstream_order | 2 dspace= select * from bitstream where bitstream_id = 40065; -[ RECORD 1 ]---+ bitstream_id| 40065 bitstream_format_id | 3 name| 8175706.pdf size_bytes | 6587102 checksum| 164de17195af1d0de45cd17a431fc2b9 checksum_algorithm | MD5 description | user_format_description | source | /dspace/assetstore/dspace-sr/upload/8175706.pdf internal_id | 104968051252620967298398595849898250327 deleted | f store_number| 0 sequence_id | 2 This bitstream however is neither FILTERED nor SKIPPED. This database has been recently updated from v1.42 to v3, and I suspect the problem is somewhere in the db rather than a bug in the code, but everything *looks* right to me. I can trace the relations from the community to collection to item, but for some reason the bitstreams are simply not checked. What do you think? Bill On Sun, Sep 22, 2013 at 12:35 PM, helix84 heli...@centrum.sk wrote: Hi Bill, please remember to keep dspace-tech in CC. Can you please tell me what the result of each of my suggestion was? 1) What was the errorlevel of your filter-media command? 2) Did you look at the log while it was running using tail -f? 3) Were all the bitstreams you expected to be filtered in the ORIGINAL bundle? (check at least a few) On Fri, Sep 20, 2013 at 10:09 PM, Bill Tantzen wile...@gmail.com wrote: Hi Ivan! I've tried all these suggestions, and still, no success. There are no errors in the log, only entries of the form: 2013-09-20 15:00:24,802 INFO org.dspace.search.DSIndexer @ Writing Community: 2408/36293 to Index And 2013-09-20 15:00:17,990 INFO org.dspace.search.DSIndexer @ Writing Collection: 2408/35292 to Index One for each community and collection. The bundles are ORIGINAL, nothing special here... The database seems OK, I am able to follow the communities to collections to items just fine, but no bitstreams are being filtered. I'll keep debugging on my end, but if you have any other ideas, do pass them my way! Bill On Thu, Sep 19, 2013 at 9:08 AM, helix84 heli...@centrum.sk wrote: Hi Bill, Jose's suggestion to look at the logs for errors is a good one. First of all, we should determine whether the filtering failed during processing some item or whether it completed with nothing else to process. Also check the errorlevel of the command. 1 means error, 0 means success. On Thu, Sep 19, 2013 at 3:03 PM, Bill Tantzen wile...@gmail.com wrote: Still working on this media filter issue -- maybe this
Re: [Dspace-tech] media filter question
Ivan, Thanks for checking in... dspace filter-media returns with exit status 0. The dspace log shows no errors, just entries of the form: 2013-09-23 10:37:41,012 INFO org.dspace.search.DSIndexer @ Writing Community: 2408/104859 to Index or: 2013-09-23 10:37:40,336 INFO org.dspace.search.DSIndexer @ Writing Collection: 2408/55874 to Index The output from the command line is short. Normally, I would expect to see a log of each bitstream examined beginning with 'FILTERED' or 'SKIPPED'. Instead I see only a few errors for .doc files (Invalid Format) followed by a couple of SKIPPED entries for bitstreams with an existing .txt file. All the .pdf files are in the ORIGINAL bundle. For instance: dspace= select * from item2bundle where item_id = 34950; -[ RECORD 1 ] id| 39982 item_id | 34950 bundle_id | 39983 -[ RECORD 2 ] id| 39983 item_id | 34950 bundle_id | 39984 dspace= select * from bundle where bundle_id in ( 39983, 39984 ); -[ RECORD 1 ]+- bundle_id| 39983 name | LICENSE primary_bitstream_id | -[ RECORD 2 ]+- bundle_id| 39984 name | ORIGINAL primary_bitstream_id | dspace= select * from bundle2bitstream where bundle_id = 39984; -[ RECORD 1 ]---+-- id | 40042 bundle_id | 39984 bitstream_id| 40065 bitstream_order | 2 dspace= select * from bitstream where bitstream_id = 40065; -[ RECORD 1 ]---+ bitstream_id| 40065 bitstream_format_id | 3 name| 8175706.pdf size_bytes | 6587102 checksum| 164de17195af1d0de45cd17a431fc2b9 checksum_algorithm | MD5 description | user_format_description | source | /dspace/assetstore/dspace-sr/upload/8175706.pdf internal_id | 104968051252620967298398595849898250327 deleted | f store_number| 0 sequence_id | 2 This bitstream however is neither FILTERED nor SKIPPED. This database has been recently updated from v1.42 to v3, and I suspect the problem is somewhere in the db rather than a bug in the code, but everything *looks* right to me. I can trace the relations from the community to collection to item, but for some reason the bitstreams are simply not checked. What do you think? Bill On Sun, Sep 22, 2013 at 12:35 PM, helix84 heli...@centrum.sk wrote: Hi Bill, please remember to keep dspace-tech in CC. Can you please tell me what the result of each of my suggestion was? 1) What was the errorlevel of your filter-media command? 2) Did you look at the log while it was running using tail -f? 3) Were all the bitstreams you expected to be filtered in the ORIGINAL bundle? (check at least a few) On Fri, Sep 20, 2013 at 10:09 PM, Bill Tantzen wile...@gmail.com wrote: Hi Ivan! I've tried all these suggestions, and still, no success. There are no errors in the log, only entries of the form: 2013-09-20 15:00:24,802 INFO org.dspace.search.DSIndexer @ Writing Community: 2408/36293 to Index And 2013-09-20 15:00:17,990 INFO org.dspace.search.DSIndexer @ Writing Collection: 2408/35292 to Index One for each community and collection. The bundles are ORIGINAL, nothing special here... The database seems OK, I am able to follow the communities to collections to items just fine, but no bitstreams are being filtered. I'll keep debugging on my end, but if you have any other ideas, do pass them my way! Bill On Thu, Sep 19, 2013 at 9:08 AM, helix84 heli...@centrum.sk wrote: Hi Bill, Jose's suggestion to look at the logs for errors is a good one. First of all, we should determine whether the filtering failed during processing some item or whether it completed with nothing else to process. Also check the errorlevel of the command. 1 means error, 0 means success. On Thu, Sep 19, 2013 at 3:03 PM, Bill Tantzen wile...@gmail.com wrote: Still working on this media filter issue -- maybe this might point me in the right direction: how are bitstreams selected for filtering? Is it something like SELECT * FROM bitstream WHERE ??? What is in the WHERE clause? Or is there some other basis for selection? No, it's not SQL. It's a recursive call down the hierarchy, as you can see in this method and the few following it: [1] However your WHERE suggestion got me thinking which bitstreams are being processed and the answer is bitstreams in the ORIGINAL bundle. So please check that your content bundles are called ORIGINAL and not something else (e.g. THUMBNAIL or something custom). [1] https://github.com/DSpace/DSpace/blob/dspace-3.2/dspace-api/src/main/java/org/dspace/app/mediafilter/MediaFilterManager.java#L393 [2]
Re: [Dspace-tech] media filter question
Hi Bill, please remember to keep dspace-tech in CC. Can you please tell me what the result of each of my suggestion was? 1) What was the errorlevel of your filter-media command? 2) Did you look at the log while it was running using tail -f? 3) Were all the bitstreams you expected to be filtered in the ORIGINAL bundle? (check at least a few) On Fri, Sep 20, 2013 at 10:09 PM, Bill Tantzen wile...@gmail.com wrote: Hi Ivan! I've tried all these suggestions, and still, no success. There are no errors in the log, only entries of the form: 2013-09-20 15:00:24,802 INFO org.dspace.search.DSIndexer @ Writing Community: 2408/36293 to Index And 2013-09-20 15:00:17,990 INFO org.dspace.search.DSIndexer @ Writing Collection: 2408/35292 to Index One for each community and collection. The bundles are ORIGINAL, nothing special here... The database seems OK, I am able to follow the communities to collections to items just fine, but no bitstreams are being filtered. I'll keep debugging on my end, but if you have any other ideas, do pass them my way! Bill On Thu, Sep 19, 2013 at 9:08 AM, helix84 heli...@centrum.sk wrote: Hi Bill, Jose's suggestion to look at the logs for errors is a good one. First of all, we should determine whether the filtering failed during processing some item or whether it completed with nothing else to process. Also check the errorlevel of the command. 1 means error, 0 means success. On Thu, Sep 19, 2013 at 3:03 PM, Bill Tantzen wile...@gmail.com wrote: Still working on this media filter issue -- maybe this might point me in the right direction: how are bitstreams selected for filtering? Is it something like SELECT * FROM bitstream WHERE ??? What is in the WHERE clause? Or is there some other basis for selection? No, it's not SQL. It's a recursive call down the hierarchy, as you can see in this method and the few following it: [1] However your WHERE suggestion got me thinking which bitstreams are being processed and the answer is bitstreams in the ORIGINAL bundle. So please check that your content bundles are called ORIGINAL and not something else (e.g. THUMBNAIL or something custom). [1] https://github.com/DSpace/DSpace/blob/dspace-3.2/dspace-api/src/main/java/org/dspace/app/mediafilter/MediaFilterManager.java#L393 [2] https://github.com/DSpace/DSpace/blob/dspace-3.2/dspace-api/src/main/java/org/dspace/app/mediafilter/MediaFilterManager.java#L502 Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13. http://pubads.g.doubleclick.net/gampad/clk?id=64545871iu=/4140/ostg.clktrk ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] media filter question
Still working on this media filter issue -- maybe this might point me in the right direction: how are bitstreams selected for filtering? Is it something like SELECT * FROM bitstream WHERE ??? What is in the WHERE clause? Or is there some other basis for selection? Thanks, Bill On Wed, Sep 18, 2013 at 2:09 PM, Bill Tantzen wile...@gmail.com wrote: Here's a snip from my dspace.cfg: #Names of the enabled MediaFilter or FormatFilter plugins filter.plugins = \ PDF Text Extractor, \ PDF Thumbnail, \ HTML Text Extractor, \ Word Text Extractor, \ JPEG Thumbnail, \ Branded Preview JPEG, \ PowerPoint Text Extractor # [To enable Branded Preview]: remove last line above, and uncomment 2 lines be\ low #Word Text Extractor, JPEG Thumbnail, \ #Branded Preview JPEG #Assign 'human-understandable' names to each filter plugin.named.org.dspace.app.mediafilter.FormatFilter = \ org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \ org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \ org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \ org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \ org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \ org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG, \ org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text Extractor Specifically, I *think* the pdf filter should be enabled... As I said, the majority of the files are .pdf... Bill On Wed, Sep 18, 2013 at 2:00 PM, helix84 heli...@centrum.sk wrote: Hi Bill, check your configuration to see which media filters you actually have enabled: https://wiki.duraspace.org/pages/viewpage.action?pageId=32474041#TransformingDSpaceContent(MediaFilters)-AvailableMediaFilters It's possible that you have only a mediafilter for one file type enabled and thus it skips the majority of your files. Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] media filter question
Bill, When you go view an item as an admin, you should be able to see the txt file created based off the pdf file. I suppose you can see these for the pdf files media-filter actually got to, but not to the others, right? I also wonder if media filter chocked along the way, but you said you did not get any error messages. What about in the logs? Look at some items as admin and see if this gives you any clue. -Jose On Thu, Sep 19, 2013 at 9:03 AM, Bill Tantzen wile...@gmail.com wrote: Still working on this media filter issue -- maybe this might point me in the right direction: how are bitstreams selected for filtering? Is it something like SELECT * FROM bitstream WHERE ??? What is in the WHERE clause? Or is there some other basis for selection? Thanks, Bill On Wed, Sep 18, 2013 at 2:09 PM, Bill Tantzen wile...@gmail.com wrote: Here's a snip from my dspace.cfg: #Names of the enabled MediaFilter or FormatFilter plugins filter.plugins = \ PDF Text Extractor, \ PDF Thumbnail, \ HTML Text Extractor, \ Word Text Extractor, \ JPEG Thumbnail, \ Branded Preview JPEG, \ PowerPoint Text Extractor # [To enable Branded Preview]: remove last line above, and uncomment 2 lines be\ low #Word Text Extractor, JPEG Thumbnail, \ #Branded Preview JPEG #Assign 'human-understandable' names to each filter plugin.named.org.dspace.app.mediafilter.FormatFilter = \ org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \ org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \ org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \ org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \ org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \ org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG, \ org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text Extractor Specifically, I *think* the pdf filter should be enabled... As I said, the majority of the files are .pdf... Bill On Wed, Sep 18, 2013 at 2:00 PM, helix84 heli...@centrum.sk wrote: Hi Bill, check your configuration to see which media filters you actually have enabled: https://wiki.duraspace.org/pages/viewpage.action?pageId=32474041#TransformingDSpaceContent(MediaFilters)-AvailableMediaFilters It's possible that you have only a mediafilter for one file type enabled and thus it skips the majority of your files. Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] media filter question
One more thing. do this: ./dspace filter-media -h and see what is avaialble. I have version 3 so I'm not sure what is in your version, but mine has these options, and one of them is to index a particular item, so you could try that and see what happens. ./dspace filter-media -h usage: MediaFilterManager -p,--plugins ONLY run the specified Media Filter plugin(s) listed from 'filter.plugins' in dspace.cfg. Separate multiple with a comma (,) (e.g. MediaFilterManager -p Word Text Extractor,PDF Text Extractor) -s,--skip SKIP the bitstreams belonging to identifier Separate multiple identifiers with a comma (,) (e.g. MediaFilterManager -s 123456789/34,123456789/323) -f,--force force all bitstreams to be processed -h,--help help -i,--identifierONLY process bitstreams belonging to identifier -m,--maximum process no more than maximum items -n,--noindex do NOT update the search index after filtering bitstreams -q,--quiet do not print anything except in the event of errors. -v,--verbose print all extracted text and other details to STDOUT On Thu, Sep 19, 2013 at 9:49 AM, Jose Blanco blan...@umich.edu wrote: Bill, When you go view an item as an admin, you should be able to see the txt file created based off the pdf file. I suppose you can see these for the pdf files media-filter actually got to, but not to the others, right? I also wonder if media filter chocked along the way, but you said you did not get any error messages. What about in the logs? Look at some items as admin and see if this gives you any clue. -Jose On Thu, Sep 19, 2013 at 9:03 AM, Bill Tantzen wile...@gmail.com wrote: Still working on this media filter issue -- maybe this might point me in the right direction: how are bitstreams selected for filtering? Is it something like SELECT * FROM bitstream WHERE ??? What is in the WHERE clause? Or is there some other basis for selection? Thanks, Bill On Wed, Sep 18, 2013 at 2:09 PM, Bill Tantzen wile...@gmail.com wrote: Here's a snip from my dspace.cfg: #Names of the enabled MediaFilter or FormatFilter plugins filter.plugins = \ PDF Text Extractor, \ PDF Thumbnail, \ HTML Text Extractor, \ Word Text Extractor, \ JPEG Thumbnail, \ Branded Preview JPEG, \ PowerPoint Text Extractor # [To enable Branded Preview]: remove last line above, and uncomment 2 lines be\ low #Word Text Extractor, JPEG Thumbnail, \ #Branded Preview JPEG #Assign 'human-understandable' names to each filter plugin.named.org.dspace.app.mediafilter.FormatFilter = \ org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \ org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \ org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \ org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \ org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \ org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG, \ org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text Extractor Specifically, I *think* the pdf filter should be enabled... As I said, the majority of the files are .pdf... Bill On Wed, Sep 18, 2013 at 2:00 PM, helix84 heli...@centrum.sk wrote: Hi Bill, check your configuration to see which media filters you actually have enabled: https://wiki.duraspace.org/pages/viewpage.action?pageId=32474041#TransformingDSpaceContent(MediaFilters)-AvailableMediaFilters It's possible that you have only a mediafilter for one file type enabled and thus it skips the majority of your files. Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile,
Re: [Dspace-tech] media filter question
Hi Bill, Jose's suggestion to look at the logs for errors is a good one. First of all, we should determine whether the filtering failed during processing some item or whether it completed with nothing else to process. Also check the errorlevel of the command. 1 means error, 0 means success. On Thu, Sep 19, 2013 at 3:03 PM, Bill Tantzen wile...@gmail.com wrote: Still working on this media filter issue -- maybe this might point me in the right direction: how are bitstreams selected for filtering? Is it something like SELECT * FROM bitstream WHERE ??? What is in the WHERE clause? Or is there some other basis for selection? No, it's not SQL. It's a recursive call down the hierarchy, as you can see in this method and the few following it: [1] However your WHERE suggestion got me thinking which bitstreams are being processed and the answer is bitstreams in the ORIGINAL bundle. So please check that your content bundles are called ORIGINAL and not something else (e.g. THUMBNAIL or something custom). [1] https://github.com/DSpace/DSpace/blob/dspace-3.2/dspace-api/src/main/java/org/dspace/app/mediafilter/MediaFilterManager.java#L393 [2] https://github.com/DSpace/DSpace/blob/dspace-3.2/dspace-api/src/main/java/org/dspace/app/mediafilter/MediaFilterManager.java#L502 Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] media filter question
Here's a snip from my dspace.cfg: #Names of the enabled MediaFilter or FormatFilter plugins filter.plugins = \ PDF Text Extractor, \ PDF Thumbnail, \ HTML Text Extractor, \ Word Text Extractor, \ JPEG Thumbnail, \ Branded Preview JPEG, \ PowerPoint Text Extractor # [To enable Branded Preview]: remove last line above, and uncomment 2 lines be\ low #Word Text Extractor, JPEG Thumbnail, \ #Branded Preview JPEG #Assign 'human-understandable' names to each filter plugin.named.org.dspace.app.mediafilter.FormatFilter = \ org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \ org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \ org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \ org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \ org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \ org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG, \ org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text Extractor Specifically, I *think* the pdf filter should be enabled... As I said, the majority of the files are .pdf... Bill On Wed, Sep 18, 2013 at 2:00 PM, helix84 heli...@centrum.sk wrote: Hi Bill, check your configuration to see which media filters you actually have enabled: https://wiki.duraspace.org/pages/viewpage.action?pageId=32474041#TransformingDSpaceContent(MediaFilters)-AvailableMediaFilters It's possible that you have only a mediafilter for one file type enabled and thus it skips the majority of your files. Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] media filter question
Hi Bill, check your configuration to see which media filters you actually have enabled: https://wiki.duraspace.org/pages/viewpage.action?pageId=32474041#TransformingDSpaceContent(MediaFilters)-AvailableMediaFilters It's possible that you have only a mediafilter for one file type enabled and thus it skips the majority of your files. Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette