[
http://jira.dspace.org/jira/browse/DS-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=10523#action_10523
]
Van Ly commented on DS-49:
--------------------------
[14:12] <stuartlewis> DS-49 - Major/Improvement - Add support for
DjVu-documents - ID: 2234659 - http://jira.dspace.org/jira/browse/DS-49 -
[unassigned / Charles Kiplagat]
[14:12] <canderson34> 0
[14:12] <ClaudiaJuergen> -1 it's up to the admin to provide other media filter
plugins, we should stick to a basic default set
[14:12] <stuartlewis> -1 out of scope (should be in wiki_
[14:12] <fnkepler> 0
[14:12] <mhwood> 0
[14:12] <jat_ysu> 0
[14:13] <bollini> -1
[14:13] <lcs> -1 out of scope, make it an add-on
[14:13] <pketienne> 0
[14:13] <bradmc> One minute: -3 out of scope, mark 'won't fix' on DS-49
> Add support for DjVu-documents - ID: 2234659
> --------------------------------------------
>
> Key: DS-49
> URL: http://jira.dspace.org/jira/browse/DS-49
> Project: DSpace 1.x
> Issue Type: Improvement
> Affects Versions: 1.5.0, 1.5.1, 1.5.2
> Reporter: Charles Kiplagat
>
> Hello All
> This patch based on
> http://mailman.mit.edu/pipermail/dspace-general/2007-May/001513.html
> In DSpace 1.5.0+ we need (before compilation)
> 1) Add utility djvutxt (package djvulibre), for Debian it is:
> apt-get install djvulibre-bin
> 2) Edit [dspace-source]/dspace/config/dspace.cfg, text-block "### Media
> Filter / Format Filter plugins"
> and add DjVu-support in 3 places:
> filter.plugins = ... \
> DjVu Text Extractor
> plugin.named.org.dspace.app.mediafilter.FormatFilter = ... \
> org.dspace.app.mediafilter.DjVuFilter = DjVu Text Extractor
> filter.org.dspace.app.mediafilter.DjVuFilter.inputFormats = DjVu
> 3) Edit [dspace-source]/dspace/config/registries/bitstream-formats.xml
> and add next
> <bitstream-type>
> <mimetype>image/vnd.djvu</mimetype>
> <short_description>DjVu</short_description>
> <description>DjVu</description>
> <support_level>1</support_level>
> <internal>false</internal>
> <extension>djvu</extension>
> <extension>djv</extension>
> </bitstream-type>
> 4) Create file
> [dspace-source]/dspace-api/src/main/java/org/dspace/app/mediafilter/DjVuFil
> ter.java
> with next content
> /*
> DjVuFilter.java
> Version: 0.1
> DSpace version: 1.4.2 beta
> Author: Ivan Penev
> e-mail: inpenev at gmail.com
> */
> package org.dspace.app.mediafilter;
> import java.io.InputStream;
> import java.io.FileInputStream;
> import java.io.BufferedInputStream;
> import java.io.ByteArrayInputStream;
> import java.io.OutputStream;
> import java.io.FileOutputStream;
> import java.io.BufferedOutputStream;
> import java.io.FileReader;
> import java.io.BufferedReader;
> import java.io.File;
> /**
> * This class provides a media filter for processing files of type DjVu.
> * <p>The current implementation uses a program called
> <code>djvutxt</code>, which extracts the text layer from a previously
> OCR-ed DjVu file and saves it into a UTF-8 text document. The program
> is distributed with the <code>djvulibre</code> package which is freely
> available under the GPL license from <a
> href="http://djvu.sourceforge.net/">http://djvu.sourceforge.net/</a>
> for both Unix and Windows operating systems. Hence, for the media
> filter to work it is required that <code>djvutxt</code> is a valid
> command (in the working environment).</p>
> */
> public class DjVuFilter extends MediaFilter
> {
> /**
> * Get a filename for a newly created filtered bitstream.
> *
> * @param sourceName
> * name of source bitstream
> * @return filename generated by the filter - for example, document.djvu
> * becomes document.djvu.txt
> */
> public String getFilteredName(String sourceName)
> {
> return sourceName + ".txt";
> }
> /**
> * Get name of the bundle this filter will stick its generated
> bitstreams.
> *
> * @return "TEXT"
> */
> public String getBundleName()
> {
> return "TEXT";
> }
> /**
> * Get name of the bitstream format returned by this filter.
> *
> * @return "Text"
> */
> public String getFormatString()
> {
> return "Text";
> }
> /**
> * Get a string describing the newly-generated bitstream.
> *
> * @return "Extracted text"
> */
> public String getDescription()
> {
> return "Extracted text";
> }
> /**
> * Get a bitstream filled with the extracted text from a DjVu bitstream.
> * <p>The bitstream supplied as a parameter is written to a DjVu
> file on the file system (in the working directory), and the system
> command <code>djvutxt</code> is called on the latter to produce a
> UTF-8 text file containg the extracted text. The file is then copied
> to a bitstream. Finally, the auxiliary files are removed from the file
> system, and the generated bitsream is returned as a result.</p>
> * <p>WARNING! Write access to the working directory is needed for
> this method to operate! No exception handling provided!</p>
> *
> * @param source
> * input stream
> *
> * @return result of filter's transformation, written out to a bitstream
> */
> public InputStream getDestinationStream(InputStream source) throws
> Exception
> {
> /* Some convenience initializations. */
> final String cmd = "djvutxt";
> final String fileName = "aux";
> final String djvuFileName = fileName + ".djvu";
> final String txtFileName = fileName + ".txt";
> /* Store input bitstresam to auxiliary DjVu file. */
> File djvuFile = streamToFile(source, djvuFileName);
> /* Invoke external command djvutxt with appropriate agruments
> to do the actual job... */
> final String[] cmdArray = {cmd, djvuFileName, txtFileName};
> Process p = Runtime.getRuntime().exec(cmdArray);
> /* ...and wait for it to terminate */
> p.waitFor();
> /* Copy extracted text from file to an independent bitstream,
> and optionally print the text to standard output. */
> File txtFile = new File(txtFileName);
> InputStream dest = fileToStream(txtFile, MediaFilterManager.isVerbose);
> /* Then remove auxiliary files...*/
> djvuFile.delete();
> txtFile.delete();
> /* ...and return resulting bitstream. */
> return dest;
> }
> /**
> * Write given input stream to a file on the file system.
> * <p>WARNING! No exception handling!</p>
> *
> * @param inStream input stream
> * @param fileName name of the file to be generated
> *
> * @return <code>File</code> object associated with the generated file
> *
> * @throws Exception
> */
> private File streamToFile(InputStream inStream, String fileName)
> throws Exception
> {
> /* Data will be read from input stream in chunks of size e.g. 4KB. */
> final int chunkSize = 4096;
> byte[] byteArray = new byte[chunkSize];
> /* Open the stream for buffered reading. */
> InputStream bufInStream = new BufferedInputStream(inStream);
> /* Create an empty file (if the file already exists, it will be left
> untouched)
> to store the supplied bitstream... */
> File file = new File(fileName);
> file.createNewFile();
> /* ...and associate a buffered output stream with it. */
> OutputStream bufOutStream = new BufferedOutputStream(new
> FileOutputStream(file));
> /* Copy data from input stream to newly generated file. */
> int readBytes = -1;
> while ((readBytes = bufInStream.read(byteArray, 0, chunkSize)) != -1)
> bufOutStream.write(byteArray, 0, readBytes);
> /* Stop transactions to the file system... */
> bufOutStream.close();
> /* ...and return result. */
> return file;
> }
> /**
> * Produce input stream from a given file on the file system.
> * <p>WARNING! No exception handling!</p>
> *
> * @param file <code>File</code> object associated with the given file
> *
> * @return input stream containing the data read from file
> *
> *...@throws Exception
> */
> private InputStream fileToStream(File file, boolean verbose) throws
> Exception
> {
> /* Open the stream for reading. */
> InputStream inStream = new FileInputStream(file);
> /* Allocate necessary memory for data buffer. */
> byte[] byteArray = new byte[(int)file.length()];
> /* Load file contents into buffer. */
> inStream.read(byteArray);
> /* And imediately close transactions with the file system. */
> inStream.close();
> /* If required to send the retrieved data to standard output... */
> if (verbose)
> {
> /* Open the file again, but this tim handle it as a character stream...
> */
> BufferedReader bufReader = new BufferedReader(new FileReader(file));
> /* ...then print its contents line by line to the standard output... */
> String lineOfText = null;
> while ((lineOfText = bufReader.readLine()) != null)
> System.out.println(lineOfText);
> /* ...and close connection to the file. */
> bufReader.close();
> }
> /* Finally, generate and return input stream containing desired data. */
> return new ByteArrayInputStream(byteArray);
> }
> }
> 5) Compilation/recompilation
> cd [dspace-source]/dspace/dspace-1.5.0-src-release/dspace/
> mvn package
> 6) Install or for recompilation - {edit work bitstream-formats.xml &
> dspace.cfg as above and replace dspace-api-1.5.0.jar from folders
> webapps/jspui/WEB-INF/lib/, lib/, webapps/lni/WEB-INF/lib/,
> webapps/oai/WEB-INF/lib/, webapps/xmlui/WEB-INF/lib/ by compiled
> [dspace-source]/dspace-api/target/dspace-api-1.5.0.jar}
> 7) Don't forgive restart Tomcat and run
> /usr/share/dspace/bin/filter-media
> With best regards
> Serhij Dubyk
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel