DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=35208>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=35208 Summary: [PATCH] HSLF Update: new (quicker but greedy) text extractor Product: POI Version: unspecified Platform: Other OS/Version: other Status: NEW Severity: normal Priority: P2 Component: POI Overall AssignedTo: poi-dev@jakarta.apache.org ReportedBy: [EMAIL PROTECTED] To quote from the javadoc of this single class: * This class will get all the text from a Powerpoint Document, including * all the bits you didn't want, and in a somewhat random order, but will * do it very fast. * The class ignores most of the hslf classes, and doesn't use * HSLFSlideShow. Instead, it just does a very basic scan through the * file, grabbing all the text records as it goes. It then returns the * text, either as a single string, or as a vector of all the individual * strings. * Because of how it works, it will return a lot of "crud" text that you * probably didn't want! It will return text from master slides. It will * return duplicate text, and some mangled text (powerpoint files often * have duplicate copies of slide text in them). You don't get any idea * what the text was associated with. * Almost everyone will want to use @see PowerPointExtractor instead. There * are only a very small number of cases (eg some performance sensitive * lucene indexers) that would ever want to use this! File should go in org.apache.poi.hslf.extractor. Also needs a single line change in org.apache.poi.hslf.record.Record: Index: Record.java =================================================================== RCS file: /home/cvspublic/jakarta-poi/src/scratchpad/src/org/apache/poi/hslf/record/Record.java,v retrieving revision 1.1 diff -u -r1.1 Record.java --- Record.java 28 May 2005 05:36:00 -0000 1.1 +++ Record.java 3 Jun 2005 16:31:00 -0000 @@ -122,7 +122,7 @@ * (not including the size of the header), this code assumes you're * passing in corrected lengths */ - protected static Record createRecordForType(long type, byte[] b, int start, int len) { + public static Record createRecordForType(long type, byte[] b, int start, int len) { // Default is to use UnknownRecordPlaceholder // When you create classes for new Records, add them here switch((int)type) { -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/