Re: doc on hpsf thumbnails for macintosh
Hi Craig, Very interesting. I can vouch for most of this from a PICT generator I wrote years ago in Fortran. Other resolutions than 72 dpi are possible. My code also produces 300 dpi PICTs. It's a pretty nice drawing file format, but I would not compare it with SVG - it's from the original MAC and more like WMF. Also, you must be really aware of your raster, particularly if you are aligning characters at a small font size. The only correction I found - the 512 byte null header should have non-null content in the first 8 bytes. 6 bytes - 'PICTMD' 2 bytes - integer value [00 06] But perhaps Office doesn't care about that and the Mac's Clipboard handles that. Regards, Dave On Oct 6, 2010, at 3:01 PM, Craig Stires wrote: Hi dev team, This is a bit of a long email, but I wanted to pass on the research that I've been doing, and some recommendations for changes to the HPSF thumbnailing API. I have needed to extract thumbnails from a set of Microsoft Office docs. They have been produced on Windows, and on Mac. The existing org.apache.poi.hpsf.Thumbnail class handles the Windows case (CFTAG_WINDOWS CF_METAFILEPICT). However, it does not handle the Macintosh case (CFTAG_MACINTOSH CF_MACQD). The Macintosh thumbnails are stored in QuickDraw format (extended version 2). This is the Mac-proprietary SVG equivalent. The thumbnail has a marker at the beginning of the clipboard data, PICT. It needs to be replaced with 512 null bytes. References: http://www.fileformat.info/format/macpict/egff.htm http://developer.apple.com/legacy/mac/library/documentation/mac/QuickDraw/Qu ickDraw-462.html#HEADING462-0 I have managed to create readable files, after a bit of manipulation of the clipboard data. Here is the high-level process for getting a file in a valid format. Overview of extraction steps 01. Get the summary information from the document (005SummaryInformation) 02. Get the thumbnail object from summary information 03. Get the clipboard format tag from the thumbnail object 04. Confirm that cftag==CFTAG_MACINTOSH 05. Get the thumbnail data from the thumbnail object 06. Confirm that substr(thumbdata,Thumbnail.OFFSET_CF,PICT.length())==PICT 07. Create a byte array with a 512-byte x00 header 08. Append the byte array with substr(thumbdata, Thumbnail.OFFSET_CF + PICT.length(), thumbdata.length() - Thumbnail.OFFSET_CF - PICT.length()) 09. Return the byte array, or write to file (extension PICT, PCT, or PIC. mime image/x-pict) Specifications of the Macintosh clipboard formats 4 byte (ascii) - clipboard data format [PICT] 2 byte - picture size (byte count) 8 byte - bounding rectangle of picture [ x1 y1 x2 y2 ] 2 byte - VersionOp opcode [00 11] 2 byte - Version opcode [02 FF] 2 byte - Header opcode [0C 00] 24 byte - header information - 2 byte - picture version ( -1 = version 2 ; -2 = extended version 2 ) - 2 byte - reserved (unused) [ 00 00 ] - 4 byte - horizontal res [ 00 48 00 00 = 72 dpi ] - 4 byte - vertical res [ 00 48 00 00 = 72 dpi ] - 8 byte - source rectangle of picture [ x1 y1 x2 y2 ] - 2 byte - reserved (unused) [ 00 00 ] - 2 byte - reserved (unused) [ 00 00 ] Recommendations for change to org.apache.poi.hpsf.Thumbnail public static int CF_MACQD = 15; public static int OFFSET_MACQDDATA = 12; private static String TAG_MACQD = PICT; public long getClipboardFormat() throws HPSFException { long clipboardformat = 0; if (getClipboardFormatTag() == CFTAG_WINDOWS) { clipboardformat = LittleEndian.getUInt(getThumbnail(), OFFSET_CF); } else if (getClipboardFormatTag() == CFTAG_MACINTOSH) { String cftype = new String(getThumbnail(), Thumbnail.OFFSET_CF, TAG_MACQD.length()); if (cftype.matches(TAG_MACQD)) { clipboardformat = CF_MACQD; } else { throw new HPSFException(Clipboard Format Tag of Thumbnail must be + TAG_MACQD + for CFTAG_MACINTOSH); } } else { throw new HPSFException(Clipboard Format Tag of Thumbnail must be + CFTAG_WINDOWS or CFTAG_MACINTOSH ); } return clipboardformat; } public byte[] getThumbnailAsPICT() throws HPSFException { if (!(getClipboardFormatTag() == CFTAG_MACINTOSH)) throw new HPSFException(Clipboard Format Tag of Thumbnail must + be CFTAG_MACINTOSH.); if (!(getClipboardFormat() == CF_MACQD)) throw new HPSFException(Clipboard Format of Thumbnail must + be CF_MACQD.); else { byte[] thumbnail = getThumbnail();
DO NOT REPLY [Bug 49908] Add API for processing of symbols
https://issues.apache.org/bugzilla/show_bug.cgi?id=49908 --- Comment #3 from a6537...@bofthew.com 2010-10-07 02:27:24 EDT --- Created an attachment (id=26130) -- (https://issues.apache.org/bugzilla/attachment.cgi?id=26130) Patch to support symbols (with testcase) I added the methods to CharacterRun, as processing of symbols is directly associated with particular character run and the processing does not need other information (as the case with pictures). Methods are documented. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org
DO NOT REPLY [Bug 49908] Add API for processing of symbols
https://issues.apache.org/bugzilla/show_bug.cgi?id=49908 --- Comment #4 from a6537...@bofthew.com 2010-10-07 02:27:45 EDT --- Created an attachment (id=26131) -- (https://issues.apache.org/bugzilla/attachment.cgi?id=26131) New files not included in the diff -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org
DO NOT REPLY [Bug 49908] Add API for processing of symbols
https://issues.apache.org/bugzilla/show_bug.cgi?id=49908 a6537...@bofthew.com changed: What|Removed |Added Status|NEEDINFO|NEW -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org
DO NOT REPLY [Bug 50052] New: Add support for formating of list numbers
https://issues.apache.org/bugzilla/show_bug.cgi?id=50052 Summary: Add support for formating of list numbers Product: POI Version: 3.7-dev Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: HWPF AssignedTo: dev@poi.apache.org ReportedBy: a6537...@bofthew.com The LVLF structure contains both PAPX and CHPX. PAPX contains additional SPRMs to be applied to paragraph, CHPX contains SPRMs for formatting the list number. I will upload a patch that modifies the ListEntry class: in the constructor the internal _props variable will be modified: first SPRMs from ListLevel will be applied, then the SPRMs from Paragraph. Additionally, getter for NumberProperties will be added to ListLevel. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org
[GUMP@vmgump]: Project poi (in module poi) failed
To whom it may engage... This is an automated request, but not an unsolicited one. For more information please visit http://gump.apache.org/nagged.html, and/or contact the folk at gene...@gump.apache.org. Project poi has an issue affecting its community integration. This issue affects 3 projects. The current state of this project is 'Failed', with reason 'Missing Build Outputs'. For reference only, the following projects are affected by this: - org.apache.poi : POI - poi : POI - poi-test : POI Full details are available at: http://vmgump.apache.org/gump/public/poi/poi/index.html That said, some information snippets are provided here. The following annotations (debug/informational/warning/error messages) were provided: -INFO- Failed with reason missing build outputs -ERROR- Missing Output: /srv/gump/public/workspace/poi/build/dist/poi-contrib-gump-07102010.jar -ERROR- See Directory Listing Work for Missing Outputs -DEBUG- Extracted fallback artifacts from Gump Repository The following work was performed: http://vmgump.apache.org/gump/public/poi/poi/gump_work/build_poi_poi.html Work Name: build_poi_poi (Type: Build) Work ended in a state of : Success Elapsed: 2 mins 46 secs Command Line: /usr/lib/jvm/java-6-openjdk/bin/java -Djava.awt.headless=true -Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar:/srv/gump/public/workspace/xml-xalan/build/xalan-unbundled.jar:/srv/gump/public/workspace/xml-xalan/build/serializer.jar org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml -Dbuild.sysclasspath=only -Dversion.id=gump -DDSTAMP=07102010 jar [Working Directory: /srv/gump/public/workspace/poi] CLASSPATH: /usr/lib/jvm/java-6-openjdk/lib/tools.jar:/srv/gump/public/workspace/poi/ooxml-lib/openxml4j-1.0-beta.jar:/srv/gump/public/workspace/poi/build/classes:/srv/gump/public/workspace/poi/build/contrib-classes:/srv/gump/public/workspace/poi/build/scratchpad-classes:/srv/gump/public/workspace/poi/build/ooxml-classes:/srv/gump/public/workspace/poi/build/test-classes:/srv/gump/public/workspace/poi/build/scratchpad-test-classes:/srv/gump/public/workspace/poi/build/ooxml-test-classes:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-xalan2.jar:/srv/gump/public/workspace/logging-log4j-12/dist/lib/log4j-07102010.jar:/srv/gump/public/workspace/apache-comm ons/logging/target/commons-logging-07102010.jar:/srv/gump/public/workspace/apache-commons/logging/target/commons-logging-api-07102010.jar:/srv/gump/public/workspace/apache-commons/beanutils/dist/commons-beanutils-07102010.jar:/srv/gump/public/workspace/commons-collections-3.x/target/commons-collections-3.3-SNAPSHOT.jar:/srv/gump/public/workspace/commons-lang-2.x/target/commons-lang-2.6-SNAPSHOT.jar:/srv/gump/public/workspace/junit/dist/junit-07102010.jar:/srv/gump/public/workspace/junit/dist/junit-dep-07102010.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/dom4j/build/dom4j.jar:/srv/gump/public/workspace/poi/ooxml-lib/geronimo-stax-api_1.0_spec-1.0.jar:/srv/gump/public/workspace/poi/ooxml-lib/xmlbeans-2.3.0.jar:/srv/gump/public/workspace/poi/ooxml-lib/ooxml-schemas-1.1.jar - compile-examples: [javac] Compiling 109 source files to /srv/gump/public/workspace/poi/build/examples-classes [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [copy] Copying 1 file to /srv/gump/public/workspace/poi/build/examples-classes compile: compile-ooxml-lite: [java] Collecting unit tests from /srv/gump/public/workspace/poi/build/ooxml-test-classes [java] . [java] . [java] . [java] . [java] . [java] . [java] . [java] . [java] . [java] . [java] . [java] . [java] ...WARNING: DateFormatTests.xlsx: Flag AllColors = false [not true] [java] WARNING: DateFormatTests.xlsx: Flag Categories = Debug [not ] [java] .. [java]
DO NOT REPLY [Bug 50052] Add support for formating of list numbers
https://issues.apache.org/bugzilla/show_bug.cgi?id=50052 --- Comment #1 from a6537...@bofthew.com 2010-10-07 04:56:05 EDT --- Created an attachment (id=26133) -- (https://issues.apache.org/bugzilla/attachment.cgi?id=26133) Patch to support list number format (with testcase) The patch contains one inconsistency, but I don't know how to handle it. See the FIXME comment in the test and in the ListEntry class. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org
DO NOT REPLY [Bug 50052] Add support for formating of list numbers
https://issues.apache.org/bugzilla/show_bug.cgi?id=50052 --- Comment #2 from a6537...@bofthew.com 2010-10-07 04:56:26 EDT --- Created an attachment (id=26134) -- (https://issues.apache.org/bugzilla/attachment.cgi?id=26134) Added files for the patch -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org
Re: how to add functions to POI
On 10/6/2010 8:10 AM, Jon Svede wrote: I am willing to take stab at implementing this. I think I can do the properties file part on my own based on what you've provided below. In this case, I assume it will search for a specific file name? Or should it be something like a -D argument passed in? Or maybe both? (a default file name and property to override it?) A system property set via -D is OK for a start. Let's stick to the name poi.udf.properties. The error message for unknown functions sounds a little more involved, can you give me some ideas of where to look to understand the intricacies? It requires some research. I will post my ideas later, when I have a minute to look into it. Yegor Jon - Original Message From: Yegor Kozlovye...@dinom.ru To: dev@poi.apache.org Sent: Tue, October 5, 2010 7:02:19 AM Subject: Re: how to add functions to POI As far as patching it, I don't think it *needs* a patch. One suggestion might be that when POI encounters a custom function and it throws an exception, the comments of the exception point users to the UDFFinder docs. This sounds like a good idea. If the formula evaluator stumbles on an unknown function then the exception should suggest a workaround with UDF. There can be tricky details - we need to tell unknown defined names from unknown functions, this detection should work transparently for .xls and .xlsx formats, etc. But anyway, I like the idea. Your idea with configuration of UDFs also makes sense but I don't want to make it too complicated. POI does not depend on Spring and I don't think we will add a new dependency just for configuration of UDFs. So, Java properties is the way to go. In current implementation if UDFFinder is not specified then a default instance is used, see UDFFinder.java: public static final UDFFinder DEFAULT = new AggregatingUDFFinder(AnalysisToolPak.instance); I think it should be re-written into something like this public static final UDFFinder DEFAULT = new AggregatingUDFFinder.getInstance(); where getInstance() will search for a configuration file (system property or classpath) and programmatically register UDFs, just like in the test case. This way future versions of POI will be compatible with existing user code. If support for custom VBA functions is needed then all user needs to do is to implement the VBA code as FreeRefFunction and register it in the configuration file. Yegor - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org
DO NOT REPLY [Bug 49908] Add API for processing of symbols
https://issues.apache.org/bugzilla/show_bug.cgi?id=49908 Yegor Kozlov ye...@dinom.ru changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #5 from Yegor Kozlov ye...@dinom.ru 2010-10-07 09:42:38 EDT --- Applied in r1005443 Thanks, Yegor -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org
DO NOT REPLY [Bug 49919] Implement support for BorderCode
https://issues.apache.org/bugzilla/show_bug.cgi?id=49919 Yegor Kozlov ye...@dinom.ru changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #6 from Yegor Kozlov ye...@dinom.ru 2010-10-07 09:58:13 EDT --- Applied in r1005447 Thanks, Yegor -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org
DO NOT REPLY [Bug 50052] Add support for formating of list numbers
https://issues.apache.org/bugzilla/show_bug.cgi?id=50052 Yegor Kozlov ye...@dinom.ru changed: What|Removed |Added Status|NEW |NEEDINFO --- Comment #3 from Yegor Kozlov ye...@dinom.ru 2010-10-07 10:14:25 EDT --- The patch breaks a unit test. I'm getting the following exception if I apply the supplied code to trunk (r1005447) : java.lang.IndexOutOfBoundsException: Index: 8, Size: 8 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.poi.hwpf.usermodel.Range.findRange(Range.java:976) at org.apache.poi.hwpf.usermodel.Range.initCharacterRuns(Range.java:925) at org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:785) at org.apache.poi.hwpf.usermodel.ListEntry.init(ListEntry.java:61) at org.apache.poi.hwpf.usermodel.Range.getParagraph(Range.java:831) at org.apache.poi.hwpf.usermodel.TestLists.testWriteRead(TestLists.java:206) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Yegor -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. - To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org
RE: doc on hpsf thumbnails for macintosh
Hi David, For anyone else who is looking to pursue extracting the Mac Office thumbnails, I have been able to render them to jpg (using Graphics2D). Found an old project by Matthias Wiesmann (JavaQuickdraw, circa 1999). Got in touch with him, and he redirected me to a great set of plugins which have been released under the BSD license by Harald Kuhr, to expand the formats available for R/W by ImageIO. https://twelvemonkeys-imageio.dev.java.net/download.html Great stuff. If Rainer Klute (or the person committing on this module) is willing to adopt the changes I've recommended below, then users will have a very simple set of code to pull these thumbnails into a BufferedImage. -- Thumbnail thumbobj = new Thumbnail(si.getThumbnail()); if (thumbobj.getClipboardFormatTag() == CFTAG_MACINTOSH) { byte[] thumbdata = thumbobj.getThumbnailAsPICT(); BufferedImage bi = ImageIO.read(new ByteArrayInputStream(thumbdata)); } -- I've attached a patch with the changes initially proposed below. David, the patch does not include the 8 bytes on the header, as I am getting the BufferedImage to work without it, and to open the saved files with QTViewer. I don't have sample files that need the header change yet, but will post if I find them. Thanks, -Craig -Original Message- From: David Fisher [mailto:dfis...@jmlafferty.com] Sent: Thursday, 7 October 2010 1:12 PM To: POI Developers List Cc: kl...@apache.org Subject: Re: doc on hpsf thumbnails for macintosh Hi Craig, Very interesting. I can vouch for most of this from a PICT generator I wrote years ago in Fortran. Other resolutions than 72 dpi are possible. My code also produces 300 dpi PICTs. It's a pretty nice drawing file format, but I would not compare it with SVG - it's from the original MAC and more like WMF. Also, you must be really aware of your raster, particularly if you are aligning characters at a small font size. The only correction I found - the 512 byte null header should have non-null content in the first 8 bytes. 6 bytes - 'PICTMD' 2 bytes - integer value [00 06] But perhaps Office doesn't care about that and the Mac's Clipboard handles that. Regards, Dave On Oct 6, 2010, at 3:01 PM, Craig Stires wrote: Hi dev team, This is a bit of a long email, but I wanted to pass on the research that I've been doing, and some recommendations for changes to the HPSF thumbnailing API. I have needed to extract thumbnails from a set of Microsoft Office docs. They have been produced on Windows, and on Mac. The existing org.apache.poi.hpsf.Thumbnail class handles the Windows case (CFTAG_WINDOWS CF_METAFILEPICT). However, it does not handle the Macintosh case (CFTAG_MACINTOSH CF_MACQD). The Macintosh thumbnails are stored in QuickDraw format (extended version 2). This is the Mac-proprietary SVG equivalent. The thumbnail has a marker at the beginning of the clipboard data, PICT. It needs to be replaced with 512 null bytes. References: http://www.fileformat.info/format/macpict/egff.htm http://developer.apple.com/legacy/mac/library/documentation/mac/QuickDraw/Qu ickDraw-462.html#HEADING462-0 I have managed to create readable files, after a bit of manipulation of the clipboard data. Here is the high-level process for getting a file in a valid format. Overview of extraction steps 01. Get the summary information from the document (005SummaryInformation) 02. Get the thumbnail object from summary information 03. Get the clipboard format tag from the thumbnail object 04. Confirm that cftag==CFTAG_MACINTOSH 05. Get the thumbnail data from the thumbnail object 06. Confirm that substr(thumbdata,Thumbnail.OFFSET_CF,PICT.length())==PICT 07. Create a byte array with a 512-byte x00 header 08. Append the byte array with substr(thumbdata, Thumbnail.OFFSET_CF + PICT.length(), thumbdata.length() - Thumbnail.OFFSET_CF - PICT.length()) 09. Return the byte array, or write to file (extension PICT, PCT, or PIC. mime image/x-pict) Specifications of the Macintosh clipboard formats 4 byte (ascii) - clipboard data format [PICT] 2 byte - picture size (byte count) 8 byte - bounding rectangle of picture [ x1 y1 x2 y2 ] 2 byte - VersionOp opcode [00 11] 2 byte - Version opcode [02 FF] 2 byte - Header opcode [0C 00] 24 byte - header information - 2 byte - picture version ( -1 = version 2 ; -2 = extended version 2 ) - 2 byte - reserved (unused) [ 00 00 ] - 4 byte - horizontal res [ 00 48 00 00 = 72 dpi ] - 4 byte - vertical res [ 00 48 00 00 = 72 dpi ] - 8 byte - source rectangle of picture [ x1 y1 x2 y2 ] - 2 byte - reserved (unused) [ 00 00 ] - 2 byte - reserved (unused) [ 00 00 ] Recommendations for change to org.apache.poi.hpsf.Thumbnail public static int CF_MACQD = 15; public static int OFFSET_MACQDDATA = 12;