RE: Getting a PIC - And errors found in EscherDump (got fixes if want)
No, they are stored in the Datastream BUT not in the format that the documentation states. But using the Escher format you should be able to grab MOST (not all) picture data, as images inserted into a Word file from Word '97 or later are now stored as Escher objects, even if they're not drawings but jpegs, etc. The documentation states that the file is saved as a PIC header followed by the filename as a Pascal string and then the file data. That is not even remotely close to what actually exists there. Instead, there's the PIC header structure, then IF it's an Escher object, you've got the insane Escher heading structure (similar to, but even worse than the grppls of srpms) and then the actual file data. Hope this helps! (BTW, did anyone notice the oddly sexual nature of the Word naming structure? A whole host of sprms everywhere, which are linked to STDs, which of course require a PAP to discern and was all preceded by a whole lot of grppl-ing) -JK From: Kais Dukes [EMAIL PROTECTED] Reply-To: POI Developers List poi-dev@jakarta.apache.org To: POI Developers List poi-dev@jakarta.apache.org Subject: RE: Getting a PIC Date: Wed, 20 Apr 2005 23:33:59 +0100 Hi Robert, I am most interested in what you have found. Are you saying that the picture data for some Escher images are not stored in the exepcted place (the document's Data stream?) but are instead embedded as part of the complex stream? Kind Regards, Kais -Original Message- From: Robert Paris [mailto:[EMAIL PROTECTED] Sent: 20 April 2005 22:06 To: poi-dev@jakarta.apache.org Subject: RE: Getting a PIC Thanks for the reply. OH, if only it were so simple. I believe I found it, and as with all other Word formats, the thing is a mess. You have to loop through and when you find the right record (and check a thousand fWhateverBooleans and option shorts), you then have to parse the complex data, and it appears to be stored in there. Of course, none of this follows the MS Binary Format writings and is found pretty much no where on the web. Ugh. But thankfully it appears the good folks at POI (non-scratchpad area) have done some great work in this area to get me started. Thanks again! From: Kais Dukes [EMAIL PROTECTED] Reply-To: POI Developers List poi-dev@jakarta.apache.org To: POI Developers List poi-dev@jakarta.apache.org Subject: RE: Getting a PIC Date: Wed, 20 Apr 2005 18:49:02 +0100 Hi Robert, Although I have not looked at the BSE record code myself, I have some information from my own work on Escher diagrams. A BSE record contains a fixed size header, and then may be followed by an optional string (2 bytes per character). Could this string be the file name you have described? -- Kais -Original Message- From: Robert Paris [mailto:[EMAIL PROTECTED] Sent: 20 April 2005 18:26 To: poi-dev@jakarta.apache.org Subject: Re: Getting a PIC Thanks for the reply. Yes, it does appear to be an Escher BSE Record, however, there seems to be an issue with grabbing some of the info inside it. When I look at the actual data in the byte stream, I can see the file path and name in the data (e.g. D : \ F i l e s \ S o m e I m a g e . j p g ), yet I cannot find that data anywhere inside either POI's EscherBSE Record reading (from 0xF007) nor in any other documentation I've found on that. None of the tags seem to hold that info. Any idea where I read it from? Attempts to read from the case 0xF007 don't work because by the time it hits that tag marker, it's already past the path/filename string and when it reads the name length (at offset 33), it always has length = 0. Thanks again for your help and time! From: Avik Sengupta [EMAIL PROTECTED] Reply-To: POI Developers List poi-dev@jakarta.apache.org To: POI Developers List poi-dev@jakarta.apache.org Subject: Re: Getting a PIC Date: Wed, 20 Apr 2005 12:39:53 +0530 Have you seen the drawing code in HSSF? Maybe its similar/same? On Wed, 2005-04-20 at 03:02 +, Robert Paris wrote: I'm working on the part of Word that stores pictures and I've run into a problem. I'm able to grab the PIC structure (from the SPRM sprmCPicLocation). However, once I've gone through that, I have a chunk of data that I believe is an Office Shape Format. Unfortunately, I am unable to find the definition for this structure anywhere. Does anyone know where it is? The documentation for Word 97 says that all pictures inserted with Word 97 are in the new Office shape format (documented elsewhere). Without that documentation, I have no way to read this data! Anyone? - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/ -- - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List
Re: Getting a PIC
Have you seen the drawing code in HSSF? Maybe its similar/same? On Wed, 2005-04-20 at 03:02 +, Robert Paris wrote: I'm working on the part of Word that stores pictures and I've run into a problem. I'm able to grab the PIC structure (from the SPRM sprmCPicLocation). However, once I've gone through that, I have a chunk of data that I believe is an Office Shape Format. Unfortunately, I am unable to find the definition for this structure anywhere. Does anyone know where it is? The documentation for Word 97 says that all pictures inserted with Word 97 are in the new Office shape format (documented elsewhere). Without that documentation, I have no way to read this data! Anyone? - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/ -- - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
RE: Getting a PIC
Hi Robert, Although I have not looked at the BSE record code myself, I have some information from my own work on Escher diagrams. A BSE record contains a fixed size header, and then may be followed by an optional string (2 bytes per character). Could this string be the file name you have described? -- Kais -Original Message- From: Robert Paris [mailto:[EMAIL PROTECTED] Sent: 20 April 2005 18:26 To: poi-dev@jakarta.apache.org Subject: Re: Getting a PIC Thanks for the reply. Yes, it does appear to be an Escher BSE Record, however, there seems to be an issue with grabbing some of the info inside it. When I look at the actual data in the byte stream, I can see the file path and name in the data (e.g. D : \ F i l e s \ S o m e I m a g e . j p g ), yet I cannot find that data anywhere inside either POI's EscherBSE Record reading (from 0xF007) nor in any other documentation I've found on that. None of the tags seem to hold that info. Any idea where I read it from? Attempts to read from the case 0xF007 don't work because by the time it hits that tag marker, it's already past the path/filename string and when it reads the name length (at offset 33), it always has length = 0. Thanks again for your help and time! From: Avik Sengupta [EMAIL PROTECTED] Reply-To: POI Developers List poi-dev@jakarta.apache.org To: POI Developers List poi-dev@jakarta.apache.org Subject: Re: Getting a PIC Date: Wed, 20 Apr 2005 12:39:53 +0530 Have you seen the drawing code in HSSF? Maybe its similar/same? On Wed, 2005-04-20 at 03:02 +, Robert Paris wrote: I'm working on the part of Word that stores pictures and I've run into a problem. I'm able to grab the PIC structure (from the SPRM sprmCPicLocation). However, once I've gone through that, I have a chunk of data that I believe is an Office Shape Format. Unfortunately, I am unable to find the definition for this structure anywhere. Does anyone know where it is? The documentation for Word 97 says that all pictures inserted with Word 97 are in the new Office shape format (documented elsewhere). Without that documentation, I have no way to read this data! Anyone? - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/ -- - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/ - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/ -- No virus found in this incoming message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005 -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005 - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
RE: Getting a PIC
Thanks for the reply. OH, if only it were so simple. I believe I found it, and as with all other Word formats, the thing is a mess. You have to loop through and when you find the right record (and check a thousand fWhateverBooleans and option shorts), you then have to parse the complex data, and it appears to be stored in there. Of course, none of this follows the MS Binary Format writings and is found pretty much no where on the web. Ugh. But thankfully it appears the good folks at POI (non-scratchpad area) have done some great work in this area to get me started. Thanks again! From: Kais Dukes [EMAIL PROTECTED] Reply-To: POI Developers List poi-dev@jakarta.apache.org To: POI Developers List poi-dev@jakarta.apache.org Subject: RE: Getting a PIC Date: Wed, 20 Apr 2005 18:49:02 +0100 Hi Robert, Although I have not looked at the BSE record code myself, I have some information from my own work on Escher diagrams. A BSE record contains a fixed size header, and then may be followed by an optional string (2 bytes per character). Could this string be the file name you have described? -- Kais -Original Message- From: Robert Paris [mailto:[EMAIL PROTECTED] Sent: 20 April 2005 18:26 To: poi-dev@jakarta.apache.org Subject: Re: Getting a PIC Thanks for the reply. Yes, it does appear to be an Escher BSE Record, however, there seems to be an issue with grabbing some of the info inside it. When I look at the actual data in the byte stream, I can see the file path and name in the data (e.g. D : \ F i l e s \ S o m e I m a g e . j p g ), yet I cannot find that data anywhere inside either POI's EscherBSE Record reading (from 0xF007) nor in any other documentation I've found on that. None of the tags seem to hold that info. Any idea where I read it from? Attempts to read from the case 0xF007 don't work because by the time it hits that tag marker, it's already past the path/filename string and when it reads the name length (at offset 33), it always has length = 0. Thanks again for your help and time! From: Avik Sengupta [EMAIL PROTECTED] Reply-To: POI Developers List poi-dev@jakarta.apache.org To: POI Developers List poi-dev@jakarta.apache.org Subject: Re: Getting a PIC Date: Wed, 20 Apr 2005 12:39:53 +0530 Have you seen the drawing code in HSSF? Maybe its similar/same? On Wed, 2005-04-20 at 03:02 +, Robert Paris wrote: I'm working on the part of Word that stores pictures and I've run into a problem. I'm able to grab the PIC structure (from the SPRM sprmCPicLocation). However, once I've gone through that, I have a chunk of data that I believe is an Office Shape Format. Unfortunately, I am unable to find the definition for this structure anywhere. Does anyone know where it is? The documentation for Word 97 says that all pictures inserted with Word 97 are in the new Office shape format (documented elsewhere). Without that documentation, I have no way to read this data! Anyone? - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/ -- - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/ - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/ -- No virus found in this incoming message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005 -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005 - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/ - To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List:http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/