ID:               27238
 User updated by:  philip at nancarrow dot net
 Reported By:      philip at nancarrow dot net
 Status:           Open
 Bug Type:         Feature/Change Request
 Operating System: Windows and Linux
 PHP Version:      4.3.4
 New Comment:

Pierre,

OK sure, I've put two JPEGs that include IIM record 1 at:

http://www.nancarrow.net/download/testpic1_latin1.jpg

[Latin1 encoded English]

and

http://www.nancarrow.net/download/testpic2_utf8.jpg

[UTF8 encoded Chinese]



The IPTC/NAA (aka "IIM") spec is freely downloadable from
http://www.iptc.org/download/download.php?fn=IIMV4.1.pdf and this
details all records include record 1.



Appendix C lists the currently defined character sets, which is
specified in dataset 1:90. Note the strange IPTC terminology - an
"octet" is a byte, so "octet 2/5" means 0x25. The character set
sequence starts with ESC, so where it says ISO-8859-1 is "intermediate
character 2/12 to 2/15" followed by "octet 4/1" this would be something
like:

ESC,0x2F,0x41

or "ESC/A". Similarly UTF8 is ESC,2/5,4/7 or "ESC%G".

Where the spec says "intermediate character 2/12 to 2/15" most creators
writing the file use the end character, ie. 2/15 in this case.



I'm not sure that PHP really needs to know about the encoding, does it
? Since strings are just byte sequences in PHP I guess it's down to the
application to do the appropriate encoding/decoding... as long as they
have access to the character set of course !



Thanks

Philip


Previous Comments:
------------------------------------------------------------------------

[2004-02-13 09:29:23] [EMAIL PROTECTED]

> I can provide you with JPEG files containing IIM record 1

> if required; they're quite common in the news industry.



Please do :)

If you can provide an URL with some images with the required fields and
a txt file for the expected result.



Note that I never read the charset part in any docs about IPTC
standart. Have you a link that describes it?



pierre

------------------------------------------------------------------------

[2004-02-13 06:27:10] philip at nancarrow dot net

Description:
------------
The iptcparse() function (GD extension) only returns IPTC/NAA records 2
and upward, skipping past record 1. This appears to be by design, but
means that the returned data is incomplete, for example the
"destination" dataset 1:05 is missing. Worse that this is the fact that
"coded character set" (1:90) is missing, and without this value the
encoding of the data is unknown (for example if 1:90 specifies ESC,%,G
the data is UTF8 encoded). I assume that the current implementation is
defaulting to ASCII or Latin1 encoding.

I can provide you with JPEG files containing IIM record 1 if required;
they're quite common in the news industry.

Thank you





------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=27238&edit=1

Reply via email to