Some encoding bugs in shape file writer
---------------------------------------
Key: GEOT-2999
URL: http://jira.codehaus.org/browse/GEOT-2999
Project: GeoTools
Issue Type: Bug
Components: data shapefile
Affects Versions: 2.6.2
Environment: Windows XP, Java 6 Update 12, default encoding is
windows-1257
Reporter: Jaan Vajakas
Assignee: Andrea Aime
Attachments: UTF8SHPWritingTest.java
There are at least three bugs in
DbaseFileWriter.FieldFormatter.getFieldString(int size, String s):
* if the encoding of the output shapefile is UTF-8 and the value written to it
has more bytes than the field length but less characters than the field length
then a StringIndexOutOfBoundsException may occur;
* if the encoding of the output shapefile is UTF-8 and the value written to it
has more bytes than the field length then an empty string may be written, for
two different reasons.
See the attached testcase (the source is in UTF-8 encoding).
It seems to me that it would be better to use byte arrays or byte buffers
instead of strings as the return values of the getFieldString(...) methods
because
* this way we could use java.nio.charset.CharsetEncoder.encode(CharBuffer,
ByteBuffer, boolean) to encode as many characters as possible, so that
getFieldString(int size, String s) would only have to pad the remaining spaces;
* performance might improve, as encoding would take place only one time per
value instead of the two or three times it takes now, and less temporary
objects would be created;
* the nature of the return values of the getFieldString(...) methods seems to
be bytes rather than characters as suggested e.g. by the following comment in
DbaseFileWriter.java: "Adding the charset to getBytes causes the output to get
altered for the '@: Timestamp' field. And using getBytes returns a different
array in 64-bit platforms so we expect chars and cast to byte just before
writing.".
And what about UTF-16? Can a DBF file be encoded in a non-ASCII-compliant
encoding? It seems that currently geotools lets you set UTF-16 as DBF output
encoding but it only writes zeros instead of data.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel