Re: [Qgis-user] [1.8.0] Broken UTF-8 support in shapefiles - and workaround

2012-06-29 Thread 欧阳乐岩


On 06/29/2012 05:44 PM, Alexander Bruy wrote:

Hi Even,

2012/6/29 Even Rouault :

the situation, and we should strive for more constructive cooperation. I think..


Dmitry Baryshnikov and I, we already working on this issue. Ticket
#4650 is a part of our work. We also run a lot of tests in different
environments
and data to find all possible parts of code that needs improvements. Hope soon
we'll have patch or patches to completely solve this issue



That's great news. This issue is a blocker in a large part of the world, 
and I can't wait to see it solved ! Right now I have to stop pushing for 
QGis as 1.8 is currently impossible to use in China...

___
Qgis-user mailing list
Qgis-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/qgis-user


Re: [Qgis-user] [1.8.0] Broken UTF-8 support in shapefiles - and workaround

2012-06-29 Thread Alexander Bruy
Hi Even,

2012/6/29 Even Rouault :
> the situation, and we should strive for more constructive cooperation. I 
> think..

Dmitry Baryshnikov and I, we already working on this issue. Ticket
#4650 is a part of our work. We also run a lot of tests in different
environments
and data to find all possible parts of code that needs improvements. Hope soon
we'll have patch or patches to completely solve this issue

-- 
Alexander Bruy
___
Qgis-user mailing list
Qgis-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/qgis-user


Re: [Qgis-user] [1.8.0] Broken UTF-8 support in shapefiles - and workaround

2012-06-29 Thread Even Rouault
Hi,

(Disclaimer: I'm a GDAL contributor)

First I'd like to say that pointing the finger at GDAL will not help improving
the situation, and we should strive for more constructive cooperation. I think
there are various issues involved and I'll try to summarize my vision of things
:
- Before GDAL 1.9, the Shapefile driver didn't have any knowledge of shapefile
encoding, and in both reading and writing operations, it took raw bytes to
read/write them in the .DBF file
- Starting with GDAL 1.9, the Shapefile driver will :
   * for write operations : recode from UTF-8 to the encoding specified by the
ENCODING layer creation option (-lco option of ogr2ogr) (or, for an existing
shapefile, from the value of the LDID field of the .dbf header or the .cpg file)
. If the value of that variable is of the form LDID/xx, then xx is written as
the LDID field in the .dbf header. If it is of another form, then it is written
as a plain string in the accompaying .cpg file. If no value for ENCODING is
specified, then LDID/87 is assumed. This value is supposed to be the "Current
ANSI codepage", a concept that doesn't make actually sense on all platforms, and
that doesn't make sense when transporting shapefiles from a system to another
one. An assumption is then made that this LDID/87 is
actually ISO-8859-1 (Latin1) and, indeed, this is strongly biased towards
Western Europe language. As far as QGIS is concerned, when creating shapefile,
it might be prudent to specify ENCODING=UTF-8 if strings passed to OGR
CreateFeature() are in UTF-8. The consequence will be that no recoding will
occur, and a .cpg file with UTF-8 in it will be written.
* recode from the encoding specified in the LDID field in the .dbf header or
the value of the .cpg file (the .cpg file has priority over the LDID field).
Several issues can occur then :
- The actual content of the .dbf may not match with the declared LDID
value or .cpg. In which case the recoding to UTF-8 will fail. This can be
gotten around by specifying the SHAPE_ENCODING environmenet variable to the
appropriate value, when it is known. You can also set SHAPE_ENCODING to the
empty string, in which case no recoding at all will occur. That might be the
solution for QGIS if QGIS want to do recoding on its side, based on user input
for example.
 - Even if the .dbf, LDID or .cpg are consistant, you can have issues if
the build of GDAL does not use the iconv library used for doing recoding
(there's only built-in conversion betweeen Latin1 and UTF-8 without iconv
dependency). Until recent fixes in GDAL (not yet released, see
http://trac.osgeo.org/gdal/ticket/4650), there was indeed a bug in the
TestCapability(OLCStringsAsUTF8) method that returned TRUE as soon as the shape
encoding was found, without checking that the recoding services were actually
available.

I hope that the working of the shapefile driver is clearer and that the QGIS
team can find the best solution on how to integrate with it.

Best regards,

Even

___
Qgis-user mailing list
Qgis-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/qgis-user


Re: [Qgis-user] [1.8.0] Broken UTF-8 support in shapefiles - and workaround

2012-06-29 Thread Andre Joost

Hi,

Am 29.06.12 09:28, schrieb Alexander Bruy:


The only thing that does not work as expected:
All text with non-ASCII-characters (e.g. ä ö ü) is broken.

Fisrt of all this is known issue and it is not QGIS bug, but GDAL one.
See http://hub.qgis.org/issues/5255,


It lokks as if Qgis and GDAL are not talking properly to each other ;-)



Also your solution will work only in some cases, when shapefiles are
in UTF-8 encoding.


That was more or less standard with Qgis 1.7.4. So most end-users are 
stuck with UTF-8 shapefiles now displayed broken, creating new 
shapefiles with wrong encoding and may not find the solution in the 
russion blogpost mentioned in the issue #5255:



I hoped this issue would be setteled for the "stable" version 1.8.0.



So if you use in one project shapefiles with different
encodings this will not work.


The other solution would be a seperate .cpg file along with the .shp. 
But I didnt test it, and I dont know what is chosen if environment 
variable and .cpg have different values.



Also this will not work if shapefile encoding
not supported by GDAL recoding method.



Ok, then you are lost after all, but you would not get those cases from 
proper working older qgis projects anyway.


Greeteings,
Andre Joost



___
Qgis-user mailing list
Qgis-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/qgis-user


Re: [Qgis-user] [1.8.0] Broken UTF-8 support in shapefiles - and workaround

2012-06-29 Thread Alexander Bruy
Hi Andre,

2012/6/28 Andre Joost :
> The only thing that does not work as expected:
> All text with non-ASCII-characters (e.g. ä ö ü) is broken.
Fisrt of all this is known issue and it is not QGIS bug, but GDAL one.
See http://hub.qgis.org/issues/5255, http://hub.qgis.org/issues/5340
and http://hub.qgis.org/issues/5508.

Also your solution will work only in some cases, when shapefiles are
in UTF-8 encoding. So if you use in one project shapefiles with different
encodings this will not work. Also this will not work if shapefile encoding
not supported by GDAL recoding method.

-- 
Alexander Bruy
___
Qgis-user mailing list
Qgis-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/qgis-user


[Qgis-user] [1.8.0] Broken UTF-8 support in shapefiles - and workaround

2012-06-28 Thread Andre Joost

Hi all,

since 1.8.0 is now the offical stable version, I installed it on Windows 
7, besides the old 1.7.4.


The only thing that does not work as expected:
All text with non-ASCII-characters (e.g. ä ö ü) is broken.
The shapefiles are encoded utf-8, and read fine in OpenOffice and qgis 
1.7.4. It seems that they are interpreted as codepage "System", that is 
1252 for Windows.
If I create a new shapefile, given encoding=utf-8, this will be broken 
in OpenOffice and Qgis 1.7.4.


Layers from spatialite and postgis databases are not broken.

I found this workaround:
Look for the folder
C:\Program Files (x86)\Quantum GIS Lisboa\bin\
(or where the Program has been installed to)
open qgis.bat with a suitable text editor
Insert the line
SET SHAPE_ENCODING=UTF-8
in line 4
save it, and shapefiles in utf-8 will open correctly.

Hoping this might be useful to others...

And perhaps someone can put this into the windows installer.

Greetings,
André Joost

___
Qgis-user mailing list
Qgis-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/qgis-user