Russell,
I have committed a patch in HEAD that is similar, but it just hammers
the wide character to narrow rather than trying to pass it around.
Are the data you trying to label actually wide character, or is it
just an instance of SDE trying to be smart and putting all of your
string columns in unicode? I confess to not knowing very much about
character issues like this, so maybe this approach is the wrong way
to go. I left a note to posterity to do something smarter for
posterity when MapServer becomes more cognizant about these issues.
Howard
#ifdef SE_NSTRING_TYPE
case SE_NSTRING_TYPE:
shape->values[i] = (char *)malloc(itemdefs
[i].size*sizeof(char)+1);
wide = (SE_WCHAR *)malloc(itemdefs[i].size*sizeof
(SE_WCHAR)+1);
status = SE_stream_get_nstring( sde->stream,
(short) (i+1),
wide);
// hammer the wide character to narrow
// FIXME: do the right thing when MapServer becomes
more
// unicode aware.
wcstombs( shape->values[i],
wide,
strlen(shape->values[i]));
if(status == SE_NULL_VALUE)
shape->values[i][0] = '\0'; /* empty string */
else if(status != SE_SUCCESS) {
sde_error( status,
"sdeGetRecord()",
"SE_stream_get_string()");
return(MS_FAILURE);
}
break;
#endif
On Feb 20, 2007, at 1:48 PM, Russell de Grove wrote:
I have map layers in ArcSDE on Sql Server 2005 and I have been
trying to=20=
label features from a field with Unicode data (type nvarchar).
To get around the ""Unknown SDE column type" error I had to add the=20
following to the sdeGetRecord method in mapsde.c, in the "switch
(itemdefs=
[i].sde_type)" block:
#ifdef SE_NSTRING_TYPE
case SE_NSTRING_TYPE:
shape->values[i] =3D (char *)malloc( (itemdefs[i].size + 1) *
sizeo=
f
(unsigned short));
status =3D SE_stream_get_nstring(sde->stream,=20
(short) (i+1),=20
(unsigned short *)shape->values
[i]);
if(status =3D=3D SE_NULL_VALUE)
((unsigned short *)shape->values[i])[0] =3D (unsigned short)
0; /*=
empty=20
string */
else if(status !=3D SE_SUCCESS) {
sde_error(status, "sdeGetRecord()", "SE_stream_get_nstring
()");
return(MS_FAILURE);
}
break;
#endif
So far, so good, but I only see the first character of each label.
If I =
explicitly=20
include a Unicode "preamble", I see two garbage characters followed
by th=
e=20
first expected characters. As it happens, my data is in UTF-16 and
my=20=
characters are all ASCII-type characters that use only the low
byte. I b=
elieve=20
what is causing my problem is the "msGetEncodedString" method in
mapgd.c.=
char *msGetEncodedString(const char *string, const char *encoding)
{
#ifdef USE_ICONV
iconv_t cd =3D NULL;
char *in, *inp;
char *outp, *out =3D NULL;
size_t len, bufsize, bufleft, status;
cd =3D iconv_open("UTF-8", encoding);
if(cd =3D=3D (iconv_t)-1) {
msSetError(MS_IDENTERR, "Encoding not supported by libiconv (%
s).",=20=
"msGetEncodedString()", encoding);
return NULL;
}
len =3D strlen(string);
// Problem point: strlen will return the count up to the first null
byte,=
=20
so "Shape #0" as Unicode will return 1 for the S stored little-
endian, or=
3 if a=20
Unicode "preamble" is used
bufsize =3D len * 4;
in =3D strdup(string);
inp =3D in;
out =3D (char*) malloc(bufsize);
if(in =3D=3D NULL || out =3D=3D NULL){
msSetError(MS_MEMERR, NULL, "msGetEncodedString()");
msFree(in);
iconv_close(cd);
return NULL;
}
strcpy(out, in);
outp =3D out;
bufleft =3D bufsize;
status =3D -1;
while (len > 0){
status =3D iconv(cd, (const char**)&inp, &len, &outp, &am=
p;bufleft);
// Problem point: since this expects byte pairs, a byte length of 1
or 3 =
is going=20
to cause problems.
if(status =3D=3D -1){
msFree(in);
msFree(out);
iconv_close(cd);
return strdup(string);
// Problem point: since there was a problem, strdup returns the
original =
"string"=20
up to the first null byte... so I get "S", possibly with a couple
of prec=
eding=20
garbage characters if I used a preamble
}
}
out[bufsize - bufleft] =3D '\0';
=20=20
msFree(in);
iconv_close(cd);
return out;
#else
msSetError(MS_MISCERR, "Not implemeted since Iconv is not
enabled.",=20=
"msGetEncodedString()");
return NULL;
#endif
}
Has anyone else encountered similar problems? Does anyone know how
I can=20=
determine the correct width of characters based on the "encoding"
paramet=
er?