Re: [UMN_MAPSERVER-USERS] Issues with SDE and Unicode

Howard Butler Thu, 01 Mar 2007 18:10:24 -0800

Russell,

I have committed a patch in HEAD that is similar, but it just hammersthe wide character to narrow rather than trying to pass it around.Are the data you trying to label actually wide character, or is itjust an instance of SDE trying to be smart and putting all of yourstring columns in unicode? I confess to not knowing very much aboutcharacter issues like this, so maybe this approach is the wrong wayto go. I left a note to posterity to do something smarter forposterity when MapServer becomes more cognizant about these issues.


Howard

#ifdef SE_NSTRING_TYPE
            case SE_NSTRING_TYPE:

shape->values[i] = (char *)malloc(itemdefs[i].size*sizeof(char)+1);wide = (SE_WCHAR *)malloc(itemdefs[i].size*sizeof(SE_WCHAR)+1);

                status = SE_stream_get_nstring( sde->stream,
                                                (short) (i+1),
                                                wide);

                // hammer the wide character to narrow

// FIXME: do the right thing when MapServer becomesmore

                // unicode aware.
                wcstombs(   shape->values[i],
                            wide,
                            strlen(shape->values[i]));

                if(status == SE_NULL_VALUE)
                    shape->values[i][0] = '\0'; /* empty string */
                else if(status != SE_SUCCESS) {
                    sde_error(  status,
                                "sdeGetRecord()",
                                "SE_stream_get_string()");
                    return(MS_FAILURE);
                }
                break;
#endif





On Feb 20, 2007, at 1:48 PM, Russell de Grove wrote:

I have map layers in ArcSDE on Sql Server 2005 and I have beentrying to=20=
label features from a field with Unicode data (type nvarchar).

To get around the ""Unknown SDE column type" error I had to add the=20
following to the sdeGetRecord method in mapsde.c, in the "switch(itemdefs=
[i].sde_type)" block:

#ifdef SE_NSTRING_TYPE
    case SE_NSTRING_TYPE:
shape->values[i] =3D (char *)malloc( (itemdefs[i].size + 1) *sizeo=
f
(unsigned short));
      status =3D SE_stream_get_nstring(sde->stream,=20
                                    (short) (i+1),=20
(unsigned short *)shape->values[i]);
      if(status =3D=3D SE_NULL_VALUE)
((unsigned short *)shape->values[i])[0] =3D (unsigned short)0; /*=
 empty=20
string */
      else if(status !=3D SE_SUCCESS) {
sde_error(status, "sdeGetRecord()", "SE_stream_get_nstring()");
        return(MS_FAILURE);
      }
      break;
#endif
So far, so good, but I only see the first character of each label.If I =
explicitly=20
include a Unicode "preamble", I see two garbage characters followedby th=
e=20
first expected characters. As it happens, my data is in UTF-16 andmy=20=
characters are all ASCII-type characters that use only the lowbyte. I b=
elieve=20
what is causing my problem is the "msGetEncodedString" method inmapgd.c.=
char *msGetEncodedString(const char *string, const char *encoding)
{
#ifdef USE_ICONV
  iconv_t cd =3D NULL;
  char *in, *inp;
  char *outp, *out =3D NULL;
  size_t len, bufsize, bufleft, status;
  cd =3D iconv_open("UTF-8", encoding);
  if(cd =3D=3D (iconv_t)-1) {
msSetError(MS_IDENTERR, "Encoding not supported by libiconv (%s).",=20=
               "msGetEncodedString()", encoding);
    return NULL;
  }
  len =3D strlen(string);
// Problem point: strlen will return the count up to the first nullbyte,=
=20
so "Shape #0" as Unicode will return 1 for the S stored little-endian, or=
 3 if a=20
Unicode "preamble" is used

  bufsize =3D len * 4;
  in =3D strdup(string);
  inp =3D in;
  out =3D (char*) malloc(bufsize);
  if(in =3D=3D NULL || out =3D=3D NULL){
    msSetError(MS_MEMERR, NULL, "msGetEncodedString()");
    msFree(in);
    iconv_close(cd);
    return NULL;
  }
  strcpy(out, in);
  outp =3D out;

  bufleft =3D bufsize;
  status =3D -1;
  while (len > 0){
    status =3D iconv(cd, (const char**)&inp, &len, &outp, &am=
p;bufleft);
// Problem point: since this expects byte pairs, a byte length of 1or 3 =
is going=20
to cause problems.

    if(status =3D=3D -1){
      msFree(in);
      msFree(out);
      iconv_close(cd);
      return strdup(string);
// Problem point: since there was a problem, strdup returns theoriginal =
"string"=20
up to the first null byte... so I get "S", possibly with a coupleof prec=
eding=20
garbage characters if I used a preamble

    }
  }
  out[bufsize - bufleft] =3D '\0';
=20=20
  msFree(in);
  iconv_close(cd);

  return out;
#else
msSetError(MS_MISCERR, "Not implemeted since Iconv is notenabled.",=20=
             "msGetEncodedString()");
  return NULL;
#endif
}
Has anyone else encountered similar problems? Does anyone know howI can=20=
determine the correct width of characters based on the "encoding"paramet=
er?

Re: [UMN_MAPSERVER-USERS] Issues with SDE and Unicode

Reply via email to