-----Original Message-----
From: David K. Taylor (JIRA) [mailto:j...@apache.org] 
Sent: Friday, May 08, 2009 11:28 AM
To: axis-c-dev@ws.apache.org
Subject: [jira] Updated: (AXIS2C-1265) guththila does not support Chinese and 
the Japanese.


     [ 
https://issues.apache.org/jira/browse/AXIS2C-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David K. Taylor updated AXIS2C-1265:
------------------------------------

    Attachment: utf8-patch.txt

This patch provides UTF-8 support when reading SOAP messages through Guththila. 
 Since libiconv is optional and not required, I hand coded a UTF-8 transcoder, 
though since I don't use libiconv I didn't add optional code to use it.  That 
would be a good addition.

This patch was built successfully on the official 1.6.0 release.  It also 
includes unit tests under guththila/tests for the new transcoder (both decode 
and encode, though only decode is really used).  To run these tests, since they 
are not executed as part of the regular "make check" target, use these commands:

cd guththila/tests
./s
./reader

The decoder test takes a few minutes since it covers the entire Unicode code 
point space.

This patch does not completely solve the UTF-8 issue, but handles the most 
common case.  These issues remain:

1) Still uses isspace and isalpha for XML tag names and attribute names, which 
depend on the locale set in the environment.

2) Only accepts UTF-8, not other encodings.  (Using iconv could improve this as 
well.)

3) Ignores possible encoding set in XML declarative.

4) Ignores possible encoding set in HTTP Content-Type.

5) Only allows invalid UTF-8 bytes to be ignored.  Should have option to escape 
them instead.

> guththila does not support Chinese and the Japanese.
> ----------------------------------------------------
>
>                 Key: AXIS2C-1265
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-1265
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: 1.5.0
>         Environment: windows xp sp2 japan
>            Reporter: songlei
>         Attachments: utf8-patch.txt
>
>
> data:
> a.xml
> <?xml version='1.0' encoding='UTF-8'?>
> <ns:parameter xmlns:ns="urn:ns">
> <ns:unit xmlns:ns="urn:ns">
>       <ns:name>name</ns:name>
>       <ns:type>1</ns:type>
>       <ns:displayname>門雷:名前</ns:displayname>
>       <ns:value>2</ns:value>
> </ns:unit>
> </ns:parameter>
> ---------------------------------------------------------------------
> code:
> axiom_node_t *root_node = NULL;
> axiom_node_t *child = NULL;
> axiom_document_t *document = NULL;
> axiom_stax_builder_t *om_builder = NULL;
> axiom_xml_reader_t *xml_reader = NULL;
> f = fopen("a.xml","r");
> xml_reader = axiom_xml_reader_create_for_io(env, read_input_callback, 
> close_input_callback, NULL, "UTF-8");
> om_builder = axiom_stax_builder_create(env, xml_reader);
> document = axiom_stax_builder_get_document(om_builder, env);
> root_node = axiom_document_get_root_element(document, env);
> axiom_document_build_all(document, env);
> child = axiom_node_get_first_child(root_node, env);
> --------------------------------------------------------------------------------------------
> result:
> The analysis result is under shows:
> <ns:parameter xmlns:ns="urn:ns">
> <ns:unit xmlns:ns="urn:ns">
>       <ns:name>name</ns:name>
>       <ns:type>1</ns:type>
>       <ns:displayname></ns:displayname>
> </ns:unit>
> </ns:parameter>
> displayname and value lost
> ---------------------------------------------------------------------------------------------------------------
> debug:
> .\axis2c\guththila\src\guththila_xml_parser.c
> 1532            c = m->buffer.buff[m->buffer.cur_buff][m->next++ -
> 1533 
> GUTHTHILA_BUFFER_PRE_DATA_SIZE
> 1534                                                    (m->buffer)];
> 1535            return c >= 0 ? c : -1;
> c is int.
> m->buffer.buff[m->buffer.cur_buff][m->next++ - GUTHTHILA_BUFFER_PRE_DATA_SIZE 
> (m->buffer)] is char.
> char scope is - 127~128.
> char[i] char [i+1]  == 門
> char[i]  > 128
> char Convert int, c < 0
> om_builder-done = true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to