[
https://issues.apache.org/jira/browse/AXIS2C-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David K. Taylor updated AXIS2C-1265:
------------------------------------
Attachment: utf8-patch.txt
This patch provides UTF-8 support when reading SOAP messages through Guththila.
Since libiconv is optional and not required, I hand coded a UTF-8 transcoder,
though since I don't use libiconv I didn't add optional code to use it. That
would be a good addition.
This patch was built successfully on the official 1.6.0 release. It also
includes unit tests under guththila/tests for the new transcoder (both decode
and encode, though only decode is really used). To run these tests, since they
are not executed as part of the regular "make check" target, use these commands:
cd guththila/tests
./s
./reader
The decoder test takes a few minutes since it covers the entire Unicode code
point space.
This patch does not completely solve the UTF-8 issue, but handles the most
common case. These issues remain:
1) Still uses isspace and isalpha for XML tag names and attribute names, which
depend on the locale set in the environment.
2) Only accepts UTF-8, not other encodings. (Using iconv could improve this as
well.)
3) Ignores possible encoding set in XML declarative.
4) Ignores possible encoding set in HTTP Content-Type.
5) Only allows invalid UTF-8 bytes to be ignored. Should have option to escape
them instead.
> guththila does not support Chinese and the Japanese.
> ----------------------------------------------------
>
> Key: AXIS2C-1265
> URL: https://issues.apache.org/jira/browse/AXIS2C-1265
> Project: Axis2-C
> Issue Type: Bug
> Components: guththila
> Affects Versions: 1.5.0
> Environment: windows xp sp2 japan
> Reporter: songlei
> Attachments: utf8-patch.txt
>
>
> data:
> a.xml
> <?xml version='1.0' encoding='UTF-8'?>
> <ns:parameter xmlns:ns="urn:ns">
> <ns:unit xmlns:ns="urn:ns">
> <ns:name>name</ns:name>
> <ns:type>1</ns:type>
> <ns:displayname>門雷:名前</ns:displayname>
> <ns:value>2</ns:value>
> </ns:unit>
> </ns:parameter>
> ---------------------------------------------------------------------
> code:
> axiom_node_t *root_node = NULL;
> axiom_node_t *child = NULL;
> axiom_document_t *document = NULL;
> axiom_stax_builder_t *om_builder = NULL;
> axiom_xml_reader_t *xml_reader = NULL;
> f = fopen("a.xml","r");
> xml_reader = axiom_xml_reader_create_for_io(env, read_input_callback,
> close_input_callback, NULL, "UTF-8");
> om_builder = axiom_stax_builder_create(env, xml_reader);
> document = axiom_stax_builder_get_document(om_builder, env);
> root_node = axiom_document_get_root_element(document, env);
> axiom_document_build_all(document, env);
> child = axiom_node_get_first_child(root_node, env);
> --------------------------------------------------------------------------------------------
> result:
> The analysis result is under shows:
> <ns:parameter xmlns:ns="urn:ns">
> <ns:unit xmlns:ns="urn:ns">
> <ns:name>name</ns:name>
> <ns:type>1</ns:type>
> <ns:displayname></ns:displayname>
> </ns:unit>
> </ns:parameter>
> displayname and value lost
> ---------------------------------------------------------------------------------------------------------------
> debug:
> .\axis2c\guththila\src\guththila_xml_parser.c
> 1532 c = m->buffer.buff[m->buffer.cur_buff][m->next++ -
> 1533
> GUTHTHILA_BUFFER_PRE_DATA_SIZE
> 1534 (m->buffer)];
> 1535 return c >= 0 ? c : -1;
> c is int.
> m->buffer.buff[m->buffer.cur_buff][m->next++ - GUTHTHILA_BUFFER_PRE_DATA_SIZE
> (m->buffer)] is char.
> char scope is - 127~128.
> char[i] char [i+1] == 門
> char[i] > 128
> char Convert int, c < 0
> om_builder-done = true.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.