[ 
https://issues.apache.org/jira/browse/COUCHDB-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749216#action_12749216
 ] 

Curt Arnold commented on COUCHDB-345:
-------------------------------------

I'm guessing that the performance difference would be reduced if 
couch_db:json_decode were modified so it only called xmerl_ucs:from_utf8 when 
it encountered a byte value > 0x7F in the binary.  I've leave to Adam to code 
and benchmark it to avoid demonstrating my Erlang newbieness.


> "High ASCII" can be inserted into db but not retrieved
> ------------------------------------------------------
>
>                 Key: COUCHDB-345
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-345
>             Project: CouchDB
>          Issue Type: Bug
>    Affects Versions: 0.9
>         Environment: OSX 10.5.6
>            Reporter: Joan Touzet
>         Attachments: badenc1.patch, badtext.tar.gz, enctest.zip, 
> reject_invalid_utf8.patch
>
>
> It is possible to PUT/POST a document into CouchDB with a "high ASCII" value 
> that cannot be retrieved. This results from not escaping a non-ASCII value 
> into \u#### when PUT/POSTing the document.
> The attached sample code will recreate the problem using the hex value D8 (Ø) 
> in a possibly unsavoury test string.
> Sample output against 0.9.0 is as follows:
> ================================================
> {
>     "ok": true
> }
> {
>     "id": "fail", 
>     "ok": true, 
>     "rev": "1-76726372"
> }
> {
>     "error": "ucs", 
>     "reason": "{bad_utf8_character_code}"
> }
> ================================================
> Please note this defect turned up another problem, namely that the 
> bad_utf8_character_code exception thrown by a design document attempting to 
> map() the bad document caused Futon to fail silently in building the view, 
> with no indication (except via debug log) that there was a failure. The log 
> indicated two attempts to build the view, both failing, followed by an 
> uncaught exception error for Futon.
> Based on this, there are likely other areas in the codebase that do not 
> handle the bad_utf8_character_code exception correctly.
> My belief is that CouchDB shouldn't accept this input and should have 
> rejected the PUT/POST, or should have escaped the input itself before the 
> insertion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to