Re: [Haskell-cafe] Text.JSON and utf8
Hello Martin, the change that you propose seems to already be in json-0.7. Perhaps you just need to 'cabal update' and install the most recent version? About your other question: I have not used CouchDB but a common mistake is to mix up strings and bytes. Perhaps the `getDoc` function does not do utf-8 decoding and so it is giving you back list of bytes (as a String)? In general, the JSON package only converts between JSON and String, and is agnostic to what encoding is used to represent the strings. There are other packages that convert Strings into bytes (e.g., http://hackage.haskell.org/package/utf8-string), so typically you want to encode the string to bytes before you export it (say to CouchDB), and decode it back into a string just after you've imported it. -Iavor On Mon, Feb 11, 2013 at 5:56 AM, Martin Hilbig li...@mhilbig.de wrote: hi, tl;dr: i propose this patch to Text/JSON/String.hs and would like to know why it is needed: @@ -375,7 +375,7 @@ where go s1 = case s1 of - (x :xs) | x '\x20' || x '\x7e' - '\\' : encControl x (go xs) + (x :xs) | x '\x20' - '\\' : encControl x (go xs) ('' :xs) - '\\' : '' : go xs ('\\':xs) - '\\' : '\\' : go xs (x :xs) - x: go xs i recently stumbled upon CouchDB telling me i'm sending invalid json. i basically read lines from a utf8 file with german umlauts and send them to CouchDB using Text.JSON and Database.CouchDB. $ file lines.txt lines.txt: UTF-8 Unicode text lets take 'ö' as an example. i use LANG=de_DE.utf8 ghci tells 'ö' '\246' putChar '\246' ö putChar 'ö' ö :m + Text.JSON Database.CouchDB runCouchDB' $ newNamedDoc (db foo) (doc bar) (showJSON $ toJSObject [(test,ö)]) *** Exception: HTTP/1.1 400 Bad Request Server: CouchDB/1.2.1 (Erlang OTP/R15B03) Date: Mon, 11 Feb 2013 13:24:49 GMT Content-Type: text/plain; charset=utf-8 Content-Length: 48 Cache-Control: must-revalidate couchdb log says: Invalid JSON: {{error,{10,lexical error: invalid bytes in UTF8 string.\n}},{\test\:\**F6\}} this is indeed hex ö: :m + Numeric putChar $ toEnum $ fst $ head $ readHex f6 ö if i apply the above patch and reinstall JSON and CouchDB the doc creation works: runCouchDB' $ newNamedDoc (db db) (doc foo) (showJSON $ toJSObject [(test, ö)]) Right someRev but i dont get back the ö i expected: Just (_,_,x) -runCouchDB' $ getDoc (db foo) (doc bar) :: IO (Maybe (Doc,Rev,JSObject String)) let Ok y = valFromObj test = readJSON x :: Result String y \195\188 putStrLn y ü apperently with curl everything works fine: $ curl localhost:5984/db/foo -XPUT -d '{test: ö}' {ok:true,id:foo,rev:**someOtherRev} $ curl localhost:5984/db/foo {_id:bars,_rev:**someOtherRev,test:ö} so how can i get my precious ö back? what am i doing wrong or does Text.JSON need another patch? another question: why does encControl in Text/JSON/String.hs handle the cases x '\x100' and x '\x1000' even though they can never be reached with the old predicate in encJSString (x '\x20') finally: is '\x7e' the right literal for the job? thanks for reading have fun martin __**_ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Text.JSON and utf8
hi, tl;dr: i propose this patch to Text/JSON/String.hs and would like to know why it is needed: @@ -375,7 +375,7 @@ where go s1 = case s1 of - (x :xs) | x '\x20' || x '\x7e' - '\\' : encControl x (go xs) + (x :xs) | x '\x20' - '\\' : encControl x (go xs) ('' :xs) - '\\' : '' : go xs ('\\':xs) - '\\' : '\\' : go xs (x :xs) - x: go xs i recently stumbled upon CouchDB telling me i'm sending invalid json. i basically read lines from a utf8 file with german umlauts and send them to CouchDB using Text.JSON and Database.CouchDB. $ file lines.txt lines.txt: UTF-8 Unicode text lets take 'ö' as an example. i use LANG=de_DE.utf8 ghci tells 'ö' '\246' putChar '\246' ö putChar 'ö' ö :m + Text.JSON Database.CouchDB runCouchDB' $ newNamedDoc (db foo) (doc bar) (showJSON $ toJSObject [(test,ö)]) *** Exception: HTTP/1.1 400 Bad Request Server: CouchDB/1.2.1 (Erlang OTP/R15B03) Date: Mon, 11 Feb 2013 13:24:49 GMT Content-Type: text/plain; charset=utf-8 Content-Length: 48 Cache-Control: must-revalidate couchdb log says: Invalid JSON: {{error,{10,lexical error: invalid bytes in UTF8 string.\n}},{\test\:\F6\}} this is indeed hex ö: :m + Numeric putChar $ toEnum $ fst $ head $ readHex f6 ö if i apply the above patch and reinstall JSON and CouchDB the doc creation works: runCouchDB' $ newNamedDoc (db db) (doc foo) (showJSON $ toJSObject [(test, ö)]) Right someRev but i dont get back the ö i expected: Just (_,_,x) -runCouchDB' $ getDoc (db foo) (doc bar) :: IO (Maybe (Doc,Rev,JSObject String)) let Ok y = valFromObj test = readJSON x :: Result String y \195\188 putStrLn y ü apperently with curl everything works fine: $ curl localhost:5984/db/foo -XPUT -d '{test: ö}' {ok:true,id:foo,rev:someOtherRev} $ curl localhost:5984/db/foo {_id:bars,_rev:someOtherRev,test:ö} so how can i get my precious ö back? what am i doing wrong or does Text.JSON need another patch? another question: why does encControl in Text/JSON/String.hs handle the cases x '\x100' and x '\x1000' even though they can never be reached with the old predicate in encJSString (x '\x20') finally: is '\x7e' the right literal for the job? thanks for reading have fun martin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Text.JSON and utf8
Don't use the json package, use aeson instead. (It's much faster and handles encoding issues correctly). G On Mon, Feb 11, 2013 at 2:56 PM, Martin Hilbig li...@mhilbig.de wrote: hi, tl;dr: i propose this patch to Text/JSON/String.hs and would like to know why it is needed: @@ -375,7 +375,7 @@ where go s1 = case s1 of - (x :xs) | x '\x20' || x '\x7e' - '\\' : encControl x (go xs) + (x :xs) | x '\x20' - '\\' : encControl x (go xs) ('' :xs) - '\\' : '' : go xs ('\\':xs) - '\\' : '\\' : go xs (x :xs) - x: go xs i recently stumbled upon CouchDB telling me i'm sending invalid json. i basically read lines from a utf8 file with german umlauts and send them to CouchDB using Text.JSON and Database.CouchDB. $ file lines.txt lines.txt: UTF-8 Unicode text lets take 'ö' as an example. i use LANG=de_DE.utf8 ghci tells 'ö' '\246' putChar '\246' ö putChar 'ö' ö :m + Text.JSON Database.CouchDB runCouchDB' $ newNamedDoc (db foo) (doc bar) (showJSON $ toJSObject [(test,ö)]) *** Exception: HTTP/1.1 400 Bad Request Server: CouchDB/1.2.1 (Erlang OTP/R15B03) Date: Mon, 11 Feb 2013 13:24:49 GMT Content-Type: text/plain; charset=utf-8 Content-Length: 48 Cache-Control: must-revalidate couchdb log says: Invalid JSON: {{error,{10,lexical error: invalid bytes in UTF8 string.\n}},{\test\:\**F6\}} this is indeed hex ö: :m + Numeric putChar $ toEnum $ fst $ head $ readHex f6 ö if i apply the above patch and reinstall JSON and CouchDB the doc creation works: runCouchDB' $ newNamedDoc (db db) (doc foo) (showJSON $ toJSObject [(test, ö)]) Right someRev but i dont get back the ö i expected: Just (_,_,x) -runCouchDB' $ getDoc (db foo) (doc bar) :: IO (Maybe (Doc,Rev,JSObject String)) let Ok y = valFromObj test = readJSON x :: Result String y \195\188 putStrLn y ü apperently with curl everything works fine: $ curl localhost:5984/db/foo -XPUT -d '{test: ö}' {ok:true,id:foo,rev:**someOtherRev} $ curl localhost:5984/db/foo {_id:bars,_rev:**someOtherRev,test:ö} so how can i get my precious ö back? what am i doing wrong or does Text.JSON need another patch? another question: why does encControl in Text/JSON/String.hs handle the cases x '\x100' and x '\x1000' even though they can never be reached with the old predicate in encJSString (x '\x20') finally: is '\x7e' the right literal for the job? thanks for reading have fun martin __**_ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe