On Sat, Oct 25, 2008 at 12:46 AM, Ho-Sheng Hsiao <[EMAIL PROTECTED]> wrote:
>
> I don't know which record it is barfing on. Pulling a single record out:
>
> {
> "unihan_version": "5.1.0",
> "unihan": {
> "kIRG_GSource":"HZ",
> "kOtherNumeric":"7",
> "kIRGHanyuDaZidian":"10004.020",
> "kDefinition":"the original form for \u4e03 U+4E03",
> "kCihaiT":"10.601",
> "kPhonetic":"1635",
> "kMandarin":"QI1",
> "kCantonese":"cat1",
> "kRSKangXi":"1.1",
> "kHanYu":"10004.020",
> "kRSUnicode":"1.1",
> "kIRGKangXi":"0076.021"},
> "_id":"U+20001"
> }
> }
>
> Seems to work fine even with the bulk uploader.
>
> I'm going to attempt to insert the records one by one. Maybe I can find
> out which record it is barfing on, maybe the json was invalid. It seems
> to me though, that something is barfing on utf8 on bulk uploads over a
> certain limit.
>
> If someone wants to try it out, I can supply the json file I used. Any
> help is appreciated.
If you don't mind, I'll take a look at it. The error you showed sure
looks like a utf8 error, but with such a big bulk upload it's hard to
be sure.
Perhaps you can put the Unihan-5.1.0.json file online somewhere, or if
you have it boiled down to records that are causing the problem,
singling those out would of course be helpful.
Thanks,
Chris
--
Chris Anderson
http://jchris.mfdz.com