ok ill commit (i forgot FFFE/FFFF ill add those too). I think there is some option to compile python with a different level of support for unicode regular expressions or something like that, and it causes the problem if it was installed with that option.
On Mon, Sep 10, 2012 at 11:42 AM, Yonik Seeley <[email protected]> wrote: > Yep, that works! > > -Yonik > http://lucidworks.com > > > On Mon, Sep 10, 2012 at 11:25 AM, Robert Muir <[email protected]> wrote: >> Basically here is what I'm proposing (if it works on your machine): >> >> Index: dev-tools/scripts/checkJavadocLinks.py >> =================================================================== >> --- dev-tools/scripts/checkJavadocLinks.py (revision 1382919) >> +++ dev-tools/scripts/checkJavadocLinks.py (working copy) >> @@ -24,7 +24,7 @@ >> reAtt = re.compile(r"""(?:\s+([a-z]+)\s*=\s*("[^"]*"|'[^']?'|[^'"\s]+))+""", >> re.I) >> >> # Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | >> [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate >> blocks, FFFE, and FFFF. */ >> -reValidChar = >> re.compile("^[\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\U00010000-\U0010FFFF]*$") >> +reValidChar = re.compile("^[^\u0000-\u0008\u000B-\u000C\u000E-\u001F]*$") >> >> # silly emacs: ' >> >> >> >> On Mon, Sep 10, 2012 at 11:14 AM, Robert Muir <[email protected]> wrote: >>> Hmm this looks my regular expression to look for valid characters (we >>> had some javadocs that intended \u0000 and so on but java preprocesses >>> these, actually giving us invalid xml). >>> >>> Can you try removing the supplementary ranges from the regex just as a >>> test? I don't really fully understand the state of python's unicode >>> support. >>> >>> On Mon, Sep 10, 2012 at 11:10 AM, Yonik Seeley <[email protected]> wrote: >>>> Thanks for fixing that. >>>> >>>> I'm trying to run javadocs-lint myself, but it's not working: >>>> >>>> javadocs-lint: >>>> [exec] Traceback (most recent call last): >>>> [exec] File >>>> "/usr/local/bin/../Cellar/python3/3.2/lib/python3.2/functools.py", >>>> line 176, in wrapper >>>> [exec] result = cache[key] >>>> [exec] KeyError: (<class 'str'>, '^[\t\n\r >>>> -\ud7ff\ue000-�𐀀-\U0010ffff]*$', 0) >>>> [exec] >>>> [exec] During handling of the above exception, another exception >>>> occurred: >>>> [exec] >>>> [exec] Traceback (most recent call last): >>>> [exec] File >>>> "/opt/code/lusolr_clean2/lucene/../dev-tools/scripts/checkJavadocLinks.py", >>>> line 27, in <module> >>>> [exec] reValidChar = >>>> re.compile("^[\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\U00010000-\U0010FFFF]*$") >>>> [exec] File >>>> "/usr/local/bin/../Cellar/python3/3.2/lib/python3.2/re.py", line 206, >>>> in compile >>>> [exec] return _compile(pattern, flags) >>>> >>>> Anyone have any pointers? >>>> >>>> -Yonik >>>> http://lucidworks.com >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>> >>> >>> >>> -- >>> lucidworks.com >> >> >> >> -- >> lucidworks.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > -- lucidworks.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
