Yep, that works! -Yonik http://lucidworks.com
On Mon, Sep 10, 2012 at 11:25 AM, Robert Muir <[email protected]> wrote: > Basically here is what I'm proposing (if it works on your machine): > > Index: dev-tools/scripts/checkJavadocLinks.py > =================================================================== > --- dev-tools/scripts/checkJavadocLinks.py (revision 1382919) > +++ dev-tools/scripts/checkJavadocLinks.py (working copy) > @@ -24,7 +24,7 @@ > reAtt = re.compile(r"""(?:\s+([a-z]+)\s*=\s*("[^"]*"|'[^']?'|[^'"\s]+))+""", > re.I) > > # Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | > [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate > blocks, FFFE, and FFFF. */ > -reValidChar = > re.compile("^[\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\U00010000-\U0010FFFF]*$") > +reValidChar = re.compile("^[^\u0000-\u0008\u000B-\u000C\u000E-\u001F]*$") > > # silly emacs: ' > > > > On Mon, Sep 10, 2012 at 11:14 AM, Robert Muir <[email protected]> wrote: >> Hmm this looks my regular expression to look for valid characters (we >> had some javadocs that intended \u0000 and so on but java preprocesses >> these, actually giving us invalid xml). >> >> Can you try removing the supplementary ranges from the regex just as a >> test? I don't really fully understand the state of python's unicode >> support. >> >> On Mon, Sep 10, 2012 at 11:10 AM, Yonik Seeley <[email protected]> wrote: >>> Thanks for fixing that. >>> >>> I'm trying to run javadocs-lint myself, but it's not working: >>> >>> javadocs-lint: >>> [exec] Traceback (most recent call last): >>> [exec] File >>> "/usr/local/bin/../Cellar/python3/3.2/lib/python3.2/functools.py", >>> line 176, in wrapper >>> [exec] result = cache[key] >>> [exec] KeyError: (<class 'str'>, '^[\t\n\r >>> -\ud7ff\ue000-�𐀀-\U0010ffff]*$', 0) >>> [exec] >>> [exec] During handling of the above exception, another exception >>> occurred: >>> [exec] >>> [exec] Traceback (most recent call last): >>> [exec] File >>> "/opt/code/lusolr_clean2/lucene/../dev-tools/scripts/checkJavadocLinks.py", >>> line 27, in <module> >>> [exec] reValidChar = >>> re.compile("^[\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\U00010000-\U0010FFFF]*$") >>> [exec] File >>> "/usr/local/bin/../Cellar/python3/3.2/lib/python3.2/re.py", line 206, >>> in compile >>> [exec] return _compile(pattern, flags) >>> >>> Anyone have any pointers? >>> >>> -Yonik >>> http://lucidworks.com >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> >> >> >> -- >> lucidworks.com > > > > -- > lucidworks.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
