[ 
http://jira.codehaus.org/browse/QDOX-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=178873#action_178873
 ] 

Mark Jenner commented on QDOX-82:
---------------------------------

Hi,

I had the same issue parsing something else with StreamTokenizer and I found 
your issue when I was searching for solutions.  I could not find one so I 
cooked up my own and thought you might be interested in applying it to your 
problem as well.  Basically I needed to parse strings contained in double 
quotes, StreamTokenizer does this for you but fails if there is a newline in 
the string.  So instead of letting StreamTokenizer do the string parsing, I 
tell it that double quote is not special and when I get to a one, I reconfigure 
the tokenizer into my own "string mode" where the only special chars are double 
quote and backslash.  When I get to the end of the string I switch back to my 
normal tokenizer config (for a format call fvar).  Here are the methods:

        private void setUpTokenizerForFvar(StreamTokenizer tokenizer) {
                // Setup the tokenizer just like a new one as per the 
StreamTokenizer constructor comment 
                tokenizer.resetSyntax();
                tokenizer.wordChars((int)'a', (int)'z');
                tokenizer.wordChars((int)'A', (int)'Z');
                tokenizer.wordChars(128 + 32, 255);
                tokenizer.whitespaceChars(0, (int)' ');
                tokenizer.commentChar((int)'/');
                tokenizer.parseNumbers();

                // Attribute names in fvar can include underscores, and spaces!
                tokenizer.wordChars(UNDER_SCORE, UNDER_SCORE);
                tokenizer.wordChars(SPACE, SPACE);
                tokenizer.ordinaryChar(DOUBLE_QUOTE);
        }

        private void setUpTokenizerForQuotedValue(StreamTokenizer tokenizer) {
                // Reset the tokenizer to treat everything as a word except the 
double quote char and the escape char
                tokenizer.resetSyntax();
                tokenizer.wordChars(0, 127);
                tokenizer.ordinaryChar(ESCAPE);
                tokenizer.ordinaryChar(DOUBLE_QUOTE);
        }

        // Because StreamTokenizer does not parse quoted strings that contain 
newlines properly
        // we have to do it ourselves. Reads everything up until a matching 
closing quote
        // ignoring any that are preceded by an escape char '\'
        private String parseQuotedString(int openQuote, StreamTokenizer 
tokenizer) {
                StringBuilder value = new StringBuilder();
                setUpTokenizerForQuotedValue(tokenizer);
                def nextToken = tokenizer.nextToken();
                boolean escapedQuote = false;
                while (escapedQuote || nextToken != openQuote) {
                        escapedQuote = false;
                        if (nextToken == StreamTokenizer.TT_WORD) {
                                value.append(tokenizer.sval);
                        } else if (nextToken == ESCAPE) {
                                escapedQuote = true;
                                value.append((char)nextToken);
                        } else if (nextToken == openQuote) {
                                value.append((char)nextToken);
                        }
                        nextToken = tokenizer.nextToken();
                }
                setUpTokenizerForFvar(tokenizer);
                return value.toString();
        }

used in some code like this:
[...]
                                nextToken = tokenizer.nextToken();
                                if (nextToken == DOUBLE_QUOTE) {
                                        String value = 
parseQuotedString(nextToken, tokenizer);
[...]

Hope that is some value to you.


> multiline'd tag attribute values not working anymore
> ----------------------------------------------------
>
>                 Key: QDOX-82
>                 URL: http://jira.codehaus.org/browse/QDOX-82
>             Project: QDox
>          Issue Type: Bug
>          Components: Parser
>            Reporter: Grégory Joseph (old account)
>             Fix For: 1.10
>
>         Attachments: MultineLineAttributeValuesWithQDoxTestCase.java, 
> qdox82-test.patch
>
>
> Some undefined time ago, it was possible to parse the following source
> with qdox and retrieve a sensible value for the "foo" attribute of the
> "bar.baz" tag -> "this is multilined"
> /**
> * @bar.baz foo="this is
> *       multilined"
> */
> with the latest snapshot, this unfortunately doesn't work anymore.
> I haven't found an open related jira issue, but before creating one, I
> wanted to make sure this wasn't on purpose ?
> I think allowing this makes sense, for instance for xdoclet because
> some attributes might have longish values, like the "description"
> elements for servlets and such.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to