[jira] [Comment Edited] (CASSANDRA-17667) Text value containing "/*" interpreted as multiline comment in cqlsh

Brad Schoening (Jira) Mon, 25 Jul 2022 20:53:10 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571175#comment-17571175
 ]


Brad Schoening edited comment on CASSANDRA-17667 at 7/26/22 3:52 AM:
---------------------------------------------------------------------

[~ahomoki],  looking at this a little further, safescanner.py, which inherits 
from the python re.Scanner, is tokenizing the input.   An example from 
[https://mail.python.org/pipermail/python-dev/2003-April/035075.html] shows how 
input is parsed:

 
{code:java}
import re

def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)

scanner = re.Scanner([
    (r"[a-zA-Z_]\w*", s_ident),
    (r"\d+\.\d*", s_float),
    (r"\d+", s_int),
    (r"=|\+|-|\*|/", s_operator),
    (r"\s+", None),
    ])

# sanity check
test('scanner.scan("sum = 3*foo + 312.50 + bar")',
     (['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], ''))
 {code}
 

In  pylexotron this is implemented as:
{code:java}
RuleSpecScanner = SaferScanner([
        (r'::=', lambda s, t: t),
        (r'\[[a-z0-9_]+\]=', lambda s, t: ('named_collector', t[1:-2])),
        (r'[a-z0-9_]+=', lambda s, t: ('named_symbol', t[:-1])),
        (r'/(\[\^?.[^]]*\]|[^/]|\\.)*/', lambda s, t: ('regex', 
t[1:-1].replace(r'\/', '/'))),
        (r'"([^"]|\\.)*"', lambda s, t: ('litstring', t)),
        (r'<[^>]*>', lambda s, t: ('reference', t[1:-1])),
        (r'\bJUNK\b', lambda s, t: ('junk', t)),
        (r'[@()|?*;]', lambda s, t: t),
        (r'\s+', None),
        (r'#[^\n]*', None),
    ], re.I | re.S | re.U) {code}
r'\s+' is skipping whitespace

I'm uncertain what r'#[^\n]*' and r'\bJUNK\b' are doing. Adding comments could 
be helpful. 

The Scanner flags used are re.I | re.S | re.U for IgnoreCase, DotAll and 
Unicode, but it doesn't use re.M for Multiline.  So, either that could be added 
if it doesn't break anything, or the tokenizer would have to tokenize a start 
comment and end comment token.

There doesn't seem to be unit tests for pylexotron or SafeScanner, however.  
That might be a good thing to add.  There could be tests for each type of 
token, named_collector, named_symbol, regex, litstring, reference and junk.

 


was (Author: bschoeni):
[~ahomoki],  looking at this a little further, safescanner.py, which inherits 
from the python re.Scanner, is tokenizing the input.   An example from 
[https://mail.python.org/pipermail/python-dev/2003-April/035075.html] shows 
input is parsed:

 
{code:java}
import re

def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)

scanner = re.Scanner([
    (r"[a-zA-Z_]\w*", s_ident),
    (r"\d+\.\d*", s_float),
    (r"\d+", s_int),
    (r"=|\+|-|\*|/", s_operator),
    (r"\s+", None),
    ])

# sanity check
test('scanner.scan("sum = 3*foo + 312.50 + bar")',
     (['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], ''))
 {code}
 

In  pylexotron this is implemented as:
{code:java}
RuleSpecScanner = SaferScanner([
        (r'::=', lambda s, t: t),
        (r'\[[a-z0-9_]+\]=', lambda s, t: ('named_collector', t[1:-2])),
        (r'[a-z0-9_]+=', lambda s, t: ('named_symbol', t[:-1])),
        (r'/(\[\^?.[^]]*\]|[^/]|\\.)*/', lambda s, t: ('regex', 
t[1:-1].replace(r'\/', '/'))),
        (r'"([^"]|\\.)*"', lambda s, t: ('litstring', t)),
        (r'<[^>]*>', lambda s, t: ('reference', t[1:-1])),
        (r'\bJUNK\b', lambda s, t: ('junk', t)),
        (r'[@()|?*;]', lambda s, t: t),
        (r'\s+', None),
        (r'#[^\n]*', None),
    ], re.I | re.S | re.U) {code}
r'\s+' is skipping whitespace

I'm uncertain what r'#[^\n]*' and r'\bJUNK\b' are doing. Adding comments could 
be helpful. 

The Scanner flags used are re.I | re.S | re.U for IgnoreCase, DotAll and 
Unicode, but it doesn't use re.M for Multiline.  So, either that could be added 
if it doesn't break anything, or the tokenizer would have to tokenize a start 
comment and end comment token.

There doesn't seem to be unit tests for pylexotron or SafeScanner, however.  
That might be a good thing to add.  There could be tests for each type of 
token, named_collector, named_symbol, regex, litstring, reference and junk.

 

> Text value containing "/*" interpreted as multiline comment in cqlsh
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-17667
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17667
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL/Interpreter
>            Reporter: ANOOP THOMAS
>            Assignee: Attila Homoki
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x
>
>
> I use CQLSH command line utility to load some DDLs. The version of utility I 
> use is this:
> {noformat}
> [cqlsh 6.0.0 | Cassandra 4.0.0.47 | CQL spec 3.4.5 | Native protocol 
> v5]{noformat}
> Command that loads DDL.cql:
> {noformat}
> cqlsh -u username -p password cassandra.example.com 65503 --ssl -f DDL.cql
> {noformat}
> I have a line in CQL script that breaks the syntax.
> {noformat}
> INSERT into tablename (key,columnname1,columnname2) VALUES 
> ('keyName','value1','/value2/*/value3');{noformat}
> {{/*}} here is interpreted as start of multi-line comment. It used to work on 
> older versions of cqlsh. The error I see looks like this:
> {noformat}
> SyntaxException: line 4:2 mismatched input 'Update' expecting ')' 
> (...,'value1','/value2INSERT into tablename(INSERT into tablename 
> (key,columnname1,columnname2)) VALUES ('[Update]-...) SyntaxException: line 
> 1:0 no viable alternative at input '(' ([(]...)
> {noformat}
> Same behavior while running in interactive mode too. {{/*}} inside a CQL 
> statement should not be interpreted as start of multi-line comment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17667) Text value containing "/*" interpreted as multiline comment in cqlsh

Reply via email to