[ https://issues.apache.org/jira/browse/TINKERPOP-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941739#comment-17941739 ]
ASF GitHub Bot commented on TINKERPOP-3047: ------------------------------------------- Cole-Greer commented on code in PR #3091: URL: https://github.com/apache/tinkerpop/pull/3091#discussion_r2032174244 ########## gremlin-language/src/main/java/org/apache/tinkerpop/gremlin/language/corpus/GrammarReader.java: ########## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.tinkerpop.gremlin.language.corpus; + +import java.io.IOException; +import java.nio.charset.StandardCharsets; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.LinkedHashSet; +import java.util.List; +import java.util.Set; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * Reads ANTLR grammar file to extract Gremlin keyword string representations. + */ +public class GrammarReader { + + // Pattern to match token definitions like: TADDALL: 'addAll'; + private static final Pattern tokenPattern = Pattern.compile("T[A-Z0-9_]+:\\s*'([^']+)'\\s*;"); + + /** + * Parse a grammar file to extract all keyword string representations. + * + * @param grammarFilePath The path to the ANTLR grammar file + * @return A set of keyword string representations + * @throws IOException If there's an error reading the file + */ + public static Set<String> parse(final String grammarFilePath) throws IOException { Review Comment: I'm not entirely sure on the purpose of this method yet, but I'm worried that the regex-based approach may be fragile if there are any keyword definitions in the grammar which are split over multiple lines. If the goal is simply to obtain a list of all keyword tokens from the grammar, it may be more robust to parse this out of the gremlin-language generated sources such as `gremlin-language/target/generated-sources/antlr4/org/apache/tinkerpop/gremlin/language/grammar/Gremlin.tokens`. > Grammar does not parse keywords into Map keys > --------------------------------------------- > > Key: TINKERPOP-3047 > URL: https://issues.apache.org/jira/browse/TINKERPOP-3047 > Project: TinkerPop > Issue Type: Bug > Components: language > Affects Versions: 3.7.1 > Reporter: Stephen Mallette > Priority: Critical > > {{[keys: ["a","b"]}} won't work because "keys" ends up being parsed to > {{Column.keys}}. another issue at play is the use of parens to wrap certain > key definitions but not others. it doesn't feel consistent. like, it will > work for {{T}} values but not for something like "edges" which is just a > keyword token. Not sure it's wrong but it requires some examination. -- This message was sent by Atlassian Jira (v8.20.10#820010)