[ 
https://issues.apache.org/jira/browse/DOXIA-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255931#comment-17255931
 ] 

ASF GitHub Bot commented on DOXIA-616:
--------------------------------------

bertysentry commented on a change in pull request #49:
URL: https://github.com/apache/maven-doxia/pull/49#discussion_r549660631



##########
File path: 
doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
##########
@@ -74,48 +80,103 @@
 
     /**
      * Regex that identifies a multimarkdown-style metadata section at the 
start of the document
+     *
+     * In order to ensure that we have minimal risk of false positives when 
slurping metadata sections, the
+     * first key in the metadata section must be one of these standard keys or 
else the entire metadata section is
+     * ignored.
      */
-    private static final String MULTI_MARKDOWN_METADATA_SECTION =
-        
"^(((?:[^\\s:][^:]*):(?:.*(?:\r?\n\\p{Blank}+[^\\s].*)*\r?\n))+)(?:\\s*\r?\n)";
+    private static final Pattern METADATA_SECTION_PATTERN = Pattern.compile(
+            "\\A^\\s*"
+            + 
"(?:title|author|date|address|affiliation|copyright|email|keywords|language|phone|subtitle)"
+            + "\\h*:\\h*\\V*\\h*$\\v+"
+            + "(?:^\\h*[^:\\v]+\\h*:\\h*\\V*\\h*$\\v+)*",
+            Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
 
     /**
      * Regex that captures the key and value of a multimarkdown-style metadata 
entry.
      */
-    private static final String MULTI_MARKDOWN_METADATA_ENTRY =
-        "([^\\s:][^:]*):(.*(?:\r?\n\\p{Blank}+[^\\s].*)*)\r?\n";
-
-    /**
-     * In order to ensure that we have minimal risk of false positives when 
slurping metadata sections, the
-     * first key in the metadata section must be one of these standard keys or 
else the entire metadata section is
-     * ignored.
-     */
-    private static final String[] STANDARD_METADATA_KEYS =
-        { "title", "author", "date", "address", "affiliation", "copyright", 
"email", "keywords", "language", "phone",
-            "subtitle" };
+    private static final Pattern METADATA_ENTRY_PATTERN = Pattern.compile(
+            "^\\h*([^:\\v]+?)\\h*:\\h*(\\V*)\\h*$",
+            Pattern.MULTILINE );
 
     /**
      * <p>getType.</p>
      *
      * @return a int.
      */
+    @Override
     public int getType()
     {
         return TXT_TYPE;
     }
 
+    /**
+     * The parser of the HTML produced by Flexmark, that we will
+     * use to convert this HTML to Sink events
+     */
     @Requirement
     private MarkdownHtmlParser parser;
 
+    /**
+     * Flexmark's Markdown parser (one static instance fits all)
+     */
+    private static final com.vladsch.flexmark.parser.Parser FLEXMARK_PARSER;
+
+    /**
+     * Flexmark's HTML renderer (its output will be re-parsed and converted to 
Sink events)
+     */
+    private static final HtmlRenderer FLEXMARK_HTML_RENDERER;
+
+    // Initialize the Flexmark parser and renderer, once and for all
+    static
+    {
+        MutableDataSet flexmarkOptions = new MutableDataSet();
+
+        // Emulate Pegdown's behavior

Review comment:
       Good question.
   
   Flexmark's default behavior is [CommonMark 
0.28](https://spec.commonmark.org/0.28/).
   
   Unit tests weren't covering edge cases like Pegdown vs CommonMark, so I was 
concerned to break people's documentation and I kept the Pegdown profile. It's 
really a question of balance between risk of breaking existing stuff, and 
moving forward with something cleaner and properly specified.
   
   I'm personally in favor of moving forward with CommonMark (just like Web 
developers are happy that we finally got rid of Microsoft Internet Explorer... 
:-D)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Markdown: Properly expose the language specified in fenced code blocks
> ----------------------------------------------------------------------
>
>                 Key: DOXIA-616
>                 URL: https://issues.apache.org/jira/browse/DOXIA-616
>             Project: Maven Doxia
>          Issue Type: Improvement
>          Components: Module - Markdown
>    Affects Versions: 1.8, 1.9, 1.9.1
>            Reporter: Bertrand Martin
>            Priority: Major
>
> h1. Use Case
> Writers can specify the language used in a fenced code block (typically for 
> syntax highlighting), as in the example below:
> {code}
> ```java
> System.out.println("Beautiful\n");
> ```
> {code}
> Currently, the Doxia module for Markdown does not expose this information 
> ("java") in the produced HTML, so a Maven skin (or frontend renderer) cannot 
> leverage it.
> Produced HTML:
> {code:html}
> <div class="source"> <!-- No mention of Java :-( -->
> <pre>
> System.out.println("Beautiful\n");
> </pre>
> </div>
> {code}
> Wanted result:
> {code:html}
> <div class="source java"> <!-- :-) -->
> <pre>
> System.out.println("Beautiful\n");
> </pre>
> </div>
> {code}
> h1. Specification
> Un-comment this block:
> https://github.com/apache/maven-doxia/blob/c439714e8f4a9e86f9962ac6be9a0077ae9b4d30/doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/FlexmarkDoxiaNodeRenderer.java#L103
> This should do the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to