lmhelp wrote: > > Hi, > > Thank you or reading my post. > > I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext" > language (or an *exhaustive* (and formal) set of rules about how is > constructed > a "Wikitext"). > I've looked for such a grammar/set of rules on the Web but I couldn't find > one...
No. But see http://www.mediawiki.org/wiki/Markup_spec for grammars which "kind of work". > I need to extract automatically the first paragraph of a Wiki article... > > I did it from the HTML version of a Wiki article (because > I noticed the first paragraph was the first <p> element > child of a <div> element which id is "bodyContent"...) > but I need to work with the "Wikitext" itself... > > - Is a grammar available somewhere? > - Do you have any idea how to extract the first paragaph of a Wiki article? > - Any advice? > - Does a Java "Wikitext" "parser" exists which would do it? Get the first text before a double new line (\n\n), which is what splits paragraphs in wikitext. However, pages commonly begin with templates, so if the page begins with {{, you would remove everything up to the matching }} (and remove leading whitespace). _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l