I'm trying to use the WikiParser to determine the category list of a
wikipedia page.
The category tags are represented as TextNode objects but when I print out
the toWikiText, it get an empty string. Should categories be "TextNodes" and
if so, what's the correct extract the category name from the wikipage?

input data:
{{Colonial Colleges}}

[[Category:New York| ]]
[[Category:Former British colonies]]
[[Category:States of the United States]]

{{Link FA|es}}

My code snippet:
    val testFile = new
java.io.File("src/test/resources/datasource/wikipedia/xml/new_york.xml")
    val parser = WikiParser()
    val xmlSource = XMLSource.fromFile(testFile)
    xmlSource.foreach{ wikiPage =>
      val page = parser.apply(wikiPage)
      page.children.foreach{ node =>
        node match {
          case template:TemplateNode => {
            println("template:" + template.title + " with " +
template.children.size + " children")
          }
          case section:SectionNode => println("section:" +
section.toWikiText)
          case text:TextNode => {
            println("text:" + text.toWikiText + " line: " + text.line)
          }
          case link:LinkNode => {
            val label = link.children.map(_.toWikiText).mkString("")
          }
          case x => println("class= " + x.getClass)
        }
      }

Output:
text:
 line: 939
template:en:Template:Colonial Colleges with 0 children
text:

 line: 940
text:
 line: 942
text:
 line: 943
text:

 line: 944
template:en:Template:Link FA with 1 children
text:


-- 
@tommychheng
http://tommy.chheng.com
------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to