On 23 April 2013 12:01, Julien Plu <julien....@redaction-developpez.com> wrote: > Sorry but I really don't understand how AST works (and Scala too) I try to > retrieve all the PropertyNode contained in a PageNode so I do : > > > override def extract(page: PageNode, subjectUri: String, pageContext: > PageContext): Seq[Quad] = { > if (page.title.namespace != Namespace.Template || page.isRedirect || > !page.title.decoded.contains("évolution population")) return Seq.empty >
I think it would be good if you could get a picture of the structure of the tree. It's usually not complicated, but a bit hard to explain in text. Can you use a debugger? If so, set a breakpoint at the following line and let the debugger show the page variable. Then click into it, look at its children, and so on. We should add a toString() method to Node.scala (and some sub-classes) that shows the structure. > for (node <- page.children) { > for (property <- allPropertiesNode(node)) { > println(property.toWikiText) > } > } > } > > private def allPropertiesNode(node : Node) : List[PropertyNode] = { > node match { > case propertyNode : PropertyNode => List(propertyNode) > case _ = node.children > } This is almost right. If I understand correctly, you want to walk through the whole tree and collect all property nodes. Change this line: case _ = node.children (does that even compile? I don't understand how... :-) ) to case _ => node.children.flatMap(allPropertiesNode) (I think that should work, I'm not 100% sure.) Oh by the way, the method name should be allPropertyNodes. :-) Or maybe findPropertyNodes is even better. Once the method works, you can drop the main loop in extract(). Instead of for (node <- page.children) { for (property <- allPropertiesNode(node)) { println(property.toWikiText) } } you can just write for (property <- findPropertyNodes(page)) { println(property.toWikiText) } But that's just cosmetic surgery, it has the same effect. Cheers, JC > } > > > And nothing is displayed on my screen :-( > > Any idea of what I do wrongly ? > > BesT. > > Julien. > > > 2013/4/23 Julien Plu <julien....@redaction-developpez.com> >> >> Hi, >> >> param come from a bad copy paste, it's "pop" the good variable. >> >> By the way thank you for the hint about AST I will take a look at these >> class and see how I can use them. I won't hesitate to ask if I'm blocked :-) >> >> Best. >> >> Julien. >> >> >> 2013/4/22 Jona Christopher Sahnwaldt <j...@sahnwaldt.de> >>> >>> Hi Julien, >>> >>> On 22 April 2013 21:43, Julien Plu <julien....@redaction-developpez.com> >>> wrote: >>> > I started the code for the extractor and I have a problem with the >>> > regex in >>> > Scala. the string is : >>> > >>> > http://fr.wikipedia.org/w/index.php?title=Mod%C3%A8le:Donn%C3%A9es/Antony/%C3%A9volution_population&action=edit >>> > >>> > And my regex is : val populationRegex = """|pop=(\d+)""".r >>> > >>> > And I use this piece of code : >>> > >>> > populationRegex findAllIn page.children.toString foreach (_ match { >>> > case populationRegex (pop) => println(page.title.decoded + " : pop >>> > : " + >>> > param) >>> >>> What is param? >>> >>> But more generally - did you try using the AST (abstract syntax tree) >>> built by the parser, i.e. the tree whose root node is the PageNode? >>> I'm not sure how good our parser is at dealing with stuff like >>> "<includeonly>" and "{{#switch ...}}", but I think it works and >>> page.children should contain a ParserFunctionNode [1] object for the >>> #switch, which in turn has a child for each branch, e.g. one child for >>> an=2010 and one for pop=61793. These children are PropertyNode [2] >>> objects, which have a key and (who would have thought) more children. >>> Well, in this case, just one child, which is a TextNode. In a >>> nutshell: Find the "#switch" node, find children with keys "an" and >>> "pop", and generate triples for their values. >>> >>> > case _ => >>> > }) >>> > >>> > And instead of to get : "Données/Antony/évolution population : pop : >>> > 61793" >>> > just once >>> > >>> > I have many : "Données/Antony/évolution population : pop : null" as >>> > much as >>> > there is line in the string >>> > >>> > An idea of what I do wrongly ? >>> > >>> > I'm totally beginner in Scala :-( sorry. >>> >>> Your code excerpt looks pretty good to me. :-) >>> >>> The AST is usually much safer and cleaner than regexes. Regexes are >>> more suitable for unstructured strings, but here you're dealing with >>> pretty clean structures. So I would suggest you write some code that >>> walks through the PageNode tree. If you have any questions, don't >>> hesitate to ask. We're looking forward to your contributions. Thanks! >>> >>> Cheers, >>> JC >>> >>> [1] >>> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/ParserFunctionNode.scala >>> [2] >>> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/PropertyNode.scala >>> >>> > >>> > Best. >>> > >>> > Julien. >>> > >>> > >>> > 2013/4/22 Jona Christopher Sahnwaldt <j...@sahnwaldt.de> >>> >> >>> >> The templates where data is stored are not used directly in the main >>> >> pages. It's a complicated process: page Toulouse uses template X, X >>> >> uses Y, >>> >> Y uses Z, and Z contains the data. Something like that, I'm 100% sure, >>> >> but >>> >> the details don't matter. This means that wikiPageUsesTemplate and >>> >> InfoboxExtractor won't help. >>> >> >>> >> Generating a separate file is probably the best idea. We could also >>> >> send >>> >> these new triples to the main mapping based file, but that might be >>> >> confusing: first, they're not mapping based; second, new triples about >>> >> a >>> >> city would be added in a completely different place in the file. >>> >> (That's not >>> >> a big problem though.) >>> >> >>> >> Cheers, >>> >> JC >>> > >>> > >> >> > ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion