Thanks Sebastian for the suggestions, I was not aware of Parsoid. Feel free to come back to me if you need support for that GSoC idea. The research unit I collaborate with at DERI quite focuses on massive data acquisition from the web, so we may be of help. Cheers,
On 4/22/13 9:14 PM, Sebastian Hellmann wrote: > Hi Marco, > did you have a look at Parsoid [1]. > > I am really not sure, what is the best way, to parse data out of > Wikisyntax. The XML configs produced for Wiktionary by Jonas Master > thesis seem to be quite alright for "normal" users. So I would hope that > we can take it from there and just build an infrastructure around this. > It might make sense to combine both sources and ways to parse Wiktionary > to get better results in the end. > > The most important thing seems to be a user-freindly process to involve > Wiktionary users, however. > > All the best, > Sebastian > > > > [1] http://www.mediawiki.org/wiki/Parsoid > > > > Am 18.04.2013 16:50, schrieb Marco Fossati: >> Definitely, that's why Sebastian's idea can become a very interesting >> GSoC project. >> >> On 4/18/13 4:41 PM, Pablo N. Mendes wrote: >>> >>> The difference between JSON and HTML are 15min, Scala and IntelliJ. :) >>> >>> I'd think the important part is how the markup is parsed, templates, >>> resolved, etc. >>> >>> >>> On Thu, Apr 18, 2013 at 4:39 PM, Marco Fossati <hell.j....@gmail.com >>> <mailto:hell.j....@gmail.com>> wrote: >>> >>> I can't say if it's a competitor. The main difference relies in the >>> output, which is structured data (JSON) instead of semi-structured >>> data (HTML). >>> For more details, see the slides [1]. >>> Cheers! >>> >>> [1] http://www.slideshare.net/__spaziodati/introducing-__jsonpedia >>> <http://www.slideshare.net/spaziodati/introducing-jsonpedia> >>> >>> >>> On 4/18/13 4:23 PM, Pablo N. Mendes wrote: >>> >>> >>> Is JSONPedia a competitor of gwtwiki and Sweble? >>> >>> https://code.google.com/p/__gwtwiki/ >>> <https://code.google.com/p/gwtwiki/> >>> http://en.wikipedia.org/wiki/__Sweble#The_current_state_of___parsing >>> <http://en.wikipedia.org/wiki/Sweble#The_current_state_of_parsing> >>> >>> >>> On Thu, Apr 18, 2013 at 4:18 PM, Marco Fossati >>> <hell.j....@gmail.com <mailto:hell.j....@gmail.com> >>> <mailto:hell.j....@gmail.com <mailto:hell.j....@gmail.com>>> >>> wrote: >>> >>> Hi Pablo, >>> >>> It's a low-level generic parser for MediaWiki content. >>> It converts all the content of any MediaWiki resource into >>> structured data. The output could be JSON (as it is now), >>> JSON-LD or >>> RDF, i.e., it can be modeled for our needs. >>> Compared to DBpedia extraction framework, it does not >>> make any >>> processing on the semantics of data e.g. on infoboxes, but >>> handles >>> every content item e.g. article body, tables, etc. >>> I see some similarities with the Wiktionary extraction >>> project [1] >>> that Sebastian mentioned in the GSoC idea. >>> Since Sebastian proposed to configure the Wiktionary >>> extractor in >>> order to parse other Wikis, I was just wondering if these 2 >>> projects >>> were complementary, could be merged or whatever could help. >>> Of course, JSONpedia will be released with an open source >>> licence. >>> >>> @Sebastian, can you give us some more thoughts about that? >>> Cheers! >>> >>> [1] http://dbpedia.org/Wiktionary >>> >>> >>> On 4/18/13 11:32 AM, Pablo N. Mendes wrote: >>> >>> >>> What does it offer that the DEF does not have? >>> >>> Cheers, >>> Pablo >>> >>> >>> On Wed, Apr 17, 2013 at 10:33 PM, Marco Fossati >>> <hell.j....@gmail.com <mailto:hell.j....@gmail.com> >>> <mailto:hell.j....@gmail.com <mailto:hell.j....@gmail.com>> >>> <mailto:hell.j....@gmail.com >>> <mailto:hell.j....@gmail.com> <mailto:hell.j....@gmail.com >>> <mailto:hell.j....@gmail.com>>>__> wrote: >>> >>> Hi Sebastian, >>> >>> I was wondering if the JSONpedia project [1] >>> could be >>> helpful for the >>> idea you are mentoring for GSoC 2013. >>> Have a look at the slides [2]. >>> What do you think about? >>> Let me know. >>> Cheers, >>> >>> [1] >>> http://json.it.dbpedia.org/____frontend/form.html >>> <http://json.it.dbpedia.org/__frontend/form.html> >>> <http://json.it.dbpedia.org/__frontend/form.html >>> <http://json.it.dbpedia.org/frontend/form.html>> >>> [2] >>> http://www.slideshare.net/____spaziodati/introducing-____jsonpedia >>> <http://www.slideshare.net/__spaziodati/introducing-__jsonpedia> >>> >>> >>> <http://www.slideshare.net/__spaziodati/introducing-__jsonpedia >>> <http://www.slideshare.net/spaziodati/introducing-jsonpedia>> >>> -- >>> Marco Fossati >>> http://about.me/marco.fossati >>> Twitter: @hjfocs >>> Skype: hell_j >>> >>> >>> >>> ------------------------------____----------------------------__--__------------------ >>> >>> >>> Precog is a next-generation analytics platform >>> capable of >>> advanced >>> analytics on semi-structured data. The platform >>> includes >>> APIs for >>> building >>> apps and a phenomenal toolset for data science. >>> Developers >>> can use >>> our toolset for easy data analysis & >>> visualization. Get a >>> free account! >>> http://www2.precog.com/____precogplatform/____slashdotnewsletter >>> <http://www2.precog.com/__precogplatform/__slashdotnewsletter> >>> >>> <http://www2.precog.com/__precogplatform/__slashdotnewsletter >>> <http://www2.precog.com/precogplatform/slashdotnewsletter>> >>> ___________________________________________________ >>> Dbpedia-gsoc mailing list >>> Dbpedia-gsoc@lists.__sourcefor__ge.net >>> <http://sourceforge.net> >>> <mailto:Dbpedia-gsoc@lists.__sourceforge.net >>> <mailto:Dbpedia-gsoc@lists.sourceforge.net>> >>> <mailto:Dbpedia-gsoc@lists. >>> <mailto:Dbpedia-gsoc@lists.>__s__ourceforge.net >>> <http://sourceforge.net> >>> <mailto:Dbpedia-gsoc@lists.__sourceforge.net >>> <mailto:Dbpedia-gsoc@lists.sourceforge.net>>> >>> >>> https://lists.sourceforge.net/____lists/listinfo/dbpedia-gsoc >>> <https://lists.sourceforge.net/__lists/listinfo/dbpedia-gsoc> >>> >>> >>> <https://lists.sourceforge.__net/lists/listinfo/dbpedia-__gsoc >>> <https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc>> >>> >>> >>> >>> >>> -- >>> >>> Pablo N. Mendes >>> http://pablomendes.com >>> >>> >>> -- >>> Marco Fossati >>> http://about.me/marco.fossati >>> Twitter: @hjfocs >>> Skype: hell_j >>> >>> >>> >>> >>> -- >>> >>> Pablo N. Mendes >>> http://pablomendes.com >>> >>> >>> -- >>> Marco Fossati >>> http://about.me/marco.fossati >>> Twitter: @hjfocs >>> Skype: hell_j >>> >>> >>> >>> >>> -- >>> >>> Pablo N. Mendes >>> http://pablomendes.com >> > > -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr _______________________________________________ Dbpedia-gsoc mailing list Dbpedia-gsoc@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc