Thanks Sebastian for the suggestions, I was not aware of Parsoid.
Feel free to come back to me if you need support for that GSoC idea. The 
research unit I collaborate with at DERI quite focuses on massive data 
acquisition from the web, so we may be of help.
Cheers,

On 4/22/13 9:14 PM, Sebastian Hellmann wrote:
> Hi Marco,
> did you have a look at Parsoid [1].
>
> I am really not sure, what is the best way, to parse data out of
> Wikisyntax. The XML configs produced for Wiktionary by Jonas Master
> thesis seem to be quite alright for "normal" users. So I would hope that
> we can take it from there and just build an infrastructure around this.
> It might make sense to combine both sources and ways to parse Wiktionary
> to get better results in the end.
>
> The most important thing seems to be a user-freindly process to involve
> Wiktionary users, however.
>
> All the best,
> Sebastian
>
>
>
> [1] http://www.mediawiki.org/wiki/Parsoid
>
>
>
> Am 18.04.2013 16:50, schrieb Marco Fossati:
>> Definitely, that's why Sebastian's idea can become a very interesting
>> GSoC project.
>>
>> On 4/18/13 4:41 PM, Pablo N. Mendes wrote:
>>>
>>> The difference between JSON and HTML are 15min, Scala and IntelliJ. :)
>>>
>>> I'd think the important part is how the markup is parsed, templates,
>>> resolved, etc.
>>>
>>>
>>> On Thu, Apr 18, 2013 at 4:39 PM, Marco Fossati <hell.j....@gmail.com
>>> <mailto:hell.j....@gmail.com>> wrote:
>>>
>>>     I can't say if it's a competitor. The main difference relies in the
>>>     output, which is structured data (JSON) instead of semi-structured
>>>     data (HTML).
>>>     For more details, see the slides [1].
>>>     Cheers!
>>>
>>>     [1] http://www.slideshare.net/__spaziodati/introducing-__jsonpedia
>>> <http://www.slideshare.net/spaziodati/introducing-jsonpedia>
>>>
>>>
>>>     On 4/18/13 4:23 PM, Pablo N. Mendes wrote:
>>>
>>>
>>>         Is JSONPedia a competitor of gwtwiki and Sweble?
>>>
>>>         https://code.google.com/p/__gwtwiki/
>>>         <https://code.google.com/p/gwtwiki/>
>>> http://en.wikipedia.org/wiki/__Sweble#The_current_state_of___parsing
>>> <http://en.wikipedia.org/wiki/Sweble#The_current_state_of_parsing>
>>>
>>>
>>>         On Thu, Apr 18, 2013 at 4:18 PM, Marco Fossati
>>>         <hell.j....@gmail.com <mailto:hell.j....@gmail.com>
>>>         <mailto:hell.j....@gmail.com <mailto:hell.j....@gmail.com>>>
>>> wrote:
>>>
>>>              Hi Pablo,
>>>
>>>              It's a low-level generic parser for MediaWiki content.
>>>              It converts all the content of any MediaWiki resource into
>>>              structured data. The output could be JSON (as it is now),
>>>         JSON-LD or
>>>              RDF, i.e., it can be modeled for our needs.
>>>              Compared to DBpedia extraction framework, it does not
>>> make any
>>>              processing on the semantics of data e.g. on infoboxes, but
>>>         handles
>>>              every content item e.g. article body, tables, etc.
>>>              I see some similarities with the Wiktionary extraction
>>>         project [1]
>>>              that Sebastian mentioned in the GSoC idea.
>>>              Since Sebastian proposed to configure the Wiktionary
>>>         extractor in
>>>              order to parse other Wikis, I was just wondering if these 2
>>>         projects
>>>              were complementary, could be merged or whatever could help.
>>>              Of course, JSONpedia will be released with an open source
>>>         licence.
>>>
>>>              @Sebastian, can you give us some more thoughts about that?
>>>              Cheers!
>>>
>>>              [1] http://dbpedia.org/Wiktionary
>>>
>>>
>>>              On 4/18/13 11:32 AM, Pablo N. Mendes wrote:
>>>
>>>
>>>                  What does it offer that the DEF does not have?
>>>
>>>                  Cheers,
>>>                  Pablo
>>>
>>>
>>>                  On Wed, Apr 17, 2013 at 10:33 PM, Marco Fossati
>>>                  <hell.j....@gmail.com <mailto:hell.j....@gmail.com>
>>>         <mailto:hell.j....@gmail.com <mailto:hell.j....@gmail.com>>
>>>                  <mailto:hell.j....@gmail.com
>>>         <mailto:hell.j....@gmail.com> <mailto:hell.j....@gmail.com
>>>         <mailto:hell.j....@gmail.com>>>__> wrote:
>>>
>>>                       Hi Sebastian,
>>>
>>>                       I was wondering if the JSONpedia project [1]
>>> could be
>>>                  helpful for the
>>>                       idea you are mentoring for GSoC 2013.
>>>                       Have a look at the slides [2].
>>>                       What do you think about?
>>>                       Let me know.
>>>                       Cheers,
>>>
>>>                       [1]
>>>         http://json.it.dbpedia.org/____frontend/form.html
>>>         <http://json.it.dbpedia.org/__frontend/form.html>
>>> <http://json.it.dbpedia.org/__frontend/form.html
>>> <http://json.it.dbpedia.org/frontend/form.html>>
>>>                       [2]
>>> http://www.slideshare.net/____spaziodati/introducing-____jsonpedia
>>> <http://www.slideshare.net/__spaziodati/introducing-__jsonpedia>
>>>
>>>
>>> <http://www.slideshare.net/__spaziodati/introducing-__jsonpedia
>>> <http://www.slideshare.net/spaziodati/introducing-jsonpedia>>
>>>                       --
>>>                       Marco Fossati
>>>         http://about.me/marco.fossati
>>>                       Twitter: @hjfocs
>>>                       Skype: hell_j
>>>
>>>
>>>
>>> ------------------------------____----------------------------__--__------------------
>>>
>>>
>>>                       Precog is a next-generation analytics platform
>>>         capable of
>>>                  advanced
>>>                       analytics on semi-structured data. The platform
>>>         includes
>>>                  APIs for
>>>                       building
>>>                       apps and a phenomenal toolset for data science.
>>>         Developers
>>>                  can use
>>>                       our toolset for easy data analysis &
>>>         visualization. Get a
>>>                  free account!
>>> http://www2.precog.com/____precogplatform/____slashdotnewsletter
>>> <http://www2.precog.com/__precogplatform/__slashdotnewsletter>
>>>
>>> <http://www2.precog.com/__precogplatform/__slashdotnewsletter
>>> <http://www2.precog.com/precogplatform/slashdotnewsletter>>
>>> ___________________________________________________
>>>                       Dbpedia-gsoc mailing list
>>>                  Dbpedia-gsoc@lists.__sourcefor__ge.net
>>>         <http://sourceforge.net>
>>>                  <mailto:Dbpedia-gsoc@lists.__sourceforge.net
>>>         <mailto:Dbpedia-gsoc@lists.sourceforge.net>>
>>>                       <mailto:Dbpedia-gsoc@lists.
>>>         <mailto:Dbpedia-gsoc@lists.>__s__ourceforge.net
>>>         <http://sourceforge.net>
>>>                  <mailto:Dbpedia-gsoc@lists.__sourceforge.net
>>> <mailto:Dbpedia-gsoc@lists.sourceforge.net>>>
>>>
>>> https://lists.sourceforge.net/____lists/listinfo/dbpedia-gsoc
>>> <https://lists.sourceforge.net/__lists/listinfo/dbpedia-gsoc>
>>>
>>>
>>> <https://lists.sourceforge.__net/lists/listinfo/dbpedia-__gsoc
>>> <https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc>>
>>>
>>>
>>>
>>>
>>>                  --
>>>
>>>                  Pablo N. Mendes
>>>         http://pablomendes.com
>>>
>>>
>>>              --
>>>              Marco Fossati
>>>         http://about.me/marco.fossati
>>>              Twitter: @hjfocs
>>>              Skype: hell_j
>>>
>>>
>>>
>>>
>>>         --
>>>
>>>         Pablo N. Mendes
>>>         http://pablomendes.com
>>>
>>>
>>>     --
>>>     Marco Fossati
>>>     http://about.me/marco.fossati
>>>     Twitter: @hjfocs
>>>     Skype: hell_j
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Pablo N. Mendes
>>> http://pablomendes.com
>>
>
>

-- 
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to