Re: [MarkLogic Dev General] Wikimedia parse

2012-06-27 Thread David Lee
Ah the API does !!! Who hoo.
Maybe I can get XML out of this after all ... I smell an xmlsh extension in the 
making :)

I actually have a similar problem with xmlsh docs ... they are all currently in 
WakiWiki ... but thats a black box ... I want to turn them into XML like 
DocBook ...
Someone (Dave Pawson I think ?) wrote me a python lib to do that but only 90% 
... the last 10% as usual is 99% 

-
David Lee
Lead Engineer
MarkLogic Corporation
d...@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.


> -Original Message-
> From: general-boun...@developer.marklogic.com [mailto:general-
> boun...@developer.marklogic.com] On Behalf Of Michael Blakeley
> Sent: Wednesday, June 27, 2012 1:23 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Wikimedia parse
> 
> Yes, it does. The API gives you access to much of that wiki-structured markup,
> but you have to decide what to do with it. Naturally the online tool and much
> of the sample code doesn't do anything interesting.
> 
> -- Mike
> 
> On 27 Jun 2012, at 10:19 , David Lee wrote:
> 
> > Thanks,  I tried the online tool on a sample I have and it strips out much 
> > of
> the meaningful stuff :(
> >
> > ---  Input
> > {{Infobox settlement
> > <!--See the Table at Infobox Settlement for all fields and descriptions 
> > of
> usage-->
> > <!-- Basic info  >
> > |name  = Teichibe
> > |other_name =
> > |native_name=  <!-- for cities whose native name is not in 
> > English --
> >
> > |nickname   =
> > |settlement_type=Village
> > |motto  =
> > <!-- images and maps  --->
> > |image_skyline  =
> > |imagesize  =
> > |image_caption  =
> > |image_flag =
> > |flag_size  =
> > |image_seal =
> > |seal_size  =
> > |image_shield   =
> > |shield_size=
> > |image_map  =
> > |mapsize=
> > |map_caption=
> > |pushpin_map=Mali<!-- the name of a location map as per
> http://en.wikipedia.org/wiki/Template:Location_map -->
> > |pushpin_label_position =bottom
> > |pushpin_mapsize=300
> > |pushpin_map_caption=Location in Mali
> > <!-- Location -->
> > |coordinates_display= inline,title
> > |coordinates_region = ML
> > |subdivision_type   = Country
> > |subdivision_name   = {{flag|Mali}}
> > |subdivision_type1  = [[Regions of Mali|Region]]
> > |subdivision_name1  = [[Kayes Region]]
> > |subdivision_type2  =[[Cercles of Mali|Cercle]]
> > |subdivision_name2  = [[Kayes Cercle]]
> > |subdivision_type3  =[[Communes of Mali|Commune]]
> > |subdivision_name3  = [[Karakoro]]
> > |<!-- Politics ->
> > |government_footnotes   =
> > |government_type=
> > |leader_title   =
> > |leader_name=
> > |leader_title1  =  <!-- for places with, say, both a mayor and a 
> > city
> manager -->
> > |leader_name1   =
> > |established_title  =  <!-- Settled -->
> > |established_date   =
> > <!-- Area->
> > |area_magnitude =
> > |unit_pref=Imperial <!--Enter: Imperial, if Imperial 
> > (metric) is
> desired-->
> > |area_footnotes   =
> > |area_total_km2   =  <!-- ALL fields dealing with a measurements 
> > are
> subject to automatic unit conversion-->
> > |area_land_km2= <!--See table @ Template:Infobox Settlement 
> > for
> details on automatic unit conversion-->
> > <!-- Population   --->
> > |population_as_of   =
> > |population_footnotes   =
> > |population_note=
> > |population_total   =
> > |population_density_km2 =
> > |population_density_sq_mi   =

Re: [MarkLogic Dev General] Wikimedia parse

2012-06-27 Thread Michael Blakeley
y:Populated places in the Kayes Region]]
> 
> 
> {{Kayes-geo-stub}}
> 
> --  Output
> 
> {{Infobox settlement
> <!--See the Table at Infobox Settlement for all fields and descriptions 
> of usage-->
> <!-- Basic info  >}} 
> Teichibe is a village and principal settlement ( href="Chef-lieu" title="chef-lieu">chef-lieu) of the  href="Karakoro" title="Karakoro">commune of Karakoro in the  href="Kayes_Cercle" title="Kayes Cercle">Cercle of Kayes in the  href="Kayes_Region" title="Kayes Region">Kayes Region of south-western  href="Mali" title="Mali">Mali.<ref>{{citation}}.</ref>
> 
> References
> {{reflist}}
> 
> {{Kayes-geo-stub}} 
> 
> 
> -
> David Lee
> Lead Engineer
> MarkLogic Corporation
> d...@marklogic.com
> Phone: +1 650-287-2531
> Cell:  +1 812-630-7622
> www.marklogic.com
> 
> This e-mail and any accompanying attachments are confidential. The 
> information is intended solely for the use of the individual to whom it is 
> addressed. Any review, disclosure, copying, distribution, or use of this 
> e-mail communication by others is strictly prohibited. If you are not the 
> intended recipient, please notify us immediately by returning this message to 
> the sender and delete all copies. Thank you for your cooperation.
> 
> 
>> -Original Message-
>> From: general-boun...@developer.marklogic.com [mailto:general-
>> boun...@developer.marklogic.com] On Behalf Of Michael Blakeley
>> Sent: Wednesday, June 27, 2012 1:14 PM
>> To: MarkLogic Developer Discussion
>> Subject: Re: [MarkLogic Dev General] Wikimedia parse
>> 
>> Not in XQuery: it would be much too ugly for my taste. I've used
>> http://code.google.com/p/gwtwiki/ and contributed a couple of patches.
>> Hsiao could show you some sample code with xhtml-like output.
>> 
>> If you need to use it from XQuery, I suppose you could wrap it in a web 
>> service.
>> 
>> -- Mike
>> 
>> On 27 Jun 2012, at 09:59 , David Lee wrote:
>> 
>>> Has anyone seen a XQuery or XSLT parser for WikiMedia (markup for
>> Wikipedia)
>>> 
>>> I found this list
>>> 
>>> http://www.mediawiki.org/wiki/Alternative_parsers
>>> 
>>> 
>>> What I'm looking for is a way to take the XML dump of Wikipedia and enrich
>> it to something more useful.  Right now all the body of an article is in
>> Wikimedia format and largely opaque to ML except as one long string.
>>> 
>>> 
>>> -
>>> David Lee
>>> Lead Engineer
>>> MarkLogic Corporation
>>> d...@marklogic.com
>>> Phone: +1 650-287-2531
>>> Cell:  +1 812-630-7622
>>> www.marklogic.com
>>> 
>>> This e-mail and any accompanying attachments are confidential. The
>> information is intended solely for the use of the individual to whom it is
>> addressed. Any review, disclosure, copying, distribution, or use of this 
>> e-mail
>> communication by others is strictly prohibited. If you are not the intended
>> recipient, please notify us immediately by returning this message to the 
>> sender
>> and delete all copies. Thank you for your cooperation.
>>> 
>>> ___
>>> General mailing list
>>> General@developer.marklogic.com
>>> http://community.marklogic.com/mailman/listinfo/general
>> 
>> ___
>> General mailing list
>> General@developer.marklogic.com
>> http://community.marklogic.com/mailman/listinfo/general
> ___
> General mailing list
> General@developer.marklogic.com
> http://community.marklogic.com/mailman/listinfo/general
> 

___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Wikimedia parse

2012-06-27 Thread David Lee
ny 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.


> -Original Message-
> From: general-boun...@developer.marklogic.com [mailto:general-
> boun...@developer.marklogic.com] On Behalf Of Michael Blakeley
> Sent: Wednesday, June 27, 2012 1:14 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Wikimedia parse
> 
> Not in XQuery: it would be much too ugly for my taste. I've used
> http://code.google.com/p/gwtwiki/ and contributed a couple of patches.
> Hsiao could show you some sample code with xhtml-like output.
> 
> If you need to use it from XQuery, I suppose you could wrap it in a web 
> service.
> 
> -- Mike
> 
> On 27 Jun 2012, at 09:59 , David Lee wrote:
> 
> > Has anyone seen a XQuery or XSLT parser for WikiMedia (markup for
> Wikipedia)
> >
> > I found this list
> >
> > http://www.mediawiki.org/wiki/Alternative_parsers
> >
> >
> > What I'm looking for is a way to take the XML dump of Wikipedia and enrich
> it to something more useful.  Right now all the body of an article is in
> Wikimedia format and largely opaque to ML except as one long string.
> >
> >
> > -
> > David Lee
> > Lead Engineer
> > MarkLogic Corporation
> > d...@marklogic.com
> > Phone: +1 650-287-2531
> > Cell:  +1 812-630-7622
> > www.marklogic.com
> >
> > This e-mail and any accompanying attachments are confidential. The
> information is intended solely for the use of the individual to whom it is
> addressed. Any review, disclosure, copying, distribution, or use of this 
> e-mail
> communication by others is strictly prohibited. If you are not the intended
> recipient, please notify us immediately by returning this message to the 
> sender
> and delete all copies. Thank you for your cooperation.
> >
> > ___
> > General mailing list
> > General@developer.marklogic.com
> > http://community.marklogic.com/mailman/listinfo/general
> 
> ___
> General mailing list
> General@developer.marklogic.com
> http://community.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Wikimedia parse

2012-06-27 Thread Michael Blakeley
Not in XQuery: it would be much too ugly for my taste. I've used 
http://code.google.com/p/gwtwiki/ and contributed a couple of patches. Hsiao 
could show you some sample code with xhtml-like output.

If you need to use it from XQuery, I suppose you could wrap it in a web service.

-- Mike

On 27 Jun 2012, at 09:59 , David Lee wrote:

> Has anyone seen a XQuery or XSLT parser for WikiMedia (markup for Wikipedia)
>  
> I found this list
>  
> http://www.mediawiki.org/wiki/Alternative_parsers
>  
>  
> What I'm looking for is a way to take the XML dump of Wikipedia and enrich it 
> to something more useful.  Right now all the body of an article is in 
> Wikimedia format and largely opaque to ML except as one long string.
>  
>  
> -
> David Lee
> Lead Engineer
> MarkLogic Corporation
> d...@marklogic.com
> Phone: +1 650-287-2531
> Cell:  +1 812-630-7622
> www.marklogic.com
> 
> This e-mail and any accompanying attachments are confidential. The 
> information is intended solely for the use of the individual to whom it is 
> addressed. Any review, disclosure, copying, distribution, or use of this 
> e-mail communication by others is strictly prohibited. If you are not the 
> intended recipient, please notify us immediately by returning this message to 
> the sender and delete all copies. Thank you for your cooperation.
>  
> ___
> General mailing list
> General@developer.marklogic.com
> http://community.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] Wikimedia parse

2012-06-27 Thread David Lee
Has anyone seen a XQuery or XSLT parser for WikiMedia (markup for Wikipedia)

I found this list

http://www.mediawiki.org/wiki/Alternative_parsers


What I'm looking for is a way to take the XML dump of Wikipedia and enrich it 
to something more useful.  Right now all the body of an article is in Wikimedia 
format and largely opaque to ML except as one long string.


-
David Lee
Lead Engineer
MarkLogic Corporation
d...@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.

___
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general