Dear Wikimedians, I'm sharing with all of you a digest email from the "offline-l list" to which I subscribe since I believe it has two items of wider interest than the usual ones on that list.
The first one concerns a new facility called Open ZIMfarm which automates the curation of offline archives of web material (not just wikis). I believe this is bringing a step-change in the possibilities of decentralised content hosting. The second describes the recent deployment of the Kiwix-serve application in a commercial telecoms network in West Africa (Kiwix stores and presents the very same ZIM files created by ZIMfarm). Given that the Wikimedia Foundation no longer funds zero-rating of its products, this represents a new way to bring content to people free of charge. Not only cellular networks but local-government-supported WiFi providers such as Project Isizwe as well as community-owned and -operated networks can do this. I will be interested to know the feelings of the community on these. Regards, ---------- Forwarded message --------- From: <offline-l-requ...@lists.wikimedia.org> Date: Wed, Jun 24, 2020 at 6:01 PM Subject: Offline-l Digest, Vol 100, Issue 10 To: <offlin...@lists.wikimedia.org> Send Offline-l mailing list submissions to offlin...@lists.wikimedia.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/offline-l or, via email, send a message with subject or body 'help' to offline-l-requ...@lists.wikimedia.org You can reach the person managing the list at offline-l-ow...@lists.wikimedia.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Offline-l digest..." Today's Topics: 1. Re: [KIWIX] Our openZIM farm... (Samuel Klein) 2. Re: [KIWIX] Our openZIM farm... (Emmanuel Engelhart) 3. [AAR] Interesting online/offline use case across West Africa (Stephane Coillet-Matillon) 4. Re: [AAR] Interesting online/offline use case across West Africa (Federico Leva (Nemo)) ---------------------------------------------------------------------- Message: 1 Date: Wed, 24 Jun 2020 10:09:04 -0400 From: Samuel Klein <meta...@gmail.com> To: Using Wikimedia projects and MediaWiki offline <offlin...@lists.wikimedia.org> Subject: Re: [Offline-l] [KIWIX] Our openZIM farm... Message-ID: <caatu9wj14ypajqikyqownk2nh1mbnstc5tprou8h+ijcdfu...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Wow, this is fabulous. If a new zimfarm starts up, can it coordinate with existing ones? On Tue, Jun 23, 2020 at 3:23 AM Emmanuel Engelhart <kel...@kiwix.org> wrote: > Hi > > There is a topic I wanted to talk about here for a long time and for > which I never have achieved to take the time to write something. A few > recent events have been a healthy remember that I should present one our > most recent and most useful tool: Zimfarm. > > The Zimfarm is the online tool which is in charge of building and > publishing all our ZIM files. After years of creating ZIM files by > launching scrapers more or less manually, we had to automatise the > process to just be able to scale the operations, ie. publishing more and > more often ZIM files. > > The effort started 3 years ago with the support of the WMF but we use it > only since Spring 2019 in production. The tool is now perfectly running > and we fully rely on it now. If we can publish an update of all our > wikis one time a month, this is thanks to this piece of software too. > > The Zimfarm is a half-decentralized solution which has a central node > (called "dispatcher") in charge of orchestrating the work to do and > multiple decentralized nodes (called "workers") which run the scraping > tasks. > > The dispatcher provides an API to manage the ZIM recipes and tasks, have > a look to https://api.farm.openzim.org/. We have setup a Web frontend on > this API to allow easy mgmt through a Web browser. For a better > transparency, even anonymous users can have a look and monitor what is > going on. Look at https://farm.openzim.org/. > > One important point is that, like all the rest of our infrastructure, > the whole system is Dockerized. Which means, this is really easy to > install a Zimfarm worker and we invite anybody having a spare server to > help us to provide offline snapshots of the best of the Web. The > procedure is documented and a few volunteers have already joined in. > Look at https://farm.openzim.org/about for more details. > > The development is fully transparent at > https://github.com/openzim/zimfarm. We have a few things which are on > the roadmap which would welcome volunteer Python developers. Look at the > good first issues and make your first PR! > > https://github.com/openzim/zimfarm/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 > > Regards > Emmanuel > > -- > Kiwix - Wikipedia Offline & more > * Web: https://kiwix.org/ > * Twitter: https://twitter.com/KiwixOffline > * Wiki: https://wiki.kiwix.org/ > > _______________________________________________ > Offline-l mailing list > offlin...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/offline-l > -- Samuel Klein @metasj w:user:sj +1 617 529 4266 -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://lists.wikimedia.org/pipermail/offline-l/attachments/20200624/129eef4e/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 24 Jun 2020 16:36:44 +0200 From: Emmanuel Engelhart <kel...@kiwix.org> To: Using Wikimedia projects and MediaWiki offline <offlin...@lists.wikimedia.org>, Samuel Klein <meta...@gmail.com> Subject: Re: [Offline-l] [KIWIX] Our openZIM farm... Message-ID: <e8c2648b-2090-1312-53f3-d1cb93036...@kiwix.org> Content-Type: text/plain; charset="utf-8" Not sure what you mean exactly with "coordinate". You can write a tool which can do that because both expose everything via API. On our side, with farm.openzim.org, we have already a few third party orgs which use it for their own stuff. Emmanuel On 24.06.20 16:09, Samuel Klein wrote: > Wow, this is fabulous. If a new zimfarm starts up, can it coordinate > with existing ones? > > On Tue, Jun 23, 2020 at 3:23 AM Emmanuel Engelhart <kel...@kiwix.org > <mailto:kel...@kiwix.org>> wrote: > > Hi > > There is a topic I wanted to talk about here for a long time and for > which I never have achieved to take the time to write something. A few > recent events have been a healthy remember that I should present one our > most recent and most useful tool: Zimfarm. > > The Zimfarm is the online tool which is in charge of building and > publishing all our ZIM files. After years of creating ZIM files by > launching scrapers more or less manually, we had to automatise the > process to just be able to scale the operations, ie. publishing more and > more often ZIM files. > > The effort started 3 years ago with the support of the WMF but we use it > only since Spring 2019 in production. The tool is now perfectly running > and we fully rely on it now. If we can publish an update of all our > wikis one time a month, this is thanks to this piece of software too. > > The Zimfarm is a half-decentralized solution which has a central node > (called "dispatcher") in charge of orchestrating the work to do and > multiple decentralized nodes (called "workers") which run the scraping > tasks. > > The dispatcher provides an API to manage the ZIM recipes and tasks, have > a look to https://api.farm.openzim.org/. We have setup a Web frontend on > this API to allow easy mgmt through a Web browser. For a better > transparency, even anonymous users can have a look and monitor what is > going on. Look at https://farm.openzim.org/. > > One important point is that, like all the rest of our infrastructure, > the whole system is Dockerized. Which means, this is really easy to > install a Zimfarm worker and we invite anybody having a spare server to > help us to provide offline snapshots of the best of the Web. The > procedure is documented and a few volunteers have already joined in. > Look at https://farm.openzim.org/about for more details. > > The development is fully transparent at > https://github.com/openzim/zimfarm. We have a few things which are on > the roadmap which would welcome volunteer Python developers. Look at the > good first issues and make your first PR! > https://github.com/openzim/zimfarm/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 > > Regards > Emmanuel > > -- > Kiwix - Wikipedia Offline & more > * Web: https://kiwix.org/ > * Twitter: https://twitter.com/KiwixOffline > * Wiki: https://wiki.kiwix.org/ > > _______________________________________________ > Offline-l mailing list > offlin...@lists.wikimedia.org <mailto:offlin...@lists.wikimedia.org> > https://lists.wikimedia.org/mailman/listinfo/offline-l > > > > -- > Samuel Klein @metasj w:user:sj +1 617 529 4266 > > _______________________________________________ > Offline-l mailing list > offlin...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/offline-l > -- Kiwix - Wikipedia Offline & more * Web: https://kiwix.org/ * Twitter: https://twitter.com/KiwixOffline * Wiki: https://wiki.kiwix.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: OpenPGP digital signature URL: < https://lists.wikimedia.org/pipermail/offline-l/attachments/20200624/e38068b1/attachment-0001.sig > ------------------------------ Message: 3 Date: Wed, 24 Jun 2020 17:08:34 +0200 From: Stephane Coillet-Matillon <steph...@kiwix.org> To: Using Wikimedia projects and MediaWiki offline <offlin...@lists.wikimedia.org> Subject: [Offline-l] [AAR] Interesting online/offline use case across West Africa Message-ID: <5e9a1a97-5422-426d-a03d-5ff969e12...@kiwix.org> Content-Type: text/plain; charset="utf-8" Hello everyone, We might as well keep this list rolling - it’s been an eventful couple of months and there’s plenty to tell. Just as COVID-19 lockdowns started to roll out across much of the world, our good friends at Orange (the French Telco) reached out asking for us to roll out Kiwix directly onto their West African network. So yes, here’s the short story of us making offline available online (!) Background In a nutshell, it’s easier/faster for a telco to carry data on its own, local network than it is to carry that same amount of data internationally. It does make sense in hindsight, particularly if you think of the internets as a series of tube. Mission They asked that we roll out Kiwix and a collection of ZIMs in Arabic, French and English onto their Ivory Coast hub: Orange customers were to be directed to a specific page[1] and would be offered the content at zero-rating or special low rate (markets could chose their pricing model). 11 markets were selected for the operation (mostly sub-saharan Africa). We rolled-out the whole thing in a few days using Kiwix-serve[2] - most of the time needed was for them to secure a big-ass server and grant us root access. It’s been running smoothly ever since - up to 100,000 users/month at peak, which was nice. Contents deployed were Wikipedia, Khan Academy, Wiktionary, Vikidia and a couple of video channels we also serve as ZIMs. So what did we learn? - Kiwix-serve is super easy to install, and can manage large loads robustly; - Most demanded contents: Wikipedia and Khan Academy, then Wiktionary & Gutenberg library; - Information circulated around somehow: we’ve had users from 130 countries so far (about 20-30% of total traffic), definitely not bots. A gentleman from An-Najah university in Palestine even reached out asking that we deploy the same thing on their local network. - The URL that Orange set up was overly long, which probably impacted adoption. We lobbied to get https://kiwix.orange (they own the TLD) but to no avail :-/ There is also a huge difference between markets that communicated on the initiative in a sustained manner (e.g. Liberia) and those who did it as a one-off. Cookie points They made a simple but sweet video[3] - in French only but you’ll get the idea. [1] https://kiwix.campusafrica.gos.orange.com/ < https://kiwix.campusafrica.gos.orange.com/> [2] https://github.com/kiwix/kiwix-tools < https://github.com/kiwix/kiwix-tools> [3] https://www.youtube.com/watch?v=2Ug0XEFhByc < https://www.youtube.com/watch?v=2Ug0XEFhByc> -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://lists.wikimedia.org/pipermail/offline-l/attachments/20200624/528881ce/attachment-0001.html > ------------------------------ Message: 4 Date: Wed, 24 Jun 2020 19:00:44 +0300 From: "Federico Leva (Nemo)" <nemow...@gmail.com> To: Using Wikimedia projects and MediaWiki offline <offlin...@lists.wikimedia.org>, Stephane Coillet-Matillon <steph...@kiwix.org> Subject: Re: [Offline-l] [AAR] Interesting online/offline use case across West Africa Message-ID: <a392b364-3bbb-51b5-c967-b3c7f27a6...@gmail.com> Content-Type: text/plain; charset=utf-8 Stephane Coillet-Matillon, 24/06/20 18:08: > We rolled-out the whole thing in a few days using Kiwix-serve[2] - most of the time needed was for them to secure a big-ass server and grant us root access. It’s been running smoothly ever since - up to 100,000 users/month at peak, which was nice. Contents deployed were Wikipedia, Khan Academy, Wiktionary, Vikidia and a couple of video channels we also serve as ZIMs. > > So what did we learn? > - Kiwix-serve is super easy to install, and can manage large loads robustly; This is excellent! I think it's good news for digital preservation purposes, too. When a dynamic website is retired, in the future you can "just" archive it is a static website in HTML and serve it with a proxy. Currently this is only possible with WARC-proxy and I'm not aware of anyone using such technologies at scale before this. Also, compare to the cost of running the Wikipedia Zero initiative, which needed a lot of software configuration in MediaWiki and Wikimedia clusters. Serving a dump is not as good as serving dynamic content, but being able to do it independently from Wikimedia Foundation is a giant plus. > They made a simple but sweet video[3] - in French only but you’ll get the idea. > [3] https://www.youtube.com/watch?v=2Ug0XEFhByc Cute. Federico ------------------------------ Subject: Digest Footer _______________________________________________ Offline-l mailing list offlin...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l ------------------------------ End of Offline-l Digest, Vol 100, Issue 10 ****************************************** -- Michael Graaf, M.I.T.(UCT) Researcher, Editor & Community Informatics Practitioner Mob +27795487242 WhatsApp +27647754342 ORCID 0000-0002-1951-5739 _______________________________________________ WikimediaZA mailing list WikimediaZA@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaza To unsubscribe, send an email to wikimediaza-unsubscr...@lists.wikimedia.org