We discussed a potential approach to scaling content extraction on wiktionary across languages in https://phabricator.wikimedia.org/T138709. The idea was to define HTML microformats that can be integrated in templates to provide a uniform marker for specific bits of content.
On Jan 12, 2017 1:17 PM, "Corey Floyd" <cfl...@wikimedia.org> wrote: > Hey all, just want to drop in some thoughts on his thread… > > I would say the premise of this email is absolutely right: Parsing out > this data by hand coding against specific HTML structures is untenable. > > On Wikipedia we have a wealth of community curated information: featured > articles, in the news, on this day, etc… > > Over the past year, the Reading team has been working with Services to > setup some basic infrastructure for ingesting some of this data and making > it available as structured data via an API that any client can ingest. This > includes our own WMF maintained projects (apps, mobile web, desktop) > > Because of the nature of our projects, it is difficult to extract this > information in a uniform way across all wikis. Though this is clearly the > target (and in line with our mission), before we invest significant time in > developing such a standardized method for each service, we need to first > deploy an API to test whether these services are actually a good direction > for our products, services, and mission. > > To that end, we develop each new service on en.wikipedia first in a method > that we do not intend to scale. > > Now that some of these services have proven useful, we have begun efforts > to develop a way for all project maintainers to opt in to making their > curated content available for consumption in these APIs. > > You can see some tickets focused on this effort here: > https://phabricator.wikimedia.org/T150806 > https://phabricator.wikimedia.org/T152284 > https://phabricator.wikimedia.org/T148680 > > We have also created some draft documentation that we are currently > gathering feedback on to see if it is viable for all projects here: > https://www.mediawiki.org/wiki/User:CFloyd_(WMF)/Feed_Markup_Documentation > > Additionally, we have also added additional resources to our Reading > Infrastructure team (which maintains our services) in part to help with > this effort. > > All this is to say, is that creating and scaling these services to > multiple wikis is a continuing effort. While we would love to deploy a > solution to all projects at once, in order to make the problem tractable, > we are tackling it in steps and re-evaluating our assumptions along the > way. Hopefully this explains our thinking and the projects in a way that > make sense. > > Because this is a large project, we are looking for solutions and help to > spread these services across all wikis - If you have or anyone time and > would like to help, the tickets and documentation above are great place you > to contribute to the process. > > Thanks for any help > Corey > > > On Thu, Jan 12, 2017 at 10:45 AM Monte Hurd <mh...@wikimedia.org> wrote: > > Thanks Florian. > > I see your point about ease of forking. Would you have time later, perhaps > off thread, to detail the challenges you faced? > > Regarding the data endpoint topic of this thread, it isn't app-specific > despite being part of 'mediawiki/services/mobileapps'. We'll probably > want a more generic name with only truly app specific endpoints named > accordingly. > > > On Thu, Jan 12, 2017 at 10:05 AM, Florian Schmidt < > florian.schmidt.wel...@t-online.de> wrote: > > What is also in my mind: the app wasn't easy to fork and use on other > (third-party) projects before that, too, however with the track in this > direction (specific Parsons of contents, which apply to Wikimedia projects > only) makes it, in reality, impossible to fork (and maintain) the app > projects to other, MediaWiki based, third-party projects. I understand, > that this might not be the goal of the WMF, which in my personal opinion > isn't quite correct, however it's very very sad. I tried to maintain an > up-to-date version of the Wikipedia app some time ago, but it takes so much > time, as it is so Wikimedia, and not MediaWiki specific, that I ended up > mostly stopping the effort in this direction. > > This is probably not a response which should be in this thread, and I > apologize for that, however I wanted to say that somewhere. > > Best, > Florian > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > >
_______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l