We discussed a potential approach to scaling content extraction on
wiktionary across languages in https://phabricator.wikimedia.org/T138709.
The idea was to define HTML microformats that can be integrated in
templates to provide a uniform marker for specific bits of content.

On Jan 12, 2017 1:17 PM, "Corey Floyd" <cfl...@wikimedia.org> wrote:

> Hey all, just want to drop in some thoughts on his thread…
>
> I would say the premise of this email is absolutely right: Parsing out
> this data by hand coding against specific HTML structures is untenable.
>
> On Wikipedia we have a wealth of community curated information: featured
> articles, in the news, on this day, etc…
>
> Over the past year, the Reading team has been working with Services to
> setup some basic infrastructure for ingesting some of this data and making
> it available as structured data via an API that any client can ingest. This
> includes our own WMF maintained projects (apps, mobile web, desktop)
>
> Because of the nature of our projects, it is difficult to extract this
> information in a uniform way across all wikis. Though this is clearly the
> target (and in line with our mission), before we invest significant time in
> developing such a standardized method for each service, we need to first
> deploy an API to test whether these services are actually a good direction
> for our products, services, and mission.
>
> To that end, we develop each new service on en.wikipedia first in a method
> that we do not intend to scale.
>
> Now that some of these services have proven useful, we have begun efforts
> to develop a way for all project maintainers to opt in to making their
> curated content available for consumption in these APIs.
>
> You can see some tickets focused on this effort here:
> https://phabricator.wikimedia.org/T150806
> https://phabricator.wikimedia.org/T152284
> https://phabricator.wikimedia.org/T148680
>
> We have also created some draft documentation that we are currently
> gathering feedback on to see if it is viable for all projects here:
> https://www.mediawiki.org/wiki/User:CFloyd_(WMF)/Feed_Markup_Documentation
>
> Additionally, we have also added additional resources to our Reading
> Infrastructure team (which maintains our services) in part to help with
> this effort.
>
> All this is to say, is that creating and scaling these services to
> multiple wikis is a continuing effort. While we would love to deploy a
> solution to all projects at once, in order to make the problem tractable,
> we are tackling it in steps and re-evaluating our assumptions along the
> way. Hopefully this explains our thinking and the projects in a way that
> make sense.
>
> Because this is a large project, we are looking for solutions and help to
> spread these services across all wikis - If you have or anyone time and
> would like to help, the tickets and documentation above are great place you
> to contribute to the process.
>
> Thanks for any help
> Corey
>
>
> On Thu, Jan 12, 2017 at 10:45 AM Monte Hurd <mh...@wikimedia.org> wrote:
>
> Thanks Florian.
>
> I see your point about ease of forking. Would you have time later, perhaps
> off thread, to detail the challenges you faced?
>
> Regarding the data endpoint topic of this thread, it isn't app-specific
> despite being part of 'mediawiki/services/mobileapps'. We'll probably
> want a more generic name with only truly app specific endpoints named
> accordingly.
>
>
> On Thu, Jan 12, 2017 at 10:05 AM, Florian Schmidt <
> florian.schmidt.wel...@t-online.de> wrote:
>
> What is also in my mind: the app wasn't easy to fork and use on other
> (third-party) projects before that, too, however with the track in this
> direction (specific Parsons of contents, which apply to Wikimedia projects
> only) makes it, in reality, impossible to fork (and maintain) the app
> projects to other, MediaWiki based, third-party projects. I understand,
> that this might not be the goal of the WMF, which in my personal opinion
> isn't quite correct, however it's very very sad. I tried to maintain an
> up-to-date version of the Wikipedia app some time ago, but it takes so much
> time, as it is so Wikimedia, and not MediaWiki specific, that I ended up
> mostly stopping the effort in this direction.
>
> This is probably not a response which should be in this thread, and I
> apologize for that, however I wanted to say that somewhere.
>
> Best,
> Florian
>
> _______________________________________________
> Mobile-l mailing list
> Mobile-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
>
> _______________________________________________
> Mobile-l mailing list
> Mobile-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
>
> _______________________________________________
> Mobile-l mailing list
> Mobile-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
>
_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l

Reply via email to