Re: [WikimediaMobile] Wiki-specific parsing in mobile "services"

2017-01-12 Thread Gabriel Wicke
We discussed a potential approach to scaling content extraction on
wiktionary across languages in https://phabricator.wikimedia.org/T138709.
The idea was to define HTML microformats that can be integrated in
templates to provide a uniform marker for specific bits of content.

On Jan 12, 2017 1:17 PM, "Corey Floyd"  wrote:

> Hey all, just want to drop in some thoughts on his thread…
>
> I would say the premise of this email is absolutely right: Parsing out
> this data by hand coding against specific HTML structures is untenable.
>
> On Wikipedia we have a wealth of community curated information: featured
> articles, in the news, on this day, etc…
>
> Over the past year, the Reading team has been working with Services to
> setup some basic infrastructure for ingesting some of this data and making
> it available as structured data via an API that any client can ingest. This
> includes our own WMF maintained projects (apps, mobile web, desktop)
>
> Because of the nature of our projects, it is difficult to extract this
> information in a uniform way across all wikis. Though this is clearly the
> target (and in line with our mission), before we invest significant time in
> developing such a standardized method for each service, we need to first
> deploy an API to test whether these services are actually a good direction
> for our products, services, and mission.
>
> To that end, we develop each new service on en.wikipedia first in a method
> that we do not intend to scale.
>
> Now that some of these services have proven useful, we have begun efforts
> to develop a way for all project maintainers to opt in to making their
> curated content available for consumption in these APIs.
>
> You can see some tickets focused on this effort here:
> https://phabricator.wikimedia.org/T150806
> https://phabricator.wikimedia.org/T152284
> https://phabricator.wikimedia.org/T148680
>
> We have also created some draft documentation that we are currently
> gathering feedback on to see if it is viable for all projects here:
> https://www.mediawiki.org/wiki/User:CFloyd_(WMF)/Feed_Markup_Documentation
>
> Additionally, we have also added additional resources to our Reading
> Infrastructure team (which maintains our services) in part to help with
> this effort.
>
> All this is to say, is that creating and scaling these services to
> multiple wikis is a continuing effort. While we would love to deploy a
> solution to all projects at once, in order to make the problem tractable,
> we are tackling it in steps and re-evaluating our assumptions along the
> way. Hopefully this explains our thinking and the projects in a way that
> make sense.
>
> Because this is a large project, we are looking for solutions and help to
> spread these services across all wikis - If you have or anyone time and
> would like to help, the tickets and documentation above are great place you
> to contribute to the process.
>
> Thanks for any help
> Corey
>
>
> On Thu, Jan 12, 2017 at 10:45 AM Monte Hurd  wrote:
>
> Thanks Florian.
>
> I see your point about ease of forking. Would you have time later, perhaps
> off thread, to detail the challenges you faced?
>
> Regarding the data endpoint topic of this thread, it isn't app-specific
> despite being part of 'mediawiki/services/mobileapps'. We'll probably
> want a more generic name with only truly app specific endpoints named
> accordingly.
>
>
> On Thu, Jan 12, 2017 at 10:05 AM, Florian Schmidt <
> florian.schmidt.wel...@t-online.de> wrote:
>
> What is also in my mind: the app wasn't easy to fork and use on other
> (third-party) projects before that, too, however with the track in this
> direction (specific Parsons of contents, which apply to Wikimedia projects
> only) makes it, in reality, impossible to fork (and maintain) the app
> projects to other, MediaWiki based, third-party projects. I understand,
> that this might not be the goal of the WMF, which in my personal opinion
> isn't quite correct, however it's very very sad. I tried to maintain an
> up-to-date version of the Wikipedia app some time ago, but it takes so much
> time, as it is so Wikimedia, and not MediaWiki specific, that I ended up
> mostly stopping the effort in this direction.
>
> This is probably not a response which should be in this thread, and I
> apologize for that, however I wanted to say that somewhere.
>
> Best,
> Florian
>
> ___
> Mobile-l mailing list
> Mobile-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
>
> ___
> Mobile-l mailing list
> Mobile-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
>
> ___
> Mobile-l mailing list
> Mobile-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
>
___
Mobile-l mailing list

Re: [WikimediaMobile] Wiki-specific parsing in mobile "services"

2017-01-12 Thread Corey Floyd
Hey all, just want to drop in some thoughts on his thread…

I would say the premise of this email is absolutely right: Parsing out this
data by hand coding against specific HTML structures is untenable.

On Wikipedia we have a wealth of community curated information: featured
articles, in the news, on this day, etc…

Over the past year, the Reading team has been working with Services to
setup some basic infrastructure for ingesting some of this data and making
it available as structured data via an API that any client can ingest. This
includes our own WMF maintained projects (apps, mobile web, desktop)

Because of the nature of our projects, it is difficult to extract this
information in a uniform way across all wikis. Though this is clearly the
target (and in line with our mission), before we invest significant time in
developing such a standardized method for each service, we need to first
deploy an API to test whether these services are actually a good direction
for our products, services, and mission.

To that end, we develop each new service on en.wikipedia first in a method
that we do not intend to scale.

Now that some of these services have proven useful, we have begun efforts
to develop a way for all project maintainers to opt in to making their
curated content available for consumption in these APIs.

You can see some tickets focused on this effort here:
https://phabricator.wikimedia.org/T150806
https://phabricator.wikimedia.org/T152284
https://phabricator.wikimedia.org/T148680

We have also created some draft documentation that we are currently
gathering feedback on to see if it is viable for all projects here:
https://www.mediawiki.org/wiki/User:CFloyd_(WMF)/Feed_Markup_Documentation

Additionally, we have also added additional resources to our Reading
Infrastructure team (which maintains our services) in part to help with
this effort.

All this is to say, is that creating and scaling these services to multiple
wikis is a continuing effort. While we would love to deploy a solution to
all projects at once, in order to make the problem tractable, we are
tackling it in steps and re-evaluating our assumptions along the way.
Hopefully this explains our thinking and the projects in a way that make
sense.

Because this is a large project, we are looking for solutions and help to
spread these services across all wikis - If you have or anyone time and
would like to help, the tickets and documentation above are great place you
to contribute to the process.

Thanks for any help
Corey


On Thu, Jan 12, 2017 at 10:45 AM Monte Hurd  wrote:

Thanks Florian.

I see your point about ease of forking. Would you have time later, perhaps
off thread, to detail the challenges you faced?

Regarding the data endpoint topic of this thread, it isn't app-specific
despite being part of 'mediawiki/services/mobileapps'. We'll probably want
a more generic name with only truly app specific endpoints named
accordingly.


On Thu, Jan 12, 2017 at 10:05 AM, Florian Schmidt <
florian.schmidt.wel...@t-online.de> wrote:

What is also in my mind: the app wasn't easy to fork and use on other
(third-party) projects before that, too, however with the track in this
direction (specific Parsons of contents, which apply to Wikimedia projects
only) makes it, in reality, impossible to fork (and maintain) the app
projects to other, MediaWiki based, third-party projects. I understand,
that this might not be the goal of the WMF, which in my personal opinion
isn't quite correct, however it's very very sad. I tried to maintain an
up-to-date version of the Wikipedia app some time ago, but it takes so much
time, as it is so Wikimedia, and not MediaWiki specific, that I ended up
mostly stopping the effort in this direction.

This is probably not a response which should be in this thread, and I
apologize for that, however I wanted to say that somewhere.

Best,
Florian

___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] Wiki-specific parsing in mobile "services"

2017-01-12 Thread Monte Hurd
Thanks Florian.

I see your point about ease of forking. Would you have time later, perhaps
off thread, to detail the challenges you faced?

Regarding the data endpoint topic of this thread, it isn't app-specific
despite being part of 'mediawiki/services/mobileapps'. We'll probably want
a more generic name with only truly app specific endpoints named
accordingly.


On Thu, Jan 12, 2017 at 10:05 AM, Florian Schmidt <
florian.schmidt.wel...@t-online.de> wrote:

> What is also in my mind: the app wasn't easy to fork and use on other
> (third-party) projects before that, too, however with the track in this
> direction (specific Parsons of contents, which apply to Wikimedia projects
> only) makes it, in reality, impossible to fork (and maintain) the app
> projects to other, MediaWiki based, third-party projects. I understand,
> that this might not be the goal of the WMF, which in my personal opinion
> isn't quite correct, however it's very very sad. I tried to maintain an
> up-to-date version of the Wikipedia app some time ago, but it takes so much
> time, as it is so Wikimedia, and not MediaWiki specific, that I ended up
> mostly stopping the effort in this direction.
>
> This is probably not a response which should be in this thread, and I
> apologize for that, however I wanted to say that somewhere.
>
> Best,
> Florian
>
> ___
> Mobile-l mailing list
> Mobile-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
>
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] Wiki-specific parsing in mobile "services"

2017-01-12 Thread Florian Schmidt
What is also in my mind: the app wasn't easy to fork and use on other (third-party) projects before that, too, however with the track in this direction (specific Parsons of contents, which apply to Wikimedia projects only) makes it, in reality, impossible to fork (and maintain) the app projects to other, MediaWiki based, third-party projects. I understand, that this might not be the goal of the WMF, which in my personal opinion isn't quite correct, however it's very very sad. I tried to maintain an up-to-date version of the Wikipedia app some time ago, but it takes so much time, as it is so Wikimedia, and not MediaWiki specific, that I ended up mostly stopping the effort in this direction.This is probably not a response which should be in this thread, and I apologize for that, however I wanted to say that somewhere.Best,Florian___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] Wiki-specific parsing in mobile "services"

2017-01-12 Thread Monte Hurd
> I see little effort on finding solutions potentially able to scale to all
our projects and languages

See my reply to your initial comment on that ticket. This was just a first
hack at implementing this functionality. If you had simply asked if there
were plans to expand this to other projects/languages the answer you would
have received would have been "absolutely! this is just a first pass".

I have pasted the referenced comment and response from that ticket below:



> It's inappropriate an unsustainable to hardcode such wiki-specific
parsing in our code.

In the future please consider the tone and impact of your comments before
clicking "post".

Imagine you were, say, a first-time volunteer contributor and the first
piece of feedback you received was the comment you posted above. How would
that make you feel about even trying to contribute?

I'm not saying there's not a kernel of truth in your comment, because there
is*, but the way you phrased it actually inclines me to take your opinion,
and you, far less seriously.

*I agree it would indeed be better to have this endpoint work across all
language wikis. I intend to examine such functionality, but as a first pass
I chose to implement the core logic for my native language. My
implementation should be fairly easy to modify, as well, because I had this
eventuality in mind throughout development.

On Thu, Jan 12, 2017 at 12:41 AM, Federico Leva (Nemo) 
wrote:

> Is it considered acceptable now to produce a service or API that hardcodes
> wiki-specific parsing of certain wikitext or HTML patterns in certain wiki
> pages (such as the "On this day" section of the main page of one wiki)?
>
> I'm confused by the status of things and after my comment
> https://phabricator.wikimedia.org/T143408#2919000 I see little effort on
> finding solutions potentially able to scale to all our projects and
> languages (which I assume to be the mission, see "globally" in
> https://wikimediafoundation.org/wiki/Mission_statement ; please point it
> out if this assumption is incorrect).
>
> It might be that wiki-specific parsing hardcoded in MediaWiki/Wikimedia
> code is actually able to scale, if written correctly; a comment on the
> association patch seemed to imply so. This would be a very surprising
> finding, and one which goes against 15 years of experience, so if we have
> some examples or evidence of this it would be very worthwhile to point them
> out.
>
> Nemo
>
> ___
> Mobile-l mailing list
> Mobile-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


[WikimediaMobile] Wiki-specific parsing in mobile "services"

2017-01-12 Thread Federico Leva (Nemo)
Is it considered acceptable now to produce a service or API that 
hardcodes wiki-specific parsing of certain wikitext or HTML patterns in 
certain wiki pages (such as the "On this day" section of the main page 
of one wiki)?


I'm confused by the status of things and after my comment 
https://phabricator.wikimedia.org/T143408#2919000 I see little effort on 
finding solutions potentially able to scale to all our projects and 
languages (which I assume to be the mission, see "globally" in 
https://wikimediafoundation.org/wiki/Mission_statement ; please point it 
out if this assumption is incorrect).


It might be that wiki-specific parsing hardcoded in MediaWiki/Wikimedia 
code is actually able to scale, if written correctly; a comment on the 
association patch seemed to imply so. This would be a very surprising 
finding, and one which goes against 15 years of experience, so if we 
have some examples or evidence of this it would be very worthwhile to 
point them out.


Nemo

___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l