Great discussion and feedback in this thread - plenty to act on.

Thanks Ted Clancy for kicking this off with an impassioned reality
check. And Thanks in particular to Benjamin Francis for summarizing
product requirements and use-cases, and especially to both Ted and Ben
taking the time last week in Whistler to discuss all of this in person
- I definitely came away with a better understanding of the data,
problem space, and perspectives for Gaia's use-cases. I've also
followed up with Gregor and jst to broaden and double-check my
understanding and possible paths forward.


tl;dr: It's time. Let's land microformats parsing support in Gecko as
a Q3 Platform deliverable that Gaia can use.


Specifically:

On Mon, Jun 29, 2015 at 2:47 PM, Benjamin Francis <bfran...@mozilla.com> wrote:
> Thanks for the responses,
>
> Let me reiterate the Product requirements:
>
>    1. Support for a syntax and vocabulary already in wide use on the web to
>    allow the creation of cards for the largest possible volume of existing
>    pinnable content

I think there's rough consensus that a subset of OG, as described by
Ted, satisfies this. Minimizing our exposure to OG (including Twitter
Cards) is ideal for a number of reasons (backcompat/proprietary
maintenance etc.).


>    2. Support for a syntax with a large enough and/or extensible vocabulary
>    to allow cards to be created for all the types of pinnable content and
>    associated actions we need in Gaia

There appear to be multiple options for this, with the best (most
open, aligned with our mission, already open source interoperably
implemented, etc.) being microformats.

On that in particular:


> *Gaia Content*
>
> Open Graph does not have a large enough vocabulary, or (as Kelly says) the
> ability to associate actions with content, needed for the second requirement

The "associate actions with content" use-case is an interesting one
that's worthy of more specific follow-up on Kelly's response. More on
that separately.


> Schema.org has a large existing vocabulary which basically
> fulfils these use cases, though some parts are more tested than others,

"fulfils" mostly in theory. Schema is 99% overdesigned and
aspirational, most objects and properties not showing up anywhere even
in search results (except generic testing tools perhaps).

A small handful of Schema objects and subset of properties are
actually implemented by anyone in anything user-facing.

Everything else is untested, and claiming "fulfils these use cases"
puts far too much faith in a company known for abandoning their
overdesigned efforts (APIs, vocabularies, syntaxes!) every few years.
Google Base / gData / etc. likely "fulfilled" these use cases too.


> with examples given in Microdata, RDFa and JSON-LD syntaxes, eg:
>
>    - Contact - http://schema.org/Person
>    - Event - http://schema.org/Event
>    - Photo - http://schema.org/Photograph
>    - Song - http://schema.org/MusicRecording
>    - Video - http://schema.org/VideoObject
>    - Radio station - http://schema.org/RadioChannel
>    - Email - http://schema.org/EmailMessage
>    - Message - http://schema.org/Comment

This explicit list of use-cases is very helpful.

Existing interoperably implemented microformats support most of these:

- Contact - http://microformats.org/wiki/h-card
- Event - http://microformats.org/wiki/h-event
- Photo - http://microformats.org/wiki/h-entry with u-photo property
- Song - no current vocabulary - classic hAudio vocabulary could be
simplified for this
- Video - http://microformats.org/wiki/h-entry with u-video property
- Radio station - no current vocabulary - worth researching with
schema RadioChannel as input
- Email - http://microformats.org/wiki/h-entry with u-in-reply-to property
- Message - http://microformats.org/wiki/h-entry

For Song and Radio Station in particular - I will take the action of
bringing these use-cases to the microformats community and see what
the community can come up with, and how quickly. Discussion will be on
#microformats on Freenode (archived, see microformats.org/wiki/irc) if
anyone wants to contribute or just lurk.


> Schema.org also provides existing schemas for actions associated with items
> (https://schema.org/docs/actions.html),

The "actions" space has been a difficult and challenging one.

Google's (abandoned) "web intents" was one such effort.

Currently the IndieWeb community is pursuing Web Actions (and has them
working across sites)

http://indiewebcamp.com/webactions

There's likely potential there to connect webactions to be part of the
format of the post/page to be parsed, consumed, re-used.

Again, this is something I'll take to the #microformats community and
we can see what people there come up with.


> although examples are only given in
> JSON-LD syntax. Schema.org is just a vocabulary and Tantek tells me it's
> theoretically possible to express this vocabulary in Microformats syntax
> too - it's possible to create new vendor prefixed types, or suggest new
> standard types to be added to the Microformats wiki.

Yes.

> This would be required
> because Microformats does not have a big enough existing vocabulary for
> Gaia's needs.

Per analysis above, there are two objects Song and RadioStation, and
approach to "actions" needed.


> Microdata, RDFa and JSON-LD use URL namespaces so are
> extensible by design with a non-centralised vocabulary (this is seen as a
> strength by some, as a weakness by others).

Indeed. In CSS, -vendor- prefixes have had some success, and some
downsides as well.

Ironically, the ease-of-use of -vendor- prefixes over URL based
namespaces led to perhaps more popularity than desired for vendor
specific things (witness our -webkit- compat headaches), whereas the
web seems to be (mostly?) surviving URL based namespace pollution.

microformats2 takes the CSS approach of non-centralized -vendor-
prefixes for the same ease-of-use reasons as CSS.


> There is resistance to implementing a full Microdata or RDFa parser in
> Gecko due to its complexity.

It's not just that, but the experience (that any Mozilla engineer who
was here before Firefox will relay, e.g ping jst sometime if you want
to hear horror stories) of RDF, triple-stores etc. being a disaster
for Mozilla, performance, etc. and taking ages to undo.


> JSON-LD is more self-contained by design (for
> better or worse)

Note: this is purely *in theory*.

In practice, if you're actually bothering with JSON-LD (not just plain
JSON), and using or depending on anything triples related, you're
likely to run into similar problems and objections. It's a very high
risk path. If you're ignoring all the "LD"ness of JSON-LD, then just
admit that upfront and use some one-off JSON.


> Microformats is possibly
> less Gecko work to implement than Microdata or RDFa, but more than JSON-LD.

There are multiple open source interoperable microformats parsers,
including in Javascript (node-compatible even), verified with a test
suite.

Landing an existing open source modern microformats parser is very
much doable, and is something we have been incrementally working
towards for some time, in particular with aspirational use-cases for
Gaia! (Gordon and Josh worked on this years ago).


> *Conclusions*
>
> My conclusion is that the least required work in Gecko for the highest
> return would be:
>
>    1. *Open Graph* (bug 1178484) - Extending the existing metachange
>    Browser API event to include all meta tags with a "property" attribute.
>    This would allow Gaia to add support for all of the Open Graph types,
>    fulfilling requirement 1.

I'd still like to go with Ted's recommendation on this, and minimize
exposure, minimize Open Graph implementation/vocab surface etc. for
all the reasons we avoid adding backcompat tech debt.


>    2. *JSON-LD* (bug 1178491) - Adding a linkeddatachange event to the
>    Browser API which is dispatched by Gecko whenever it encounters a script
>    tag with a type of "application/ld+json" (as per the W3C recommendation
>    [5]), including the JSON content in the payload of the event. This would
>    allow the Gaia system app to support existing schema.org schemas
>    (including actions), with the least amount of work in Gecko, and already in
>    a JSON format it can store directly in the Places database
>    (DataStore/IndexedDB).
>
> …
>
> It's clear that there's not a consensus amongst everyone that JSON-LD is
> the best format for Mozilla to promote for structured data on the web going
> forward

In fact quite the opposite!

There *is* a pretty strong engineering consensus, in both this thread,
and other threads *against* any use of JSON-LD, or anything Linked
Data or otherwise rebranded RDF / Semantic Web, and for good reason.

Ted's email provides the highlevel outline for why. Annevk debunked
the assumption of "W3C Spec = must be good".


> I would suggest that they go ahead with implementing Microformats in Gecko
> and we can use it in Gaia when it's ready.

Suggestion accepted. It's time.

We have been supporting microformats to some degree or other in
Firefox for years in incrementally since Firefox 3.

In the meanwhile, microformats matured, indexed by search engines
since 2006 (rich snippets since 2009), microformats2 was designed
(based on lessons learned from microdata and RDFa, focused by
real-world use-cases), developed, implemented, tested, and shipped on
thousands of sites (mostly IndieWeb based, withknown.com etc.), and
consumed by various indie readers and other sites.

There are now several microformats2 open source parsing libraries
across languages, deployed live and testable:
http://microformats.org/wiki/microformats2#Parsers


> I would recommend exposing it to
> Gaia via a getStructuredData() method on the Browser API (bug 1169634)
> which returns a Promise which resolves with the canonical JSON
> representation of any Microformats data present in a document.

The update to this bug makes sense:
* To support getStructuredData, using the canonical JSON
representation of microformats on the page.

This will allow us to move forward with a much simpler JSON based
model, and hopefully avoid all the LinkedData / triples pitfalls.


> This will
> then allow us to add the necessary support in the Gaia system app. (When
> implementing this it might also make sense to hook it up to the Open Graph
> and JSON-LD support to create a single API with support for multiple
> formats).

From all evidence so far, the simpler canonical JSON should suffice for this.

I can also pose the question to the #microformats community of how to
reinterpret Open Graph "og:" meta tag markup as what they would mean
in canonical microformats JSON, as members of that community have
already been working on parsing *both* OG: and microformats, for all
the similar pragmatic reasons we've discussed here.



> In the mean time, given our tight schedule, I would be grateful if we could
> not to block the Gaia work on the implementation of Microformats or any
> more discussion on which formats we'd like to promote going forward.

From my understanding in discussing with Ted, there's nothing being
blocked on the pragmatic minimal implementation support of OG: meta
tags. Worst comes to worse we could even make up our own "og:moz:…"
vendor specific markup should we need to for Gaia (unexposed to web
platform).

I'm happy to sync-up with Ted to make sure that we continue to not
block the Gaia work.


> Thanks for everyone's input on this so far, I hope we can now get to work.

Thanks to you Ben for continuing to pursue and iteratively analyze the
various options, and providing the data, with continued critical
(re)analysis.

Q3 has begun, let's get to work.

Tantek
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to