On 26 June 2015 at 12:58, Ted Clancy <tcla...@mozilla.com> wrote:

> My apologies for the fact that this is such an essay, but I think this has
> become necessary.
>
> Firefox OS 2.5 will be unveiling a new feature called Pinning The Web, and
> there's been some discussion about whether we should leverage technologies
> like RDFa, Microdata, JSON-LD, Open Graph, and Microformats for this
> purpose.
>
> First, I'd like to give some background on these technologies.
>
> In 2001, Tim Berners-Lee said that the "Semantic Web" was the future of
> the web and was going to revolutionize our world. (
> http://www.scientificamerican.com/article/the-semantic-web/)
>
> The Semantic Web was a doomed idea, for reasons best articulated in essay
> by Cory Doctorow entitled "Metacrap", also written in 2001. (
> http://www.well.com/~doctorow/metacrap.htm) After 14 years of the
> Semantic Web not revolutionizing our world, I think history suggests that
> Cory Doctorow was right.
>
> But because the Semantic Web was "the next big thing", millions of dollars
> were poured into it (mostly in the form of research grants and crappy
> specs, from what I can gather). In 2004, RDFa became the first big standard
> to emerge from this work. RDFa is a W3C Recommendation, and work is still
> proceeding on it.
>
> JSON-LD was started in 2008 as a JSON-based alternative to RDFa. As the
> author of JSON-LD, Manu Sporny, states:
>
> "RDF is a shitty data model. It doesn’t have native support for lists.
> LISTS for fuck’s sake! [...] to work with RDF you typically needed a quad
> store, a SPARQL engine, and some hefty libraries. Your standard web
> developer has no interest in that toolchain because it adds more complexity
> to the solution than is necessary." (
> http://manu.sporny.org/2014/json-ld-origins-2/)
>
> However, though it originally wanted to distance itself from RDFa, JSON-LD
> ended up being chosen as a serialization for RDFa:
>
> "Around mid-2012, the JSON-LD stuff was going pretty well and the newly
> chartered RDF Working Group was going to start work on RDF 1.1. One of the
> work items was a serialization of RDF for JSON. [...] The biggest problem
> being that many of the participants in the RDF Working Group at the time
> didn’t understand JSON." (ibid)
>
> (I just want everyone to note that in 2012, *THE AUTHORS OF RDFa DID NOT
> KNOW JSON*. This is in a spec that casually throws around propositional
> logic terms like "entails", and "subject-predicate-object triples".)
>
> JSON-LD is now a W3C recommendation, and has undergone added complexity to
> align it with RDFa. As Manu Sporny states, "Nobody was happy with the
> result" (ibid).
>
> Microdata is similar to RDFa, but without the benefit of being a W3C
> recommendation.
>
> Open Graph is a technology developed by Facebook. It's putatively a subset
> of RDFa. There is a small subset of Open Graph tags (og:title, og:type,
> og:url, and og:image) which are widely used for sharing content on social
> media like Facebook and Twitter.
>
> RDFa, Microdata, and JSON-LD can collectively be described as "Linked
> Data" technologies, so called because their intention is that semantic
> objects across different web pages would "link" to each other to create a
> "Semantic Web".
>
> Microformats was developed circa 2005 as a lightweight way of putting
> semantic information into web pages, but does not aim to be a "Linked Data"
> or "Semantic Web" technology. It does not have an official standards body
> behind it, instead being maintained by a community of volunteers. One of
> our Mozilla employees, Tantek Çelik, was instrumental in its development.
>
>
Thanks for the history lesson :) When I started to research this area I
learnt very quickly that there are a lot of strong feelings on all sides
about which format is the "best", and many formats claim to supersede each
other. The reality is that there's still no clear winner on the web. So
what I've tried to do is to take a data driven approach to look at which
syntaxes and vocabularies are getting the most traction according to
research papers based on the Common Crawl corpus, the Bing corpus and the
Yahoo corpus (all the data I've found so far).

There are two high level requirements for the Pin the Web features:
1) Getting the most possible user value out of the data that already exists
on the web today
2) Finding the best solution for the use cases we have in Gaia apps which
can be implemented in the time frame we have for the 2.5 release (Feature
Landing on 21st September)

Based on the data available and the level of effort of implementation my
most recent conclusions for those requirements were:

1) Open Graph
2) JSON-LD

However, there's also a case for bonus points for a solution that we as
Mozilla actually want to see used in the future!


> Okay, now I'd like to discuss whether or not we should use these
> technologies for Pinning The Web.
>
> Open Graph: I think we need to use the four tags "og:title", "og:type",
> "og:url" and "og:image", since they are widely used. Apart from that, I
> don't think we need to support the rest of Open Graph.
>

I agree that Open Graph gets us the volume. I would prefer to get events
for any meta tag with a "property" attribute rather than hard code these
strings in Gecko. We can handle the rest in Gaia. I would argue there are
other Open Graph tags in wide usage which have value for our use cases and
that would be the most flexible solution.


>
> RDFa, Microdata, and JSON-LD: I'd be afraid of using these. They were
> designed for something much bigger and more complicated than just pinning
> websites/contacts/events. I'd be afraid of people getting the idea that
> "Mozilla supports RDFa", because that would give the wrong idea and just
> lead to disappointment and/or headache. Also, they are complex, and our
> developer effort is limited.
>

I agree, we don't need the advanced Linked Data graph capabilities for our
use cases, we just need useful structured data about web pages.


>
> JSON-LD has the additional problem that it exists separately from the
> content of the webpage, meaning that the JSON-LD data can get out-of-sync
> with the webpage, leading to confusion for users. (We've all see the way
> code comments quickly get out-of-sync with the code they purport to
> describe.)
>

This could actually be a benefit for us due to a quirk of the
implementation of the new Gaia architecture in 2.5, but this is only a
temporary benefit and we shouldn't choose a flawed solution on that basis.


>
> The argument has been made on this discussion list that RDFa and Microdata
> data is abundant, and so we should take advantage of it. But it's
> questionable how much of that data is actually good. The main use of RDFa
> and Microdata right now is for search engine optimization, which means the
> data isn't necessarily in a form presentable to the user. (Also, it might
> be all lies.)
>

This is difficult to quantify, but yes Open Graph is much easier to
validate. Facebook provides a validator and just pasting a link into
Facebook will show you how your data will be represented. The way that RDFa
and Microdata are going to be used is largely guesswork. This is largely a
result of Open Graph being more centralised.


>
> Microformats: Yes, we should use these. We've had support for Microformats
> in Firefox since Firefox 3 (
> https://developer.mozilla.org/en-US/docs/Using_microformats), so it's
> just a matter of updating and expanding what we already have.
>
> Microformats is becoming more widely used. Facebook includes Microformats
> data for its Events and Places. There are Wikipedia templates that use
> Microformats, and Wordpress plugins (the most recent of which was just
> published last month).
>

The data I have does not back this up, Microdata is shown to be growing
fast whereas Microformats usage has remained relatively stable. Also, we
didn't find Microformats usage on any of the example high profile sites we
used during prototyping, it seems to be more commonly used on Wordpress
blogs and Indie Web style web sites.

However, Tantek and I have sat down together and taken a deeper look at
this data and it does have some flaws. It doesn't include some of the most
commonly used Microformats, or many of the new Microformats 2.0
vocabularies. We also don't have any data for JSON-LD. It's difficult to
make a definitive call on this without more data.


>
> When I look at RDFa and Microdata, I see large corporations trying to make
> something happen, with results that must be disappointing. When I look at
> Microformats, I see an enthusiastic community of volunteers who are getting
> results (like Mike Kaply, who added Microformats.js to our gecko tree).
>

When I look at RDFa, Microdata and JSON-LD I see formal W3C
recommendations, extensive vocabularies which (at least on the surface) are
agreed on by all the big search engines, and I see a clean engineering
solution (albeit fairly complex). When I look at Microformats I see an
informally written and opinionated wiki, a limited existing vocabularly and
what I regard as a fairly hacky solution from an engineering point of view.
But it is the grassroots solution.


>
> And if nothing else, Mozilla's own Web Standards employees, like Marcos
> Cacares and Tantek Çelik, who we hire and pay good money to be experts in
> such matters, are telling us to not use Linked Data and to use Microformats
> instead.
>

I think we're all agreed on Open Graph, let's do it.

My main reason for suggesting JSON-LD over Microformats was that it's super
easy for us to implement in our short timeframe (less work for you on the
Gecko side), it fulfills all of our use cases for Gaia, with an extensive
existing vocabulary and is already in JSON format which is what we'd like
to parse and store on the Gaia side.

However, if the people we have at Mozilla who are experts at this stuff are
recommending Microformats, and you think it's achievable to implement in
our timeframe (Gecko can provide Gaia with the structured data in the
canonical JSON format), then I'm more than happy to defer to their
expertise and go with that. If you think that's feasible, then let's get it
done :)


> And I find it abominable that people on this discussion list have
> suggested that Tantek shouldn't be listened to because Microformats is
> something he helped developed. That is shitty teamwork and A-grade
> paranoia. This whole Pinning-The-Web concept is something that Ben Francis
> has been developing. Does that mean we shouldn't listen to what Ben says
> regarding it?
>

I'm sorry if that's how I came across. I was just trying to leave aside the
passionate views on all sides and take a data driven approach to decision
making. I have the utmost respect for Tantek and the extensive work he's
already done in this area :)

All the Best

Ben
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to