Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
Hi all, Thanks John for the references to my project. It seems that here you need a solution that both pleases those who want a PDF to comply with existing processes, and those who want a machine-readable format for better Web-accessibility. The DITA https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita standard is an OASIS standard, like Open Document. It's an XML framework dedicated to the creation of documents via the assembling of content components, the topics. See it as a Docbook evolved. The Wikipedia page https://en.wikipedia.org/wiki/Darwin_Information_Typing_Architecture is a good introduction. In the DITA ecosystem, a processing engine has been developed by the community, the DITA Open Toolkit http://dita-ot.github.io/. Through its plugin system, it enables the publication of DITA content to a myriad of output formats: * PDF * Simple HTML * HTML WebHelp (fancy example http://purl.org/dita/ditardf-project) * ePub and Kindle (through the dita4publisher plugin http://dita4publishers.sourceforge.net/) * ...and RDF/XML through the plugin part of the DITA RDF project http://purl.org/dita/ditardf-project. The plugin extracts the metadata of the documentation (author, title, creation date, links, variables), not the meaning of the content (output example https://github.com/ColinMaudry/dita-rdf/blob/ditaot-plugin/dita2rdf/demo/out/ditaot-userguide.rdf). It could be extended to extract certain facts from the content. DITA has a nice feature: its core vocabulary can be extended via specialization, so that it can support specific purposes: learning content, troubleshooting documents, etc. Those who want a PDF would make a PDF rendition and those who want machine-readable formats would use a flavour of HTML or give me a hand with the RDF output. What do you think? Colin On 02/10/2014 11:08, John Walker wrote: Hi All, I know Latex is the norm in academic circles, but the DITA XML standard is widely used in industry and gaining traction in publishing. Colin Maudry ( @CMaudry) has a project for extracting RDF metadata from DITA content [1]. Seems to be attracting interest from Marklogic and HarperCollins [2] and others [3]. Cheers, John [1] http://purl.org/dita/ditardf-project [2] http://files.meetup.com/1645603/meetup-2014-08-12.pptx [3] http://de.slideshare.net/TheresaGrotendorst/towards-dynamic-and-smart-content-semantic-technologies-for-adaptive-technical-documentation On October 2, 2014 at 12:03 AM Norman Gray nor...@astro.gla.ac.uk wrote: Greetings. On 2014 Oct 1, at 22:36, Luca Matteis lmatt...@gmail.com wrote: So forget PDF. Perhaps we can add markup to Latex documents and make them linked data friendly? That would be cool. A Latex RDF serialization :) There exists http://www.siegfried-handschuh.net/pub/2007/salt_eswc2007.pdf: SALT: Semantically Annotated LATEX Tudor Groza Siegfried Handschuh Hak Lae Kim Digital Enterprise Research Institute IDA Business Park, Lower Dangan Galway, Ireland {tudor.groza, siegfried.handschuh, haklae.kim}@deri.org ABSTRACT Machine-understandable data constitutes the basis for the Seman- tic Desktop. We provide in this paper means to author and annotate Semantic Documents on the Desktop. In our approach, the PDF file format is the basis for semantic documents, which store both a document and the related metadata in a single file. To achieve this we provide a framework, SALT that extends the Latex writ- ing environment and supports the creation of metadata for scien- tific publications. SALT lets the scientific author create metadata while putting together the content of a research paper. We discuss some of the requirements one has to meet when developing such an ontology-based writing environment and we describe a usage scenario. That describes a very thorough approach to embedding some semantics within LaTeX documents. Yes, 'thorough'; very thorough; verging on the intimidating. I dimly recall that there was a rather more lightweight approach which was used for proceedings in ISWC or ESWC -- I remember marking up a LaTeX document in something less comprehensive than SALT -- but I can't remember enough to be able to re-find it. All the best, Norman -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK attachment: colin.vcf
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
Dear Sarven et al, I'd like to say that I'm an HTML/CSS/JavaScript aficionado so I'd be the first to embrace Web standards to produce publications. I'm simply playing a bit of the devil's advocate here because I think that Latex is still more mature than HTML for writing papers. However, I must admit I'd like to see a future where that is different. But before we ask conferences to embrace this still immature HTML world (at least for producing papers) we must write the frameworks, the libraries, the CSS templates that enable the same level of publication that Latex enables. JavaScript for example can help with the kerning issue (http://kerningjs.com/) and this should be part of the HTML publisher toolkit. For solving the browser inconsistencies, standalone tools (based on a Webkit engine for example) must be built that produce a consistent printable layout no matter the operating system (browser fonts render differently on Mac/Windows/Linux). So yes, we can get there, but there's some work to be done to prove that HTML is up for task. And once we get there, then we can start going crazy and adding interactions which is really the power of the Web platform. Phillip Lord, by interactions I don't mean simple animations, I mean this: http://worrydream.com/LadderOfAbstraction/ - use the right side scrolling to instantly see the output given different inputs. That's powerful stuff. Best, Luca On Thu, Oct 2, 2014 at 4:02 PM, Colin Maudry co...@maudry.com wrote: Hi all, Thanks John for the references to my project. It seems that here you need a solution that both pleases those who want a PDF to comply with existing processes, and those who want a machine-readable format for better Web-accessibility. The DITA standard is an OASIS standard, like Open Document. It's an XML framework dedicated to the creation of documents via the assembling of content components, the topics. See it as a Docbook evolved. The Wikipedia page is a good introduction. In the DITA ecosystem, a processing engine has been developed by the community, the DITA Open Toolkit. Through its plugin system, it enables the publication of DITA content to a myriad of output formats: PDF Simple HTML HTML WebHelp (fancy example) ePub and Kindle (through the dita4publisher plugin) ...and RDF/XML through the plugin part of the DITA RDF project. The plugin extracts the metadata of the documentation (author, title, creation date, links, variables), not the meaning of the content (output example). It could be extended to extract certain facts from the content. DITA has a nice feature: its core vocabulary can be extended via specialization, so that it can support specific purposes: learning content, troubleshooting documents, etc. Those who want a PDF would make a PDF rendition and those who want machine-readable formats would use a flavour of HTML or give me a hand with the RDF output. What do you think? Colin On 02/10/2014 11:08, John Walker wrote: Hi All, I know Latex is the norm in academic circles, but the DITA XML standard is widely used in industry and gaining traction in publishing. Colin Maudry ( @CMaudry) has a project for extracting RDF metadata from DITA content [1]. Seems to be attracting interest from Marklogic and HarperCollins [2] and others [3]. Cheers, John [1] http://purl.org/dita/ditardf-project [2] http://files.meetup.com/1645603/meetup-2014-08-12.pptx [3] http://de.slideshare.net/TheresaGrotendorst/towards-dynamic-and-smart-content-semantic-technologies-for-adaptive-technical-documentation On October 2, 2014 at 12:03 AM Norman Gray nor...@astro.gla.ac.uk wrote: Greetings. On 2014 Oct 1, at 22:36, Luca Matteis lmatt...@gmail.com wrote: So forget PDF. Perhaps we can add markup to Latex documents and make them linked data friendly? That would be cool. A Latex RDF serialization :) There exists http://www.siegfried-handschuh.net/pub/2007/salt_eswc2007.pdf: SALT: Semantically Annotated LATEX Tudor Groza Siegfried Handschuh Hak Lae Kim Digital Enterprise Research Institute IDA Business Park, Lower Dangan Galway, Ireland {tudor.groza, siegfried.handschuh, haklae.kim}@deri.org ABSTRACT Machine-understandable data constitutes the basis for the Seman- tic Desktop. We provide in this paper means to author and annotate Semantic Documents on the Desktop. In our approach, the PDF file format is the basis for semantic documents, which store both a document and the related metadata in a single file. To achieve this we provide a framework, SALT that extends the Latex writ- ing environment and supports the creation of metadata for scien- tific publications. SALT lets the scientific author create metadata while putting together the content of a research paper. We discuss some of the requirements one has to meet when developing such an ontology-based writing environment and we describe a usage scenario. That describes a very thorough
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
Hi Luca, I'll admit my opinion is probably skewed by nearly 15 years working in and around technical documentation environment using structured authoring tools like FrameMaker and Oxygen based on XML/SGML technologies. I'm a firm convert from WYSIWIG environments like MS Word to more structured 'semantic' markup made possible with XML... sometimes referred to as WYSIWYM or What You See Is What You Mean. There are some great tools out there that make editing a doddle and allow use of vector images (SVG) and mathematical formulas (MathML) directly in your XML document. As it is XML, then weaving in some RDFa is also possible if you are so inclined Going to the rendered publication format whether that be page-based (PDF) or web-based (HTML) or whatever else is possible via a myriad of approach whether you prefer Latex, HTML+CSS+JS or XSL-FO (for the masochists out there :) Certainly most technical authors I know would run a mile were you to suggest the edit directly as Latex or XSL-FO, or even raw XML/HTML for that matter, but perhaps developers would be more comfortable with it. DITA on top of this offers the specialization as Colin mentioned, but also a myriad of different (direct and indirect) referencing possibilities to pull and push content between different documents. HTML imports [1] and custom elements [2] might offer some of these options in HTML at some point in the future. Cheers, John [1] http://www.w3.org/TR/html-imports/ [2] http://w3c.github.io/webcomponents/spec/custom/ On October 3, 2014 at 10:59 AM Luca Matteis lmatt...@gmail.com wrote: Dear Sarven et al, I'd like to say that I'm an HTML/CSS/JavaScript aficionado so I'd be the first to embrace Web standards to produce publications. I'm simply playing a bit of the devil's advocate here because I think that Latex is still more mature than HTML for writing papers. However, I must admit I'd like to see a future where that is different. But before we ask conferences to embrace this still immature HTML world (at least for producing papers) we must write the frameworks, the libraries, the CSS templates that enable the same level of publication that Latex enables. JavaScript for example can help with the kerning issue (http://kerningjs.com/) and this should be part of the HTML publisher toolkit. For solving the browser inconsistencies, standalone tools (based on a Webkit engine for example) must be built that produce a consistent printable layout no matter the operating system (browser fonts render differently on Mac/Windows/Linux). So yes, we can get there, but there's some work to be done to prove that HTML is up for task. And once we get there, then we can start going crazy and adding interactions which is really the power of the Web platform. Phillip Lord, by interactions I don't mean simple animations, I mean this: http://worrydream.com/LadderOfAbstraction/ - use the right side scrolling to instantly see the output given different inputs. That's powerful stuff. Best, Luca On Thu, Oct 2, 2014 at 4:02 PM, Colin Maudry co...@maudry.com wrote: Hi all, Thanks John for the references to my project. It seems that here you need a solution that both pleases those who want a PDF to comply with existing processes, and those who want a machine-readable format for better Web-accessibility. The DITA standard is an OASIS standard, like Open Document. It's an XML framework dedicated to the creation of documents via the assembling of content components, the topics. See it as a Docbook evolved. The Wikipedia page is a good introduction. In the DITA ecosystem, a processing engine has been developed by the community, the DITA Open Toolkit. Through its plugin system, it enables the publication of DITA content to a myriad of output formats: PDF Simple HTML HTML WebHelp (fancy example) ePub and Kindle (through the dita4publisher plugin) ...and RDF/XML through the plugin part of the DITA RDF project. The plugin extracts the metadata of the documentation (author, title, creation date, links, variables), not the meaning of the content (output example). It could be extended to extract certain facts from the content. DITA has a nice feature: its core vocabulary can be extended via specialization, so that it can support specific purposes: learning content, troubleshooting documents, etc. Those who want a PDF would make a PDF rendition and those who want machine-readable formats would use a flavour of HTML or give me a hand with the RDF output. What do you think? Colin On 02/10/2014 11:08, John Walker wrote: Hi All, I know Latex is the norm in academic circles, but the DITA XML standard is widely used in industry and gaining traction in publishing. Colin Maudry ( @CMaudry) has a project for extracting RDF metadata from DITA content [1]. Seems to be attracting interest from Marklogic and HarperCollins [2] and others [3]. Cheers,
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
Luca Matteis lmatt...@gmail.com writes: I'd like to say that I'm an HTML/CSS/JavaScript aficionado so I'd be the first to embrace Web standards to produce publications. I'm simply playing a bit of the devil's advocate here because I think that Latex is still more mature than HTML for writing papers. However, I must admit I'd like to see a future where that is different. The conference does not want latex, it wants PDF. So write your documents in latex, publish in HTML. The only thing that needs to change are the tools in the middle. But before we ask conferences to embrace this still immature HTML world (at least for producing papers) we must write the frameworks, the libraries, the CSS templates that enable the same level of publication that Latex enables. Well, that's already been done. As for the same level of publication I profoundly disagree. LNCS format is very poor for anything other than printing. I want a form of publication that allows me, the reader, to switch layout. For solving the browser inconsistencies, standalone tools (based on a Webkit engine for example) must be built that produce a consistent printable layout no matter the operating system (browser fonts render differently on Mac/Windows/Linux). Seriously? You want to build another browser. My experience is that the web is more consistent than PDF. Font problems with PDFs used to be the norm. Tend not to use them now, so perhaps that's changed. And, again, printable? At least some of us want to move away from that. Stlying in reader issue, not an authorial one. So yes, we can get there, but there's some work to be done to prove that HTML is up for task. No. There is work to be done to prove that we can break the habit of a lifetime. HTML is far from immature. We move, and then we fix any problems that we may have. Why would we bother before? Phillip Lord, by interactions I don't mean simple animations, I mean this: http://worrydream.com/LadderOfAbstraction/ - use the right side scrolling to instantly see the output given different inputs. That's powerful stuff. Colour figures and animations would be a nice start though. Phil
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
On 01/10/2014 21:55, Luca Matteis wrote: But why is it backwards? We have different formats serving different purposes. Diversity is healthy. Simply because PDF is not in the Web stack it doesn't make it Web-unfriendly. In 2013, PDF was mentioned during ODW2013 [0] workshop and I quote part of the final report [1] below regarding PDF: (...) PDF - often referred to as the format where data goes to die. In the open data world, PDF has a bad name as it is not deemed machine processable. As Adobe's Jim King pointed out in his presentation [2] , this is perhaps unfair. PDF can include structured tables, can carry associated metadata, extractable text and more. It is the way that PDFs are generated - using basic tools that don't support all the features - that renders PDF documents opaque to machine processes. This could be an opportunity to work closer with Adobe's folks to see how web stack can help process data in PDF... Best, Ghislain [0] http://www.w3.org/2013/04/odw/ [1] http://www.w3.org/2013/04/odw/report [2] http://www.w3.org/2013/04/odw/Role_of_PDF_and_Opendata_final.pdf -- Ghislain Atemezing EURECOM, Multimedia Communications Department Campus SophiaTech 450, route des Chappes, 06410 Biot, France. e-mail: auguste.atemez...@eurecom.fr ghislain.atemez...@gmail.com Tel: +33 (0)4 - 9300 8178 Fax: +33 (0)4 - 9000 8200 Web: http://www.eurecom.fr/~atemezin Google+:http://google.com/+GhislainATEMEZING Twitter:@gatemezing
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
Hi All, I know Latex is the norm in academic circles, but the DITA XML standard is widely used in industry and gaining traction in publishing. Colin Maudry ( @CMaudry) has a project for extracting RDF metadata from DITA content [1]. Seems to be attracting interest from Marklogic and HarperCollins [2] and others [3]. Cheers, John [1] http://purl.org/dita/ditardf-project [2] http://files.meetup.com/1645603/meetup-2014-08-12.pptx [3] http://de.slideshare.net/TheresaGrotendorst/towards-dynamic-and-smart-content-semantic-technologies-for-adaptive-technical-documentation On October 2, 2014 at 12:03 AM Norman Gray nor...@astro.gla.ac.uk wrote: Greetings. On 2014 Oct 1, at 22:36, Luca Matteis lmatt...@gmail.com wrote: So forget PDF. Perhaps we can add markup to Latex documents and make them linked data friendly? That would be cool. A Latex RDF serialization :) There exists http://www.siegfried-handschuh.net/pub/2007/salt_eswc2007.pdf: SALT: Semantically Annotated LATEX Tudor Groza Siegfried Handschuh Hak Lae Kim Digital Enterprise Research Institute IDA Business Park, Lower Dangan Galway, Ireland {tudor.groza, siegfried.handschuh, haklae.kim}@deri.org ABSTRACT Machine-understandable data constitutes the basis for the Seman- tic Desktop. We provide in this paper means to author and annotate Semantic Documents on the Desktop. In our approach, the PDF file format is the basis for semantic documents, which store both a document and the related metadata in a single file. To achieve this we provide a framework, SALT that extends the Latex writ- ing environment and supports the creation of metadata for scien- tific publications. SALT lets the scientific author create metadata while putting together the content of a research paper. We discuss some of the requirements one has to meet when developing such an ontology-based writing environment and we describe a usage scenario. That describes a very thorough approach to embedding some semantics within LaTeX documents. Yes, 'thorough'; very thorough; verging on the intimidating. I dimly recall that there was a rather more lightweight approach which was used for proceedings in ISWC or ESWC -- I remember marking up a LaTeX document in something less comprehensive than SALT -- but I can't remember enough to be able to re-find it. All the best, Norman -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
On 2014-10-02 10:36, Ghislain Atemezing wrote: On 01/10/2014 21:55, Luca Matteis wrote: But why is it backwards? We have different formats serving different purposes. Diversity is healthy. Simply because PDF is not in the Web stack it doesn't make it Web-unfriendly. In 2013, PDF was mentioned during ODW2013 [0] workshop and I quote part of the final report [1] below regarding PDF: (...) PDF - often referred to as the format where data goes to die. In the open data world, PDF has a bad name as it is not deemed machine processable. As Adobe's Jim King pointed out in his presentation [2] , this is perhaps unfair. PDF can include structured tables, can carry associated metadata, extractable text and more. It is the way that PDFs are generated - using basic tools that don't support all the features - that renders PDF documents opaque to machine processes. This could be an opportunity to work closer with Adobe's folks to see how web stack can help process data in PDF... Best, Ghislain [0] http://www.w3.org/2013/04/odw/ [1] http://www.w3.org/2013/04/odw/report [2] http://www.w3.org/2013/04/odw/Role_of_PDF_and_Opendata_final.pdf Thanks for sharing Ghislain. Lets not forget that we have SW/LD supporters that go after public institutions to aim for 5-star Linked Data. Or ask for public funding to support their SW/LD research. Ironic Facts: * Majority of the SW/LD research output is publicly funded * Majority of the SW/LD research venues promote 1-star Linked Data So, yes, we can do a lot of different things and in fact, a lot of people are doing different things to improve open science and communication. The question is, what efforts are the SW/LD research venues making? How are they compromising or improving the state of things? What has changed in recent memory? -Sarven http://csarven.ca/#i smime.p7s Description: S/MIME Cryptographic Signature
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
Hi everybody, I won't disturb the discussion a lot, but what you say is really interesting! From my academic and physical sciences bias (as a student at master's level at université Paris-Sud), this is just NOT the way publication is considered by the researchers of my university. So may I leak what is exchange here to another discussion (on facebook, sorry…). It is HackYourPhD, a French group working on open science: http://hackyourphd.org https://www.facebook.com/groups/499463776745202/ Best, Jibé Le 01/10/2014 21:26, Sarven Capadisli a écrit : On 2014-10-01 21:05, Luca Matteis wrote: Dear Sarven, This stuff is really cool: http://linked-research.270a.info/ Couple of questions: How did you come up with such a close CSS/HTML template as the LCNS latex version? Did you hand code the CSS to make it look as close as possible or was it automated by some tool I'm not aware of? As venues always give precise instructions on what template to follow e.g.: http://static.springer.com/sgw/documents/1121537/application/pdf/SPLNPROC+Author+Instructions_Aug2014.pdf that's exactly what I did. Read it line by line and wrote the CSS for it. There is no doubt that the CSS can be better. Different browsers for instance have varying CSS3 print support. If you thought http://linked-research.270a.info/ looked cool, why not change the link href=lncs.css to acm.css from your browser's developer tool. What's demanded by the conferences/publishers is an archaic presentation. Fixed page length. Fixed view. So be it. That is a small subset of what we can achieve using the Web stack. What you're saying about moving towards RDFa for publishing papers should definitely be discussed more, however, CSS/HTML still fails in a lot of things that Latex on the other hand excels at. For example typography and font kerning/spacing. All that works really well in latex/pdf, while in HTML you get different results in different browsers. Journals certainly can't expect inconsistencies. I've seen templates built in PDF using Latex that you can dream of using HTML/CSS. It's just a better set of tools for when it comes to publishing *static* documents, because they were built for static documents. The Web on the other hand is rarely static. It's an interactive playground better suited for a DOM structure such as HTML. Let me ask you to take a step back for a second. Are you convinced that there are far more possibilities with LaTeX/PDF for data representation, presentation and interaction than HTML+CSS+JavaScript+RDFa+SVG+MathML.. ? Do we really need to battle that out? :) Don't worry, I will. As I'll demonstrate in my final PhD dissertation ;) If PDF was so good at static documents, we'd have the Web of PDFs instead of Web of HTMLs. I disagree that the Web is rarely static. As far as the print precision goes, I agree, CSS3 and browser support for printing has a lot of work to do. But, what level of precision is the SW/LD conferences are worried about providing to publishers? I recall that Springer for instance asks for only the PDF. What that practically means it that, one can go from LaTeX, HTML+CSS, or dare I say, JPEG to PDF. There is no precision police for rendering. Most people and organizations have printers that have 300-600 DPI support. Isn't there just a standard way to add RDF markup to a PDF file? Maybe. But, that's totally backwards, IMO. -Sarven
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
Luca Matteis lmatt...@gmail.com writes So until we start building interactive publications, I see no reason to move away from the wonders that Latex/PDF can accomplish. Because PDF is rubbish on the web. Because almost all of the software tools for data visualisation are being written in JS these days. Because PDF is hard to extract from. Because embedding metadata is easy in HTML. Because, we do not make interactive publications because the technology we are using is antiquated and does not let us, not because we do not want to. I've even have journals try to charge me extra for colour. And, besides, we are making interactive publications. The bioinformatics community do this all the time. Often with data, and downloadable VMs so you can rerun the analysis. Maybe. But, that's totally backwards, IMO. But why is it backwards? We have different formats serving different purposes. Diversity is healthy. Simply because PDF is not in the Web stack it doesn't make it Web-unfriendly. Yes, actually, it does. Phil
Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
On 2014-10-01 19:10, Laura Dawson wrote: What about EPUB, which is xHTML and has support for Schema.org markup? It also provides for fixed-layout. IMO, this particular discussion is not what we should be focusing on. And, it almost always deters from the main topic. There are a number of ways to get to Web friendly representations and presentations. EPUB? Sure. Whatever floats the author's boat. As long as we can precisely identify and be able to discover the items in research papers, that's all fine. I personally don't find the need to set any hard limitations on (X)HTML or which vocabularies to use. So, schema.org is not granular enough at this time. There are more appropriate ones out there e.g: e.g., http://lists.w3.org/Archives/Public/public-lod/2014Jul/0179.html , but that doesn't mean that we can't use them along with schema.org. I favour plain HTML+CSS+RDFa to get things going e.g.: https://github.com/csarven/linked-research (I will not dwell on the use of SVG, MathML, JavaScript etc. at this point, but you get the picture). The primary focus right now is to have SW/LD venues compromise i.e., not insist only on Adobe's PDF, but welcome Web native technologies. Debating on which Doctype or vocabulary or whatever is like the icing on the cake. Can we first bring the flour into our kitchen? -Sarven http://csarven.ca/#i smime.p7s Description: S/MIME Cryptographic Signature
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
Apologies, Sarven. I was just trying to point out some options and resources for those who were interested. On 10/1/14, 2:42 PM, Sarven Capadisli i...@csarven.ca wrote: On 2014-10-01 19:10, Laura Dawson wrote: What about EPUB, which is xHTML and has support for Schema.org markup? It also provides for fixed-layout. IMO, this particular discussion is not what we should be focusing on. And, it almost always deters from the main topic. There are a number of ways to get to Web friendly representations and presentations. EPUB? Sure. Whatever floats the author's boat. As long as we can precisely identify and be able to discover the items in research papers, that's all fine. I personally don't find the need to set any hard limitations on (X)HTML or which vocabularies to use. So, schema.org is not granular enough at this time. There are more appropriate ones out there e.g: e.g., http://lists.w3.org/Archives/Public/public-lod/2014Jul/0179.html , but that doesn't mean that we can't use them along with schema.org. I favour plain HTML+CSS+RDFa to get things going e.g.: https://github.com/csarven/linked-research (I will not dwell on the use of SVG, MathML, JavaScript etc. at this point, but you get the picture). The primary focus right now is to have SW/LD venues compromise i.e., not insist only on Adobe's PDF, but welcome Web native technologies. Debating on which Doctype or vocabulary or whatever is like the icing on the cake. Can we first bring the flour into our kitchen? -Sarven http://csarven.ca/#i
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
Not to pile on Sarven, said Gannon, piling on, but I recently tripped over the XSD validation tests I did a couple of years ago. They are XHTML 1.1 + RDFa all zipped up so you won't have to stitch the schema together (it is 20+ little files). If anyone wants it please contact me off-board. (If you are the person who stole my Nook, contact me. Then run.) --Gannon On Wed, 10/1/14, Sarven Capadisli i...@csarven.ca wrote: Subject: Formats and icing (Was Re: [ESWC 2015] First Call for Paper) To: Laura Dawson laura.daw...@bowker.com, Kingsley Idehen kide...@openlinksw.com, public-lod@w3.org public-lod@w3.org Date: Wednesday, October 1, 2014, 1:42 PM On 2014-10-01 19:10, Laura Dawson wrote: What about EPUB, which is xHTML and has support for Schema.org markup? It also provides for fixed-layout. IMO, this particular discussion is not what we should be focusing on. And, it almost always deters from the main topic. There are a number of ways to get to Web friendly representations and presentations. EPUB? Sure. Whatever floats the author's boat. As long as we can precisely identify and be able to discover the items in research papers, that's all fine. I personally don't find the need to set any hard limitations on (X)HTML or which vocabularies to use. So, schema.org is not granular enough at this time. There are more appropriate ones out there e.g: e.g., http://lists.w3.org/Archives/Public/public-lod/2014Jul/0179.html , but that doesn't mean that we can't use them along with schema.org. I favour plain HTML+CSS+RDFa to get things going e.g.: https://github.com/csarven/linked-research (I will not dwell on the use of SVG, MathML, JavaScript etc. at this point, but you get the picture). The primary focus right now is to have SW/LD venues compromise i.e., not insist only on Adobe's PDF, but welcome Web native technologies. Debating on which Doctype or vocabulary or whatever is like the icing on the cake. Can we first bring the flour into our kitchen? -Sarven http://csarven.ca/#i
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
Dear Sarven, This stuff is really cool: http://linked-research.270a.info/ Couple of questions: How did you come up with such a close CSS/HTML template as the LCNS latex version? Did you hand code the CSS to make it look as close as possible or was it automated by some tool I'm not aware of? What you're saying about moving towards RDFa for publishing papers should definitely be discussed more, however, CSS/HTML still fails in a lot of things that Latex on the other hand excels at. For example typography and font kerning/spacing. All that works really well in latex/pdf, while in HTML you get different results in different browsers. Journals certainly can't expect inconsistencies. I've seen templates built in PDF using Latex that you can dream of using HTML/CSS. It's just a better set of tools for when it comes to publishing *static* documents, because they were built for static documents. The Web on the other hand is rarely static. It's an interactive playground better suited for a DOM structure such as HTML. Isn't there just a standard way to add RDF markup to a PDF file? Best, Luca On Wed, Oct 1, 2014 at 8:46 PM, Laura Dawson laura.daw...@bowker.com wrote: Apologies, Sarven. I was just trying to point out some options and resources for those who were interested. On 10/1/14, 2:42 PM, Sarven Capadisli i...@csarven.ca wrote: On 2014-10-01 19:10, Laura Dawson wrote: What about EPUB, which is xHTML and has support for Schema.org markup? It also provides for fixed-layout. IMO, this particular discussion is not what we should be focusing on. And, it almost always deters from the main topic. There are a number of ways to get to Web friendly representations and presentations. EPUB? Sure. Whatever floats the author's boat. As long as we can precisely identify and be able to discover the items in research papers, that's all fine. I personally don't find the need to set any hard limitations on (X)HTML or which vocabularies to use. So, schema.org is not granular enough at this time. There are more appropriate ones out there e.g: e.g., http://lists.w3.org/Archives/Public/public-lod/2014Jul/0179.html , but that doesn't mean that we can't use them along with schema.org. I favour plain HTML+CSS+RDFa to get things going e.g.: https://github.com/csarven/linked-research (I will not dwell on the use of SVG, MathML, JavaScript etc. at this point, but you get the picture). The primary focus right now is to have SW/LD venues compromise i.e., not insist only on Adobe's PDF, but welcome Web native technologies. Debating on which Doctype or vocabulary or whatever is like the icing on the cake. Can we first bring the flour into our kitchen? -Sarven http://csarven.ca/#i
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
On 2014-10-01 21:05, Luca Matteis wrote: Dear Sarven, This stuff is really cool: http://linked-research.270a.info/ Couple of questions: How did you come up with such a close CSS/HTML template as the LCNS latex version? Did you hand code the CSS to make it look as close as possible or was it automated by some tool I'm not aware of? As venues always give precise instructions on what template to follow e.g.: http://static.springer.com/sgw/documents/1121537/application/pdf/SPLNPROC+Author+Instructions_Aug2014.pdf that's exactly what I did. Read it line by line and wrote the CSS for it. There is no doubt that the CSS can be better. Different browsers for instance have varying CSS3 print support. If you thought http://linked-research.270a.info/ looked cool, why not change the link href=lncs.css to acm.css from your browser's developer tool. What's demanded by the conferences/publishers is an archaic presentation. Fixed page length. Fixed view. So be it. That is a small subset of what we can achieve using the Web stack. What you're saying about moving towards RDFa for publishing papers should definitely be discussed more, however, CSS/HTML still fails in a lot of things that Latex on the other hand excels at. For example typography and font kerning/spacing. All that works really well in latex/pdf, while in HTML you get different results in different browsers. Journals certainly can't expect inconsistencies. I've seen templates built in PDF using Latex that you can dream of using HTML/CSS. It's just a better set of tools for when it comes to publishing *static* documents, because they were built for static documents. The Web on the other hand is rarely static. It's an interactive playground better suited for a DOM structure such as HTML. Let me ask you to take a step back for a second. Are you convinced that there are far more possibilities with LaTeX/PDF for data representation, presentation and interaction than HTML+CSS+JavaScript+RDFa+SVG+MathML.. ? Do we really need to battle that out? :) Don't worry, I will. As I'll demonstrate in my final PhD dissertation ;) If PDF was so good at static documents, we'd have the Web of PDFs instead of Web of HTMLs. I disagree that the Web is rarely static. As far as the print precision goes, I agree, CSS3 and browser support for printing has a lot of work to do. But, what level of precision is the SW/LD conferences are worried about providing to publishers? I recall that Springer for instance asks for only the PDF. What that practically means it that, one can go from LaTeX, HTML+CSS, or dare I say, JPEG to PDF. There is no precision police for rendering. Most people and organizations have printers that have 300-600 DPI support. Isn't there just a standard way to add RDF markup to a PDF file? Maybe. But, that's totally backwards, IMO. -Sarven smime.p7s Description: S/MIME Cryptographic Signature
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
On 10/1/14 2:42 PM, Sarven Capadisli wrote: can't use them along with schema.org. I favour plain HTML+CSS+RDFa to get things going e.g.: https://github.com/csarven/linked-research What about: HTML+CSS+(RDFa | Microdata | JSON-LD | TURTLE) ? Basically, we have to get to: HTML+CSS+(Any RDF Notation) . The above is possible because we now have standardization of link/, script/ etc.. in HTML that makes this possible. I wouldn't single out RDFa in this quest. History has shown that whenever we single anything out anything, at the notation level, we inevitably open up a new format centric war. These wars simply protract all the confusion that swirls around RDF :) -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
On Wed, Oct 1, 2014 at 9:26 PM, Sarven Capadisli i...@csarven.ca wrote: Let me ask you to take a step back for a second. Are you convinced that there are far more possibilities with LaTeX/PDF for data representation, presentation and interaction than HTML+CSS+JavaScript+RDFa+SVG+MathML.. ? Do we really need to battle that out? :) Don't worry, I will. As I'll demonstrate in my final PhD dissertation ;) No battling :) but I'd be glad to discuss. I think for interactions, no doubt, HTML has a lot more to offer than PDF which has barely any interactions capabilities at all, compared to things like WebGL, CSS animations and so on of the Web world. But specifically for publications, these are still static documents. One of the main goals of publications is to be able to combine them into a journal right? Imagine doing that in HTML. You'd have inconsistencies all over the journal. So until we start building interactive publications, I see no reason to move away from the wonders that Latex/PDF can accomplish. Just think of graphing libraries and the entire vibrant Tex community that builds templates for specific visualizations - http://tex.stackexchange.com/questions/158668/nice-scientific-pictures-show-off - some of this stuff is available in HTML but it's far from the maturity latex/pdf has achieved. If PDF was so good at static documents, we'd have the Web of PDFs instead of Web of HTMLs. I disagree that the Web is rarely static. Sorry but that's a bad analogy. HTML was designed to be lightweight because it needs to be transferred across the wire and visualized immediately (HTTP request). PDF is not lightweight at all. Maybe. But, that's totally backwards, IMO. But why is it backwards? We have different formats serving different purposes. Diversity is healthy. Simply because PDF is not in the Web stack it doesn't make it Web-unfriendly. Best, Luca
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
On 10/1/14 3:26 PM, Sarven Capadisli wrote: On 2014-10-01 21:05, Luca Matteis wrote: Dear Sarven, This stuff is really cool: http://linked-research.270a.info/ Couple of questions: How did you come up with such a close CSS/HTML template as the LCNS latex version? Did you hand code the CSS to make it look as close as possible or was it automated by some tool I'm not aware of? As venues always give precise instructions on what template to follow e.g.: http://static.springer.com/sgw/documents/1121537/application/pdf/SPLNPROC+Author+Instructions_Aug2014.pdf that's exactly what I did. Read it line by line and wrote the CSS for it. There is no doubt that the CSS can be better. Different browsers for instance have varying CSS3 print support. If you thought http://linked-research.270a.info/ looked cool, why not change the link href=lncs.css to acm.css from your browser's developer tool. What's demanded by the conferences/publishers is an archaic presentation. Fixed page length. Fixed view. So be it. That is a small subset of what we can achieve using the Web stack. What you're saying about moving towards RDFa for publishing papers should definitely be discussed more, however, CSS/HTML still fails in a lot of things that Latex on the other hand excels at. For example typography and font kerning/spacing. All that works really well in latex/pdf, while in HTML you get different results in different browsers. Journals certainly can't expect inconsistencies. I've seen templates built in PDF using Latex that you can dream of using HTML/CSS. It's just a better set of tools for when it comes to publishing *static* documents, because they were built for static documents. The Web on the other hand is rarely static. It's an interactive playground better suited for a DOM structure such as HTML. Let me ask you to take a step back for a second. Are you convinced that there are far more possibilities with LaTeX/PDF for data representation, presentation and interaction than HTML+CSS+JavaScript+RDFa+SVG+MathML.. ? Do we really need to battle that out? :) Don't worry, I will. As I'll demonstrate in my final PhD dissertation ;) If PDF was so good at static documents, we'd have the Web of PDFs instead of Web of HTMLs. I disagree that the Web is rarely static. As far as the print precision goes, I agree, CSS3 and browser support for printing has a lot of work to do. But, what level of precision is the SW/LD conferences are worried about providing to publishers? I recall that Springer for instance asks for only the PDF. What that practically means it that, one can go from LaTeX, HTML+CSS, or dare I say, JPEG to PDF. There is no precision police for rendering. Most people and organizations have printers that have 300-600 DPI support. Isn't there just a standard way to add RDF markup to a PDF file? Maybe. But, that's totally backwards, IMO. +1 Packing Linked Open Data into a PDF (meta)data slot == packing the Linked Open Data into a document silo. Remember, the tools for embedding the data are also platform specific, ditto the tools for extracting this embedded (meta)data. Net effect, a total contradiction in regards to Linked Open Data. In my eyes, PDF is really stands for (P)aper (D)efinition (F)ormat, it regurgitates all the limitations of paper in digital form, especially in regards to data. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
On 10/1/14 3:55 PM, Luca Matteis wrote: But why is it backwards? We have different formats serving different purposes. Diversity is healthy. Simply because PDF is not in the Web stack it doesn't make it Web-unfriendly. The issue arises when Linked Open Data is packing into a PDF silo. The big elephant in the room, in regards to Linked Open Data, is the fact that PDFs are inherently contradictory. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
I guess PDF is sort of the compiled version while we're talking about the source of the document (HTML). You can convert both Latex and HTML to PDF. So I guess perhaps what I'm saying is that Latex is a better tool than HTML for writing publications simply because it was built for that. So forget PDF. Perhaps we can add markup to Latex documents and make them linked data friendly? That would be cool. A Latex RDF serialization :) On Wed, Oct 1, 2014 at 11:26 PM, Kingsley Idehen kide...@openlinksw.com wrote: On 10/1/14 3:55 PM, Luca Matteis wrote: But why is it backwards? We have different formats serving different purposes. Diversity is healthy. Simply because PDF is not in the Web stack it doesn't make it Web-unfriendly. The issue arises when Linked Open Data is packing into a PDF silo. The big elephant in the room, in regards to Linked Open Data, is the fact that PDFs are inherently contradictory. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
On 2014-10-01 21:51, Kingsley Idehen wrote: On 10/1/14 2:42 PM, Sarven Capadisli wrote: can't use them along with schema.org. I favour plain HTML+CSS+RDFa to get things going e.g.: https://github.com/csarven/linked-research What about: HTML+CSS+(RDFa | Microdata | JSON-LD | TURTLE) ? Basically, we have to get to: HTML+CSS+(Any RDF Notation) . Sure, why not! The above is possible because we now have standardization of link/, script/ etc.. in HTML that makes this possible. I wouldn't single out RDFa in this quest. History has shown that whenever we single anything out anything, at the notation level, we inevitably open up a new format centric war. These wars simply protract all the confusion that swirls around RDF :) I agree and that's all fine. I've only proposed one particular solution that made the most sense to me. Going from one to another is not an issue either. People are going to do whatever is convenient or suitable for them in the end (just like LaTeX-PDF). The primary problem is not about solving x in HTML+CSS+x, but that HTML+CSS is not even an option to begin with for major international Semantic Web conferences to better preserve and foster smart identification and discovery of research components. Reproducibility suffers along the way. There is absolutely nothing worthwhile we can query for from past SW/LD research. -Sarven smime.p7s Description: S/MIME Cryptographic Signature
Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)
On 10/1/14 5:36 PM, Luca Matteis wrote: I guess PDF is sort of the compiled version while we're talking about the source of the document (HTML). No, there is structured data locked (inaccessible) from the PDF. In HTML+(RDF based Structured Data Island Notation) you have loose coupling of presentation and fine-grained structured data, increasingly in Linked Open Data form. You can convert both Latex and HTML to PDF. But you do that using platform specific tools, in either direction. We don't want that kind of specificity circa., 2104. So I guess perhaps what I'm saying is that Latex is a better tool than HTML for writing publications simply because it was built for that. It might be, but that's really besides the point. Our big issue here is unshacking data i.e., data de-silo-fication. Sadly, we have conferences that are aligned closely with Linked Open Data that, right at the front-door, completely contradict its fundamental essence and value proposition. So forget PDF. Perhaps we can add markup to Latex documents and make them linked data friendly? Only if the end product is as open and loosely coupled as HTML+(RDF based Structured Data Islands) . That would be cool. A Latex RDF serialization :) If that was accessible and usable over HTTP network, without any platform specific tools. Kingsley On Wed, Oct 1, 2014 at 11:26 PM, Kingsley Idehen kide...@openlinksw.com wrote: On 10/1/14 3:55 PM, Luca Matteis wrote: But why is it backwards? We have different formats serving different purposes. Diversity is healthy. Simply because PDF is not in the Web stack it doesn't make it Web-unfriendly. The issue arises when Linked Open Data is packing into a PDF silo. The big elephant in the room, in regards to Linked Open Data, is the fact that PDFs are inherently contradictory. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature