from:"Peter F. Patel\-Schneider"

Re: Advancing the DBpedia ontology

2015-02-26 Thread Peter F. Patel-Schneider

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sure, the data and the ontology have to line up.

However, just because all the windmills in Wikipedia happen to be buildings
doesn't mean that windmill should be subcategory of building in DBpedia.
Similarly, if the DBpedia class Church is a subcategory of buildings then
there is pressure to consider a church to be a building.

Some of this is just (the perjorative sense of) semantics. What is wrong
with defining Church to be a building that is also a place of Christian
worship? That's why I suggested that DBpedia classes be tied to Wikipedia
articles. (Wikipedia does identify churches with buildings, but at least
using this the informal definition of a church would let DBpedia
contributors know what a DBpedia church should be.)

peter

On 02/24/2015 08:15 PM, Vladimir Alexiev wrote:
From: Peter F. Patel-Schneider [mailto:pfpschnei...@gmail.com] I agree
that there are problems with the mappings. However, how can the
mappings be fixed without fixing the ontology?

I could ask you a converse question: ** how can you make an accurate
ontology without looking at the data? And to look at the data, you need
mappings (if not to execute then to document what you've examined).

But more constructively:

There is a large number of mapping problems independent of the ontology.
E.g. when a Singer (Person) is mapped to Band (Organisation) due to wrong
check of a field background, I don’t care how the classes are
organized, I already hurt that the direct type is wrong.

Of course, having a good ontology would help! E.g.
https://github.com/dbpedia/mappings-tracker/issues/49: some guy named
Admin made in 2010 two props occupation and personFunction with
nearly identical role history. - No documentation of course. -
occupation has 100-250 uses, personFunction has 20-50 uses. - Which of
the two to use? - More importantly, which have already been used right,
and which are wrong?

I suspect that most uses of occupation are as a DataProp, even though
it's declared as an ObjectProp.

DBpedia adopts an Object/DataProp Dichotomy that IMHO does not work
well. See
http://vladimiralexiev.github.io/pres/20150209-dbpedia/dbpedia-problems-long.html#sec-3-2

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJU71nsAAoJECjN6+QThfjzujsIAM+nsI/QW4WOfT08OEWaBNvc
pVhATh4Tyo/vOeLYWkUE9Cus53iWb7YFW/LEUclrD4rqvfUJ1i5pe/BkKqd9EIUf
SaZl2d+uAV+BJ3cIto/JRdQ79eiwQLWTdcmIFdP37+1+ksVPsyIKZsS44fLs5KSa
nxTr3t2EPnhvtAEbM2VQadNFDgrdqeze6o9QCNRuFyU7haZudbz1xEelwlzESPWw
irfzxqt9CAkppY775jb8APzXfe6M6WvJlHStrgDyIXOl5nWdpRw1WSzp4zK69F1C
h6Ar0kn1mbSonMaH6NKK4EeWXAzqpQdRVC0v3HJdfYMv6cvDQul4KccF6ZZsxTg=
=nea2
-END PGP SIGNATURE-

Re: [Dbpedia-ontology] [Dbpedia-discussion] Advancing the DBpedia ontology

2015-02-18 Thread Peter F. Patel-Schneider

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I agree that there are problems with the mappings. However, how can the
mappings be fixed without fixing the ontology?

peter

On 02/18/2015 05:03 AM, Vladimir Alexiev wrote:
Hi everyone!

My presentations from the Dublin meeting are at

-
http://VladimirAlexiev.github.io/pres/20150209-dbpedia/add-mapping-long.html

An example of adding a mapping, while making a couple props along the way and
reporting a couple of problems.

-
http://VladimirAlexiev.github.io/pres/20150209-dbpedia/dbpedia-problems-long.html

Provides a wider perspective that data problems are not only due to the
ontology, but many other areas.
3. Mapping Language Issues 4. Mapping Server Deficiencies 5. Mapping Wiki
Deficiencies 6. Mapping Issues 7. Extraction Framework Issues 8. External
Mapping Problems 9. Ontology Problems Almost all of these are also
reported in the two trackers described in sec.2

Heiko Paulheim he...@informatik.uni-mannheim.de wrote: I am currently
working with Aldo Gangemi on exploiting the mappings to DOLCE (and the
high level disjointness axioms in DOLCE) for finding modeling issues
both in the instances and the ontology.

Sounds very interesting!

I've been quite active in the last couple of months, but I've been
pecking at random here and there. More systematic approaches are
definitely needed, as soon as they are not limited to a theoretical
experiment, or a one-time effort that's quickly closed down.

I've observed many error patterns, and if people smarter than me can
devise ways to leverage and amplify these observations using algorithmic
or ML approaches, that could create fast progress. I give some examples
of the Need for Research of specific problems:
http://VladimirAlexiev.github.io/pres/20150209-dbpedia/dbpedia-problems-long.html#sec-6-5
and next section.

Harald Sack: apply the DBpedia ontology to detect inconsistencies and
flaws in DBpedia facts. This should not only be possible in a
retroactive way, but should take place much earlier. Besides the
detection of inconsistencies during the mapping process or afterwards
in the extracted data

Sounds very promising! If I can help somehow with manual ontological
wiki labor, let me know. Data vs ontology validation can provide -
mapping defect lists - useful hints that the Extraction can use. The most
important feature would be Use Domain Range to Guide Extraction
http://vladimiralexiev.github.io/pres/20150209-dbpedia/dbpedia-problems-long.html#sec-7-4

this could already be possible right from the start when the user is
changing the wikipedia infobox content (in the sense of type checking
for domain/range, checking of class disjointness and further
constraints, plausibility check for dates

I'm doubtful of the utility of error lists to Wikipedia (or it needs to
be done with skill and tact): 1. The mapping wiki adopts an Object vs
DataProp Dichotomy (uses owl:ObjectProperty and owl:DatatypeProperty and
never rdf:Property). But MANY Wikipedia fields include both links and
text, and in many cases BOTH are useful
http://vladimiralexiev.github.io/pres/20150209-dbpedia/dbpedia-problems-long.html#sec-3-2

2. At the end of the day, Wikipedia is highly-crafted text, so telling
Wikipedia editors that they can't write something, will not sit well with them.

For example, who should resolve this contradiction: DBO: dbo:parent
rdfs:range dbo:Person Wikipedia: | mother = [[Queen Victoria]] of
[[England]] I think the Extraction Framework (by filtering out the link
that is not Person), not Wikipedians

a tool that makes inconsistencies/flaws in wikipedia data visible
directly in the wikipedia interface, where users could either correct
them or confirm facts that are originally in doubt.

But Wikipedia is moving towards using Wikidata props in template fields:
through {{#property}}.

Cheers!

Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration
more Get technology previously reserved for billion-dollar corporations,
FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk

___
Dbpedia-ontology mailing list dbpedia-ontol...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-ontology

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJU5JtYAAoJECjN6+QThfjzf0MH/3Heorv9dKd1jx7V72an1SPK
Ng0ODtLR1R0/NW6r5ge8tpu0IKt8VG/XdhzcBcycxwfAJZvkMVpeqyo5fvC6yKPQ
YP4gq223lys/NccrCOS68t64Y2r2wVh7rQR6q7XI9HUJVkN3sFqk68UzKpvV0K0f
1b5QCOc7Cu+h1iUvKgl+/AnVvtveyRAeCX35bSLAjEyX41DrYx2rB8/xr7KaJMbB
XOP11dJvPp7xu0bTrAapW+TVA+WT16XQxo/nwap33tUWmDCikD3cIYOY2TpwD5Er

Re: [Dbpedia-discussion] Advancing the DBpedia ontology

2015-01-24 Thread Peter F. Patel-Schneider

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Is there going to be the possibility of at least listening in remotely?

peter


On 01/24/2015 12:02 AM, Dimitris Kontokostas wrote:
 Hi Peter,
 
 ATM I can only answer for disjointness axioms. We plan to use them for 
 cleaning up extracted data so we definitely want them. For the rest, we
 are open to suggestions and this is one of the reasons we invite ontology
 experts to participate.
 
 Best, Dimitris
 
 On Sat, Jan 24, 2015 at 5:47 AM, Peter F. Patel-Schneider 
 pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote:
 
 
 Good points all, I think.
 
 As well I would like to know what expressive power is being considered
 for the ontology language.  For example, will disjointness axioms be
 allowed, or local ranges, or constraints?
 
 peter
 
 On 01/23/2015 12:23 PM, Nicolas Torzec wrote:
 Hi Dimitris et al.,
 
 A) What is the specific use you have in mind?
 
 B) Are you thinking about a centralized ontology managed by editors, a 
 user-contributed ontology, or an automatically generated taxonomy?
 
 C) How will it relate to other ontologies, taxonomies and schemas?
 Also, will it relate to Wikidata, Wikipedia, schema.org
 http://schema.org,
 Facebook OG, etc.
 
 D) How will you categorize Wiki pages (and possibly other documents) 
 against this ontology?
 
 Cheers. -N.
 
 -- Nicolas Torzec Yahoo Labs.
 
 
 --

 
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
 GigeNET is offering a free month of service with a new server in
 Ashburn. Choose from 2 high performing configs, both with 100TB of
 bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely
 compliant. http://p.sf.net/sfu/gigenet 
 ___ Dbpedia-discussion
 mailing list dbpedia-discuss...@lists.sourceforge.net 
 mailto:dbpedia-discuss...@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
 
 
 
 
 -- Dimitris Kontokostas Department of Computer Science, University of
 Leipzig Research Group: http://aksw.org 
 Homepage:http://aksw.org/DimitrisKontokostas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJUw801AAoJECjN6+QThfjzd3gH/0zre80KqwpW2c/8Ugwd+KpU
T94TmAmp+wKmyp5zLzvjXWPETLsnf9tPhTgK+tdPJBLqBsK+3aRO8pEA2h7oFpFy
fJ//Jj7hOLEzg7/MkMRl3twnvylK3V1SybR2S/QIBjTuRPcRfRl5guxKd9L2yIIP
ANAWtHpwb9RHepG/+E4GWIdeONc2QeaZp4Pf4siWDgKva/SKxMMn4XObFAJ/TY/H
uAsXsqboou0mr8i1FyMt0CWoTmOZD7Ki3obZ/zQiwiV/CkBurZ+/tCbjN7dLKAbT
iMRK7r5M0GN5Ow56rSwe5Y9uro3Pm1UIl/JohdTWExP04YLd4e1hC+p8qn+91lU=
=ocPv
-END PGP SIGNATURE-

Re: [Dbpedia-discussion] Advancing the DBpedia ontology

2015-01-23 Thread Peter F. Patel-Schneider

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Good points all, I think.

As well I would like to know what expressive power is being considered for
the ontology language.  For example, will disjointness axioms be allowed, or
local ranges, or constraints?

peter


On 01/23/2015 12:23 PM, Nicolas Torzec wrote:
 Hi Dimitris et al.,
 
 A) What is the specific use you have in mind?
 
 B) Are you thinking about a centralized ontology managed by editors, a 
 user-contributed ontology, or an automatically generated taxonomy?
 
 C) How will it relate to other ontologies, taxonomies and schemas? Also,
 will it relate to Wikidata, Wikipedia, schema.org, Facebook OG, etc.
 
 D) How will you categorize Wiki pages (and possibly other documents)
 against this ontology?
 
 Cheers. -N.
 
 -- Nicolas Torzec Yahoo Labs.
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJUwxUBAAoJECjN6+QThfjzAVgH/RPU8ZF8leU0LXDUjpsqko6W
urUFwo9BPT41YlNMkFN8CvQzWFVEFTVDIgW/V7YXFZW7xNBxz0UocsYypMV5w4l1
QywdjgtLnyIyWqbfL2MKsELChyPVzHy+mt3A6LHPYlICoIWsYET8odcGaJyc+bkZ
USIAeHxAJPBH1UU1E+G1cmRZzXxAcp3hXQTB6HOnkk55Gx/zAjETUDIzYopwszS1
j0WewUWEM6eMWOUYVec5Lgf6It3az6v8KHLNG/lxvq9R/1xETYZ2lXtdjiZF8VpS
Z7j+ojzt2QIm/sXMam3kXPTC14xTwVfqwimGs07RsE83a8wzXn7m3pFXYwtHip8=
=C11o
-END PGP SIGNATURE-

Re: [Dbpedia-discussion] Advancing the DBpedia ontology

2015-01-23 Thread Peter F. Patel-Schneider

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Good points all, I think.

As well I would like to know what expressive power is being considered for
the ontology language.  For example, will disjointness axioms be allowed, or
local ranges, or constraints?

peter

On 01/23/2015 12:23 PM, Nicolas Torzec wrote:
 Hi Dimitris et al.,
 
 A) What is the specific use you have in mind?
 
 B) Are you thinking about a centralized ontology managed by editors, a 
 user-contributed ontology, or an automatically generated taxonomy?
 
 C) How will it relate to other ontologies, taxonomies and schemas? Also,
 will it relate to Wikidata, Wikipedia, schema.org, Facebook OG, etc.
 
 D) How will you categorize Wiki pages (and possibly other documents)
 against this ontology?
 
 Cheers. -N.
 
 -- Nicolas Torzec Yahoo Labs.
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJUwxXRAAoJECjN6+QThfjzHqgH/2EJ4W+ESivc2lINp4JwyI+/
yaFg/+3714GPQSvApPEXuEAW5ssHRhBU68Lq90TcFm+fTjazb0Oc36qy0granfLe
hFYAojBUKP9ZLDnEkiAxheeXPHrRdTK+yeJQI9/IuCpnrxkJsxL2b65Q1Oz0SSS+
DR1qVrDEk5l4JC9oSWiy6UAAs0aJRyxktqV1gYw/PcCY1mE/yEgtL6PQD7K9TkGW
RD0N2jWCI0zsAsp45P2RLC1aNNQ2KMlu6AiHlBm2REvlk3zaMc3liUX+rOGkL7wO
aBCFI0gkCkbfooHiVnjvPvKZPy0WJkFaqbP3n8OzbCkaq+tnDr4F9lzCP74nvSM=
=WQQm
-END PGP SIGNATURE-

Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Peter F. Patel-Schneider


Done.

The goal of a new paper-preparation and display system should, however, be to 
be better than what is currently available.  Most HTML-based solutions do not 
exploit the benefits of HTML, strangely enough.


Consider, for example, citation links.  They generally jump you to the 
references section.  They should instead pop up the reference, as is done in 
Wikipedia.


Similarly for links to figures.  Instead of blindly jumping to the figure, 
they should do something better, perhaps popping up the figure or, if the 
figure is already visible, just highlighting it.


I have put in both of these as issues.

peter

On 10/08/2014 03:18 AM, Sarven Capadisli wrote:

On 2014-10-07 15:44, Peter F. Patel-Schneider wrote:

Well, I remain totally unconvinced that any current HTML solution is as
good as the current PDF setup.  Certainly htlatex is not suitable.
There may be some way to get tex4ht to do better, but no one has
provided a solution. Sarven Capadisli sent me some HTML that looks much
better, but even on a math-light paper I could see a number of
glitches.  I haven't seen anything better than that.


Would you mind creating an issue for the glitches that you are experiencing?

https://github.com/csarven/linked-research/issues

Please mention your environment and the documents you've looked at. Also keep
in mind the LNCS and ACM SIG authoring guidelines. The purpose of the LNCS and
ACM CSS is to adhere to the authoring guidelines so that the the generated PDF
file or print output looks as expected (within reason).

Much appreciated!

-Sarven
http://csarven.ca/#i

Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Peter F. Patel-Schneider




On 10/08/2014 05:31 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


PLOS is an interesting case.  The HTML for PLOS articles is relatively
readable.  However, the HTML that the PLOS setup produces is failing at math,
even for articles from August 2014.

As well, sometimes when I zoom in or out (so that I can see the math better)
Firefox stops displaying the paper, and I have to reload the whole page.


Interesting bug that. Worth reporting to PLoS.


PLoS doesn't appear to have a bug reporting system in place.  Even their 
general assistance email is obsfucated.  I sent them a message anyway.



Strangely, PLOS accepts low-resolution figures, which in one paper I looked at
are quite difficult to read.


Yep. Although, it often provides several links to download higher
res images, including in the original file format. Quite handy.


In this case, even the original was low resolution.


However, maybe the PLOS method can be improved to the point where the HTML is
competitive with PDF.


Indeed. For the moment, HTML views are about 1/5 of PDF. Partly this is
because scientists are used to viewing in print format, I suspect, but
partly not.

I'm hoping that, eventually, PLoS will stop using image based maths. I'd
like to be able to zoom maths independently, and copy and paste it in
either mathml or tex. Mathjax does this now already.


I would suggest that this should have been one of their highest priorities.


Phil



peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider

If you mean that published papers have to be in PDF, but that they can 
optionally have a second format, then I had no problem with this proposal.  I 
also have no problem with encouraging use of other formats.


However, this is an added burden on conference organizers.  Someone would have 
to volunteer to handle the extra work, particularly the work involved in 
checking that papers using the second format abide by the publishing requirements.


peter



On 10/07/2014 05:52 AM, Robert Stevens wrote:



What I'd suggest for conference organisers is something like the following:

1. Keep the PDF as the main thing, as it's not going anywhere soon.
3. Also allow submission in some alternative form, including semantic content,
and have the conference run a competition for alternative publishing forms -
including voting by delegates on what  they like and what they want. this
could promote such alternative forms and offer a migration route over time.

Robert.

On 07/10/2014 13:27, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

So, you believe that there is an excellent set of tools for preparing,
reviewing, and reading scientific publishing.

Package them up and make them widely available.  If they are good, people will
use them.

Convince those who run conferences.  If these people are convinced, then they
will allow their use in conferences or maybe even require their use.

Is that not the point of the discussion?

Unfortuantely, we do not know why ISWC and ESWC insist on PDF.


I'm not convinced by what I'm seeing right now, however.

Sure, but at least the discussion has meant that you have looked at some
of the tools again. That's no bad thing.

My question would be, are more convinced than you were last time you
looked or less?

Phil

Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider


On 10/07/2014 05:27 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


So, you believe that there is an excellent set of tools for preparing,
reviewing, and reading scientific publishing.

Package them up and make them widely available.  If they are good, people will
use them.

Convince those who run conferences.  If these people are convinced, then they
will allow their use in conferences or maybe even require their use.


Is that not the point of the discussion?


Not at all.  Where was the proposal to put together something that met the 
requirements of preparing, reviewing, and publishing scientific papers?


To me, the initial discussion was about how much better HTML was for carrying 
data.  Other aspects of paper preparation, review, and publishing were not 
being considered.  Now, maybe, aspects of presentation and review and ease of 
use are part of the discussion.   A change in the paper submission process 
needs to take into account what the paper submission process is about, not 
just some aspect of what might be included in submitted papers.



Unfortuantely, we do not know why ISWC and ESWC insist on PDF.


As far as I am concerned, ISWC and ESWC insist on PDF for submissions because 
the reviewing process is so much better with PDF than with anything else.



I'm not convinced by what I'm seeing right now, however.


Sure, but at least the discussion has meant that you have looked at some
of the tools again. That's no bad thing.

My question would be, are more convinced than you were last time you
looked or less?


Well, I remain totally unconvinced that any current HTML solution is as good 
as the current PDF setup.  Certainly htlatex is not suitable.  There may be 
some way to get tex4ht to do better, but no one has provided a solution. 
Sarven Capadisli sent me some HTML that looks much better, but even on a 
math-light paper I could see a number of glitches.  I haven't seen anything 
better than that.


It's not as if the basics (MathML, CSS, etc.)  are unavailable to put together 
most, or maybe even all, of an HTML-based solution.  These basics have been 
around for some time now.  However, I haven't seen a setup that is as good as 
LaTeX and PDF for preparation, review, and publishing of scientific papers.


Yes, it took a lot of effort to get to the current state with respect to LaTeX 
and PDF.  In the past, I experienced quite a number of problems with using 
LaTeX and PDF for writing, reviewing, and publishing scientific papers, but 
most of these are in the past.  Yes, there are still some problems with using 
LaTeX and PDF.  Produce something better and people will use it, eventually.



Phil


peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider




On 10/07/2014 05:23 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


On 10/06/2014 11:00 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


On 10/06/2014 09:32 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

Who cares what the authors intend? I mean, they are not reading the
paper, are they?


For reviewing, what the authors intend is extremely important.  Having
different rendering of the paper interfere with the authors' message is
something that should be avoided at all costs.


Really? So, for example, you think that a reviewer with impared vision
should, for example, be forced to review a paper using the authors
rendering, regardless of whether they can read it or not?


No, but this is not what I was talking about. I was talking about
interfering with the authors' message via changes from the rendering
that the authors' set up.


It *is* exactly what you are talking about.


Well, maybe I was not being clear, but I thought that I was talking about
rendering  changes interfering with comprehension of the authors' intent.



And if only you had a definition of rendering changes that interfere
with authors intent as opposed to just rendering changes.

I can guarantee that rendering a paper to speech WILL change at least
some of the authors intent because, for example, figures will not
reproduce. You state that this should be avoided at all costs.

I think this is wrong. There are many reasons to change rendering. That
should be the readers choice.

Phil


I think that for reviewing the authors should be able to dictate how their 
submission looks, within the bounds of the submission requirements.  If the 
reviewer wants, or needs, to change the way a submission is presented then it 
is up to the reviewer to ensure that their review is not coloured by this change.


When I review papers I routinely point out presentation problems.  Sometimes I 
take into account presentation problems when I evaluate papers.  However, I 
try very hard to evaluate the submission based on what the authors submitted, 
not on any changes that I made to the submission.  For example, I will point 
out problems with using colours in graphs, but I will evaluate the paper based 
on the coloured version of the graphs, not a black and white version. 
However, if the authors submitted low-resolution figures and something is 
missing because of this, then I feel free to take this into account in my 
evaluation.


In a situation where I do not know what presentation the authors wanted, for 
example if explicit line breaks and indentation are sometimes preserved, but 
not always, the evaluation of submissions can become very much harder.


peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider




On 10/07/2014 05:20 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


tex4ht takes the slight strange approach of having an strange and
incomprehensible command line, and then lots of scripts which do default
options, of which xhmlatex is one. In my installation, they've only put
the basic ones into the path, so I ran this with
/usr/share/tex4ht/xhmlatex.


Phil



So someone has to package this up so that it can be easily used.  Before then,
how can it be required for conferences?


http://svn.gnu.org.ua/sources/tex4ht/trunk/bin/ht/unix/xhmlatex


Somehow this is not in my tex4ht package.

In any case, the HTML output it produces is dreadful.   Text characters, even 
outside math, are replaced by numeric XML character entity references.


peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider

Sure, I have lots of papers (none for ESWC, though) that could serve as test 
cases.


peter


On 10/07/2014 07:49 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

tex4ht takes the slight strange approach of having an strange and
incomprehensible command line, and then lots of scripts which do default
options, of which xhmlatex is one. In my installation, they've only put
the basic ones into the path, so I ran this with
/usr/share/tex4ht/xhmlatex.


Phil



So someone has to package this up so that it can be easily used.  Before then,
how can it be required for conferences?


http://svn.gnu.org.ua/sources/tex4ht/trunk/bin/ht/unix/xhmlatex


Somehow this is not in my tex4ht package.

In any case, the HTML output it produces is dreadful.   Text characters, even
outside math, are replaced by numeric XML character entity references.



So, I am willing to spend some time getting this to work. I would like
to plug some ESWC papers into tex4ht, to get some HTML which works plain
and also with Sarven's templates so that it *looks* like a PDF.

Would you be willing to a) try it and b) give worked and short test
cases for things that do not work?

Phil

Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Peter F. Patel-Schneider

PLOS is an interesting case. The HTML for PLOS articles is relatively
readable. However, the HTML that the PLOS setup produces is failing at math,
even for articles from August 2014.

As well, sometimes when I zoom in or out (so that I can see the math better)
Firefox stops displaying the paper, and I have to reload the whole page.

Strangely, PLOS accepts low-resolution figures, which in one paper I looked at
are quite difficult to read.

However, maybe the PLOS method can be improved to the point where the HTML is
competitive with PDF.

peter

This makes me think of PLoS. For example, PLoS has a published format
guidelines using Work and Latex (http://www.plosone.org/static/guidelines), a
workflow for semantically structuring their resulting output and their final
output is well structured and available in XML based on a known standard
(http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd), PDF and the
published HTML on their website
(http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011233).

This results In semantically meaningful XML that is transformed to HTML

http://www.plosone.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML
http://www.plosone..org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0011233representation=XML

Interestingly as well, they have provided this framework in an open source form:
http://www.ambraproject.org/

Clearly the publication process can support a semantic solution and when its
in the best interest of the publisher. They will adopt and drive their own
markup processes to meet external demand.

Providing tools that both the publisher and the author may use independently
could simplify such an effort, but is not a main driver in achieving that
final result you see in PLoS. This is especially the case given even the
debate concerning file formats here. For PLoS, the solution that is currently
successful is the one that worked to solve todays immediate local need with
todays tools.

Cheers,
Mark

p.s. Finally, on the reference of moving repositories such as EPrints and
DSpace towards supporting semantic markup of their contents. Being somewhat of
a participant in LoD on the DSpace side, I note that these efforts are
inherently just Repository Centric, describing the the structure of the
repository (IE Collections of Items), not the semantic structure contained
within the Item contents (articles, citations, formulas, data tables, figures,
ideas). In both platforms, these capabilities are in their infancy, lacking
any rendering other than to offer the original file for download, they
ultimately suffer from the absence of semantic structure in the content going
into them.

--
Mark R. Diggory

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider


On 10/06/2014 04:15 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


One problem with allowing HTML submission is ensuring that reviewers can
correctly view the submission as the authors intended it to be viewed.  How
would you feel if your paper was rejected because one of the reviewers could
not view portions of it?  At least with PDF there is a reasonably good chance
that every paper can be correctly viewed by all its reviewers, even if they
have to print it out.  I don't think that the same claim can be made for
HTML-based systems.



I don't think this is a valid point. It is certainly possible to write
HTML that will not be look good on every machine, but these days, it is
easier to write HTML that does.

The same is true with PDF. Font problems used to be routine. And, as
other people have said, it's very hard to write a PDF that looks good on
anything other than paper.


My aesthetics are different.  I routinely view PDFs on my laptop, and find 
that they indeed look great.  As I said before, I prefer PDF to HTML for 
viewing of just about any technical material on my computers.  Yes, on limited 
displays two-column PDF may not be viewable at all.  Single-column PDF should 
look good on displays with resolution of HD or better.


When I view HTML documents, even the ones I have written, I have to do a lot 
of adjusting to get something that looks even half-decent on the screen.  And 
when I print HTML documents, the result is invariably bad, and often very bad.


However, my point was not about looking good.  It was about being able to see 
the paper in the way that the author intended.  My experience is that this is 
generally possible with PDF, but generally not possible with HTML.  I do write 
papers with considerable math in them, so my experience may not be typical, 
but whenever I have tried to produce HTML versions of my papers, I have ended 
up quite frustrated because even I cannot get them to display the way I want 
them to.


It may be that there are now good tools for producing HTML that carries the 
intent of the author.  htlatex has been mentioned in this thread.  A solution 
that uses htlatex would have the benefit of building on much of the work that 
has been done to make latex a reasonable technology for producing papers.  If 
someone wants to create the necessary infrastructure to make htlatex work as 
well as pdflatex does, then feel free.




Further, why should there be any technical preference for HTML at all?  (Yes,
HTML is an open standard and PDF is a closed one, but is there anything else
besides that?)  Web conference vitally use the web in their reviewing and
publishing processes.  Doesn't that show their allegiance to the web?  Would
the use of HTML make a conference more webby?


PDF is, I think, open these days. But, yes, I do think that conferences
should dog food. I mean, what would you think if W3C produced all of
their documents in PDF? Would that make sense?


Actually, I would have been very happy if W3C had produced all its technical 
documents in PDF.  It would have made my life much easier.



Phil



peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider


On 10/06/2014 04:27 AM, Phillip Lord wrote:

[On using htlatex for conferences.]


So, as well as providing a LNCS stylesheet, we'd need a htlatex cf.cfg,
and one CSS and it's done. Be good to have another CSS for on-screen
viewing; LNCS's back of a postage stamp is very poor for that.

Phil


I would be totally astonished if using htlatex as the main way to produce 
conference papers were as simple as this.


I just tried htlatex on my ISWC paper, and the result was, to put it mildly, 
horrible.  (One of my AAAI papers was about the same, the other one caused an 
undefined control sequence and only produced one page of output.)   Several 
parts of the paper were rendered in fixed-width fonts.  There was no attempt 
to limit line length.  Footnotes were in separate files.  Many non-scalable 
images were included, even for simple math.  My carefully designed layout for 
examples was modified in ways that made the examples harder to understand. 
The footnotes did not show up at all in the printed version.


That said, the result was better than I expected.  If someone upgrades htlatex 
to work well I'm quite willing to use it, but I expect that a lot of work is 
going to be needed.


peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider




On 10/06/2014 08:38 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

I would be totally astonished if using htlatex as the main way to produce
conference papers were as simple as this.

I just tried htlatex on my ISWC paper, and the result was, to put it mildly,
horrible.  (One of my AAAI papers was about the same, the other one caused an
undefined control sequence and only produced one page of output.)   Several
parts of the paper were rendered in fixed-width fonts.  There was no attempt
to limit line length.  Footnotes were in separate files.



The footnote thing is pretty strange, I have to agree. Although
footnotes are a fairly alien concept wrt to the web. Probably hover
overs would be a reasonable presentation for this.



Many non-scalable images were included, even for simple math.


It does MathML I think, which is then rendered client side. Or you could
drop math-mode straight through and render client side with mathjax.


Well, somehow png files are being produced for some math, which is a failure. 
 I don't know what the way to do this right would be, I just know that the 
version of htlatex for Fedora 20 fails to reasonably handle the math in this 
paper.



My carefully designed layout for examples was modified in ways that
made the examples harder to understand.


Perhaps this is a key difference between us. I don't care about the
layout, and want someone to do it for me; it's one of the reasons I use
latex as well.


There are many cases where line breaks and indentation are important for 
understanding.  Getting this sort of presentation right in latex is a pain for 
starters, but when it has been done, having the htlatex toolchain mess it up 
is a failure.



That said, the result was better than I expected.  If someone upgrades htlatex
to work well I'm quite willing to use it, but I expect that a lot of work is
going to be needed.


Which gets us back to the chicken and egg situation. I would probably do
this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
end up with the PDF output anyway.


Well, I'm with ESWC and ISWC here.  The review process should be designed to 
make reviewing easy for reviewers.  Until viewing HTML output is as 
trouble-free as viewing PDF output, then PDF should be the required format.



This is why it is important that web conferences allow HTML, which is
where the argument started. If you want something that prints just
right, PDF is the thing for you. If you you want to read your papers in
the bath, likewise, PDF is the thing for you. And that's fine by me (so
long as you don't mind me reading your papers in the bath!). But it
needs to not be the only option.


Why?  What are the benefits of HTML reviewing, right now?  What are the 
benefits of HTML publishing, right now?  If there were HTML-based tools that 
worked well for preparing, reviewing, and reading scientific papers, then 
maybe conferences would use them.  However, conference organizers and 
reviewers have limited time, and are thus going for the simplest solution that 
works well.


If some group thinks that a good HTML-based solution is possible, then let 
them produce this solution.  If the group can get pre-approval of some 
conference, then more power to them.  However, I'm not going to vote for any 
pre-approval of some future solution when the current situation is satisficing.



Phil


peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider




On 10/06/2014 08:29 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

However, my point was not about looking good.  It was about being able to see
the paper in the way that the author intended.


Yes, I understand this. It's not something that I consider at all
important, which perhaps represents our different view points. Readers
have different preferences. I prefer reading in inverse video; I like to
be able to change font size to zoom in and out. I quite like fixed width
fonts. Other people like the two column thing. Other people want things
read to them.

Who cares what the authors intend? I mean, they are not reading the
paper, are they?


For reviewing, what the authors intend is extremely important.  Having 
different rendering of the paper interfere with the authors' message is 
something that should be avoided at all costs.  Similarly for reading papers, 
if the rendering of the paper interferes with the authors' message, that is a 
failure of the process.



I do write papers with considerable math in them, so my experience may
not be typical, but whenever I have tried to produce HTML versions of
my papers, I have ended up quite frustrated because even I cannot get
them to display the way I want them to.


I've been using mathjax on my website for a long time and it seems to
work well, although I am not maths heavy.



It may be that there are now good tools for producing HTML that carries the
intent of the author.  htlatex has been mentioned in this thread.  A solution
that uses htlatex would have the benefit of building on much of the work that
has been done to make latex a reasonable technology for producing papers.  If
someone wants to create the necessary infrastructure to make htlatex work as
well as pdflatex does, then feel free.


It's more to make htlatex work as well as lncs.sty works. htlatex
produces reasonable, if dull, HTML of the bat.


My experience is that htlatex produces very bad output.


Phil


peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider

It's not hard to query PDFs with SPARQL.  All you have to do is extract the 
metadata from the document and turn it into RDF, if needed.  Lots of programs 
extract and display this metadata already.


No, I don't think that viewing this issue from the reviewer perspective is too 
narrow.  Reviewers form  a vital part of the scientific publishing process. 
Anything that makes their jobs harder or the results that they produce worse 
is going to have to have very large benefits over the current setup.  In any 
case, I haven't been looking at the reviewer perspective only, even in the 
message quoted below.


peter

PS:  This is *not* to say that I think that the reviewing process is anywhere 
near ideal.  On the contrary, I think that the reviewing process has many 
problems, particularly as it is performed in CS conferences.



On 10/06/2014 09:19 AM, Martynas Jusevičius wrote:

Dear Peter,

please show me how to query PDFs with SPARQL. Then I'll believe there
are no benefits of XHTML+RDFa over PDF.

Addressing the issue from the reviewer perspective only is too narrow,
don't you think?


Martynas

On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider
pfpschnei...@gmail.com wrote:



On 10/06/2014 08:38 AM, Phillip Lord wrote:


Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


I would be totally astonished if using htlatex as the main way to produce
conference papers were as simple as this.

I just tried htlatex on my ISWC paper, and the result was, to put it
mildly,
horrible.  (One of my AAAI papers was about the same, the other one
caused an
undefined control sequence and only produced one page of output.)
Several
parts of the paper were rendered in fixed-width fonts.  There was no
attempt
to limit line length.  Footnotes were in separate files.




The footnote thing is pretty strange, I have to agree. Although
footnotes are a fairly alien concept wrt to the web. Probably hover
overs would be a reasonable presentation for this.



Many non-scalable images were included, even for simple math.



It does MathML I think, which is then rendered client side. Or you could
drop math-mode straight through and render client side with mathjax.



Well, somehow png files are being produced for some math, which is a
failure.  I don't know what the way to do this right would be, I just know
that the version of htlatex for Fedora 20 fails to reasonably handle the
math in this paper.


My carefully designed layout for examples was modified in ways that
made the examples harder to understand.



Perhaps this is a key difference between us. I don't care about the
layout, and want someone to do it for me; it's one of the reasons I use
latex as well.



There are many cases where line breaks and indentation are important for
understanding.  Getting this sort of presentation right in latex is a pain
for starters, but when it has been done, having the htlatex toolchain mess
it up is a failure.


That said, the result was better than I expected.  If someone upgrades
htlatex
to work well I'm quite willing to use it, but I expect that a lot of work
is
going to be needed.



Which gets us back to the chicken and egg situation. I would probably do
this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
end up with the PDF output anyway.



Well, I'm with ESWC and ISWC here.  The review process should be designed to
make reviewing easy for reviewers.  Until viewing HTML output is as
trouble-free as viewing PDF output, then PDF should be the required format.


This is why it is important that web conferences allow HTML, which is
where the argument started. If you want something that prints just
right, PDF is the thing for you. If you you want to read your papers in
the bath, likewise, PDF is the thing for you. And that's fine by me (so
long as you don't mind me reading your papers in the bath!). But it
needs to not be the only option.



Why?  What are the benefits of HTML reviewing, right now?  What are the
benefits of HTML publishing, right now?  If there were HTML-based tools that
worked well for preparing, reviewing, and reading scientific papers, then
maybe conferences would use them.  However, conference organizers and
reviewers have limited time, and are thus going for the simplest solution
that works well.

If some group thinks that a good HTML-based solution is possible, then let
them produce this solution.  If the group can get pre-approval of some
conference, then more power to them.  However, I'm not going to vote for any
pre-approval of some future solution when the current situation is
satisficing.


Phil



peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider


On 10/06/2014 09:28 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

It does MathML I think, which is then rendered client side. Or you could
drop math-mode straight through and render client side with mathjax.


Well, somehow png files are being produced for some math, which is a failure.


Yeah, you have to tell it to do mathml. The problem is that older
versions of the browsers don't render mathml, and image rendering was
the only option.


Well, then someone is going to have to tell people how to do this.  What I saw 
for htlatex was that it just did the right thing.



I don't know what the way to do this right would be, I just know that the

There are many cases where line breaks and indentation are important for
understanding.  Getting this sort of presentation right in latex is a pain for
starters, but when it has been done, having the htlatex toolchain mess it up
is a failure.


Indeed. I believe that there are plans in future versions of HTML to
introduce a pre tag which prefers indentation and line breaks.



Which gets us back to the chicken and egg situation. I would probably do
this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll
end up with the PDF output anyway.


Well, I'm with ESWC and ISWC here.  The review process should be designed to
make reviewing easy for reviewers.


I *only* use PDF when reviewing. I never use it for viewing anything
else. I only use it for reviewing since I am forced to.

Experiences differ, so I find this a far from compelling argument.


It may not be a compelling argument when choosing between two new 
alternatives, but it is much more compelling argument against change.



This is why it is important that web conferences allow HTML, which is
where the argument started.



Why?  What are the benefits of HTML reviewing, right now?  What are the
benefits of HTML publishing, right now?


Well, we've been through this before, so I'll not repeat myself.

Phil



Yes, and I haven't seen any benefits using the current setup.

peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider


On 10/06/2014 09:32 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

Who cares what the authors intend? I mean, they are not reading the
paper, are they?


For reviewing, what the authors intend is extremely important.  Having
different rendering of the paper interfere with the authors' message is
something that should be avoided at all costs.


Really? So, for example, you think that a reviewer with impared vision
should, for example, be forced to review a paper using the authors
rendering, regardless of whether they can read it or not?


No, but this is not what I was talking about.  I was talking about interfering 
with the authors' message via changes from the rendering that the authors' set up.



Of course, this is an extreme example, although not an unrealistic one.
It is fundamentally any different from my desire as I get older to be
able to change font size and refill paragraphs with ease. I see a
difference of scale, that is all.


I see these as completely different.  There are some aspects of rendering that 
generally do not interfere with intent.  There are other aspects of rendering 
that can easily interfere with intent.



Similarly for reading papers, if the rendering of the paper interferes
with the authors' message, that is a failure of the process.


Yes, I agree. Which is why, I believe, that the rendering of a paper
should be up to the reader


As this is why I believe that the authors' should be able to specify the 
rendering of their paper to the extent that they feel is needed to convey the 
intent of the paper.

.

Phil


peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider

I don't think that scanning a printout retains any metadata that was in the 
electronic source so, no, this would not follow using the same logic.


I do agree that dissemination of results is one of the most important parts of 
the scientific process.  The argument here is, I think, what is the best way 
to support dissemination.


Eating your own dog food, is a separate matter, I think.  Eating your own dog 
good may help with uptake, but on the other hand it may interfere with 
dissemination, by making preparation of papers harder or making them harder to 
review or read.


peter




On 10/06/2014 10:09 AM, Martynas Jusevičius wrote:

Following the same logic, we still could have been using paper
submissions? All you have to do is to scan them to turn them into
PDFs.

It's been a while since I was in the university, but wasn't
dissemination an important part of science? What about dogfooding
after all?


Martynas

On Mon, Oct 6, 2014 at 6:48 PM, Peter F. Patel-Schneider
pfpschnei...@gmail.com wrote:

It's not hard to query PDFs with SPARQL.  All you have to do is extract the
metadata from the document and turn it into RDF, if needed.  Lots of
programs extract and display this metadata already.

No, I don't think that viewing this issue from the reviewer perspective is
too narrow.  Reviewers form  a vital part of the scientific publishing
process. Anything that makes their jobs harder or the results that they
produce worse is going to have to have very large benefits over the current
setup.  In any case, I haven't been looking at the reviewer perspective
only, even in the message quoted below.

peter

PS:  This is *not* to say that I think that the reviewing process is
anywhere near ideal.  On the contrary, I think that the reviewing process
has many problems, particularly as it is performed in CS conferences.



On 10/06/2014 09:19 AM, Martynas Jusevičius wrote:


Dear Peter,

please show me how to query PDFs with SPARQL. Then I'll believe there
are no benefits of XHTML+RDFa over PDF.

Addressing the issue from the reviewer perspective only is too narrow,
don't you think?


Martynas




[...]

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider

Sure.  So extract the text from the PDF and query that.  It also would be nice 
to have access to the LaTeX sources.


What HTML publishing *might* have that is better than the above is to more 
easily embed some extra information into papers that can be queried.  Is this 
just metadata that could also be easily injected into PDFs?  If given this 
capability will a significant number of authors use it?  Is it instead better 
to have a separate document that has the information and not use HTML for 
publishing?


peter




On 10/06/2014 10:42 AM, Alexander Garcia Castro wrote:

It's not hard to query PDFs with SPARQL.  All you have to do is extract the
metadata from the document and turn it into RDF, if needed.  Lots of programs
extract and display this metadata already.

in the age of the web of data why should I restrict my search just to
metadata? I want the full content, open access or not once I have the document
I should be able to mine the content of the document. I dont want to limit my
search just to simple metadata.

On Mon, Oct 6, 2014 at 9:48 AM, Peter F. Patel-Schneider
pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote:

It's not hard to query PDFs with SPARQL.  All you have to do is extract
the metadata from the document and turn it into RDF, if needed.  Lots of
programs extract and display this metadata already.

No, I don't think that viewing this issue from the reviewer perspective is
too narrow.  Reviewers form  a vital part of the scientific publishing
process. Anything that makes their jobs harder or the results that they
produce worse is going to have to have very large benefits over the
current setup.  In any case, I haven't been looking at the reviewer
perspective only, even in the message quoted below.

peter

PS:  This is *not* to say that I think that the reviewing process is
anywhere near ideal.  On the contrary, I think that the reviewing process
has many problems, particularly as it is performed in CS conferences.



On 10/06/2014 09:19 AM, Martynas Jusevičius wrote:

Dear Peter,

please show me how to query PDFs with SPARQL. Then I'll believe there
are no benefits of XHTML+RDFa over PDF.

Addressing the issue from the reviewer perspective only is too narrow,
don't you think?


Martynas

On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider
pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote:



On 10/06/2014 08:38 AM, Phillip Lord wrote:


Peter F. Patel-Schneider pfpschnei...@gmail.com
mailto:pfpschnei...@gmail.com writes:


I would be totally astonished if using htlatex as the main
way to produce
conference papers were as simple as this.

I just tried htlatex on my ISWC paper, and the result was,
to put it
mildly,
horrible.  (One of my AAAI papers was about the same, the
other one
caused an
undefined control sequence and only produced one page of
output.)
Several
parts of the paper were rendered in fixed-width fonts.
There was no
attempt
to limit line length.  Footnotes were in separate files.




The footnote thing is pretty strange, I have to agree. Although
footnotes are a fairly alien concept wrt to the web.
Probably hover
overs would be a reasonable presentation for this.


Many non-scalable images were included, even for simple 
math.



It does MathML I think, which is then rendered client side. Or
you could
drop math-mode straight through and render client side with
mathjax.



Well, somehow png files are being produced for some math, which is a
failure.  I don't know what the way to do this right would be, I
just know
that the version of htlatex for Fedora 20 fails to reasonably
handle the
math in this paper.

My carefully designed layout for examples was modified in
ways that
made the examples harder to understand.



Perhaps this is a key difference between us. I don't care
about the
layout, and want someone to do it for me; it's one of the
reasons I use
latex as well.



There are many cases where line breaks and indentation are
important for
understanding.  Getting this sort of presentation right in latex
is a pain
for starters, but when it has been done

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider




On 10/06/2014 10:44 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


On 10/06/2014 09:28 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

It does MathML I think, which is then rendered client side. Or you could
drop math-mode straight through and render client side with mathjax.


Well, somehow png files are being produced for some math, which is a failure.


Yeah, you have to tell it to do mathml. The problem is that older
versions of the browsers don't render mathml, and image rendering was
the only option.


Well, then someone is going to have to tell people how to do this.  What I saw
for htlatex was that it just did the right thing.



So, htlatex is part of TeX4Ht which does HTML.

If you do xhmlatex then you get XHTML with, indeed, math mode in MathML.
So, for example, this output comes with the default xhmlatex.

math
  xmlns=http://www.w3.org/1998/Math/MathML;
display=inline mi

e/mi mo

class=MathClass-rel=/mo mi

m/mimsupmrow
mi
c/mi/mrowmrow
mn2/mn/mrow/msup
/math


tex4ht takes the slight strange approach of having an strange and
incomprehensible command line, and then lots of scripts which do default
options, of which xhmlatex is one. In my installation, they've only put
the basic ones into the path, so I ran this with
/usr/share/tex4ht/xhmlatex.


Phil



So someone has to package this up so that it can be easily used.  Before then, 
how can it be required for conferences?


I have tex4ht installed, but there is no xhmlatex file to be found.  I managed 
to find what appears to be a good command line


htlatex schema-org-analysis.tex xhtml,mathml  -cunihtf -cvalidate

This looks better when viewed, but the resultant HTML is unintelligible.

There is definitely more work needed here before this can be considered as a 
potential solution.


peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider




On 10/06/2014 11:00 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


On 10/06/2014 09:32 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

Who cares what the authors intend? I mean, they are not reading the
paper, are they?


For reviewing, what the authors intend is extremely important.  Having
different rendering of the paper interfere with the authors' message is
something that should be avoided at all costs.


Really? So, for example, you think that a reviewer with impared vision
should, for example, be forced to review a paper using the authors
rendering, regardless of whether they can read it or not?


No, but this is not what I was talking about. I was talking about
interfering with the authors' message via changes from the rendering
that the authors' set up.


It *is* exactly what you are talking about.


Well, maybe I was not being clear, but I thought that I was talking about 
rendering  changes interfering with comprehension of the authors' intent.


peter

[...]

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider




On 10/06/2014 11:00 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


On 10/06/2014 09:32 AM, Phillip Lord wrote:

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:

Who cares what the authors intend? I mean, they are not reading the
paper, are they?


For reviewing, what the authors intend is extremely important.  Having
different rendering of the paper interfere with the authors' message is
something that should be avoided at all costs.


Really? So, for example, you think that a reviewer with impared vision
should, for example, be forced to review a paper using the authors
rendering, regardless of whether they can read it or not?


No, but this is not what I was talking about. I was talking about
interfering with the authors' message via changes from the rendering
that the authors' set up.


It *is* exactly what you are talking about. If I want to render your
document to speech, then why should I not? What I am saying is that,
you, the author, should not wish to constrain the rendering, only really
the content. Effectively, if you are using latex, you are already doing
this, since latex defines the layout and not you.

But, I think we are talking in too abstract a term here. Should you be
able to constrain indentation for code blocks? Yes, of course, you
should. But, a quick look at the web shows that people do this all the
time.


Sure, and htlatex appears to interfere with this indentation. At least it does 
in my ISWC paper.



Similarly for reading papers, if the rendering of the paper interferes
with the authors' message, that is a failure of the process.


Yes, I agree. Which is why, I believe, that the rendering of a paper
should be up to the reader


As this is why I believe that the authors' should be able to specify the
rendering of their paper to the extent that they feel is needed to convey the
intent of the paper.


For scientific papers, I think this really is not very far. I mean, a
scientific paper is not a fashion store; it's a story designed to
persuade with data.

I would like to see papers which are in the hands of the reader as much
as possible. Citation format should be for the reader. Math
presentation. Graphs should be interactive and zoomable, with the data
underneath as CSV.

All of these are possible and routine with HTML now. I want to be free
to choose the organisation of my papers so that I can convey what I
want. At the moment, I cannot. The PDF is not reasonable for all, maybe
not even most of this. But some.

Phil


So, you believe that there is an excellent set of tools for preparing, 
reviewing, and reading scientific publishing.


Package them up and make them widely available.  If they are good, people will 
use them.


Convince those who run conferences.  If these people are convinced, then they 
will allow their use in conferences or maybe even require their use.


I'm not convinced by what I'm seeing right now, however.

peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider




On 10/06/2014 11:03 AM, Kingsley Idehen wrote:

On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:

It's not hard to query PDFs with SPARQL.  All you have to do is extract the
metadata from the document and turn it into RDF, if needed. Lots of programs
extract and display this metadata already.


Peter,

Having had 200+ (some-non-rdf-doc} to RDF document transformers built under my
direct guidance, there are issues with your claim above:


Huh?  Every single PDF reader that I use can extract the PDF metadata and 
display it.  The metadata that I see in PDF documents uses a core set of 
properties that are easy to transform into RDF.  Of course, this core set is 
very small (title, author, and a few other things) so you don't get all that 
much out of the core set.




1. The extractors are platform specific -- AWWW is about platform agnosticism
(I don't want to mandate an OS for experiencing the power of Linked Open Data
transformers / rdfizers)


Well, the extractors would be specific to PDF, but that's hardly surprising, I 
think.



2. It isn't solely about metadata  -- we also have raw data inside these
documents confined to Tables, paragraphs of sentences


Well, sure, but is extracting information directly from the figures or tables 
or text being considered here?  I sure would like this to be possible.  How 
would it work in an HTML context?



3. If querying a PDF was marginally simple, I would be demonstrating that
using a SPARQL results URL in response to this post :-)


I'm not saying that it is so simple.  You do have to find the metadata block 
in the PDF and then look for the /Title, /Author, ... stuff.



Possible != Simple and Productive.


Yes, but there are lots of tools that display PDF metadata, so there are some 
who believe that the benefit is greater than the cost.



We want to leverage the productivity and simplicity that AWWW brings to data
representation, access, interaction, and integration.


Sure, but the additional costs, if any, on paper authors, reviewers, and 
readers have to be considered.  If these costs are eliminated or at least 
minimized then this good is much more likely to be realized.


peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Peter F. Patel-Schneider

Neat.  This could be extended to putting a full table of contents into the 
metadata, and in lots of other ways.  The other nice thing about it is that it 
would be possible to push the same data through a LaTeX to HTML toolchain for 
those who want HTML output.


peter

On 10/06/2014 03:18 PM, Norman Gray wrote:


Greetings.

On 2014 Oct 6, at 19:19, Alexander Garcia Castro alexgarc...@gmail.com wrote:


querying PDFs is NOT simple and requires a lot of work -and usually
produces lots of errors. just querying metadata is not enough. As I said
before, I understand the PDF as something that gives me a uniform layout.
that is ok and necessary, but not enough or sufficient within the context
of the web of data and scientific publications. I would like to have the
content readily available for mining purposes. if I pay for the publication
I should get access to the publication in every format it is available. the
content should be presented in a way so that it makes sense within the web
of data.  if it is the full content of the paper represented in RDF or XML
fine. also, I would like to have well annotated content, this is simple and
something that could quite easily be part of existing publication
workflows. it may also be part of the guidelines for authors -for instance,
identify and annotate rhetorical structures.



The following might add something to this conversation.

It illustrates getting the metadata from a LaTeX file, putting it into an XMP 
packet in a PDF, and getting it out of the PDF as RDF.  Pace Peter's mention of 
/Author, /Title, etc, this just focuses on the XMP packet.

This has the document metadata, the abstract, and an illustrative bit of 
argumentation.  Adding details about the document structure, and (RDF) pointers 
to any figures would be feasible, as would, I suspect, incorporating CSV files 
directly into the PDF.  Incorporating \begin{tabular} tables would be rather 
tricky, but not impossible.  I can't help feeling that the XHTML+RDFa 
equivalent would be longer and need more documentation to instruct the author 
where to put the RDFa magic.

It's not very fancy, and still has rough edges, but it only took me 100 
minutes, from a standing start.

Generating and querying this PDF seems pretty simple to me.




[...]

scientific publishing process (was Re: Cost and access)

2014-10-03 Thread Peter F. Patel-Schneider

In my opinion PDF is currently the clear winner over HTML in both the ability 
to produce readable documents and the ability to display readable documents in 
the way that the author wants them to display.  In the past I have tried 
various means to produce good-looking HTML and I've always gone back to a 
setup that produces PDF.  If a document is available in both HTML and PDF I 
almost always choose to view it in PDF.  This is the case even though I have 
particular preferences in how I view documents.


If someone wants to change the format of conference submissions, then they are 
going to have to cater to the preferences of authors, like me, and reviewers, 
like me.  If someone wants to change the format of conference papers, then 
they are going to have to cater to the preferences of authors, like me, 
attendees, like me, and readers, like me.


I'm all for *better* methods for preparing, submitting, reviewing, and 
publishing conference (and journal) papers.  So go ahead, create one.  But 
just saying that HTML is better than PDF in some dimension, even if it were 
true, doesn't mean that HTML is better than PDF for this purpose.


So I would say that the semantic web community is saying that there are better 
formats and tools for creating, reviewing, and publishing scientific papers 
than HTML and tools that create and view HTML.  If there weren't these better 
ways then an HTML-based solution might be tenable, but why use a worse 
solution when a better one is available?


peter





On 10/03/2014 08:02 AM, Phillip Lord wrote:
[...]


As it stands, the only statement that the semantic web community are
making is that web formats are too poor for scientific usage.

[...]


Phil

Re: scientific publishing process (was Re: Cost and access)

2014-10-03 Thread Peter F. Patel-Schneider

One problem with allowing HTML submission is ensuring that reviewers can 
correctly view the submission as the authors intended it to be viewed.  How 
would you feel if your paper was rejected because one of the reviewers could 
not view portions of it?  At least with PDF there is a reasonably good chance 
that every paper can be correctly viewed by all its reviewers, even if they 
have to print it out.  I don't think that the same claim can be made for 
HTML-based systems.


Further, why should there be any technical preference for HTML at all?  (Yes, 
HTML is an open standard and PDF is a closed one, but is there anything else 
besides that?)  Web conference vitally use the web in their reviewing and 
publishing processes.  Doesn't that show their allegiance to the web?  Would 
the use of HTML make a conference more webby?


peter


On 10/03/2014 09:11 AM, Phillip Lord wrote:



In my opinion, the opposite is true. PDF I almost always end up printing
out. This isn't the point though.

Necessity is the mother of invention. In the ideal world, a web
conference would allow only HTML submission. Failing that, at least HTML
submission. But, currently, we cannot submit HTML at all. What is the
point of creating a better method, if we can't use it?

The only argument that seems at all plausible to me is, well, we've
always done it like this, and it's too much effort to change. I could
appreciate that.

Anyway, the argument is going round in circles.

Peter F. Patel-Schneider pfpschnei...@gmail.com writes:


In my opinion PDF is currently the clear winner over HTML in both the ability
to produce readable documents and the ability to display readable documents in
the way that the author wants them to display.  In the past I have tried
various means to produce good-looking HTML and I've always gone back to a
setup that produces PDF.  If a document is available in both HTML and PDF I
almost always choose to view it in PDF.  This is the case even though I have
particular preferences in how I view documents.

If someone wants to change the format of conference submissions, then they are
going to have to cater to the preferences of authors, like me, and reviewers,
like me.  If someone wants to change the format of conference papers, then
they are going to have to cater to the preferences of authors, like me,
attendees, like me, and readers, like me.

I'm all for *better* methods for preparing, submitting, reviewing, and
publishing conference (and journal) papers.  So go ahead, create one.  But
just saying that HTML is better than PDF in some dimension, even if it were
true, doesn't mean that HTML is better than PDF for this purpose.

So I would say that the semantic web community is saying that there are better
formats and tools for creating, reviewing, and publishing scientific papers
than HTML and tools that create and view HTML.  If there weren't these better
ways then an HTML-based solution might be tenable, but why use a worse
solution when a better one is available?

peter





On 10/03/2014 08:02 AM, Phillip Lord wrote:
[...]


As it stands, the only statement that the semantic web community are
making is that web formats are too poor for scientific usage.

[...]


Phil

Re: scientific publishing process (was Re: Cost and access)

2014-10-03 Thread Peter F. Patel-Schneider




On 10/03/2014 10:25 AM, Diogo FC Patrao wrote:



On Fri, Oct 3, 2014 at 1:38 PM, Peter F. Patel-Schneider
pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote:

One problem with allowing HTML submission is ensuring that reviewers can
correctly view the submission as the authors intended it to be viewed.
How would you feel if your paper was rejected because one of the reviewers
could not view portions of it?  At least with PDF there is a reasonably
good chance that every paper can be correctly viewed by all its reviewers,
even if they have to print it out.  I don't think that the same claim can
be made for HTML-based systems.



The majority of journals I'm familiar with mandates a certain format for
submission: font size, figure format, etc. So, in a HTML format submission,
there should be rules as well, a standard CSS and the right elements and
classes. Not different from getting a word(c) or latex template.


This might help.  However, someone has to do this, and ensure that the result 
is generally viewable.



Web conference vitally use the web in their reviewing and publishing
processes.  Doesn't that show their allegiance to the web?  Would the use
of HTML make a conference more webby?


As someone said, this is leading by example.


Yes, but what makes HTML better for being webby than PDF?



dfcp



peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-03 Thread Peter F. Patel-Schneider


Does ease of processing make something more webby?

If so, LaTeX should be preferred to HTML.

peter


On 10/03/2014 02:01 PM, john.nj.dav...@bt.com wrote:

 Yes, but what makes HTML better for being webby than PDF?
Because it is a mark-up language (albeit largely syntactic) which makes it much 
more amenable to machine processing?

-Original Message-
From: Peter F. Patel-Schneider [mailto:pfpschnei...@gmail.com]
Sent: 03 October 2014 21:15
To: Diogo FC Patrao
Cc: Phillip Lord; semantic-...@w3.org; public-lod@w3.org
Subject: Re: scientific publishing process (was Re: Cost and access)



On 10/03/2014 10:25 AM, Diogo FC Patrao wrote:



On Fri, Oct 3, 2014 at 1:38 PM, Peter F. Patel-Schneider
pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote:

 One problem with allowing HTML submission is ensuring that reviewers can
 correctly view the submission as the authors intended it to be viewed.
 How would you feel if your paper was rejected because one of the reviewers
 could not view portions of it?  At least with PDF there is a reasonably
 good chance that every paper can be correctly viewed by all its reviewers,
 even if they have to print it out.  I don't think that the same claim can
 be made for HTML-based systems.



The majority of journals I'm familiar with mandates a certain format
for
submission: font size, figure format, etc. So, in a HTML format
submission, there should be rules as well, a standard CSS and the
right elements and classes. Not different from getting a word(c) or latex 
template.


This might help.  However, someone has to do this, and ensure that the result 
is generally viewable.



 Web conference vitally use the web in their reviewing and publishing
 processes.  Doesn't that show their allegiance to the web?  Would the use
 of HTML make a conference more webby?


As someone said, this is leading by example.


Yes, but what makes HTML better for being webby than PDF?



dfcp



 peter

Re: scientific publishing process (was Re: Cost and access)

2014-10-03 Thread Peter F. Patel-Schneider


Hmm.  Are these semantic?  All these seem to do is to signal parts of a 
document.

What I would consider to be semantic would be a way of extracting the 
mathematical content of a document.


peter


On 10/03/2014 02:32 PM, Diogo FC Patrao wrote:

html5 has so-called semantic tags, like header, section.



--
diogo patrão



On Fri, Oct 3, 2014 at 6:01 PM, john.nj.dav...@bt.com
mailto:john.nj.dav...@bt.com wrote:

 Yes, but what makes HTML better for being webby than PDF?
Because it is a mark-up language (albeit largely syntactic) which makes it
much more amenable to machine processing?

-Original Message-
From: Peter F. Patel-Schneider [mailto:pfpschnei...@gmail.com
mailto:pfpschnei...@gmail.com]
Sent: 03 October 2014 21:15
To: Diogo FC Patrao
Cc: Phillip Lord; semantic-...@w3.org mailto:semantic-...@w3.org;
public-lod@w3.org mailto:public-lod@w3.org
Subject: Re: scientific publishing process (was Re: Cost and access)



On 10/03/2014 10:25 AM, Diogo FC Patrao wrote:
 
 
  On Fri, Oct 3, 2014 at 1:38 PM, Peter F. Patel-Schneider
  pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com
mailto:pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote:
 
  One problem with allowing HTML submission is ensuring that
reviewers can
  correctly view the submission as the authors intended it to be 
viewed.
  How would you feel if your paper was rejected because one of the
reviewers
  could not view portions of it?  At least with PDF there is a 
reasonably
  good chance that every paper can be correctly viewed by all its
reviewers,
  even if they have to print it out.  I don't think that the same
claim can
  be made for HTML-based systems.
 
 
 
  The majority of journals I'm familiar with mandates a certain format
  for
  submission: font size, figure format, etc. So, in a HTML format
  submission, there should be rules as well, a standard CSS and the
  right elements and classes. Not different from getting a word(c) or
latex template.

This might help.  However, someone has to do this, and ensure that the
result is generally viewable.
 
 
  Web conference vitally use the web in their reviewing and publishing
  processes.  Doesn't that show their allegiance to the web?  Would
the use
  of HTML make a conference more webby?
 
 
  As someone said, this is leading by example.

Yes, but what makes HTML better for being webby than PDF?

 
  dfcp
 
 
 
  peter

Re: How to avoid that collections break relationships

2014-04-12 Thread Peter F. Patel-Schneider



On 04/09/2014 12:57 AM, Ruben Verborgh wrote:

What then is RDF for you?

The Resource Description Framework.
It is a framework to describe resources,
and this includes predicates.
Anybody can define predicates the way they want,
otherwise RDF is useless to express semantics.



Ok, I describe ex:BaseballPlayer as
ex:BaseballPlayer owl:equivalentClass _:x .
_:x owl:intersectionOf ( ex:Person  [ owl:onProperty ex:plays; owl:hasValue 
ex:Baseball ] )


Is this RDF?  Should all consumers of RDF understand all of this?

For example, do you consider N3 to be RDF?

No, quantification is not part of RDF.


Why not?  I could certainly define an encoding of quanfification in RDF and 
use it to define predicates.



Can predicates have non-local effects?

A predicate indicates a relationship between an object and a subject.
What this relationship means is described in the ontology to which the 
predicate belongs.

Predicates may not influence non-related triples,
however, other triples might be influenced through a cascade of relations.


Why not?  I can define predicates however I want, after all?



What does using owl:differentFrom in RDF commit you to?

It says that two things are different.
Clients that can interpret this predicate can apply its meaning.
This application does not change the model.


What model?  Do  you mean that all you care about is the abstract syntax?  
What about rdf:type?  What about rdfs:domain?  Do all consumers of RDF need to 
commit to the standard meaning of these predicates?



To me, what RDF does not do is just as important and what it does do.  This 
means that RDF captures only the RDF bit of the meaning of predicates - the 
rest of their meaning remains inaccessible from RDF.  Any attempt to go beyond 
this is … going beyond RDF and it is very important do realize this.

RDF is just the model. Giving a predicate meaning is not extending the model.



How so?  What else is giving a predicate meaning besides extending the  model?


Best,

Ruben


I am really struggling to understand your view of RDF.


peter

Re: How to avoid that collections break relationships

2014-04-12 Thread Peter F. Patel-Schneider



On 04/12/2014 05:20 PM, Ruben Verborgh wrote:

Hi Peter,


Ok, I describe ex:BaseballPlayer as
ex:BaseballPlayer owl:equivalentClass _:x .
_:x owl:intersectionOf ( ex:Person  [ owl:onProperty ex:plays; owl:hasValue 
ex:Baseball ] )

Is this RDF?

Yes.

I would say that this is OWL in RDF clothing.


Should all consumers of RDF understand all of this?

Yes, depending on your interpretation of understand.
All of them should parse the triples. This is where RDF ends.
This I totally disagree with.  RDF is much more than just triples. RDF  
includes a meaning for triples.


Those that can interpret OWL will be able to infer additional things.
This is OWL and not part of the RDF model
(and thus also not extending the RDF model).

 h1Baseball player/h1
doesn't extend HTML.
It just applies HTML to describe a baseball player.


As using ex:BaseballPlayer doesn't extend RDF.

However, using owl:disjointWith as a predicate in triples, and expecting it to 
have some relationship to disjointness of RDF class extensions, is an 
extension of RDF.



No, quantification is not part of RDF.

Why not?

It is not in the spec.


But you appear to be using only part of the RDF spec.  Why just that part and 
not the whole RDF spec?  If *you* leave parts out, surely it is just as 
legitimate for *'me* to add parts.





I could certainly define an encoding of quanfification in RDF and use it to 
define predicates.

You indeed can.


Predicates may not influence non-related triples,
however, other triples might be influenced through a cascade of relations.

Why not?  I can define predicates however I want, after all?

Because, by definition of related,
if your predicate is defined to influence a certain (kind of) triple,
that triple is related to the usage of the predicate.


Sure, but if I can add things to the RDF spec, then I could add something like:

The triple a b c means that all subclassOf relationships are strict.




What does using owl:differentFrom in RDF commit you to?

It says that two things are different.
Clients that can interpret this predicate can apply its meaning.
This application does not change the model.

What model?

The RDF model.


Do  you mean that all you care about is the abstract syntax?

No.


But, but, but, isn't that what you said above?  All that counts is triples, 
i.e., the abstract syntax.



What about rdf:type?  What about rdfs:domain?  Do all consumers of RDF need to 
commit to the standard meaning of these predicates?

Yes.


But this goes beyond triples.



RDF is just the model. Giving a predicate meaning is not extending the model.


How so?  What else is giving a predicate meaning besides extending the  model?

It defines something on top of the model.
Building a home with bricks does not extend the bricks; it uses them.


Yes, sure, which is why using rdfs:domain to infer rdf:type triples is not 
going beyond RDF(S).  However, using owl:sameAs as equality and inferring 
other triples from this is going beyond RDF(S).  It's just like turning a 
little brick into a long I-beam - you are no longer working with a little brick.



I am really struggling to understand your view of RDF.

Likewise. But maybe further discussing this doesn't really help the community.
My view on RDF works for what I want to do and in my opinion, it's by no means 
an unreasonable view.
But there might be other views… and that might just be fine.


Well that's a bit debatable.  Standards, even W3C standards, are there so that 
there is commonality of understanding.  If different people take different 
views of RDF, then its utility is weakened, particularly if everyone still 
thinks that they are all using the same thing.


My view is that getting these differences of opinion out in the open is very 
helpful.


My view is that RDF is defined by the W3C RDF recommendation and that going 
beyond the inferences sanctioned here is no longer RDF.






Best,

Ruben


Petter

Re: Inference for error checking [was Re: How to avoid that collections break relationships]

2014-04-06 Thread Peter F. Patel-Schneider

Well, certainly, one could do this if one wanted to.  However, is this a 
useful thing to do, in general, particularly in the absence of constructs that 
actually sanction the inferenceand particularly if the checking is done in a 
context where there is no way of actually getting the author to fix whatever 
problems are encountered?


My feelings are that if you really want to do this, then the place to do it 
isduring data entry or data importation.



peter

On 04/03/2014 03:12 PM, David Booth wrote:

First of all, my sincere apologies to Pat, Peter and the rest of the
readership for totally botching my last example, writing domain when
I meant range *and* explaining it wrong.  Sorry for all the confusion it 
caused!


I was simply trying to demonstrate how a schema:domainIncludes
assertion could be useful for error checking even if it had no
formal entailments, by making selective use of the CWA.  I'll
try again.

Suppose we are given these RDF statements, in which the author
*may* have made a typo, writing ddd instead of ccc as the rdf:type
of x:

  x ppp y .   # Triple A
  x rdf:type ddd .# Triple B
  ppp schema:domainIncludes ccc.  # Triple C

As given, these statements are consistent, so a reasoner
will not detect a problem.  Indeed, they may or may
not be what the author intended.  If the author later
added the statement:

  ccc owl:equivalentClass ddd .   # Triple E

then ddd probably was what the author intended
in triple B.  OTOH if the author later added:

  ccc owl:disjointWith ddd .  # Triple F

then ddd probably was not what the author intended
in triple B.

However, thus far we are only given triples {A,B,C}
above, and an error checker wishes
to check for *potential* typos by applying the rule:

  For all subgraphs of the form

{ x ppp y .
  ppp schema:domainIncludes ccc . }

  check whether

 { x rdf:type ccc . }

  is *provably* true.  If not, then fail the
  error check.  If all such subgraphs pass, then
  the error check as a whole passes.

Under the OWA, the requirement:

 { x rdf:type ccc . }

is neither provably true nor provably false given
graph {A,B,C}.  But under the CWA it is
considered false, because it is not provably true.

This is how the schema:domainIncludes can be
useful for error checking even if it has no formal
entailments: it tells the error checker which
cases to check.

I hope that now makes more sense.   Again, sorry to
have screwed up my example so badly last time, and
I hope I've got it right this time.  :)

David


On 04/02/2014 11:42 PM, Pat Hayes wrote:


On Mar 31, 2014, at 10:31 AM, David Booth da...@dbooth.org wrote:


On 03/30/2014 03:13 AM, Pat Hayes wrote:

[ , . . ]
What follows from knowing that

ppp schema:domainIncludes ccc . ?

Suppose you know this and you also know that

x ppp y .

Can you infer x rdf:type ccc? I presume not, since the domain might
include other stuff outside ccc. So, what *can* be inferred about the
relationship between x and ccc ? As far as I can see, nothing can be
inferred. If I am wrong, please enlighten me. But if I am right, what
possible utility is there in even making a schema:domainIncludes
assertion?

If inference is too strong, let me weaken my question: what
possible utility **in any way whatsoever** is provided by knowing
that schema:domainIncludes holds between ppp and ccc? What software
can do what with this, that it could not do as well without this?


I think I can answer this question quite easily, as I have seen it come up 
before in discussions of logic.


...


Note that this categorization typically relies on making a closed world 
assumption (CWA), which is common for an application to make for a 
particular purpose -- especially error checking.


Yes, of course. If you make the CWA with the information you have, then

ppp schema:domainIncludes ccc .

has exactly the same entailments as

ppp rdfs:domain ccc .

has in RDFS without the CWA. But that, of course, begs the question. If you 
are going to rely on the CWA, then (a) you are violating the basic 
assumptions of all Web notations and (b) you are using a fundamentally 
different semantics. And see below.


None of this has anything to do with a distinction between entailment and 
error checking, by the way. Your hypothetical three-way classification task 
uses the same meanings of the RDF as any other entailment task would.




In this example, let us suppose that to pass, the object of every 
predicate must be in the Known Domain of that predicate, where the Known 
Domain is the union of all declared schema:domainIncludes classes for that 
predicate.   (Note the CWA here.)


Given this error checking objective, if a system is given the facts:

  x ppp y .
  y a ccc .

then without also knowing that ppp schema:domainIncludes ccc, the system 
may not be able to determine that these statements should be considered 
Passed or Failed: the result may be Indeterminate.  But if the system is 
also told

Re: How to avoid that collections break relationships

2014-04-03 Thread Peter F. Patel-Schneider



On 03/31/2014 01:59 PM, Ruben Verborgh wrote:

In actuality, defining things like owl:sameAs is indeed extending RDF.  
Defining things in terms of OWL connectives also goes beyond RDF. This is 
different from introducing domain predicates like foaf:friends.   (Yes, it is 
sometimes a bit hard to figure out which side of the line one is on.)

Thanks for clarifying, and this is indeed where we disagree.
For me, such a line does not exist, nor was it ever defined.
And even if there were, I don't see the need to draw it.

RDF is the framework, the interpretation is semantics.
All predicates have meaning associated with them,
none has “more” meaning than the other;
maybe some usually allow to infer more triples,
but that doesn't change the framework at all.

Cheers,

Ruben


What then is RDF for you?   For example, do you consider N3 to be RDF?  Can 
the predicates be modal operators?  Can predicates have non-local effects?  
What does using owl:differentFrom in RDF commit you to?


To me, what RDF does not do is just as important and what it does do.  This 
means that RDF captures only the RDF bit of the meaning of predicates - the 
rest of their meaning remains inaccessible from RDF.  Any attempt to go beyond 
this is ... going beyond RDF and it is very important do realize this.


peter

Re: Inference for error checking [was Re: How to avoid that collections break relationships]

2014-04-02 Thread Peter F. Patel-Schneider



On 03/31/2014 01:39 PM, David Booth wrote:

On 03/31/2014 11:59 AM, Peter F. Patel-Schneider wrote:

[...]

Given this error checking objective, if a system is given the facts:

  x ppp y .
  y a ccc .

then without also knowing that ppp schema:domainIncludes ccc, the
system may not be able to determine that these statements should be
considered Passed or Failed: the result may be Indeterminate. But if
the system is also told that

  ppp schema:domainIncludes ccc .

then it can safely categorize these statements as Passed (within the
limits of this error checking).

Sure, but it can be very tricky to determine just what facts to consider
when making this determination, particularly with the upside-down nature
of schema:domainIncludes


My assumption in this example is that the application already has a set of 
assertions that it intends to work with, and it wishes to error check them.


It is quite tricky to figure out what this set of assertions should be?  For 
example, are consequences of other facts allowed?  All of them?




Thus, although schema:domainIncludes does not enable any new
entailments under the open world assumption (OWA), it *does* enable
some useful error checking inference under the closed world assumption
(CWA), by enabling a shift from Indeterminate to Passed or Failed.

The CWA actually works against you here.  Given the following triples,

x ppp y .   # Triple A
y rdf:type ddd .# Triple B
ppp schema:domainIncludes ccc.  # Triple C

you are determining whether

y rdf:type ccc. # Triple E

is entailed, whether its negation is entailed, or neither.  The relevant
CWA would push these last two together, making it impossible to have a
three-way determination, which you want.


I don't think that's quite it.  The error check that I described is not the 
same as checking whether NOT(y rdf:type ccc) is entailed.  (Such a 
conclusion could be entailed if there were an owl:disjointWith assertion, 
for example.)  It is checking whether (y rdf:type KnownDomain(ppp)).  In 
other words, the CWA is not being made in testing whether (y rdf:type ccc); 
rather it is being made in computing KnownDomain(ppp).


Huh?  What is this KnownDomain construct?  Where does it come from? How is it 
computed?


The net effect of this is that the CWA is being used to distinguish between 
cases that would all be considered unknown under the OWA.


I still don't see a play for the CWA here.


David


peter

Re: How to avoid that collections break relationships

2014-03-31 Thread Peter F. Patel-Schneider


I don't see how this solves the problem.

Even if you augment RDF with this set construct (hydra:memberOf pointing to a 
separate document containing a collection of entities), what you are saying is 
thatMarkus knows some entity, and that that entity belongs to the set. It does 
not say which of the members of the set are friends of Markus, nor that any 
more than one of them has to be.


peter

On 03/31/2014 01:34 AM, Ruben Verborgh wrote:

Dear all,

Sorry for hacking the discussion, but I think we should keep the discussion 
goal-focused.
So let's therefore see what we want to achieve:
1. Having a way for clients to find out the members of a specific collection
2. Not breaking the RDF model while doing so
A solution that satisfies 1 and 2 with minimal effort is good enough for Hydra;
the rest can be discussed more deeply in other places.

The easiest solution I could come up with that satisfies the above criteria is 
the following.
Suppose a client needs to find Markus' friends, and the server use foaf:knows 
for that
(which has the restrictive range foaf:Person, disallowing a collection).
If the representation contains all of Markus' friends, then it could look like:

 /people/markus foaf:knows /people/Anna.
 /people/markus foaf:knows /people/Bert.
 /people/markus foaf:knows /people/Carl.

Now, more interestingly, if the list of Markus' friends is available
as a separate resource /people/markus/friends, then it could look like:

 /people/markus foaf:knows [ hydra:memberOf /people/markus/friends ].

So we say that a blank node is one of Markus' friends, and where it can be 
found.
This satisfies 1, because the client can follow the link and find all friends 
there.
This satisfies 2, because the blank node is an actual person, not a collection.
And that is all we need for hypermedia clients to work.

Yes, I know this does not add a whole bunch of extensive semantics
we might need for specific case X, but:
   a) that's not necessary in general; a Hydra client has all it needs;
   b) this solution is extensible to allow for that.
If you like, you can add details about /people/markus/friends,
say that they all have a foaf:knows relationship to /people/markus etc.

Summarized: look at the minimum client needs, implement that;
only thing we need is a blank node and a memberOf predicate.
Hydra clients work; the model is happy too.

Best,

Ruben

PS The only case this slightly breaks the model is if Markus has no friends yet;
then you say Markus knows somebody while he actually doesn't.
But saying something doesn't exists is a problem in RDF anyway.
The easy way: just don't include any foaf:knows triple (or ignore slight 
breakage).

If you insist to include _something_, we'd need to do have an explicit empty 
list:
 /people/markus foaf:knowsList ().
 foaf:knowsList hydra:listPropertyOf foaf:knows.
But then we'd be stuck with twice as many properties, which is not ideal either.

Re: How to avoid that collections break relationships

2014-03-31 Thread Peter F. Patel-Schneider



On 03/31/2014 08:02 AM, Ruben Verborgh wrote:

Hi Peter,


I don't see how this solves the problem.

Recall that “the problem” for Hydra clients is (in my opinion):

1. Having a way to find out the members of a specific collection
2. Not breaking the RDF model while doing so


what you are saying is thatMarkus knows some entity, and that that entity 
belongs to the set.

…and I give that set a URL, too.


It does not say which of the members of the set are friends of Markus, nor that 
any more than one of them has to be

…on purpose, since it is not needed at all to solve 1 and 2.

 /people/markus foaf:knows [ hydra:memberOf /people/markus/friends ].

Gives you the URL of the set, which allows you to retrieve it.
Then if you retrieve it, it will list its members and say they are friends of 
Markus:

/people/markus foaf:knows /people/Anna.
/people/markus foaf:knows /people/Bert.
/people/markus foaf:knows /people/Carl.


But this is violating both the spirit and the letter of RDF.   It would be 
better to introduce entirely new syntactic mechanisms, for example, something like


/people/markus foaf:knows **http://.../people/markus/friends**

which could be read as shorthand for replacing the **...** with an object list 
containing the objects in the document being pointed at.



This is exactly the reason I propose this approach;
I know many people want more semantics in there;
and that's totally fine and even possible with this approach
(like I said, you can further describe /people/markus/friends if you like.)

But all we need for Hydra clients is a way to get Markus' friends.
This solution offers it. No more semantics needed.


Huh?  If you want to be in the RDF camp, you have to play by RDF rules.  
However, maybe all you want is something that is syntactically RDF.  In that 
case there are a multitude of solutions.  Yours might be somewhat better than 
the other ones, but the differences look rather superficial when viewed 
through an RDF lens.



Best,

Ruben


peter

Re: How to avoid that collections break relationships

2014-03-31 Thread Peter F. Patel-Schneider



On 03/31/2014 08:29 AM, Ruben Verborgh wrote:

Hi Peter,

This is why I started by saying the focus of the discussion should be on what 
we want to achieve.
With my proposed solution, it is achieved.
Furthermore, this solution allows you to add any metadata you might like;
a Hydra client just wouldn't need it (even though others might).
Right now, don't need anything else than just finding the members of a 
collection.


But this is violating both the spirit and the letter of RDF.   It would be 
better to introduce entirely new syntactic mechanisms

A new syntax would break everything that exists. How is that better?
The proposed approach doesn't break anything and achieves what we need,
without violating the RDF model.


Huh?  If you want to be in the RDF camp, you have to play by RDF rules.

And we do that.

/people/markus foaf:knows [ hydra:memberOf /people/markus/friends ].

 means “Markus knows somebody who is a member of collection X.


But that's not what this says.  It says that Markus knows some entity that is 
related by an unknown relationship to some unknown other entity.


Check that collection X to find out if Markus knows more of them.
I'm not saying there will be more in there… just saying that you could check it.
Handy for a hypermedia client. Works in practice, doesn't break the model.

If you want more semantics, just add them:
 /people/markus/friends :isACollectionOf [
 :hasPredicate foaf:knows;
 :hasSubject /people/Markus
 ]
But that is _not_ needed to achieve my 1 and 2.


Well this certainly adds more triples.  Whether it adds more meaning is a 
separate issue.





Best,

Ruben


It appears that you feel that adding significant new expressive power is 
somehow less of a change than adding new syntax.  I do not feel this way at all.



peter

Re: Inference for error checking [was Re: How to avoid that collections break relationships]

2014-03-31 Thread Peter F. Patel-Schneider



On 03/31/2014 08:31 AM, David Booth wrote:

On 03/30/2014 03:13 AM, Pat Hayes wrote:

[ , . . ]

 What follows from knowing that


ppp schema:domainIncludes ccc . ?

Suppose you know this and you also know that

x ppp y .

Can you infer x rdf:type ccc? I presume not, since the domain might
include other stuff outside ccc. So, what *can* be inferred about the
relationship between x and ccc ? As far as I can see, nothing can be
inferred. If I am wrong, please enlighten me. But if I am right, what
possible utility is there in even making a schema:domainIncludes
assertion?

If inference is too strong, let me weaken my question: what
possible utility **in any way whatsoever** is provided by knowing
that schema:domainIncludes holds between ppp and ccc? What software
can do what with this, that it could not do as well without this?


I think I can answer this question quite easily, as I have seen it come up 
before in discussions of logic.


Entailment produces statements that are known to be true, given a set of 
facts and entailment rules.  And indeed, adding the fact that


  ppp schema:domainIncludes ccc .

to a set of facts produces no new entailments in that sense. 


Is it then your contention that schema:domainIncludes does not add any new 
entailments under the schema.org semantics?



But it *does* enable another kind of very useful machine-processable 
inference that is useful in error checking, which I'll describe.


In error checking, it is sometimes useful to classify a set of statements 
into three categories: Passed, Failed or Indeterminate. Passed means that 
the statements are fine (within the checkable limits anyway): sufficient 
information has been provided, and it is internally consistent.  Failed 
means that there is something malformed about them (according to the 
application's purpose). Indeterminate means that the system does not have 
enough information to know whether the statements are okay or not: further 
work might need to be performed, such as manual examination or adding more 
information (facts) to the system. Hence, it is *useful* to be able to 
quickly and automatically establish that the statements fall into the Passed 
or Failed category.


Note that this categorization typically relies on making a closed world 
assumption (CWA), which is common for an application to make for a 
particular purpose -- especially error checking.


I don't see that the CWA is particularly germane here, except that most 
formalisms that do this sort of checking also utilize some sort of CWA.   
There is notthing wrong with performing this sort of analysis in formalisms 
that do not have any form of CWA.  What does cause problems with this sort of 
analysis is the presence of non-trivial inference.


In this example, let us suppose that to pass, the object of every predicate 
must be in the Known Domain of that predicate, where the Known Domain is 
the union of all declared schema:domainIncludes classes for that 
predicate.   (Note the CWA here.)


Given this error checking objective, if a system is given the facts:

  x ppp y .
  y a ccc .

then without also knowing that ppp schema:domainIncludes ccc, the system 
may not be able to determine that these statements should be considered 
Passed or Failed: the result may be Indeterminate.  But if the system is 
also told that


  ppp schema:domainIncludes ccc .

then it can safely categorize these statements as Passed (within the limits 
of this error checking).


Sure, but it can be very tricky to determine just what facts to consider when 
making this determination, particularly with the upside-down nature of 
schema:domainIncludes


Thus, although schema:domainIncludes does not enable any new entailments 
under the open world assumption (OWA), it *does* enable some useful error 
checking inference under the closed world assumption (CWA), by enabling a 
shift from Indeterminate to Passed or Failed.

The CWA actually works against you here.  Given the following triples,

x ppp y .
y rdf:type ddd .
ppp schema:domainIncludes ccc.

you are determining whether

y rdf:type ccc.

is entailed, whether its negation is entailed, or neither.  The relevant CWA 
would push these last two together, making it impossible to have a three-way 
determination, which you want.




If anyone is concerned that this use of the CWA violates the spirit of RDF, 
which indeed is based on the OWA (for *very* good reason), please bear in 
mind that almost every application makes the CWA at some point, to do its job.


David


peter

Re: How to avoid that collections break relationships

2014-03-31 Thread Peter F. Patel-Schneider



On 03/31/2014 08:48 AM, Ruben Verborgh wrote:

/people/markus foaf:knows [ hydra:memberOf /people/markus/friends ].

 means “Markus knows somebody who is a member of collection X.

But that's not what this says.  It says that Markus knows some entity that is 
related by an unknown relationship to some unknown other entity.

Well, obviously we'd have to define the hydra:memberOf predicate…

It's not helpful to interpret foaf:knows as knows
but hydra:memberOf as unknown relationship”.


In this case it certainly is.  You want to depend on a particular reading of 
this non-RDF predicate, and have this reading trigger inferences.   A system 
that only uses the RDF semantics will have no knowledge of this extra 
semantics and will thus not perform these essential inferences.


Yes, when one is being totally formal one should not change from foaf:knows to 
knows, but there is no formal fallout from this shadiness.


And “unknown entity” is intended; this is why you have to fetch it if you're 
curious.


If you want more semantics, just add them:
 /people/markus/friends :isACollectionOf [
 :hasPredicate foaf:knows;
 :hasSubject /people/Markus
 ]
But that is _not_ needed to achieve my 1 and 2.

Well this certainly adds more triples.  Whether it adds more meaning is a 
separate issue.

Obviously, we'd define isACollectionOf as well.


Again making a significant addition to RDF.



It appears that you feel that adding significant new expressive power is 
somehow less of a change than adding new syntax.

I'm not adding any new expressive power. Can you point exactly to where you 
think I'm doing that?
Yes, I define a memberOf predicate that clients have to understand.


That's new expressive power.

But that's a given if we just define it was owl:inverseProperty hydra:member.
Which is precisely my point.  You are using OWL, not just RDF.  If you want to 
do this in a way that fits in better with RDF, it would be better to add to 
the syntax of RDF without adding to the semantics of RDF.


Best,

Ruben

peter

Re: How to avoid that collections break relationships

2014-03-31 Thread Peter F. Patel-Schneider

If you want a hydra solution, then you should do whatever is needed to make it 
a hydra solution.


In actuality, defining things like owl:sameAs is indeed extending RDF.  
Defining things in terms of OWL connectives also goes beyond RDF. This is 
different from introducing domain predicates like foaf:friends.   (Yes, it is 
sometimes a bit hard to figure out which side of the line one is on.)


peter



On 03/31/2014 09:26 AM, Ruben Verborgh wrote:

Peter,

Please, let's get the discussion back
to what we want to achieve in the first place.
Right now, the solution is being evaluated
on a dozen of other things that are not relevant.

Proposal: let's discuss the whole abstract RDF container thing on 
public-lod@w3.org,
and solutions to make clients work at public-hy...@w3.org.

We're talking here about making clients able to get the members of something.
Yes, they will need to interpret some properties.
Just like an OWL reasoner needs to interpret owl:sameAs,
a Hydra client needs to interpret hydra:member.
That is how applications work.

In no way, defining a vocabulary is extending RDF.
RDF is a framework. I'm not adding to the framework.
I'm proposing a simple property
 hydra:memberOf owl:inverseProperty hydra:member.
If you really don't like me introducing a property,
here's an alternative way of saying the same thing:

/people/markus foaf:knows _:x.
/people/markus/friends hydra:member _:x.

There you go. hydra:member was already defined,
I'm not inventing or adding anything.


You want to depend on a particular reading of this non-RDF predicate, and have 
this reading trigger inferences.

No I don't want any of that. Why do think I'd want that?
Where did I say I want inferences? Where do I need them?

Also, how could it possibly be a non-RDF predicate?
RDF simply defines a predicate as an IRI [1].


Again making a significant addition to RDF.

When did defining a vocabulary become adding to RDF?


Which is precisely my point.  You are using OWL, not just RDF.  If you want to 
do this in a way that fits in better with RDF, it would be better to add to the 
syntax of RDF without adding to the semantics of RDF.

…but this has _never_ been about extending RDF in any way,
nor has it been about only using RDF or only using OWL.
We don't want any of that. We want:

1. Having a way for clients to find out the members of a specific collection
2. Not breaking the RDF model while doing so

The proposed solution achieves both objectives.

Best,

Ruben

[1] http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#dfn-predicate

Re: How to avoid that collections break relationships

2014-03-30 Thread Peter F. Patel-Schneider



On 03/30/2014 12:13 AM, Pat Hayes wrote:

On Mar 29, 2014, at 8:10 PM, Peter F. Patel-Schneider pfpschnei...@gmail.com 
wrote:


On 03/29/2014 03:30 PM, Markus Lanthaler wrote:

On Wednesday, March 26, 2014 5:26 AM, Pat Hayes wrote:

Hmm. I would be inclined to violate IRI opacity at this point and have
a convention that says that any schema.org property schema:ppp can have
a sister property called schema:pppList, for any character string ppp.
So you ought to check schema:knowsList when you are asked to look for
schema:knows. Then although there isn't a link in the conventional
sense, there is a computable route from schema:knows to
schema:knowsList, which as far as I am concerned amounts to a link.

Schema.org doesn't suffer from this issue as much as other vocabularies do
as it isn't defined with RDFS but uses its own, looser description
mechanisms such as schema:domainIncludes and schema:rangeIncludes. So what
I'm really looking for is a solution that would work in general, not just
for some vocabularies.
[...]


--
Markus Lanthaler
@markuslanthaler


I would  like to see some firm definition of just how these looser description 
mechanisms actually work.

Yes, I agree. Let me put the question rather more sharply. What follows from 
knowing that

ppp schema:domainIncludes ccc . ?

Suppose you know this and you also know that

x ppp y .

Can you infer x rdf:type ccc? I presume not, since the domain might include 
other stuff outside ccc. So, what *can* be inferred about the relationship 
between x and ccc ? As far as I can see, nothing can be inferred. If I am 
wrong, please enlighten me. But if I am right, what possible utility is there 
in even making a schema:domainIncludes assertion?

If inference is too strong, let me weaken my question: what possible utility 
**in any way whatsoever** is provided by knowing that schema:domainIncludes holds between 
ppp and ccc? What software can do what with this, that it could not do as well without 
this?

Having a piece of formalism which claims to be a 'weak' assertion becomes 
simply ludicrous when it is so weak that it carries no content at all. This 
bears the same relation to axiom writing that miming does to wrestling.

Pat


Perhaps this could be somewhat sharpened to that professional wrestling does 
to wrestling.


peter

Re: How to avoid that collections break relationships

2014-03-29 Thread Peter F. Patel-Schneider



On 03/29/2014 03:30 PM, Markus Lanthaler wrote:

On Wednesday, March 26, 2014 5:26 AM, Pat Hayes wrote:

Hmm. I would be inclined to violate IRI opacity at this point and have
a convention that says that any schema.org property schema:ppp can have
a sister property called schema:pppList, for any character string ppp.
So you ought to check schema:knowsList when you are asked to look for
schema:knows. Then although there isn't a link in the conventional
sense, there is a computable route from schema:knows to
schema:knowsList, which as far as I am concerned amounts to a link.

Schema.org doesn't suffer from this issue as much as other vocabularies do
as it isn't defined with RDFS but uses its own, looser description
mechanisms such as schema:domainIncludes and schema:rangeIncludes. So what
I'm really looking for is a solution that would work in general, not just
for some vocabularies.
[...]


--
Markus Lanthaler
@markuslanthaler

I would  like to see some firm definition of just how these looser description 
mechanisms actually work.


peter

Re: How to avoid that collections break relationships

2014-03-25 Thread Peter F. Patel-Schneider


Let's see if I have this right.

You are encountering a situation where thenumber of people Markus knows is too 
big (somehow).  The proposed solution is to move this information to a 
separate location. I don't see how this helps in reducing the size of the 
information, which was the initial problem.



Splitting this information into pieces might help. schema.org, along with just 
about every other RDF syntax, doesnot require that all the information about 
aparticularentity is in the same spot. The problem then is to ensure that all 
the information is accessed together.


schema.org, somewhat separate from other RDF syntaxes, does have facilities 
for this.  All you needto do is to set up multiple pages, for example

.../markus1 through.../markusn
and on each of these pages include schema.org markup withcontent like
.../markusi schema:url .../markus
.../markus schema:knows .../friendi1
...
.../markus schema:knows .../friendimi
Then on .../markus you have
.../markus schema:url .../markus1
...
.../markus schema:url .../markusn
(Maybe schema:sameAs is a better relationshipto use here, but they both should 
work.)


Voila! (With the big provisio that I have no idea whether the schema.org 
processors actually dothe right thing here, asthere is no indication of what 
they do do.)


peter

PS:  LDP??

On 03/24/2014 08:24 AM, Markus Lanthaler wrote:

Hi all,

We have an interesting discussion in the Hydra W3C Community Group [1]
regarding collections and would like to hear more opinions and ideas. I'm
sure this is an issue a lot of Linked Data applications face in practice.

Let's assume we want to build a Web API that exposes information about
persons and their friends. Using schema.org, your data would look somewhat
like this:

   /markus a schema:Person ;
 schema:knows /alice ;
 ...
 schema:knows /zorro .

All this information would be available in the document at /markus (please
let's not talk about hash URLs etc. here, ok?). Depending on the number of
friends, the document however may grow too large. Web APIs typically solve
that by introducing an intermediary (paged) resource such as
/markus/friends/. In Schema.org we have ItemList to do so:

   /markus a schema:Person ;
 schema:knows /markus/friends/ .

   /markus/friends/ a schema:ItemList ;
 schema:itemListElement /alice ;
 ...
 schema: itemListElement /zorro .

This works, but has two problems:
   1) it breaks the /markus --[knows]-- /alice relationship
   2) it says that /markus --[knows]-- /markus/friends

While 1) can easily be fixed, 2) is much trickier--especially if we consider
cases that don't use schema.org with its weak semantics but a vocabulary
that uses rdfs:range, such as FOAF. In that case, the statement

   /markus foaf:knows /markus/friends/ .

and the fact that

   foaf:knows rdfs:range foaf:Person .

would yield to the wrong inference that /markus/friends is a foaf:Person.

How do you deal with such cases?

How is schema.org intended to be used in cases like these? Is the above use
of ItemList sensible or is this something that should better be avoided?


Thanks,
Markus


P.S.: I'm aware of how LDP handles this issue, but, while I generally like
the approach it takes, I don't like that fact that it imposes a specific
interaction model.


[1] http://bit.ly/HydraCG



--
Markus Lanthaler
@markuslanthaler

Re: How to avoid that collections break relationships

2014-03-25 Thread Peter F. Patel-Schneider



On 03/25/2014 10:40 AM, Markus Lanthaler wrote:

On Tuesday, March 25, 2014 5:49 PM, Peter F. Patel-Schneider wrote:

Let's see if I have this right.

You are encountering a situation where thenumber of people Markus knows
is too big (somehow).  The proposed solution is to move this information
to a separate location. I don't see how this helps in reducing the size
of the information, which was the initial problem.

Cynical as usual :-) Let's just assume that the vast majority of the clients 
aren't interested in Markus' friends but just in information about him. Thus, 
they shouldn't have to process megabytes of friend relationships that they are 
to be ignoring anyway. Those few clients that are interested in those 
relationships, however, need a mechanism to find them.


Aah.  However, this is a new requirement.

So what you want is to be able to cherry-pick the data associated with Markus, 
and not even have to pay for transmitting the unwanted bits.  This is 
definitely not supported by schema.org.  To do this in general would require 
specifying the data that you want in the request.



Splitting this information into pieces might help. schema.org, along
with just about every other RDF syntax, doesnot require that all the
information about aparticularentity is in the same spot. The problem
then is to ensure that all the information is accessed together.

schema.org, somewhat separate from other RDF syntaxes, does have
facilities for this.  All you needto do is to set up multiple pages,
for example .../markus1 through.../markusn
and on each of these pages include schema.org markup withcontent like
.../markusi schema:url .../markus

I'm still wondering what schema:url is actually for and how it relates to 
Microdata's itemid, RDFa's resource and JSON-LD's @id... but that's a separate 
discussion.


Yeah.  I'm still waiting for the better documentation that was supposed to be 
coming shortly after ISWC 2013.




.../markus schema:knows .../friendi1
...
.../markus schema:knows .../friendimi
Then on .../markus you have
.../markus schema:url .../markus1
...
.../markus schema:url .../markusn
(Maybe schema:sameAs is a better relationshipto use here, but they both
should work.)

Yeah, this would of course work, but it doesn't tell the client at all why it 
should follow schema:url links to /markus{n}. The same is more or less true 
abut schema:sameAs.


--
Markus Lanthaler
@markuslanthaler




Voila! (With the big provisio that I have no idea whether the schema.org
processors actually dothe right thing here, asthere is no indication of
what they do do.)

peter

PS:  LDP??

Linked Data Platform: http://www.w3.org/TR/ldp/

Re: How to avoid that collections break relationships

2014-03-25 Thread Peter F. Patel-Schneider



On 03/24/2014 08:24 AM, Markus Lanthaler wrote:

Hi all,

[snip]


Thanks,
Markus


P.S.: I'm aware of how LDP handles this issue, but, while I generally like
the approach it takes, I don't like that fact that it imposes a specific
interaction model.



So LDP handles this issue, and is going through the W3C process. Why not hold 
your nose and use it?  Or even better, participate and fix it?


peter

PS:  It took me quite a while to figure out just that LDP was trying to do, 
and how it was proposing to do it.   That's probably a sign that more 
user-facing documentation is needed.

Re: How to avoid that collections break relationships

2014-03-25 Thread Peter F. Patel-Schneider



On 03/25/2014 11:21 AM, Markus Lanthaler wrote:

On Tuesday, March 25, 2014 7:04 PM, Peter F. Patel-Schneider wrote:

On 03/24/2014 08:24 AM, Markus Lanthaler wrote:

Hi all,

[snip]


Thanks,
Markus


P.S.: I'm aware of how LDP handles this issue, but, while I generally
like the approach it takes, I don't like that fact that it imposes a
specific interaction model.


So LDP handles this issue, and is going through the W3C process. Why
not hold your nose and use it?  Or even better, participate and fix it?

Because in my opinion the model LDP is based on is doomed to fail. I expressed 
that concern a couple of times (at conferences where LDP was presented).. if I 
find the time (which will be tricky as I'm traveling from tomorrow onwards), I 
may even do a full review of the spec.





Probably the best thing to do next is to elucidate all of what you do want, as 
it appears to be quite complex, involving not just the amount of data being 
served, but also how to access specific parts of the data under the control of 
the requester.  Otherwise, you are going to continue to get solutions that 
address only what you are asking about, which do not appear to be meeting your 
needs.


peter

49 matches

Mail list logo