Re: Document fragment vocabulary

2011-08-16 Thread Sebastian Hellmann

Am 16.08.2011 14:12, schrieb Michael Hausenblas:

It is not really LinkedData friendly.


Why?

It does not scale for large documents. Let's say you have a 200 MB text 
file, with average 3 annotations per line (200,000 lines, 600,000 triples ).

Somebody attached an annotation on line 2:

http://example.com/text.txt#line=2  my:comment Please remove this line. It is 
so negative! .

When making a query with RDF/XML Accept Header. You would always need to 
retrieve all annotations for all lines.
Then after transferring the 200 MB, the client would throw away all 
other triples but the one.


@Michael: is there some standardisation respective URIs for text  
going on?


As you've rightly identified, an RFC already exists. What would this 
new standardisation activity be chartered for?


As and aside, this reminds me a bit of http://xkcd.com/927/
Hm, actually you created an extra standard yourself for csv, because the 
approach by Wilde and Dürst did not cover your use case.
It does not cover mine either for 100%.  Potentially, there are a lot of 
text based formats. So there should be a way to extend the pattern somehow.

The approach by Wilde and Dürst[1] seems to lack stability.
I don't know what you mean by this. Lack of take-up, yes. Stability, 
what's that?
Wilde and Dürst provide integrity checks, but there is no proposal that 
produces robust fragment IDs.  e.g. something that works on the context 
and not on line or position. A change in the document on position 0 
might render all fragment ids obsolete. E.g. #range=(574,585) would 
not be valid, if one character was inserted at the beginning of the 
document.


Do you think we could do such standardisation for document fragments 
and text fragments within the Media Fragments Group[3] ?

No. Disclaimer: I'm a MF WG member. Look at our charter [1] ...


Ok, thanks for clarifying that.


Maybe this thread should slowly be moved over to u...@w3.org [2]?

The # part not being sent to the server might be interesting for this 
list as it is a linked data problem. Also I think we should create an 
OWL Vocabulary to describe, document and standardize different fragment 
identifiers, as Alexander has started. But we should only do it with the 
w3c. Otherwise it will truly become competing standard 15 .

The ontology could also just be descriptive, reflecting the RFCs.
Should we cross-post? Alternatively I could just start another thread there.
Sebastian



Cheers,
Michael

[1] http://www.w3.org/2008/01/media-fragments-wg.html
[2] http://lists.w3.org/Archives/Public/uri/
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 16 Aug 2011, at 05:40, Sebastian Hellmann wrote:


Hi Michael and Alex,
sorry to answer so late, I was in holiday in France.
I looked at the three provided resources [1,2,3] and there are still 
some comments and questions I have.


1. The part after the # is actually not sent to the server. Are there 
any solutions for this? It is not really LinkedData friendly.
Compare 
http://linkedgeodata.org/triplify/near/51.03,13.73/1000/class/Amenity

(Currently not working, but it gives all points within a 1000m radius)

The client would be required to calculate the subset of triples from 
the resource, that are addressed.


2. [1] is quite basic and they are basically using position and 
lines. I made a qualitative comparison of different fragment id 
approaches for text in [4] slide 7.
I was wondering if anybody has researched such properties of URI 
fragments. Currently, I am benchmarking stability of these uris using 
Wikipedia changes.

Has such work been done before?

3. @Alex: In my opinion, your proposed fragment ontology can  only be 
used to provide documentation for different fragments.

I would rather propose to just use one triple:
http://www.w3.org/DesignIssues/LinkedData.html#offset__14406-14418 
a http://nlp2rdf.lod2.eu/schema/string/OffsetBasedString
The ontology I made for Strings might be generalized for formats 
other than text based [5]
One triple is much shorter. As you can see I also tried to encode the 
type of fragment right into the fragment offset, although a 
notation like type=offset  might be better.


4.  @Michael: is there some standardisation respective URIs for text  
going on?
I heard there would be a Language Technology W3C group. The approach 
by Wilde and Dürst[1] seems to lack stability.
Do you think we could do such standardisation for document fragments 
and text fragments within the Media Fragments Group[3] ?
I really thought the liveUrl project was quite good, but it seems 
dead[6].



In LOD2[7] and NIF[8] we will need some fragment identifiers to 
Standardize NLP tools for the LOD2 stack.
It would be great to reuse stuff instead of starting from scratch. I 
had to extend [1] 

Re: Document fragment vocabulary

2011-08-16 Thread Sebastian Hellmann
I just forwared this mail and my questions to u...@w3c.org without cross 
posting, as some parts are really not interesting for the linked data list.

http://lists.w3.org/Archives/Public/uri/2011Aug/

Regards,
Sebastian


Am 16.08.2011 15:09, schrieb Sebastian Hellmann:

Am 16.08.2011 14:12, schrieb Michael Hausenblas:

It is not really LinkedData friendly.


Why?

It does not scale for large documents. Let's say you have a 200 MB 
text file, with average 3 annotations per line (200,000 lines, 600,000 
triples ).

Somebody attached an annotation on line 2:

http://example.com/text.txt#line=2  my:comment Please remove 
this line. It is so negative! .


When making a query with RDF/XML Accept Header. You would always need 
to retrieve all annotations for all lines.
Then after transferring the 200 MB, the client would throw away all 
other triples but the one.


@Michael: is there some standardisation respective URIs for text  
going on?


As you've rightly identified, an RFC already exists. What would this 
new standardisation activity be chartered for?


As and aside, this reminds me a bit of http://xkcd.com/927/
Hm, actually you created an extra standard yourself for csv, because 
the approach by Wilde and Dürst did not cover your use case.
It does not cover mine either for 100%.  Potentially, there are a lot 
of text based formats. So there should be a way to extend the pattern 
somehow.

The approach by Wilde and Dürst[1] seems to lack stability.
I don't know what you mean by this. Lack of take-up, yes. Stability, 
what's that?
Wilde and Dürst provide integrity checks, but there is no proposal 
that produces robust fragment IDs.  e.g. something that works on the 
context and not on line or position. A change in the document on 
position 0 might render all fragment ids obsolete. E.g. 
#range=(574,585) would not be valid, if one character was inserted 
at the beginning of the document.


Do you think we could do such standardisation for document fragments 
and text fragments within the Media Fragments Group[3] ?

No. Disclaimer: I'm a MF WG member. Look at our charter [1] ...


Ok, thanks for clarifying that.


Maybe this thread should slowly be moved over to u...@w3.org [2]?

The # part not being sent to the server might be interesting for this 
list as it is a linked data problem. Also I think we should create an 
OWL Vocabulary to describe, document and standardize different 
fragment identifiers, as Alexander has started. But we should only do 
it with the w3c. Otherwise it will truly become competing standard 15 .

The ontology could also just be descriptive, reflecting the RFCs.
Should we cross-post? Alternatively I could just start another thread 
there.

Sebastian



Cheers,
Michael

[1] http://www.w3.org/2008/01/media-fragments-wg.html
[2] http://lists.w3.org/Archives/Public/uri/
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 16 Aug 2011, at 05:40, Sebastian Hellmann wrote:


Hi Michael and Alex,
sorry to answer so late, I was in holiday in France.
I looked at the three provided resources [1,2,3] and there are still 
some comments and questions I have.


1. The part after the # is actually not sent to the server. Are 
there any solutions for this? It is not really LinkedData friendly.
Compare 
http://linkedgeodata.org/triplify/near/51.03,13.73/1000/class/Amenity

(Currently not working, but it gives all points within a 1000m radius)

The client would be required to calculate the subset of triples from 
the resource, that are addressed.


2. [1] is quite basic and they are basically using position and 
lines. I made a qualitative comparison of different fragment id 
approaches for text in [4] slide 7.
I was wondering if anybody has researched such properties of URI 
fragments. Currently, I am benchmarking stability of these uris 
using Wikipedia changes.

Has such work been done before?

3. @Alex: In my opinion, your proposed fragment ontology can  only 
be used to provide documentation for different fragments.

I would rather propose to just use one triple:
http://www.w3.org/DesignIssues/LinkedData.html#offset__14406-14418 
a http://nlp2rdf.lod2.eu/schema/string/OffsetBasedString
The ontology I made for Strings might be generalized for formats 
other than text based [5]
One triple is much shorter. As you can see I also tried to encode 
the type of fragment right into the fragment offset, although a 
notation like type=offset  might be better.


4.  @Michael: is there some standardisation respective URIs for 
text  going on?
I heard there would be a Language Technology W3C group. The approach 
by Wilde and Dürst[1] seems to lack stability.
Do you think we could do such standardisation for document fragments 
and text fragments within the Media Fragments Group[3] 

Re: Document fragment vocabulary

2011-08-15 Thread Michael Hausenblas



It is not really LinkedData friendly.




Why?


@Michael: is there some standardisation respective URIs for text   
going on?



As you've rightly identified, an RFC already exists. What would this  
new standardisation activity be chartered for?


As and aside, this reminds me a bit of http://xkcd.com/927/



The approach by Wilde and Dürst[1] seems to lack stability.



I don't know what you mean by this. Lack of take-up, yes. Stability,  
what's that?




Do you think we could do such standardisation for document fragments  
and text fragments within the Media Fragments Group[3] ?




No. Disclaimer: I'm a MF WG member. Look at our charter [1] ...


Maybe this thread should slowly be moved over to u...@w3.org [2]?


Cheers,
Michael

[1] http://www.w3.org/2008/01/media-fragments-wg.html
[2] http://lists.w3.org/Archives/Public/uri/
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 16 Aug 2011, at 05:40, Sebastian Hellmann wrote:


Hi Michael and Alex,
sorry to answer so late, I was in holiday in France.
I looked at the three provided resources [1,2,3] and there are still  
some comments and questions I have.


1. The part after the # is actually not sent to the server. Are  
there any solutions for this? It is not really LinkedData friendly.

Compare 
http://linkedgeodata.org/triplify/near/51.03,13.73/1000/class/Amenity
(Currently not working, but it gives all points within a 1000m radius)

The client would be required to calculate the subset of triples from  
the resource, that are addressed.


2. [1] is quite basic and they are basically using position and  
lines. I made a qualitative comparison of different fragment id  
approaches for text in [4] slide 7.
I was wondering if anybody has researched such properties of URI  
fragments. Currently, I am benchmarking stability of these uris  
using Wikipedia changes.

Has such work been done before?

3. @Alex: In my opinion, your proposed fragment ontology can  only  
be used to provide documentation for different fragments.

I would rather propose to just use one triple:
http://www.w3.org/DesignIssues/LinkedData.html#offset__14406-14418  
a http://nlp2rdf.lod2.eu/schema/string/OffsetBasedString
The ontology I made for Strings might be generalized for formats  
other than text based [5]
One triple is much shorter. As you can see I also tried to encode  
the type of fragment right into the fragment offset, although a  
notation like type=offset  might be better.


4.  @Michael: is there some standardisation respective URIs for  
text  going on?
I heard there would be a Language Technology W3C group. The approach  
by Wilde and Dürst[1] seems to lack stability.
Do you think we could do such standardisation for document fragments  
and text fragments within the Media Fragments Group[3] ?
I really thought the liveUrl project was quite good, but it seems  
dead[6].



In LOD2[7] and NIF[8] we will need some fragment identifiers to  
Standardize NLP tools for the LOD2 stack.
It would be great to reuse stuff instead of starting from scratch. I  
had to extend [1] for example, because it did not produce stable  
uris and also it did not contain the type of algorithm used to  
produce the URI.


All the best,
Sebastian


[1] http://tools.ietf.org/html/rfc5147
[2] http://tools.ietf.org/html/draft-hausenblas-csv-fragment
[3] http://www.w3.org/TR/media-frags/
[4] http://www.slideshare.net/kurzum/nif-nlp-interchange-format
[5] http://nlp2rdf.lod2.eu/schema/string/
[6] http://liveurls.mozdev.org/index.html
[7] http://lod2.eu
[8] http://aksw.org/Projects/NIF

Am 04.08.2011 22:37, schrieb Michael Hausenblas:



Alex,


Has something already done this? Is it even (mostly?) sane?


Sane yes, IMO. Done, sort of, see:

+ URI Fragment Identifiers for the text/plain [1]
+ URI Fragment Identifiers for the text/csv [2]

Cheers,
Michael

[1] http://tools.ietf.org/html/rfc5147
[2] http://tools.ietf.org/html/draft-hausenblas-csv-fragment

--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 4 Aug 2011, at 14:22, Alexander Dutton wrote:



-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

Say I have an XML document, http://example.org/something.xml,  
and I
want to talk about about some part of it in RDF. As this is XML,  
being
able to point into it using XPath sounds ideal, leading to  
something like:


#fragment a fragment:Fragment ;
 fragment:within http://example.org/something.xml ;
 fragment:locator /some/path[1]^^fragment:xpath .

(For now we can ignore whether we wanted a nodeset or a single node,
and how to handle XML 

Re: Document fragment vocabulary

2011-08-05 Thread Sebastian Hellmann
Hi, 
I am currently benchmarking several properties for such identifiers. my work 
targets strings and all string based documents. 
on http://aksw.org/Projects/NIF you can have a look at the proposed recipes, 
there is also a link to some slides. we will propose a standard for this within 
the lod2 project so everybody is welcome to help and provide use cases. all the 
best,
sebastian
--
Sent with my mobile phone, please excuse my brevity, Sebastian



Alexander Dutton alexander.dut...@oucs.ox.ac.uk schrieb:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

Say I have an XML document, http://example.org/something.xml;, and I
want to talk about about some part of it in RDF. As this is XML, being
able to point into it using XPath sounds ideal, leading to something like:

#fragment a fragment:Fragment ;
fragment:within http://example.org/something.xml; ;
fragment:locator /some/path[1]^^fragment:xpath .

(For now we can ignore whether we wanted a nodeset or a single node,
and how to handle XML namespaces.)

More generally, we might want other ways of locating fragments
(probably with a datatype for each):

* character offsets / ranges
* byte offsets / ranges
* line numbers / ranges
* some sub-rectangle of an image
* XML node IDs
* page ranges of a paginated document

Some of these will be IMT-specific and may need some more thinking
about, but the idea is there.


Has something already done this? Is it even (mostly?) sane?


Yours,

Alex


NB. Our actual use-case is having pointers into an NLM XML file
(embodying a journal article) so we can hook up our in-text reference
pointer¹ URIs to the original XML elements (xref/s) they were
generated from. This will allow us to work out the context of each
citation for use in further analysis of the relationship between the
citing and cited articles.

¹ See
http://opencitations.wordpress.com/2011/07/01/nomenclature-for-citations-and-references/;
for an explanation of the terminology.

- --
Alexander Dutton
Developer, data.ox.ac.uk, InfoDev, Oxford University Computing Services
Open Citations Project, Department of Zoology, University
of Oxford
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk46nS4ACgkQS0pRIabRbjDVZQCdGblvoMgNqEietlE5EwAkPJY8
pikAn2KApM0HjcXj6TZegA+Dek/DJIQX
=UcCr
-END PGP SIGNATURE-



Document fragment vocabulary

2011-08-04 Thread Alexander Dutton

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

Say I have an XML document, http://example.org/something.xml, and I
want to talk about about some part of it in RDF. As this is XML, being
able to point into it using XPath sounds ideal, leading to something like:

#fragment a fragment:Fragment ;
  fragment:within http://example.org/something.xml ;
  fragment:locator /some/path[1]^^fragment:xpath .

(For now we can ignore whether we wanted a nodeset or a single node,
and how to handle XML namespaces.)

More generally, we might want other ways of locating fragments
(probably with a datatype for each):

* character offsets / ranges
* byte offsets / ranges
* line numbers / ranges
* some sub-rectangle of an image
* XML node IDs
* page ranges of a paginated document

Some of these will be IMT-specific and may need some more thinking
about, but the idea is there.


Has something already done this? Is it even (mostly?) sane?


Yours,

Alex


NB. Our actual use-case is having pointers into an NLM XML file
(embodying a journal article) so we can hook up our in-text reference
pointer¹ URIs to the original XML elements (xref/s) they were
generated from. This will allow us to work out the context of each
citation for use in further analysis of the relationship between the
citing and cited articles.

¹ See
http://opencitations.wordpress.com/2011/07/01/nomenclature-for-citations-and-references/
for an explanation of the terminology.

- --
Alexander Dutton
Developer, data.ox.ac.uk, InfoDev, Oxford University Computing Services
   Open Citations Project, Department of Zoology, University
of Oxford
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk46nS4ACgkQS0pRIabRbjDVZQCdGblvoMgNqEietlE5EwAkPJY8
pikAn2KApM0HjcXj6TZegA+Dek/DJIQX
=UcCr
-END PGP SIGNATURE-




Re: Document fragment vocabulary

2011-08-04 Thread Michael Hausenblas


Alex,


Has something already done this? Is it even (mostly?) sane?


Sane yes, IMO. Done, sort of, see:

+ URI Fragment Identifiers for the text/plain [1]
+ URI Fragment Identifiers for the text/csv [2]

Cheers,
Michael

[1] http://tools.ietf.org/html/rfc5147
[2] http://tools.ietf.org/html/draft-hausenblas-csv-fragment

--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 4 Aug 2011, at 14:22, Alexander Dutton wrote:



-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

Say I have an XML document, http://example.org/something.xml, and I
want to talk about about some part of it in RDF. As this is XML, being
able to point into it using XPath sounds ideal, leading to something  
like:


#fragment a fragment:Fragment ;
 fragment:within http://example.org/something.xml ;
 fragment:locator /some/path[1]^^fragment:xpath .

(For now we can ignore whether we wanted a nodeset or a single node,
and how to handle XML namespaces.)

More generally, we might want other ways of locating fragments
(probably with a datatype for each):

* character offsets / ranges
* byte offsets / ranges
* line numbers / ranges
* some sub-rectangle of an image
* XML node IDs
* page ranges of a paginated document

Some of these will be IMT-specific and may need some more thinking
about, but the idea is there.


Has something already done this? Is it even (mostly?) sane?


Yours,

Alex


NB. Our actual use-case is having pointers into an NLM XML file
(embodying a journal article) so we can hook up our in-text reference
pointer¹ URIs to the original XML elements (xref/s) they were
generated from. This will allow us to work out the context of each
citation for use in further analysis of the relationship between the
citing and cited articles.

¹ See
http://opencitations.wordpress.com/2011/07/01/nomenclature-for-citations-and-references/ 


for an explanation of the terminology.

- --
Alexander Dutton
Developer, data.ox.ac.uk, InfoDev, Oxford University Computing  
Services

  Open Citations Project, Department of Zoology, University
of Oxford
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk46nS4ACgkQS0pRIabRbjDVZQCdGblvoMgNqEietlE5EwAkPJY8
pikAn2KApM0HjcXj6TZegA+Dek/DJIQX
=UcCr
-END PGP SIGNATURE-







Re: Document fragment vocabulary

2011-08-04 Thread Alexander Dutton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Michael,

I'm not sure that the URI-style fragment identifiers are expressive or
generalisable enough.

There are a lot of cases where IMTs don't have defined fragment
resolution schemes. I may also want to choose between various ways to
point inside a file of a given IMT. For example, I may want to use XPath
instead of an @id to pick out a node in an HTML file (particularly if
the original author didn't provide any @id attributes).

As a further generalisation one could imagine chaining:

#fragment a fragment:Fragment ;
fragment:within [
fragment:within http://example.org/some/archive.zip ;
fragment:locator foo/bar.html^^fragment:path
] ;
fragment:locator some-div^^fragment:html-id .

Were this notional vocab to exist, it'd be the datatype that would
determine the process by which the fragment is extracted from the
original document, not the document's media type.


All the best,

Alex

On 04/08/11 14:37, Michael Hausenblas wrote:
 
 Alex,
 
 Has something already done this? Is it even (mostly?) sane?
 
 Sane yes, IMO. Done, sort of, see:
 
 + URI Fragment Identifiers for the text/plain [1]
 + URI Fragment Identifiers for the text/csv [2]
 
 Cheers,
   Michael
 
 [1] http://tools.ietf.org/html/rfc5147
 [2] http://tools.ietf.org/html/draft-hausenblas-csv-fragment
 
 --
 Dr. Michael Hausenblas, Research Fellow
 LiDRC - Linked Data Research Centre
 DERI - Digital Enterprise Research Institute
 NUIG - National University of Ireland, Galway
 Ireland, Europe
 Tel. +353 91 495730
 http://linkeddata.deri.ie/
 http://sw-app.org/about.html
 
 On 4 Aug 2011, at 14:22, Alexander Dutton wrote:
 

 Hi all,
 
 Say I have an XML document, http://example.org/something.xml, and I
 want to talk about about some part of it in RDF. As this is XML, being
 able to point into it using XPath sounds ideal, leading to something  
 like:
 
 #fragment a fragment:Fragment ;
  fragment:within http://example.org/something.xml ;
  fragment:locator /some/path[1]^^fragment:xpath .
 
 (For now we can ignore whether we wanted a nodeset or a single node,
 and how to handle XML namespaces.)
 
 More generally, we might want other ways of locating fragments
 (probably with a datatype for each):
 
 * character offsets / ranges
 * byte offsets / ranges
 * line numbers / ranges
 * some sub-rectangle of an image
 * XML node IDs
 * page ranges of a paginated document
 
 Some of these will be IMT-specific and may need some more thinking
 about, but the idea is there.
 
 
 Has something already done this? Is it even (mostly?) sane?
 
 
 Yours,
 
 Alex
 
 
 NB. Our actual use-case is having pointers into an NLM XML file
 (embodying a journal article) so we can hook up our in-text reference
 pointer¹ URIs to the original XML elements (xref/s) they were
 generated from. This will allow us to work out the context of each
 citation for use in further analysis of the relationship between the
 citing and cited articles.
 
 ¹ See
 http://opencitations.wordpress.com/2011/07/01/nomenclature-for-citations-and-references/
  

 for an explanation of the terminology.
 



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk46r7MACgkQS0pRIabRbjCogQCfXz+d18G0ChICLY8ubU+g6ngV
IIwAnA8kuLavXHYFIKKXvFzAGi3ONe/r
=k/jm
-END PGP SIGNATURE-



Re: Document fragment vocabulary

2011-08-04 Thread Alexander Dutton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Damian,

On 04/08/11 15:41, Damian Steer wrote:
 
 On 4 Aug 2011, at 14:22, Alexander Dutton wrote:

 #fragment a fragment:Fragment ;
  fragment:within http://example.org/something.xml ;
  fragment:locator /some/path[1]^^fragment:path .
 
 I think you're mostly describing XPointer? [1][2]

Ooh, yes — that's maybe a better fit for what I meant when I said XPath.
I think I'd like a way of encoding use this XPointer expression to pull
something out of that document as RDF. As far as I know that
expression-in-RDF bit is missing.

 (For now we can ignore whether we wanted a nodeset or a single node,
 and how to handle XML namespaces.)
 
 XPointer certainly handles this. You might find it a little scary :-)

Pah, scariness is no bother ;-).

 By 'more generally' do you mean non-xml documents?

Yes.

 * character offsets / ranges
 
 [3]

I was thinking character ranges in any character stream (e.g. text
files; but this might not always make sense, such as in binary file
formats).

 Has something already done this? Is it even (mostly?) sane?
 
 a) Yes, b) I shall withhold comment concerning the sanity of xpointer.

The pointer to XPointer is certainly useful and has prompted more
investigation on my part. However, as I mentioned above, we're missing
the jump to RDF. (Sorry if I didn't make that overly clear the first
time around).


Yours,

Alex
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk46s5sACgkQS0pRIabRbjBT1QCeKmuapP6ZXx2e+y3AyLQRtDdw
ElkAn1ZgfEvWRCG761M0ZTwHJmEO6VrP
=PpfR
-END PGP SIGNATURE-



Re: Document fragment vocabulary

2011-08-04 Thread Damian Steer

On 4 Aug 2011, at 15:58, Alexander Dutton wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Hi Damian,
 
 On 04/08/11 15:41, Damian Steer wrote:
 
 On 4 Aug 2011, at 14:22, Alexander Dutton wrote:
 
 #fragment a fragment:Fragment ;
 fragment:within http://example.org/something.xml ;
 fragment:locator /some/path[1]^^fragment:path .
 
 I think you're mostly describing XPointer? [1][2]
 
 Ooh, yes — that's maybe a better fit for what I meant when I said XPath.
 I think I'd like a way of encoding use this XPointer expression to pull
 something out of that document as RDF. As far as I know that
 expression-in-RDF bit is missing.

XPointer can be used as fragment-id-from-hell, so use in RDF is straightforward:

http://www.example.com/something#xpointer(//para[1]) ex:mentions 
http://www.example.com/something-else

 By 'more generally' do you mean non-xml documents?
 
 Yes.

Ah, sorry I wasn't sure. There's quite a bit of work on fragments and 
multimedia out there, the most familiar being youtube's time offsets, and there 
is [1] from the W3C.

Damian

[1] http://www.w3.org/TR/media-frags/