Re: Parsing namespaced XML with clojure.data.xml

2016-09-28 Thread Herwig Hochleitner
http://dev.clojure.org/jira/browse/CLJ-2030
​

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parsing namespaced XML with clojure.data.xml

2016-09-28 Thread Herwig Hochleitner
So, your comment about using uri-encoding inspired me to just use that as
an encoding to fit in a kw-ns. It seems to work out:
https://github.com/bendlas/data.xml/commit/22cbe21181175d302c884b4ec9162bd5ebf336d7

There is a couple of open issues, that I commented on the commit.
​
I'll open a dev-thread about the possibility of making clojure.core/alias
auto-creating, with varars and expose it as (ns (:alias al n ak )).
That would make this incarnation of data.xml very convenient to use, as
well as solve similar cases for creating a namespace just for the sake of
naming keywords in them.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parsing namespaced XML with clojure.data.xml

2016-09-24 Thread Herwig Hochleitner
What about skipping the alphabet translation and just doing uri encoding?

{http://www.w3.org/1999/xhtml}pre => :http%3A%2F%2Fwww.w3.org
%2F1999%2Fxhtml/pre
doesn't seem so bad and this way we would get uniformity without weird
corner cases.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parsing namespaced XML with clojure.data.xml

2016-09-17 Thread Matching Socks
No escape needed; or, rather, no need to invent an escape.  A mapping from 
IRI to URI is already specified in RFC 3987, "Internationalized Resource 
Identifiers (IRIs)".  Just translate IRI to URI, and thence to keyword.  

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parsing namespaced XML with clojure.data.xml

2016-09-17 Thread Herwig Hochleitner
2016-09-17 15:10 GMT+02:00 Matching Socks :

> To make a URI into a Clojure keyword namespace, we may simply replace
> the 11 URI characters that are forbidden or problematic in keywords
> with Unicode-alphabetic characters outside Latin-1.
>

Yep, I've been thinking along those lines as well. We'd still need an
escape character, since unicode uris are a thing, but at least we could
substitute :,/,... without it.

The substitutes should be present in common desktop fonts, and should
> not be mistaken for Latin-1 characters.  They should come from a
> single Unicode script, to avoid burdensome Unicode puns.  It should
> be a raster script that does not require decades of handwriting practice.
>
> Cyrillic fits the bill very well:  it's recognizable and out-of-band.
> You'd
> never type these URI keywords in, but Cyrillic is a software-selectable
> keyboard so you could if you felt like it.
>
>   http://www.cs.yale.edu/~perlis-alan/quotes.html
>   httpцЛЛwwwЯcsЯyaleЯeduЛжperlis-alanЛquotesЯhtml
>

Cyrillic might serve us well, but maybe is a set of dedicated substitution
characters in unicode?

I'm still concerned, that doing this might be viewed as an ugly hack, but I
think, being able to reuse namespace aliasing is a powerful proposition...

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parsing namespaced XML with clojure.data.xml

2016-09-17 Thread Matching Socks
To make a URI into a Clojure keyword namespace, we may simply replace
the 11 URI characters that are forbidden or problematic in keywords
with Unicode-alphabetic characters outside Latin-1.

The substitutes should be present in common desktop fonts, and should
not be mistaken for Latin-1 characters.  They should come from a
single Unicode script, to avoid burdensome Unicode puns.  It should
be a raster script that does not require decades of handwriting practice.

Cyrillic fits the bill very well:  it's recognizable and out-of-band.  You'd
never type these URI keywords in, but Cyrillic is a software-selectable 
keyboard so you could if you felt like it.

  http://www.cs.yale.edu/~perlis-alan/quotes.html
  httpцЛЛwwwЯcsЯyaleЯeduЛжperlis-alanЛquotesЯhtml

Here is a demonstration of a simple URI <-> keyword translator 
and a keyword-namespace aliasing macro to facilitate relatively 
painless use of namespace literals in source code.

(To furthermore overcome the problem that "%" hex expressions compare
case-blind in URIs, but not keywords, we should norm %xx to %XX as RFC
3986 recommends before converting to a keyword namespace.)

(def problems  [\. \~ \: \/ \[ \] \@ \( \) \, \;])
(def solutions [\Я \ж \ц \Л \П \Ю \Ж \ъ \Ъ \г \д])

(defn- tr [a b]
  (let [m (zipmap a b)]
(fn [s]
  (apply str (map #(m % %) s)

(def uri->kwns
  (tr problems solutions))

(def kwns->uri
  (tr solutions problems))

(defmacro alias-xml-ns [sym uri]
  `(let [kwns# (symbol (uri->kwns ~uri))]
(create-ns kwns#)
(alias ~sym kwns#)))

(comment

  (uri->kwns "http://www.w3.org/2000/01/rdf-schema#;)
  
  (kwns->uri *1)

  (alias-xml-ns 'html "http://www.w3.org/1999/xhtml;)

  (assert
   (identical? ::html/aside
   :httpцЛЛwwwЯw3ЯorgЛ1999Лxhtml/aside))

)


-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parsing namespaced XML with clojure.data.xml

2016-08-28 Thread Herwig Hochleitner
2016-08-25 1:53 GMT+02:00 Matching Socks :

> Namespaced XML is inherently value-comparable and unambiguous.  It would
> be shame to give up on that, and disperse the burden throughout every layer
> of library and consumer.
>

That's a very good point. Disregarding concerns for edn syntax for a
moment, the best solution would seem to standardize on qnames, since those
are the jvm-wide canonical mapping. With reader tags, they are almost there
in terms of read/writability, but not quite as convenient as
::alias/keywords

When designing the namespacing support, I discarded the idea to cram uris
into keywords, because of the impedance mismatch. I thank you for bringing
it up again, though, because I overlooked a very important failure case,
when trying to preserve value semantics:

Say, a library chooses not to use keyword mapping for a given xmlns and
instead matches on qname instances, but then somebody within the system
establishes an alias for that xmlns. Said library will then silently get
keywordized data, that it won't recognize anymore. Unfortunately, I don't
know how to catch that with an explicit error either.

So yes, I am open to experimenting with encoding uris into keywords.

Pretty-printing need not be a concern of the XML parsing library.  Everyone
> seems to be interested nowadays in easing the usage of namespaced
> keywords.  Perhaps printing could be improved (globally) to use the
> caller's keyword namespace aliases.
>

Yes, it doesn't feel right to let an easily adaptable concern like
pretty-printing dictate value semantics.

By all means, use an encoding more legible than Base64.  URLEncoder could
> be an example in the way it uses %.  Pick an escape character that's legal
> in Clojure namespace names, but unusual in the best-known namespace URIs.
> Apostrophe?
>

Yes, or maybe map to unicode lookalikes and escape those, should they ever
occur in a ns-uri.
e.g.

/ -> ⧄
: -> ╎

Though, those are not alphanumeric characters, so probably illegal in
keywords.

Any thoughts so far?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parsing namespaced XML with clojure.data.xml

2016-08-24 Thread Matching Socks
Namespaced XML is inherently value-comparable and unambiguous.  It would be 
shame to give up on that, and disperse the burden throughout every layer of 
library and consumer.

Pretty-printing need not be a concern of the XML parsing library.  Everyone 
seems to be interested nowadays in easing the usage of namespaced 
keywords.  Perhaps printing could be improved (globally) to use the 
caller's keyword namespace aliases.  

Anyway, pretty-printing is always expensive.  If a keyword-conversion step 
must encumber either pretty-printing or everything else, better do it in 
pretty-printing.

Keyword *literals* make the source code easy to read, but composing 
keywords programmatically with a caller-provided namespace might be 
intolerable.  Moreover, providing those namespace mappings would be a messy 
headache for the consumer of XML processing libraries.  The mappings would 
have to pass through layer after layer.  No doubt, every library will 
provide different defaults.  One false step, and you would lose value 
comparability.

By contrast!, with well-known keyword namespaces, computed by a well-known 
function from their respective well-known namespace URI, everyone could 
write source code using keyword literals with whatever keyword namespace 
alias they want, and XML structures would be value-comparable.  In the 
short run, the best pretty-print might be actual XML serialization.  In the 
long run, I predict, Clojure's namespaced keywords will go down as smooth 
as fudge.

By all means, use an encoding more legible than Base64.  URLEncoder could 
be an example in the way it uses %.  Pick an escape character that's legal 
in Clojure namespace names, but unusual in the best-known namespace URIs.  
Apostrophe?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parsing namespaced XML with clojure.data.xml

2016-08-22 Thread Herwig Hochleitner
I've been thinking this over. I'm starting feel that you are right in that
the arbitrary, global mapping could cause more problems, than it would
solve. Even if we could get by with a maintained registry, it would still
be a burden to maintain and to use. Also, there is the open question of
code expecting qnames, when suddenly, somebody declares a new xmlns mapping.

There is the possibility to canonicalize by cramming the xmlns uri into a
readable kw-ns and that would still neatly reuse clojure's ns-alias
facility. What I don't like about the approach is, that it would make even
pretty-printed xml parse-trees quite unreadable. While
:xmlns.dav/multistatus vs :xmlns.REFWOgo=/multistatus might not look as
horrifying, consider :xmlns.aHR0cDovL3d3dy53My5vcmcvMTk5OS94aHRtbAo=/p for
an xhtml paragraph.

Maybe it's time to give up on universal value equality of parsed xml and
instead make the keyword - mapping a la carte, with a parser / emitter flag.
Technically, universal value equality is already challenged by the qname /
keyword dichotomy and given that we want to retain using ::alias/keywords
there is a decision to be made on whether to make qname the canonical
representation and embrace the multitude of keyword mappings or whether to
eliminate qnames and take the readability hit for canonicalizing the
keyword representation. Do you see any alternative?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parsing namespaced XML with clojure.data.xml

2016-08-21 Thread Matching Socks
Apps are cobbled together from sub-systems and libraries.  Some of those 
may use clojure.data.xml, either to share their products with their client 
or for their internal purposes.  As soon as two libraries on Clojars differ 
in their namespace-URI to keyword-namespace mapping, has the ship sunk?  

Nonetheless, value equality might sometimes be useful.  How to achieve it?  

There is already a globally distinct, agreed, and unambiguous way to refer 
to each well-known XML namespace URI.  It is the URI itself.  If the 
keyword-namespace must have the same properties, it ought to follow from 
the URI, not be left to the discretion of individual consumers.  

Could there be a well-known translation from namespace-URI to 
keyword-namespace and back?  These keyword namespaces would be cumbersome 
(as they must include the whole URI and also avoid colliding with the 
namespace of any other namespaced keywords anywhere), but consumers could 
alias them conveniently without impacting value comparisons:

(->> 'xmlns.http.www.w3.org.n1999.xhtml
 create-ns
 ns-name
 symbol
 (alias 'xhtml))

To account for the whole space of URIs, without violating the Clojure or 
EDN keyword namespace spec or compromising reverse translation back to the 
URI, you might have to go further.  For example, combine a legible symbol 
name computed with some loss (as an assertion) and a Base-64 encoding...

(->> 
'xmlns.http.maven.apache.org.POM.4.0.0.aHR0cDovL21hdmVuLmFwYWNoZS5vcmcvUE9NLzQuMC4wCg==
 create-ns
 ns-name
 symbol
 (alias 'pom))

A well-known formula for namespace keywords representing XML namespaces 
could replace the ad-hoc mutable map and satisfy your dual aims that 
clojure.data.xml applications might use keywords for convenience while also 
maintaining strict value equality of the XML data structures all the way to 
the horizon.  (The data structure would use such keywords for all element 
tags.)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parsing namespaced XML with clojure.data.xml

2016-08-21 Thread Herwig Hochleitner
2016-08-20 21:43 GMT+02:00 Matching Socks :

>
> Could the same effect be obtained without the global state of namespace
> mappings?  Do all uses of clojure.data.xml in an app, even fully
> encapsulated uses, have to agree about the keyword for any given well-known
> XML namespace URI?
>

Currently, that is the case. The motivation is to ensure value-equality for
parse trees within an application.
Do you have a compelling use case for passing the namespace mapping into
the parse call?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Parsing namespaced XML with clojure.data.xml

2016-08-20 Thread Matching Socks
The future is XML-with-namespaces: POM files and whatnot.  Such cases are 
tricky because more than one notation is possible.  You need a 
namespace-enabled parser to figure out what the XML text really means.  
Luckily, a contributed project, clojure.data.xml, can read 
XML-with-namespaces, and in good idiom return Clojure-namespaced keywords 
for the element names.  (Its present version is 0.1.0-beta1, a 
work-in-progress.)  You configure the namespaces to keywordize as its 
README illustrates:

(declare-ns "xml.html" "http://www.w3.org/1999/xhtml;)
(parse-str "
http://www.w3.org/1999/xhtml\;>
...

Could the same effect be obtained without the global state of namespace 
mappings?  Do all uses of clojure.data.xml in an app, even fully 
encapsulated uses, have to agree about the keyword for any given well-known 
XML namespace URI?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.