Re: [NTG-context] Permissible characters in ConTeXt reference labels

2014-09-18 Thread Mark Szepieniec
OK, thanks both of you, its looks like I need to sanitize all mentioned
characters, since the reference strings will generally originate from
formats other than ConTeXt, and we don't want ConTeXt to do any processing
on them, aside from comparisons to resolve references.

As for Aditya's examples, the first results in a compilation error on my
test file, while the second compiles without error, and gives the expected
result.

On Thu, Sep 18, 2014 at 4:26 AM, Aditya Mahajan adit...@umich.edu wrote:

 On Thu, 18 Sep 2014, Hans Hagen wrote:

  On 9/18/2014 12:06 AM, Mark Szepieniec wrote:

 Bump...

 If it's not too much trouble, I would greatly appreciate some feedback
 on this before I propose it to be merged into pandoc; even a looks good
 to me from one of the ConTeXt gurus would be very helpful.

 Thanks in advance,

 Mark

 On Tue, Sep 9, 2014 at 12:20 AM, Mark Szepieniec mszep...@gmail.com
 mailto:mszep...@gmail.com wrote:

 I'm trying to fix a problem in pandoc (see
 https://github.com/jgm/pandoc/pull/1589) where it doesn't properly
 sanitize the reference labels in ConTeXt output, causing errors
 during compilation when a label contains '#' for example. Note that
 this sanitizing is needed in addition to the regular backslash
 escaping used for control characters: '\#' is still illegal in a
 label for example.


 (LaTeX label) = (ConTeXt reference). What Mark mean was references such as

 \section[...]{...} or \startplacefigure[reference={...}].

  In the sanitizer function I'm writing, I'd like to properly escape
 all illegal characters, but I couldn't find an explicit list of
 allowed or illegal characters. Based on some testing I've conducted
 (see attached file), I've arrived at the following set:

 \#[],{}%()|=


 it depends on where these characters end up in

 #  : always tricky as it denotes an argument, so escape
 [] : depends if it gets fed into a macro that uses [] as delimiters
 {} : only an issue when not balanced
 %  : escaping needed as it's comment otherwise
 () : depends on where it ends up, like []
 |  : is special in context so needs escaping
 \  : of course that one needs escaping

  1) Does this look like a reasonable set? Are there other characters
 or sequences that should be included, or are worth testing?


 keep in mind that escapes should end up unescaped at some point

  2) I was told (see
 https://groups.google.com/forum/#!topic/pandoc-discuss/tYpXMUkmbEY)
 that if the characters  and , didn't work, it would count as a
 ConTeXt bug, is there any truth to that? Please let me know if any
 further info is needed on my part.


 well, define bug ... one can say the same of  and  in xml -)


 Since I made that comment on the pandoc mailing list, let me explain.

 Consider:

 \section[some reference]{Title}

 Given how  behaves elsewhere in ConTeXt, a user would expect the above to
 be a valid input. If it is not, then it is bug (or atleast, surprising).

 The same goes for

 \section[some, reference]{Title}

  if the result ends up in a comma separated list then , can be an issue
 but one can always wrap an argument in {} to hide that

  3) Does anyone see issues with this general approach? I'm relatively
 new to ConTeXt, so I might be missing either a huge problem, or an
 obviously easier way to do this.


 i don't know ... i never used pandoc input


 Aditya

 
 ___
 If your question is of interest to others as well, please add an entry to
 the Wiki!

 maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/
 listinfo/ntg-context
 webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
 archive  : http://foundry.supelec.fr/projects/contextrev/
 wiki : http://contextgarden.net
 
 ___

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] Permissible characters in ConTeXt reference labels

2014-09-17 Thread Mark Szepieniec
Bump...

If it's not too much trouble, I would greatly appreciate some feedback on
this before I propose it to be merged into pandoc; even a looks good to
me from one of the ConTeXt gurus would be very helpful.

Thanks in advance,

Mark

On Tue, Sep 9, 2014 at 12:20 AM, Mark Szepieniec mszep...@gmail.com wrote:

 I'm trying to fix a problem in pandoc (see
 https://github.com/jgm/pandoc/pull/1589) where it doesn't properly
 sanitize the reference labels in ConTeXt output, causing errors during
 compilation when a label contains '#' for example. Note that this
 sanitizing is needed in addition to the regular backslash escaping used for
 control characters: '\#' is still illegal in a label for example.

 In the sanitizer function I'm writing, I'd like to properly escape all
 illegal characters, but I couldn't find an explicit list of allowed or
 illegal characters. Based on some testing I've conducted (see attached
 file), I've arrived at the following set:

 \#[],{}%()|=

 1) Does this look like a reasonable set? Are there other characters or
 sequences that should be included, or are worth testing?

 2) I was told (see
 https://groups.google.com/forum/#!topic/pandoc-discuss/tYpXMUkmbEY) that
 if the characters  and , didn't work, it would count as a ConTeXt bug, is
 there any truth to that? Please let me know if any further info is needed
 on my part.

 3) Does anyone see issues with this general approach? I'm relatively new
 to ConTeXt, so I might be missing either a huge problem, or an obviously
 easier way to do this.

 Thanks,

 Mark

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___

Re: [NTG-context] Permissible characters in ConTeXt reference labels

2014-09-17 Thread Hans Hagen

On 9/18/2014 12:06 AM, Mark Szepieniec wrote:

Bump...

If it's not too much trouble, I would greatly appreciate some feedback
on this before I propose it to be merged into pandoc; even a looks good
to me from one of the ConTeXt gurus would be very helpful.

Thanks in advance,

Mark

On Tue, Sep 9, 2014 at 12:20 AM, Mark Szepieniec mszep...@gmail.com
mailto:mszep...@gmail.com wrote:

I'm trying to fix a problem in pandoc (see
https://github.com/jgm/pandoc/pull/1589) where it doesn't properly
sanitize the reference labels in ConTeXt output, causing errors
during compilation when a label contains '#' for example. Note that
this sanitizing is needed in addition to the regular backslash
escaping used for control characters: '\#' is still illegal in a
label for example.

In the sanitizer function I'm writing, I'd like to properly escape
all illegal characters, but I couldn't find an explicit list of
allowed or illegal characters. Based on some testing I've conducted
(see attached file), I've arrived at the following set:

\#[],{}%()|=


it depends on where these characters end up in

#  : always tricky as it denotes an argument, so escape
[] : depends if it gets fed into a macro that uses [] as delimiters
{} : only an issue when not balanced
%  : escaping needed as it's comment otherwise
() : depends on where it ends up, like []
|  : is special in context so needs escaping
\  : of course that one needs escaping


1) Does this look like a reasonable set? Are there other characters
or sequences that should be included, or are worth testing?


keep in mind that escapes should end up unescaped at some point


2) I was told (see
https://groups.google.com/forum/#!topic/pandoc-discuss/tYpXMUkmbEY)
that if the characters  and , didn't work, it would count as a
ConTeXt bug, is there any truth to that? Please let me know if any
further info is needed on my part.


well, define bug ... one can say the same of  and  in xml -)

if the result ends up in a comma separated list then , can be an issue 
but one can always wrap an argument in {} to hide that



3) Does anyone see issues with this general approach? I'm relatively
new to ConTeXt, so I might be missing either a huge problem, or an
obviously easier way to do this.


i don't know ... i never used pandoc input

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Permissible characters in ConTeXt reference labels

2014-09-17 Thread Aditya Mahajan

On Thu, 18 Sep 2014, Hans Hagen wrote:


On 9/18/2014 12:06 AM, Mark Szepieniec wrote:

Bump...

If it's not too much trouble, I would greatly appreciate some feedback
on this before I propose it to be merged into pandoc; even a looks good
to me from one of the ConTeXt gurus would be very helpful.

Thanks in advance,

Mark

On Tue, Sep 9, 2014 at 12:20 AM, Mark Szepieniec mszep...@gmail.com
mailto:mszep...@gmail.com wrote:

I'm trying to fix a problem in pandoc (see
https://github.com/jgm/pandoc/pull/1589) where it doesn't properly
sanitize the reference labels in ConTeXt output, causing errors
during compilation when a label contains '#' for example. Note that
this sanitizing is needed in addition to the regular backslash
escaping used for control characters: '\#' is still illegal in a
label for example.


(LaTeX label) = (ConTeXt reference). What Mark mean was references such as

\section[...]{...} or \startplacefigure[reference={...}].


In the sanitizer function I'm writing, I'd like to properly escape
all illegal characters, but I couldn't find an explicit list of
allowed or illegal characters. Based on some testing I've conducted
(see attached file), I've arrived at the following set:

\#[],{}%()|=


it depends on where these characters end up in

#  : always tricky as it denotes an argument, so escape
[] : depends if it gets fed into a macro that uses [] as delimiters
{} : only an issue when not balanced
%  : escaping needed as it's comment otherwise
() : depends on where it ends up, like []
|  : is special in context so needs escaping
\  : of course that one needs escaping


1) Does this look like a reasonable set? Are there other characters
or sequences that should be included, or are worth testing?


keep in mind that escapes should end up unescaped at some point


2) I was told (see
https://groups.google.com/forum/#!topic/pandoc-discuss/tYpXMUkmbEY)
that if the characters  and , didn't work, it would count as a
ConTeXt bug, is there any truth to that? Please let me know if any
further info is needed on my part.


well, define bug ... one can say the same of  and  in xml -)


Since I made that comment on the pandoc mailing list, let me explain.

Consider:

\section[some reference]{Title}

Given how  behaves elsewhere in ConTeXt, a user would expect the above to 
be a valid input. If it is not, then it is bug (or atleast, surprising).


The same goes for

\section[some, reference]{Title}

if the result ends up in a comma separated list then , can be an issue but 
one can always wrap an argument in {} to hide that



3) Does anyone see issues with this general approach? I'm relatively
new to ConTeXt, so I might be missing either a huge problem, or an
obviously easier way to do this.


i don't know ... i never used pandoc input


Aditya
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


[NTG-context] Permissible characters in ConTeXt reference labels

2014-09-08 Thread Mark Szepieniec
I'm trying to fix a problem in pandoc (see
https://github.com/jgm/pandoc/pull/1589) where it doesn't properly sanitize
the reference labels in ConTeXt output, causing errors during compilation
when a label contains '#' for example. Note that this sanitizing is needed
in addition to the regular backslash escaping used for control characters:
'\#' is still illegal in a label for example.

In the sanitizer function I'm writing, I'd like to properly escape all
illegal characters, but I couldn't find an explicit list of allowed or
illegal characters. Based on some testing I've conducted (see attached
file), I've arrived at the following set:

\#[],{}%()|=

1) Does this look like a reasonable set? Are there other characters or
sequences that should be included, or are worth testing?

2) I was told (see
https://groups.google.com/forum/#!topic/pandoc-discuss/tYpXMUkmbEY) that if
the characters  and , didn't work, it would count as a ConTeXt bug, is
there any truth to that? Please let me know if any further info is needed
on my part.

3) Does anyone see issues with this general approach? I'm relatively new to
ConTeXt, so I might be missing either a huge problem, or an obviously
easier way to do this.

Thanks,

Mark


test.tex
Description: TeX document
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___