Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-11 Thread TRS-80

An update!

I am very pleased to announce, that my campaign took me only a little
less than 2 weeks to achieve success!  :)

And so, from now on, not only myself but /everyone/ who prefer writing
in .org instead of .md now have a clear path to nice looking rendered
HTML which is on par with the default .md at Sourcehut![0]

This was the only real gripe I had with that platform, as I generally
love what Drew is trying to do over there and I really wanted to
continue my support and participation.  And now I can!

The main issue was that in-page links, generated by exporting .org ->
.md, and then getting processed by Sourcehut's HTML renderer were
getting broken in the process.  However now they are working, as can now
be seen for example at README of one of my projects:

https://sr.ht/~trs-80/rofi-in-elisp/

Please feel free to go there and click in-page links!  In fact I cannot
remember the last time I got so much enjoyment from something so simple
as clicking on a working in-page link!  :D

Just to re-cap, I had taken a 2 pronged approach.  First was on sr.ht
mailing list[1] about trying to fix the id sanitation in anchor links.
After some discussion and a couple patches, this approach is now live
and working.

However, simultaneously, I had also been pursuing a Pandoc based
solution (which, amazingly, was /also/ broken, but for a different
reason).  This second approach has now also bourne fruit in the form of
a patch and a workaround, and eventually should also become fully
supported.[2]

I include both here for the benefit of anyone who comes searching along
later.

Special thanks to Tim Cross for helping me get the exact issue nailed
down in my head so I could go forth onto other mailing lists and fora
and explain the issue with clarity.

It really feels great to make some small contribution back to the larger
Orgmode ecosystem which has given me so much.

Cheers,
TRS-80

[0] https://sourcehut.org
[1] 
https://lists.sr.ht/~sircmpwn/sr.ht-discuss/%3Cfe7aa296-9c90-463d-b4e6-50eeb7e57428%40localhost%3E

[2] https://github.com/jgm/pandoc/issues/6916



Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-02 Thread Tim Cross


TRS-80  writes:

>> On 2020-12-02 16:59, Tim Cross wrote:
>>> TRS-80  writes:
>>>
>> I note that in the email thread you referenced, the last post suggests
>> setting up a custom readme format which would allow you to use HTML.
>> Maybe that is the easiest route to take - org -> html with custom
>> readme?
>
> Unfortunately, the Org HTML exporter (which is in fact the parent that
> the Markdown exporter was derived from) also makes extensive use of the
> id attribute and anchor links.  So I am afraid those would be sanitized
> out exactly the same.
>

OK. My reading of that response was that the custom approach would give
you full control over the HTML. If they still run some sort of
sanitiser, then you would still have an issue as you actually don't have
full control. However, that does seem like an odd process. I mean, if
you go to the trouble to setup a custom workflow and then having some
arbitrary sanitiser come in at the end and move the goal posts seems
broken to me.


--
Tim Cross



Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-02 Thread TRS-80

On 2020-12-02 16:59, Tim Cross wrote:

TRS-80  writes:

I think the problem is actually because Sourcehut are sanitizing the 
id

attribute out of links, as I have replied already to some other people
in this thread.


From what I can tell, yes your right. However, it also seems that this
is an arbitrary decision by sourcehut. There doesn't seem to be 
anything

in the CommonMark spec which prevents the id attribute. The commonMark
spec explicitly supports raw HTML including attributes. This also makes
me think the problem is not with the org mode exporter either.


You know, as much as my last email may have sounded otherwise, I am now
also thinking this way.

Whitelisting the id attribute should (in theory) be the least amount of
work.  I have replied back on the thread at Sourcehut asking if there is
some (security or other) reason they are blocking it.  Hopefully that
approach bears fruit.


I note that in the email thread you referenced, the last post suggests
setting up a custom readme format which would allow you to use HTML.
Maybe that is the easiest route to take - org -> html with custom
readme?


Unfortunately, the Org HTML exporter (which is in fact the parent that
the Markdown exporter was derived from) also makes extensive use of the
id attribute and anchor links.  So I am afraid those would be sanitized
out exactly the same.

Cheers,
TRS-80



Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-02 Thread Tim Cross


TRS-80  writes:

>> On 2020-12-02 14:44, Tim Cross wrote:
>>
> I think the problem is actually because Sourcehut are sanitizing the id
> attribute out of links, as I have replied already to some other people
> in this thread.
>

>From what I can tell, yes your right. However, it also seems that this
is an arbitrary decision by sourcehut. There doesn't seem to be anything
in the CommonMark spec which prevents the id attribute. The commonMark
spec explicitly supports raw HTML including attributes. This also makes
me think the problem is not with the org mode exporter either.

Basically, sourcehut is using the commonMark spec plus their own
'extensions/modifications' to that spec. The org exporter is not
breaking the commonMark spec.

Org mode exports could use what could be argued a simpler link target
export style, but perhaps using the 'raw' HTML approach makes it more
flexible/compliant with different markdown flavors? Problem is, changing
this now could result in lots of breakage for others where it is now
working.

I note that in the email thread you referenced, the last post suggests
setting up a custom readme format which would allow you to use HTML.
Maybe that is the easiest route to take - org -> html with custom
readme?

--
Tim Cross



Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-02 Thread TRS-80

On 2020-12-02 14:56, TRS-80 wrote:

On 2020-12-02 14:12, Jean Louis wrote:
Try using pandoc Org to Markdown as that could help until Org
exporting start working how it should.


Great minds must think alike!  :)  I tried that already but in-page 
links

which look like:

```
[[*Setup][Setup]]
```

Somehow get exported to:

```
*Setup*
```

...which is not a link at all, but rather just italicised text.

In fact, I think I will go now and bring that up to pandoc project...


FWIW, I did go and post to their mailing list:

https://groups.google.com/g/pandoc-discuss/c/D-8J4RGiYsk/m/g45AutiNAAAJ

Cheers,
TRS-80



Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-02 Thread TRS-80

On 2020-12-02 14:44, Tim Cross wrote:

I could be completely wrong here, but I suspect this is a combination
of the evolving markdown spec (or more specifically, no one standard
spec) and the age of the org mode markdown exporter.


FWIW, it sort of feels that way to me, too.


One of the challenges with markdown is that there has never been one
universally accepted spec.


Yes, I am aware of the history.  And this is one of my main criticisms
of Markdown, and why I prefer Orgmode (by a wide margin).  In fact I
agree (strongly) with Karl Voit's article "Org-Mode Is One of the Most
Reasonable Markup Languages to Use for Text."[0]


It might be worth looking in the archive. I seem to recall other
discussions along these lines some months back. My flawed memory seems
to recall that it was probably time for org's markdown exporter to be
updated to fit with the more 'common' markdown standard, but I don't
recall which that was or whether anyone decided to take that
responsibility on.


Thanks, I'll have a look.  I was also sort of getting the sense that
updating the Markdown exporter might be the answer, hence me starting
this thread.


Org already has two markdown flavors - 'generic' markdown and github
flavoured markdown. Org's current markdown is based on
http://daringfireball.net/projects/markdown, which probably varies
enough from the one used by sourcehut to cause the problems you are
seeing.


I think the problem is actually because Sourcehut are sanitizing the id
attribute out of links, as I have replied already to some other people
in this thread.


Unfortunately, this fails to provide a clear path to fix your problem.


Yeah, this is like (at least) my second day into this now.  :)  Which is
why I paused to seek more counsel on the best way forward.  As there
seem to be no good/clear (or at least, easy) answers.

Thanks for the input.

Cheers,
TRS-80

[0] https://karl-voit.at/2017/09/23/orgmode-as-markup-only/



Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-02 Thread TRS-80

On Wed, Dec 2, 2020 at 7:54 PM TRS-80 
wrote:

Some further digging revealed that the ox-md exporter (which itself
is derived from the HTML exporter(?) makes extensive use of the id
attribute in links.  And Sourcehut's HTML sanitizer only allows href
and title attributes (not id).[1]

[1] https://man.sr.ht/markdown/#post-processing



On 2020-12-02 14:45, Diego Zamboni wrote:

(note: Markdown allows embedded HTML, so ox-md's behavior is not
incorrect)


Right.  However, unless I am missing something, they (Sourcehut) are
passing the HTML through, but their sanitizer is only allowing a subset
of attributes.  Look closely (on the sr.ht Markdown page we both have
now linked) at what attributes are allowed for links:

a: href, title

Only href and title.  No id!


Note that according to https://man.sr.ht/markdown/#post-processing,
Sourcehut uses CommonMark, not plain Markdown, so I guess that's why
it doesn't allow all HTML tags.


Yes, I am aware they use CommonMark.

However it seems to me that CommonMark is just a less ambiguously
defined version of Markdown, designed to address those particular
criticisms of the original.

I don't know if there is anything in CommonMark spec itself that forbids
passing HTML through, and I did not bother to look that up (and thus I
could be wrong) however I think the problem is rather what I already
said above about id attributes being blocked, by Sourcehut particular
HTML sanitizer.


There seems to be no ox-commonmark (that I could find) but pandoc does
support it, so you could probably use ox-pandoc
(https://github.com/kawabata/ox-pandoc) to export your documents in
CommonMark format.


I did try using command line pandoc, and ran into some problems (which I
outlined already in separate reply to Jean Louis), but maybe I give this
a try, too.  Thanks.

Cheers,
TRS-80



Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-02 Thread TRS-80

On 2020-12-02 14:12, Jean Louis wrote:

Try using pandoc Org to Markdown as that could help until Org
exporting start working how it should.


Great minds must think alike!  :)  I tried that already but in-page 
links

which look like:

```
[[*Setup][Setup]]
```

Somehow get exported to:

```
*Setup*
```

...which is not a link at all, but rather just italicised text.

In fact, I think I will go now and bring that up to pandoc project...

They have several different flavored Markdown exporters available, I
tried all of them in fact.

FWIW, I also tried going the route of exporting to HTML (which is also
supported at Sourcehut)[0] however I ran into lots of other similar
issues like in OP, as the HTML exporter does the same things with id in
links.

I actually worked on this all day yesterday, still without success.  I
am of course still open to any suggestions, but I came to conclusion so
far that fixing the ox-md exporter might be the Right Way and I was
willing to dig into that further, myself.

I work extensively in Orgmode and have plans to publish many more things
(likely on Sourcehut) and I would rather do whatever work is required to
fix ox-md than change all my existing (or new) work to Markdown.  I
really do hate Markdown that much.  :D

Also, maybe in the meantime I could simply hard code a ToC, but that
still will not fix other in-page links throughout the page (like the
*Setup example, above).

Cheers,
TRS-80

[0] https://man.sr.ht/git.sr.ht/#setting-a-custom-readme



Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-02 Thread Diego Zamboni
Hi TRS-80,

Note that according to https://man.sr.ht/markdown/#post-processing,
Sourcehut uses CommonMark, not plain Markdown, so I guess that's why it
doesn't allow all HTML tags.

(note: Markdown allows embedded HTML, so ox-md's behavior is not incorrect)

There seems to be no ox-commonmark (that I could find) but pandoc does
support it, so you could probably use ox-pandoc (
https://github.com/kawabata/ox-pandoc) to export your documents in
CommonMark format.

--Diego


On Wed, Dec 2, 2020 at 7:54 PM TRS-80  wrote:

> Hallo,
>
> I became quite interested in what Drew Devault was doing with his
> Sourcehut project, so I decided to join.  I was really enjoying
> everything except for the fact that .org files are not supported insofar
> as automatic rendering into nice looking HTML in the same way that
> Markdown files are for the README at the root of the project.  And the
> official word is that only Markdown is to be supported.[0]
>
> So I start digging into this, my first try was to use
> org-md-export-to-markdown function to generate the supported Markdown.
> However, doing it that way broke all inter-page links (to headings,
> footnotes, etc.).
>
> Some further digging revealed that the ox-md exporter (which itself is
> derived from the HTML exporter(?) makes extensive use of the id
> attribute in links.  And Sourcehut's HTML sanitizer only allows href and
> title attributes (not id).[1]
>
> For example, here are the sort of links that the ox-md exporter create:
>
> ToC:
>
> ```
> 1.  [rofi-in-elisp](#orgdbf2274)
> ```
>
> Body:
>
> ```
> 
>
> # rofi-in-elisp
> ```
>
> Above was copied straight from Eli Schwartz reply to me in my post to
> Sourcehut mailing list about this[0] (although I had already noticed the
> same thing as well).
>
> I tend to agree with him that this is not following the Markdown spec,
> where links should instead become simply:
>
> ToC:
>
> ```
> 1.  [rofi-in-elisp](#rofi-in-elisp)
> ```
>
> And if so, then the Right Thing to do would be to fix that in the ox-md
> exporter?
>
> However OTOH, I can't help but venture a guess that there must have been
> some reason to do it that way in the first place.
>
> So before I invest any more time going down this path, I thought I would
> take a step back and seek some advice whether this is actually the
> correct path or not?
>
> Cheers,
> TRS-80
>
> [0]
>
> https://lists.sr.ht/~sircmpwn/sr.ht-discuss/%3Cfe7aa296-9c90-463d-b4e6-50eeb7e57428%40localhost%3E
> [1] https://man.sr.ht/markdown/#post-processing
>
>


Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-02 Thread Tim Cross


TRS-80  writes:

> Hallo,
>
> I became quite interested in what Drew Devault was doing with his
> Sourcehut project, so I decided to join.  I was really enjoying
> everything except for the fact that .org files are not supported insofar
> as automatic rendering into nice looking HTML in the same way that
> Markdown files are for the README at the root of the project.  And the
> official word is that only Markdown is to be supported.[0]
>
> So I start digging into this, my first try was to use
> org-md-export-to-markdown function to generate the supported Markdown.
> However, doing it that way broke all inter-page links (to headings,
> footnotes, etc.).
>
> Some further digging revealed that the ox-md exporter (which itself is
> derived from the HTML exporter(?) makes extensive use of the id
> attribute in links.  And Sourcehut's HTML sanitizer only allows href and
> title attributes (not id).[1]
>
> For example, here are the sort of links that the ox-md exporter create:
>
> ToC:
>
> ```
> 1.  [rofi-in-elisp](#orgdbf2274)
> ```
>
> Body:
>
> ```
> 
>
> # rofi-in-elisp
> ```
>
> Above was copied straight from Eli Schwartz reply to me in my post to
> Sourcehut mailing list about this[0] (although I had already noticed the
> same thing as well).
>
> I tend to agree with him that this is not following the Markdown spec,
> where links should instead become simply:
>
> ToC:
>
> ```
> 1.  [rofi-in-elisp](#rofi-in-elisp)
> ```
>
> And if so, then the Right Thing to do would be to fix that in the ox-md
> exporter?
>
> However OTOH, I can't help but venture a guess that there must have been
> some reason to do it that way in the first place.
>
> So before I invest any more time going down this path, I thought I would
> take a step back and seek some advice whether this is actually the
> correct path or not?
>
> Cheers,
> TRS-80
>
> [0]
> https://lists.sr.ht/~sircmpwn/sr.ht-discuss/%3Cfe7aa296-9c90-463d-b4e6-50eeb7e57428%40localhost%3E
> [1] https://man.sr.ht/markdown/#post-processing

I could be completely wrong here, but I suspect this is a combination of
the evolving markdown spec (or more specifically, no one standard spec)
and the age of the org mode markdown exporter. This probably highlights
the advantages of a standardised spec.

One of the challenges with markdown is that there has never been one
universally accepted spec. While the situation has consolidated somewhat
since markdown first became popular, there is still some variation in
implementations and some of the decisions made when the org mode
exporter was first implemented may not be as correct/accepted now.

It might be worth looking in the archive. I seem to recall other
discussions along these lines some months back. My flawed memory seems
to recall that it was probably time for org's markdown exporter to be
updated to fit with the more 'common' markdown standard, but I don't
recall which that was or whether anyone decided to take that
responsibility on.

Org already has two markdown flavors - 'generic' markdown and github
flavoured markdown. Org's current markdown is based on
http://daringfireball.net/projects/markdown, which probably varies
enough from the one used by sourcehut to cause the problems you are
seeing.

Unfortunately, this fails to provide a clear path to fix your problem. I
guess the 'sane' thing to do would be to look at how the two different
specs differ and then decide if that difference can be managed by
providing additional customisation options to the existing markdown
exporter or whether the differences are sufficient to warrant another
completely different markdown exporter along similar lines to the github
flavoured markdown (probably also worth checking the differences between
sourceht and github as well in case it is closer to what sorcehut
expects).

--
Tim Cross



Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-02 Thread Jean Louis
Try using pandoc Org to Markdown as that could help until Org
exporting start working how it should.



Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?

2020-12-02 Thread TRS-80

Hallo,

I became quite interested in what Drew Devault was doing with his
Sourcehut project, so I decided to join.  I was really enjoying
everything except for the fact that .org files are not supported insofar
as automatic rendering into nice looking HTML in the same way that
Markdown files are for the README at the root of the project.  And the
official word is that only Markdown is to be supported.[0]

So I start digging into this, my first try was to use
org-md-export-to-markdown function to generate the supported Markdown.
However, doing it that way broke all inter-page links (to headings,
footnotes, etc.).

Some further digging revealed that the ox-md exporter (which itself is
derived from the HTML exporter(?) makes extensive use of the id
attribute in links.  And Sourcehut's HTML sanitizer only allows href and
title attributes (not id).[1]

For example, here are the sort of links that the ox-md exporter create:

ToC:

```
1.  [rofi-in-elisp](#orgdbf2274)
```

Body:

```


# rofi-in-elisp
```

Above was copied straight from Eli Schwartz reply to me in my post to
Sourcehut mailing list about this[0] (although I had already noticed the
same thing as well).

I tend to agree with him that this is not following the Markdown spec,
where links should instead become simply:

ToC:

```
1.  [rofi-in-elisp](#rofi-in-elisp)
```

And if so, then the Right Thing to do would be to fix that in the ox-md
exporter?

However OTOH, I can't help but venture a guess that there must have been
some reason to do it that way in the first place.

So before I invest any more time going down this path, I thought I would
take a step back and seek some advice whether this is actually the
correct path or not?

Cheers,
TRS-80

[0] 
https://lists.sr.ht/~sircmpwn/sr.ht-discuss/%3Cfe7aa296-9c90-463d-b4e6-50eeb7e57428%40localhost%3E

[1] https://man.sr.ht/markdown/#post-processing