Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
An update! I am very pleased to announce, that my campaign took me only a little less than 2 weeks to achieve success! :) And so, from now on, not only myself but /everyone/ who prefer writing in .org instead of .md now have a clear path to nice looking rendered HTML which is on par with the default .md at Sourcehut![0] This was the only real gripe I had with that platform, as I generally love what Drew is trying to do over there and I really wanted to continue my support and participation. And now I can! The main issue was that in-page links, generated by exporting .org -> .md, and then getting processed by Sourcehut's HTML renderer were getting broken in the process. However now they are working, as can now be seen for example at README of one of my projects: https://sr.ht/~trs-80/rofi-in-elisp/ Please feel free to go there and click in-page links! In fact I cannot remember the last time I got so much enjoyment from something so simple as clicking on a working in-page link! :D Just to re-cap, I had taken a 2 pronged approach. First was on sr.ht mailing list[1] about trying to fix the id sanitation in anchor links. After some discussion and a couple patches, this approach is now live and working. However, simultaneously, I had also been pursuing a Pandoc based solution (which, amazingly, was /also/ broken, but for a different reason). This second approach has now also bourne fruit in the form of a patch and a workaround, and eventually should also become fully supported.[2] I include both here for the benefit of anyone who comes searching along later. Special thanks to Tim Cross for helping me get the exact issue nailed down in my head so I could go forth onto other mailing lists and fora and explain the issue with clarity. It really feels great to make some small contribution back to the larger Orgmode ecosystem which has given me so much. Cheers, TRS-80 [0] https://sourcehut.org [1] https://lists.sr.ht/~sircmpwn/sr.ht-discuss/%3Cfe7aa296-9c90-463d-b4e6-50eeb7e57428%40localhost%3E [2] https://github.com/jgm/pandoc/issues/6916
Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
TRS-80 writes: >> On 2020-12-02 16:59, Tim Cross wrote: >>> TRS-80 writes: >>> >> I note that in the email thread you referenced, the last post suggests >> setting up a custom readme format which would allow you to use HTML. >> Maybe that is the easiest route to take - org -> html with custom >> readme? > > Unfortunately, the Org HTML exporter (which is in fact the parent that > the Markdown exporter was derived from) also makes extensive use of the > id attribute and anchor links. So I am afraid those would be sanitized > out exactly the same. > OK. My reading of that response was that the custom approach would give you full control over the HTML. If they still run some sort of sanitiser, then you would still have an issue as you actually don't have full control. However, that does seem like an odd process. I mean, if you go to the trouble to setup a custom workflow and then having some arbitrary sanitiser come in at the end and move the goal posts seems broken to me. -- Tim Cross
Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
On 2020-12-02 16:59, Tim Cross wrote: TRS-80 writes: I think the problem is actually because Sourcehut are sanitizing the id attribute out of links, as I have replied already to some other people in this thread. From what I can tell, yes your right. However, it also seems that this is an arbitrary decision by sourcehut. There doesn't seem to be anything in the CommonMark spec which prevents the id attribute. The commonMark spec explicitly supports raw HTML including attributes. This also makes me think the problem is not with the org mode exporter either. You know, as much as my last email may have sounded otherwise, I am now also thinking this way. Whitelisting the id attribute should (in theory) be the least amount of work. I have replied back on the thread at Sourcehut asking if there is some (security or other) reason they are blocking it. Hopefully that approach bears fruit. I note that in the email thread you referenced, the last post suggests setting up a custom readme format which would allow you to use HTML. Maybe that is the easiest route to take - org -> html with custom readme? Unfortunately, the Org HTML exporter (which is in fact the parent that the Markdown exporter was derived from) also makes extensive use of the id attribute and anchor links. So I am afraid those would be sanitized out exactly the same. Cheers, TRS-80
Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
TRS-80 writes: >> On 2020-12-02 14:44, Tim Cross wrote: >> > I think the problem is actually because Sourcehut are sanitizing the id > attribute out of links, as I have replied already to some other people > in this thread. > >From what I can tell, yes your right. However, it also seems that this is an arbitrary decision by sourcehut. There doesn't seem to be anything in the CommonMark spec which prevents the id attribute. The commonMark spec explicitly supports raw HTML including attributes. This also makes me think the problem is not with the org mode exporter either. Basically, sourcehut is using the commonMark spec plus their own 'extensions/modifications' to that spec. The org exporter is not breaking the commonMark spec. Org mode exports could use what could be argued a simpler link target export style, but perhaps using the 'raw' HTML approach makes it more flexible/compliant with different markdown flavors? Problem is, changing this now could result in lots of breakage for others where it is now working. I note that in the email thread you referenced, the last post suggests setting up a custom readme format which would allow you to use HTML. Maybe that is the easiest route to take - org -> html with custom readme? -- Tim Cross
Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
On 2020-12-02 14:56, TRS-80 wrote: On 2020-12-02 14:12, Jean Louis wrote: Try using pandoc Org to Markdown as that could help until Org exporting start working how it should. Great minds must think alike! :) I tried that already but in-page links which look like: ``` [[*Setup][Setup]] ``` Somehow get exported to: ``` *Setup* ``` ...which is not a link at all, but rather just italicised text. In fact, I think I will go now and bring that up to pandoc project... FWIW, I did go and post to their mailing list: https://groups.google.com/g/pandoc-discuss/c/D-8J4RGiYsk/m/g45AutiNAAAJ Cheers, TRS-80
Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
On 2020-12-02 14:44, Tim Cross wrote: I could be completely wrong here, but I suspect this is a combination of the evolving markdown spec (or more specifically, no one standard spec) and the age of the org mode markdown exporter. FWIW, it sort of feels that way to me, too. One of the challenges with markdown is that there has never been one universally accepted spec. Yes, I am aware of the history. And this is one of my main criticisms of Markdown, and why I prefer Orgmode (by a wide margin). In fact I agree (strongly) with Karl Voit's article "Org-Mode Is One of the Most Reasonable Markup Languages to Use for Text."[0] It might be worth looking in the archive. I seem to recall other discussions along these lines some months back. My flawed memory seems to recall that it was probably time for org's markdown exporter to be updated to fit with the more 'common' markdown standard, but I don't recall which that was or whether anyone decided to take that responsibility on. Thanks, I'll have a look. I was also sort of getting the sense that updating the Markdown exporter might be the answer, hence me starting this thread. Org already has two markdown flavors - 'generic' markdown and github flavoured markdown. Org's current markdown is based on http://daringfireball.net/projects/markdown, which probably varies enough from the one used by sourcehut to cause the problems you are seeing. I think the problem is actually because Sourcehut are sanitizing the id attribute out of links, as I have replied already to some other people in this thread. Unfortunately, this fails to provide a clear path to fix your problem. Yeah, this is like (at least) my second day into this now. :) Which is why I paused to seek more counsel on the best way forward. As there seem to be no good/clear (or at least, easy) answers. Thanks for the input. Cheers, TRS-80 [0] https://karl-voit.at/2017/09/23/orgmode-as-markup-only/
Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
On Wed, Dec 2, 2020 at 7:54 PM TRS-80 wrote: Some further digging revealed that the ox-md exporter (which itself is derived from the HTML exporter(?) makes extensive use of the id attribute in links. And Sourcehut's HTML sanitizer only allows href and title attributes (not id).[1] [1] https://man.sr.ht/markdown/#post-processing On 2020-12-02 14:45, Diego Zamboni wrote: (note: Markdown allows embedded HTML, so ox-md's behavior is not incorrect) Right. However, unless I am missing something, they (Sourcehut) are passing the HTML through, but their sanitizer is only allowing a subset of attributes. Look closely (on the sr.ht Markdown page we both have now linked) at what attributes are allowed for links: a: href, title Only href and title. No id! Note that according to https://man.sr.ht/markdown/#post-processing, Sourcehut uses CommonMark, not plain Markdown, so I guess that's why it doesn't allow all HTML tags. Yes, I am aware they use CommonMark. However it seems to me that CommonMark is just a less ambiguously defined version of Markdown, designed to address those particular criticisms of the original. I don't know if there is anything in CommonMark spec itself that forbids passing HTML through, and I did not bother to look that up (and thus I could be wrong) however I think the problem is rather what I already said above about id attributes being blocked, by Sourcehut particular HTML sanitizer. There seems to be no ox-commonmark (that I could find) but pandoc does support it, so you could probably use ox-pandoc (https://github.com/kawabata/ox-pandoc) to export your documents in CommonMark format. I did try using command line pandoc, and ran into some problems (which I outlined already in separate reply to Jean Louis), but maybe I give this a try, too. Thanks. Cheers, TRS-80
Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
On 2020-12-02 14:12, Jean Louis wrote: Try using pandoc Org to Markdown as that could help until Org exporting start working how it should. Great minds must think alike! :) I tried that already but in-page links which look like: ``` [[*Setup][Setup]] ``` Somehow get exported to: ``` *Setup* ``` ...which is not a link at all, but rather just italicised text. In fact, I think I will go now and bring that up to pandoc project... They have several different flavored Markdown exporters available, I tried all of them in fact. FWIW, I also tried going the route of exporting to HTML (which is also supported at Sourcehut)[0] however I ran into lots of other similar issues like in OP, as the HTML exporter does the same things with id in links. I actually worked on this all day yesterday, still without success. I am of course still open to any suggestions, but I came to conclusion so far that fixing the ox-md exporter might be the Right Way and I was willing to dig into that further, myself. I work extensively in Orgmode and have plans to publish many more things (likely on Sourcehut) and I would rather do whatever work is required to fix ox-md than change all my existing (or new) work to Markdown. I really do hate Markdown that much. :D Also, maybe in the meantime I could simply hard code a ToC, but that still will not fix other in-page links throughout the page (like the *Setup example, above). Cheers, TRS-80 [0] https://man.sr.ht/git.sr.ht/#setting-a-custom-readme
Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
Hi TRS-80, Note that according to https://man.sr.ht/markdown/#post-processing, Sourcehut uses CommonMark, not plain Markdown, so I guess that's why it doesn't allow all HTML tags. (note: Markdown allows embedded HTML, so ox-md's behavior is not incorrect) There seems to be no ox-commonmark (that I could find) but pandoc does support it, so you could probably use ox-pandoc ( https://github.com/kawabata/ox-pandoc) to export your documents in CommonMark format. --Diego On Wed, Dec 2, 2020 at 7:54 PM TRS-80 wrote: > Hallo, > > I became quite interested in what Drew Devault was doing with his > Sourcehut project, so I decided to join. I was really enjoying > everything except for the fact that .org files are not supported insofar > as automatic rendering into nice looking HTML in the same way that > Markdown files are for the README at the root of the project. And the > official word is that only Markdown is to be supported.[0] > > So I start digging into this, my first try was to use > org-md-export-to-markdown function to generate the supported Markdown. > However, doing it that way broke all inter-page links (to headings, > footnotes, etc.). > > Some further digging revealed that the ox-md exporter (which itself is > derived from the HTML exporter(?) makes extensive use of the id > attribute in links. And Sourcehut's HTML sanitizer only allows href and > title attributes (not id).[1] > > For example, here are the sort of links that the ox-md exporter create: > > ToC: > > ``` > 1. [rofi-in-elisp](#orgdbf2274) > ``` > > Body: > > ``` > > > # rofi-in-elisp > ``` > > Above was copied straight from Eli Schwartz reply to me in my post to > Sourcehut mailing list about this[0] (although I had already noticed the > same thing as well). > > I tend to agree with him that this is not following the Markdown spec, > where links should instead become simply: > > ToC: > > ``` > 1. [rofi-in-elisp](#rofi-in-elisp) > ``` > > And if so, then the Right Thing to do would be to fix that in the ox-md > exporter? > > However OTOH, I can't help but venture a guess that there must have been > some reason to do it that way in the first place. > > So before I invest any more time going down this path, I thought I would > take a step back and seek some advice whether this is actually the > correct path or not? > > Cheers, > TRS-80 > > [0] > > https://lists.sr.ht/~sircmpwn/sr.ht-discuss/%3Cfe7aa296-9c90-463d-b4e6-50eeb7e57428%40localhost%3E > [1] https://man.sr.ht/markdown/#post-processing > >
Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
TRS-80 writes: > Hallo, > > I became quite interested in what Drew Devault was doing with his > Sourcehut project, so I decided to join. I was really enjoying > everything except for the fact that .org files are not supported insofar > as automatic rendering into nice looking HTML in the same way that > Markdown files are for the README at the root of the project. And the > official word is that only Markdown is to be supported.[0] > > So I start digging into this, my first try was to use > org-md-export-to-markdown function to generate the supported Markdown. > However, doing it that way broke all inter-page links (to headings, > footnotes, etc.). > > Some further digging revealed that the ox-md exporter (which itself is > derived from the HTML exporter(?) makes extensive use of the id > attribute in links. And Sourcehut's HTML sanitizer only allows href and > title attributes (not id).[1] > > For example, here are the sort of links that the ox-md exporter create: > > ToC: > > ``` > 1. [rofi-in-elisp](#orgdbf2274) > ``` > > Body: > > ``` > > > # rofi-in-elisp > ``` > > Above was copied straight from Eli Schwartz reply to me in my post to > Sourcehut mailing list about this[0] (although I had already noticed the > same thing as well). > > I tend to agree with him that this is not following the Markdown spec, > where links should instead become simply: > > ToC: > > ``` > 1. [rofi-in-elisp](#rofi-in-elisp) > ``` > > And if so, then the Right Thing to do would be to fix that in the ox-md > exporter? > > However OTOH, I can't help but venture a guess that there must have been > some reason to do it that way in the first place. > > So before I invest any more time going down this path, I thought I would > take a step back and seek some advice whether this is actually the > correct path or not? > > Cheers, > TRS-80 > > [0] > https://lists.sr.ht/~sircmpwn/sr.ht-discuss/%3Cfe7aa296-9c90-463d-b4e6-50eeb7e57428%40localhost%3E > [1] https://man.sr.ht/markdown/#post-processing I could be completely wrong here, but I suspect this is a combination of the evolving markdown spec (or more specifically, no one standard spec) and the age of the org mode markdown exporter. This probably highlights the advantages of a standardised spec. One of the challenges with markdown is that there has never been one universally accepted spec. While the situation has consolidated somewhat since markdown first became popular, there is still some variation in implementations and some of the decisions made when the org mode exporter was first implemented may not be as correct/accepted now. It might be worth looking in the archive. I seem to recall other discussions along these lines some months back. My flawed memory seems to recall that it was probably time for org's markdown exporter to be updated to fit with the more 'common' markdown standard, but I don't recall which that was or whether anyone decided to take that responsibility on. Org already has two markdown flavors - 'generic' markdown and github flavoured markdown. Org's current markdown is based on http://daringfireball.net/projects/markdown, which probably varies enough from the one used by sourcehut to cause the problems you are seeing. Unfortunately, this fails to provide a clear path to fix your problem. I guess the 'sane' thing to do would be to look at how the two different specs differ and then decide if that difference can be managed by providing additional customisation options to the existing markdown exporter or whether the differences are sufficient to warrant another completely different markdown exporter along similar lines to the github flavoured markdown (probably also worth checking the differences between sourceht and github as well in case it is closer to what sorcehut expects). -- Tim Cross
Re: Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
Try using pandoc Org to Markdown as that could help until Org exporting start working how it should.
Exporting .org to .md for Sourcehut (sr.ht); ox-md not following Markdown spec?
Hallo, I became quite interested in what Drew Devault was doing with his Sourcehut project, so I decided to join. I was really enjoying everything except for the fact that .org files are not supported insofar as automatic rendering into nice looking HTML in the same way that Markdown files are for the README at the root of the project. And the official word is that only Markdown is to be supported.[0] So I start digging into this, my first try was to use org-md-export-to-markdown function to generate the supported Markdown. However, doing it that way broke all inter-page links (to headings, footnotes, etc.). Some further digging revealed that the ox-md exporter (which itself is derived from the HTML exporter(?) makes extensive use of the id attribute in links. And Sourcehut's HTML sanitizer only allows href and title attributes (not id).[1] For example, here are the sort of links that the ox-md exporter create: ToC: ``` 1. [rofi-in-elisp](#orgdbf2274) ``` Body: ``` # rofi-in-elisp ``` Above was copied straight from Eli Schwartz reply to me in my post to Sourcehut mailing list about this[0] (although I had already noticed the same thing as well). I tend to agree with him that this is not following the Markdown spec, where links should instead become simply: ToC: ``` 1. [rofi-in-elisp](#rofi-in-elisp) ``` And if so, then the Right Thing to do would be to fix that in the ox-md exporter? However OTOH, I can't help but venture a guess that there must have been some reason to do it that way in the first place. So before I invest any more time going down this path, I thought I would take a step back and seek some advice whether this is actually the correct path or not? Cheers, TRS-80 [0] https://lists.sr.ht/~sircmpwn/sr.ht-discuss/%3Cfe7aa296-9c90-463d-b4e6-50eeb7e57428%40localhost%3E [1] https://man.sr.ht/markdown/#post-processing