Re: Bug: inconsistent escaping of coderef regexp
Hello, Tom Gillespie writes: > We can decouple them, so not an issue. The attached patch implements > the header arg equivalents of -r and -l without making any changes to the > existing switch behavior. This is already conflating the two. I'd like to solve the issue at hand without having header args interfere at all. This can happen later, after a discussion on the ML. >> What do you mean by "it is impossible for the user to specify their own >> coderef regexp that can be used in both cases"? In particular, what is >> a coderef regexp in this context? I know about coderef format, but >> I don't think users are supposed to provide a regexp here. > > I did a first pass implementation and realized that allowing users to > specify coderef-regexp is a bad idea. The attached patch fixes the > divergent behavior of org-bable-tangle-single-block and provides a > standard way to specify a :coderef-prefix regexp so that empty > comments can be stripped. Would you mind answering my questions first? I still don't follow you about the coderef prefix/regexp. Regards, -- Nicolas Goaziou
Re: Bug: inconsistent escaping of coderef regexp
Hi Nicolas, I've included the simplest patch I could come up with for the divergence in behavior between org-babel-tangle-single-file and org-link-search. I think there are two new threads that I need to create. One is related to how to make it possible to specify what should be removed along with the coderef (i.e. coderef prefix), the other is the addition of header arguments that provide the same functionality as switches. Best, Tom > This is already conflating the two. I'd like to solve the issue at hand > without having header args interfere at all. > > This can happen later, after a discussion on the ML. Ok. I've included the simplest version of the fix, which is to use org-src-coderef-regexp in org-babel-tangle-single-file. > Would you mind answering my questions first? I still don't follow you > about the coderef prefix/regexp. https://code.orgmode.org/bzg/org-mode/src/2d78ea57cfad1ddc3e993c949daf117b76315170/lisp/org-src.el#L882 That line defines a hardcoded regular expression for matching coderefs. The codref prefix is the first =[ \t]*= and the coderef regexp is the equivalent to the fully formatted version of that format string. Neither of those can currently be specified by the user. The user should not be able to specify the coderef regexp due to the fact that it is too easy to specify a regexp that will not work correctly and because the format string is needed to make org-link-search work for named coderefs (otherwise you wind up trying to replace .+ in the coderef regexp which is a nightmare). The coderef prefix is something that should probably be configurable by the user so that empty comments are not left in the file. I also looked into detecting the comment character for the language in question, but that is significantly more difficult even using (with-temp-buffer (funcall lang-mode) comment-start) because not all languages have sane comment start values and comment-start is not complete, so we would need a way to manually specify what to exclude anyway. From c30913da6b1c8d6be3670a59ae867df019505af3 Mon Sep 17 00:00:00 2001 From: Tom Gillespie Date: Wed, 7 Apr 2021 12:29:01 -0700 Subject: [PATCH] lisp/ob-tangle.el: Fix coderef removal during tangling * lisp/ob-tangle.el (orb-babel-tangle-single-block): Regularize behavior when removing coderefs during tangling. This fixes an issue where trailing whitespace would be retained when coderefs were removed for tangling. --- lisp/ob-tangle.el | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/lisp/ob-tangle.el b/lisp/ob-tangle.el index aa0373ab8..4c0c3132d 100644 --- a/lisp/ob-tangle.el +++ b/lisp/ob-tangle.el @@ -414,9 +414,8 @@ non-nil, return the full association list to be used by (src-lang (nth 0 info)) (params (nth 2 info)) (extra (nth 3 info)) - (cref-fmt (or (and (string-match "-l \"\\(.+\\)\"" extra) - (match-string 1 extra)) - org-coderef-label-format)) + (coderef (nth 6 info)) + (cref-regexp (org-src-coderef-regexp coderef)) (link (let ((l (org-no-properties (org-store-link nil (and (string-match org-link-bracket-re l) (match-string 1 l @@ -445,8 +444,7 @@ non-nil, return the full association list to be used by (funcall assignments-cmd params)) (when (string-match "-r" extra) (goto-char (point-min)) - (while (re-search-forward - (replace-regexp-in-string "%s" ".+" cref-fmt) nil t) + (while (re-search-forward cref-regexp nil t) (replace-match ""))) (run-hooks 'org-babel-tangle-body-hook) (buffer-string -- 2.26.3
Re: Bug: inconsistent escaping of coderef regexp
Hello, Tom Gillespie writes: > I've included the simplest patch I could come up with for the > divergence in behavior between org-babel-tangle-single-file and > org-link-search. Thank you. I applied it. > https://code.orgmode.org/bzg/org-mode/src/2d78ea57cfad1ddc3e993c949daf117b76315170/lisp/org-src.el#L882 > > That line defines a hardcoded regular expression for matching > coderefs. It is a reasonable one, however, since it just means "the coderef is the last thing on the line". > The codref prefix is the first =[ \t]*= and the coderef > regexp is the equivalent to the fully formatted version of that format > string. Neither of those can currently be specified by the user. [...] > The coderef prefix is something that should probably be configurable > by the user so that empty comments are not left in the file. I'm not convinced this should be configurable, as I fail to see a use-case for such a customization. Note that in the case you mention, comment syntax should probably be part of the coderef label itself. Regards, -- Nicolas Goaziou
Re: Bug: inconsistent escaping of coderef regexp
Hello, Tom Gillespie writes: > It is not possible to strip coderefs when tangling and also search for > those coderefs using org-link-search. This is because org-link-search > uses org-src-coderef-regexp which calls regexp-quote on the regexp > string while org-babel-tangle-single-block does not and uses the > regexp string directly without quoting it. I'm not sure about the best > way to fix this. It seems to me that the call to regexp-quote should > be removed but I'm not entirely sure of the consequences of doing > that. Thoughts? This would be wrong. A format string is not a regexp, and therefore needs to be quoted. E.g., one might want to use "/* %s */". `org-babel-tangle-single-block' should make use of `org-src-coderef-regexp' and, possibly, `org-src-coderef-format'. > PS While on the topic of coderefs, let me drop a note that is a > preview of some of the issues I have encountered while working on a > full formal grammar for org. Having the -l switch control this is an > awful design that induces more complexity into the org-mode grammar > than nearly any other feature. Source block switches are completely > inconsistent with the rest of org and completely undiscoverable. I had > no idea they even existed until I was trying to figure out which > header argument could be used to set the coderef regexp. The -l option > and switches in general need to have their behavior implemented as > part of the standard header arguments like everything else so that > users can migrate away from switches with an eye toward removing them > entirely. Switches predate Babel and its header arguments. Also, they are orthogonal to Babel. I don't know what a sound design would be here, tho. Regards, -- Nicolas Goaziou
Re: Bug: inconsistent escaping of coderef regexp
Nicolas Goaziou writes: > Switches predate Babel and its header arguments. Also, they are > orthogonal to Babel. I don't know what a sound design would be here, > tho. Breaking backwards comparability is a tough call, but somehow I feel that it would be appropriate here. A number of reasons why they should go come to mind: - Rarely used. Searching github I found 146k results for "#+begin_src" but as soon as I added a switch (I tried -n and +n) that dropped to 0 results - Easily replaced by a more consistent syntax, which has multiple benefits IMO + Simplify parsing a bit + Less forms for a new user to learn + Better consistency Perhaps a less sudden approach would to be mark them as depreciated, implement equivalents with the standard syntax, update manual to use the new/standard syntax, then remove at some point in the future? In that case I think it would also be acceptable for new things built for Org (e.g. a grammar) to ignore them. Thoughts? -- Timothy
Re: Bug: inconsistent escaping of coderef regexp
Hi Nicolas, After a bit of investigation I understand the issue better now. There are two problems here. One is an easy single line change, the other is a deeper issue, which is that it is impossible for the user to specify their own coderef regexp that can be used in both cases. No matter what change we make we are likely to break existing org files if users relied on one behavior and not the other. Given this, I would say that it is worse to break tangling behavior than it is to break coderef search because it is obvious to the user when coderef search breaks, whereas a change in tanging behavior is a silent change that users will not be aware of. If we want a temporary fix, a patch is attached, but I would suggest against changing the behavior right now and instead work toward a new, more consistent system using header args. I think that moving to use header args to control these is an opportunity to resolve both issues, and to make a start toward eventually deprecating the switches. The only question that I have right now regarding that implementation is whether we provide header args for just the coderef regexp or also for the coderef format, with the coderef regexp taking precedence. The deeper issue is that the format string that appears in the org-src snippit below is hard coded, and if we allow users to set the coderef format, then it may make sense to let them set that format string. However, this would duplicate the simpler functionality of simply allowing the user to provide their own coderef regexp. At the moment I have two proposed header args which are :coderef-regexp with default matching the current output of the org-src snippit below, and :coderef-tangle which defaults to yes matching the behavior of the existing =-r= switch. There is an option for a 3rd header arg that would directly replaced the =-l= switch :coderef-format, however as mentioned above it adds significant complexity and requires a fourth argument :coderef-surround or something like that which is the hard coded format string in the org-src snippit. I'm working on a basic implementation and will respond in this thread again when I have something worth looking at. Best! Tom For the record there are at least 3 different inconsistent regex that are used to detect coderefs. org-element: (string-match "-l +\"\\([^\"\n]+\\)\"" switches) is duplicated between org-element-src-block-parser and org-element-example-block-parser org-src: (format "\\([ \t]*\\(%s\\)[ \t]*\\)$" (replace-regexp-in-string "%s" (if label (regexp-quote label) "\\([-a-zA-Z0-9_][-a-zA-Z0-9_ ]*\\)") (regexp-quote fmt) nil t)) ob-tangle: (re-search-forward (replace-regexp-in-string "%s" ".+" cref-fmt) nil t) diff --git a/lisp/ob-tangle.el b/lisp/ob-tangle.el index aa0373ab8..677d9d8ba 100644 --- a/lisp/ob-tangle.el +++ b/lisp/ob-tangle.el @@ -446,7 +446,7 @@ non-nil, return the full association list to be used by (when (string-match "-r" extra) (goto-char (point-min)) (while (re-search-forward - (replace-regexp-in-string "%s" ".+" cref-fmt) nil t) + (replace-regexp-in-string "%s" ".+" (org-src-coderef-regexp cref-fmt)) nil t) (replace-match ""))) (run-hooks 'org-babel-tangle-body-hook) (buffer-string
Re: Bug: inconsistent escaping of coderef regexp
Hello, Tom Gillespie writes: > After a bit of investigation I understand the issue better now. > There are two problems here. One is an easy single line change, > the other is a deeper issue, which is that it is impossible for the > user to specify their own coderef regexp that can be used in both > cases. No matter what change we make we are likely to break > existing org files if users relied on one behavior and not the other. [...] > If we want a temporary fix, a patch is attached, but I would suggest > against changing the behavior right now and instead work toward > a new, more consistent system using header args. If possible, I'd like not to conflate current issue with switches deprecation, which needs to be discussed separately. What do you mean by "it is impossible for the user to specify their own coderef regexp that can be used in both cases"? In particular, what is a coderef regexp in this context? I know about coderef format, but I don't think users are supposed to provide a regexp here. Regards, -- Nicolas Goaziou
Re: Bug: inconsistent escaping of coderef regexp
Hi Nicolas, I've attached a patch with a first pass implementation that I think resolves most of the issues. It probably needs a few tests to go along with it, but I think it is the simplest way forward. I tried to make the changes without disrupting the org-babel info structure, but it comes with the cost of having to pull out :coderef-prefix in a number of separate contexts. Best, Tom > If possible, I'd like not to conflate current issue with switches > deprecation, which needs to be discussed separately. We can decouple them, so not an issue. The attached patch implements the header arg equivalents of -r and -l without making any changes to the existing switch behavior. > What do you mean by "it is impossible for the user to specify their own > coderef regexp that can be used in both cases"? In particular, what is > a coderef regexp in this context? I know about coderef format, but > I don't think users are supposed to provide a regexp here. I did a first pass implementation and realized that allowing users to specify coderef-regexp is a bad idea. The attached patch fixes the divergent behavior of org-bable-tangle-single-block and provides a standard way to specify a :coderef-prefix regexp so that empty comments can be stripped. From e017fe3f4fb36da2c8560ae526b8bdfd42dc Mon Sep 17 00:00:00 2001 From: Tom Gillespie Date: Sun, 4 Apr 2021 21:40:32 -0700 Subject: [PATCH] improve org-src-coderef-regexp and regularize usage * lisp/ob-core.el org-babel-common-header-args-w-values: new :coderef- header args org-babel-safe-header-args: include the new :coderef- header args (org-babel-get-src-block-info): calulate params before info in let* so that they can be used to set the coderef-format field (nth 6 info) (org-babel--expand-body): use coderef-prefix to correctly strip coderefs when expanding * lisp/ob-tangle.el (orb-babel-tangle-single-block): Regularize behavior when removing coderefs during tangling. This fixes an issue where trailing whitespace would be retained when coderefs were removed for tangling. Make the header argument :coderef-tangle no work the same way that the -r switch currently works * lisp/ol.el (org-link-search): use org babel info to match the coderef format for each block * lisp/org-src.el (org-src-coderef-regexp): now takes an additional argument rx-prefix that can be used to customize the text preceeding the coderef that should be removed during tangling, this is most useful for removing comments and trailing whitespace. * lisp/ox.el (org-export-resolve-coderef) and (org-export-unravel-code): use org babel info to correctly match the coderef format for each block. This commit adds support for three new src block header arguments, :coderef-format :coderef-prefix and :coderef-tangle. :coderef-format has the same behavior has the org src switch -l and :coderef-tangle has the same behavior as org src switch -r. :coderef-prefix provides new functionality and makes it possible to set the regexp for text leading up to the coderef. In particular this can be used to strip comments, which are required if authoring an org file that works with older versions of org. --- lisp/ob-core.el | 43 +-- lisp/ob-tangle.el | 18 +++--- lisp/ol.el| 17 +++-- lisp/org-src.el | 5 +++-- lisp/ox.el| 15 +++ 5 files changed, 61 insertions(+), 37 deletions(-) diff --git a/lisp/ob-core.el b/lisp/ob-core.el index 2e78ac3e6..feb6f2235 100644 --- a/lisp/ob-core.el +++ b/lisp/ob-core.el @@ -76,7 +76,7 @@ (declare-function org-previous-block "org" (arg &optional block-regexp)) (declare-function org-show-context "org" (&optional key)) (declare-function org-src-coderef-format "org-src" (&optional element)) -(declare-function org-src-coderef-regexp "org-src" (fmt &optional label)) +(declare-function org-src-coderef-regexp "org-src" (fmt &optional label rx-prefix)) (declare-function org-src-get-lang-mode "org-src" (lang)) (declare-function org-table-align "org-table" ()) (declare-function org-table-convert-region "org-table" (beg0 end0 &optional separator)) @@ -392,6 +392,9 @@ then run `org-babel-switch-to-session'." (defconst org-babel-common-header-args-w-values '((cache . ((no yes))) (cmdline . :any) +(coderef-format . :any) +(coderef-prefix . :any) +(coderef-tangle . ((nil yes no))) (colnames . ((nil no yes))) (comments . ((no link yes org both noweb))) (dir . :any) @@ -434,7 +437,8 @@ Note that individual languages may define their own language specific header arguments as well.") (defconst org-babel-safe-header-args - '(:cache :colnames :comments :exports :epilogue :hlines :noeval + '(:cache :coderef-format :coderef-prefix :coderef-tangle + :colnames :comments :exports :epilogue :hlines :noeval :noweb :noweb-ref :noweb-sep :padline :prologue :rownames :sep :session :tangle :wrap (:eval . ("never" "query")) @@ -607,29 +611,31 @@ a l
Re: Bug: inconsistent escaping of coderef regexp
Missed removing a debug message. Here is the correct patch. Best, Tom On Sun, Apr 4, 2021 at 10:22 PM Tom Gillespie wrote: > > Hi Nicolas, >I've attached a patch with a first pass implementation that I think > resolves most of the issues. It probably needs a few tests to go along > with it, but I think it is the simplest way forward. I tried to make the > changes without disrupting the org-babel info structure, but it comes > with the cost of having to pull out :coderef-prefix in a number of separate > contexts. Best, > Tom > > > If possible, I'd like not to conflate current issue with switches > > deprecation, which needs to be discussed separately. > > We can decouple them, so not an issue. The attached patch implements > the header arg equivalents of -r and -l without making any changes to the > existing switch behavior. > > > What do you mean by "it is impossible for the user to specify their own > > coderef regexp that can be used in both cases"? In particular, what is > > a coderef regexp in this context? I know about coderef format, but > > I don't think users are supposed to provide a regexp here. > > I did a first pass implementation and realized that allowing users to > specify coderef-regexp is a bad idea. The attached patch fixes the > divergent behavior of org-bable-tangle-single-block and provides a > standard way to specify a :coderef-prefix regexp so that empty > comments can be stripped. From 91aa10a5a14737b770e58b1a7f9f0e0b563dae62 Mon Sep 17 00:00:00 2001 From: Tom Gillespie Date: Sun, 4 Apr 2021 21:40:32 -0700 Subject: [PATCH] improve org-src-coderef-regexp and regularize usage * lisp/ob-core.el org-babel-common-header-args-w-values: new :coderef- header args org-babel-safe-header-args: include the new :coderef- header args (org-babel-get-src-block-info): calulate params before info in let* so that they can be used to set the coderef-format field (nth 6 info) (org-babel--expand-body): use coderef-prefix to correctly strip coderefs when expanding * lisp/ob-tangle.el (orb-babel-tangle-single-block): Regularize behavior when removing coderefs during tangling. This fixes an issue where trailing whitespace would be retained when coderefs were removed for tangling. Make the header argument :coderef-tangle no work the same way that the -r switch currently works * lisp/ol.el (org-link-search): use org babel info to match the coderef format for each block * lisp/org-src.el (org-src-coderef-regexp): now takes an additional argument rx-prefix that can be used to customize the text preceeding the coderef that should be removed during tangling, this is most useful for removing comments and trailing whitespace. * lisp/ox.el (org-export-resolve-coderef) and (org-export-unravel-code): use org babel info to correctly match the coderef format for each block. This commit adds support for three new src block header arguments, :coderef-format :coderef-prefix and :coderef-tangle. :coderef-format has the same behavior has the org src switch -l and :coderef-tangle has the same behavior as org src switch -r. :coderef-prefix provides new functionality and makes it possible to set the regexp for text leading up to the coderef. In particular this can be used to strip comments, which are required if authoring an org file that works with older versions of org. --- lisp/ob-core.el | 43 +-- lisp/ob-tangle.el | 17 ++--- lisp/ol.el| 17 +++-- lisp/org-src.el | 5 +++-- lisp/ox.el| 15 +++ 5 files changed, 60 insertions(+), 37 deletions(-) diff --git a/lisp/ob-core.el b/lisp/ob-core.el index 2e78ac3e6..feb6f2235 100644 --- a/lisp/ob-core.el +++ b/lisp/ob-core.el @@ -76,7 +76,7 @@ (declare-function org-previous-block "org" (arg &optional block-regexp)) (declare-function org-show-context "org" (&optional key)) (declare-function org-src-coderef-format "org-src" (&optional element)) -(declare-function org-src-coderef-regexp "org-src" (fmt &optional label)) +(declare-function org-src-coderef-regexp "org-src" (fmt &optional label rx-prefix)) (declare-function org-src-get-lang-mode "org-src" (lang)) (declare-function org-table-align "org-table" ()) (declare-function org-table-convert-region "org-table" (beg0 end0 &optional separator)) @@ -392,6 +392,9 @@ then run `org-babel-switch-to-session'." (defconst org-babel-common-header-args-w-values '((cache . ((no yes))) (cmdline . :any) +(coderef-format . :any) +(coderef-prefix . :any) +(coderef-tangle . ((nil yes no))) (colnames . ((nil no yes))) (comments . ((no link yes org both noweb))) (dir . :any) @@ -434,7 +437,8 @@ Note that individual languages may define their own language specific header arguments as well.") (defconst org-babel-safe-header-args - '(:cache :colnames :comments :exports :epilogue :hlines :noeval + '(:cache :coderef-format :coderef-prefix :coderef-tangle + :colnames :comments :exports :epilo