Re: ENB: about external file format 5-thin

Edward K. Ream Sat, 06 Jun 2020 04:25:19 -0700

On Fri, Jun 5, 2020 at 10:10 AM vitalije <vitali...@gmail.com> wrote:


For the past few days I've been working on the reusable functions for both
> parsing content of external files and writing external files. In the
> attached Leo document there are two new scripts. One is for generating the
> test data, and the other is for testing these two new functions. All tests
> are passing and round trip (*text-> outline -> text*) confirms that these
> functions have almost the same effect as Leo's FastAtFile reading and
> atFile writing methods.
>

Good to know.

Thinking about the format of external files and looking at them, I've come
> to the conclusion that this format contains some redundant information.
> This is not a big problem, but since I am currently working on this part of
> the Leo's code base, I wish to propose some improvements to this format.
> Having redundant information means that different files may produce the
> same outline. This can cause problems when testing round trip
> transformations.
>

Some general reactions:

1. Changing Leo's file format would be a big deal. It will be inconvenient
for Leo's users, Leo's devs, and future maintainers. A new file format
would, at minimum, create migration problems. It would require new
documentation and probably migration scripts similar to the script I
recently wrote.

2. Leo's existing file format explicitly represents all of Leo's syntactic
constructs. I never considered using a minimal set of sentinels. I only
considered the *clearest*, most explicit, set of sentinels. The second
principle of the zen of python <https://www.python.org/dev/peps/pep-0020/>
is "Explicit is better than implicit." I want to remain the explicit
correspondences between sentinels, nodes, @others and section references.

True, the first zen-of-python principle is "Beautiful is better than
ugly."  Imo, this principle does not apply here. Eliding sentinels makes it
harder for users to understand the sentinels. Again imo, there is nothing
very beautiful about embedding subtle inferences in crucial read logic.

3. Error correction is not possible without redundancy. Removing various
"non-essential" sentinels would make it harder to write scripts that act on
external files. Such scripts would have to recreate the clever inferences
that make eliding sentinels possible in the first place.

4. @clean allows users to eliminate *all* sentinels. Those who dislike
sentinels are already using @clean. Those who don't care much about
sentinels will not appreciate yet another unnecessary change to Leo.

5. Changing Leo's file format might affect the @clean logic. This logic
does a diff between the external file and a recreation of that file (with
sentinels) generated from the outline itself. *Maybe* that diff will work
with a new file format, but that is not guaranteed. For sure, removing
redundancy in the file format will make the @clean logic more fragile, in
hard to predict ways.

6. Changing Leo's file format to make your new code easier to test would be
letting the tail wag the dog. I am confident that you can find a robust
testing strategy that does not depend on a new file format.

Now to specific comments:

top level node gnx and its headline are not necessary. Both headline and
> gnx are present in the xml. They don't provide any useful information. This
> also can cause problems when two different outlines contain the same
> external file. If the top level node have different path or different gnx
> in those outlines than they would produce different file even if they have
> the same content.
>

I agree with you and Bob that this can be a problem. Imo, the way forward
is to define clearly what happens when the xml and external file collide. I
welcome your thoughts on this. Imo, it should be considered as a separate
issue.

>
>    - *@+<<* sentinels are redundant too. When we encounter the node whose
>    headline is a section reference, we know that the section reference was
>    just before the opening node line.
>
> Yes, but I don't care.

>
>    - *@-<<* sentinel and *@afterref* can be joined in one. The section
>    name is not necessary because opening and closing sections must be properly
>    nested. We know for sure that the closing    section has the same headline
>    as the last open one. The closing *@-<<* sentinel can give a clue
>    whether the following line is *@afterref* or an ordinary line. For
>    example *@-<<[* means same as closing section sentinel followed by an
>    *@after* line, while* @-<<]* means there is no *@after *line after
>    this closing sentinel.
>
> The documentation for @afterref
<http://leoeditor.com/appendices.html#format-of-external-files> is: "Marks
non-whitespace text appearing after a section reference." I don't know
whether these words are still true. Perhaps @afterref can truly be
eliminated. If so, the way to do that is to change the *write* logic, not
the read logic. Leo should be able to read @afterref "forever".

>
>    - *@+others* is not necessary because when we hit the first open node
>    without the section reference in its headline we know for sure that just
>    before this node was @others directive. Also when we encounter new open
>    node with the different identation we can be sure that just before this
>    node was *@others* directive. In the reading external file this line
>    is used just to push current node data on the stack. But this signal can be
>    added to the opening node sentinel as a single character.
>
> Again, I don't care.

>
>    - format of *@+node* sentinel can be changed so that headline comes
>    first and gnx and level at the end of the line for example:
>    #@ at.findFilesToRead        :ekr.20190108054317.1:6
>    instead of
>    #@+node:ekr.20190108054317.1: *6* at.findFilesToRead
>    It would be nicer to read source code using other editors
>
> I don't like this proposal, for several reasons:

1. I prefer the present format. I don't read external files often, but when
I do I am usually more interested in the gnx's than the headlines.

2. The regex required to recognize the new node sentinel would be slower
and less secure than the present regex. The present regex *ends* with
something like ".*$". A new regex would *begin* with something like "^.*?"

We need to discuss this in more detail only if we *all *decide that a new
file format is a good idea.

>
>    - closing *@-leo* line is not necessary and there is no need for
>    *@last* directives either. Last lines are just last lines of the top
>    level node.
>    - *@first* directive can be present in the body, but it doesn't need
>    to be written in the external file, because we know that all lines coming
>    before `*@+leo*` sentinel are first lines.
>
> @first and @last have, in the past, caught mal-formed .leo files.

Also so called "dangerous directives" (*@comment* and *@delims*), are never
> used in the Leo's code base. Personaly I can't think of the use case for
> those directives.
>

As has already been pointed out, these directives exit for specific reasons.

*Summary*

I see many reasons to retain the format of external files, and no
compelling reason to change that format.

The problem with root @file nodes is real. Let's deal with it as a separate
issue.

If @afterref truly is never useful, the graceful way to eliminate the
sentinel would be by having Leo's write logic not write it.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/CAMF8tS09_iSiPi9DZXwRpdHg1dTwiL3bdkrhiPNi72pTFtAQPQ%40mail.gmail.com.

Re: ENB: about external file format 5-thin

Reply via email to