Hi Jonas,

Thanks again for the detailed review.

I agree with the earlier points you raised, and I will rework those parts accordingly.

Regarding the TAB separator: I think |IFS=$'\t'| is bash-specific syntax. The current shebang is |/bin/sh|, which resolves to |dash| on Debian systems, so I do not think that form is portable there.

Would something like this make more sense instead?

|tab=$(printf '\t') extract_grammars | while IFS="$tab" read -r set name repo rev; do ... done |

About the repository size: you are right, and honestly I did not expect it to grow that large either. After checking more closely, most of the size comes from generated |parser.c| files.

I did previously experiment with splitting this into an additional source package, but that introduces a synchronization issue: once hx itself updates, the highlight data ideally needs to update in lockstep as well, yet in practice there would inevitably be a delay because the secondary package depends on the hx update landing first.

Keeping them in the same source package allows synchronized uploads, which feels more correct to me.

Additionally, upstream Helix developers explicitly describe these grammars as strongly coupled to the exact Helix revision. In particular, pascalkuthe wrote here:
https://github.com/helix-editor/helix/discussions/12433

   No we require that the tree sitter grammar matches the exact commit
   so it doesn't make sense to use anything but the exact commit
   specified in the config that the queries were created for (which are
   specific to helix).

   Tree sitter grammars are not stable and not reusable across
   different editors/programs.

Given that, I still tend to view this as a tightly coupled component rather than a reusable shared asset.

Also, splitting the package would not really solve the total source size issue itself. Even if users prefer building from source, I think it would still make sense to split out the extra assets separately if needed.

What do you think about this approach?

Best,
Junyong Liang


On 2026/5/18 15:46, Jonas Smedegaard wrote:
Quoting Junyong Liang (2026-05-17 18:05:47)
I’ve made another round of updates based on your suggestions for the
maintenance tooling migration. In particular, I’ve moved the setup to
use myrepos, replaced the update script with a shell-based version,
and split the changes into clearer, easier-to-review commits.

The updated version builds successfully on my machine. Could you
please take another look at my fork when you have a chance?
This one feels much easier to read for me. Thanks for doing that
restructuring!

You still lump multiple independent changes together in each git
commit - as a concrete example, I noticed that in the commit updating
debian/copyright you also corrected some structural bugs in the
existing content. I have now cherry-picked those changes and applied
them to the main branch - crediting you :-)

It seems update-highlight-core use tar to remove files. I would use
`find "$src" ... -delete` for that, but maybe I am missing some subtle
tricks there - if so, I recommend adding a comment hinting at that the
reason for the choice of tooling there. Simplifying there would also
avoid piping into `-exec sh -c` - I am aware that some patterns are
safe, but even then I worry about accidentally making a clumsy edit
later that turns it unsafe.

If I understand correctly, you clone and then drop the .git database.
Assuming that's correctly understood, then a shallow clone should be
adequate - i.e. add `--depth=1` to the clone command.

The .mrconfig file can be more compact by adding a default function:

```
[DEFAULT]
lib=clone () { git clone --quiet --filter=blob:none --depth=1 $1 $2 }

[.work/rust]
checkout = clonehttps://github.com/tree-sitter/tree-sitter-rust rust
```

You use TAB as field separator. I am a fan of TAB, so I am not gonna
try to dissuade you from doing that, but since many editors have a hard
time even visualizing TAB as a raw character, I think it is best to
limit that: in update-highlight-core you could instead use `IFS=$'\t'`.

The cloned code is more than 1GB, where the code of helix itself is
less than 200MB. I would prefer that this was maintained as a separate
source package, build-depending on hx or if needed on a new hx-devel
providing the subset of sources needed for these new routines to work.

What do you think?

  - Jonas

Reply via email to