In v1.3.1 we've added support for editing large files, but it has
exposed some other challenges related to search, replace, and data
profiling.  I outline the problems and possible solutions to these problems
in a discussion thread here (
https://github.com/ctc-oss/daffodil-vscode/discussions/122).

The bottom line up front is that for search and replace, I think we'll need
to adopt an interactive approach rather than an all at once approach.
For example search will find the next match from where you are, click next
and it will find the next, and so on, instead of finding all the matches up
front.  Similarly, with replace, we find the next match, then you can
either replace or skip to the next match, and so on.  These are departures
from v1.3.0, but we need something that will scale.

Data profiling is a new feature in v1.3.1 that creates a byte frequency
graph and some statistics on all or part of the edited file.  Right now
I've allowed it to profile from the beginning to the end of the file, even
if the file is multiple gigabytes in size.  Currently though that could
take longer than 5 seconds especially if the file has many editing
changes.  After 5 seconds the request is timed out in the Scala gRPC
server.  I can bump up the time out, but that's just a band aid (what
happens if someone wants to profile 1+TB file, for example).  I think a
reasonable fix is to allow the user to select any offset in the file and we
profile up to X bytes from that offset, where X is perhaps something on the
order of 1M.  This ensures the UI is responsive and can scale.

We expect to have a release candidate of v1.3.1 within two weeks from now,
and I'm hoping to address these scale issues before then.  Feedback welcome!

Thank you.

Reply via email to