Re: [GSoC 2026] Design Discussion for BanyanDB Native Export/Import Utility

Hongtao Gao Tue, 03 Feb 2026 05:18:49 -0800

Thank you for your interest in this project.

The streaming export procedure should cover both the server and client
sides. The server side indicates the data node of the BanyanDB
cluster. It should provide chunked export to reduce memory and CPU
overhead. The client side is bydbctl, which accesses the export
streaming service provided by the data node.


For the format, the native format indicates BanyanDB's column-based
file format. You can refer to the backup/restore feature for details.
Additionally, a CSV-based plain format should be accessible. Both
native and plain formats should contain two parts: schema and data.

Furthermore, please undertake a simple task from the issue list[1].
While it's not mandatory for your proposal, completing it will help us
estimate your capabilities for this task. I suggest this particular
task[2], as it is suitable for beginners.

1. 
https://github.com/apache/skywalking/issues?q=is%3Aissue%20state%3Aopen%20label%3Adatabase
2. https://github.com/apache/skywalking/issues/13408

Best regards

Hongtao

On Tue, Feb 3, 2026 at 1:32 PM Tanay Paul <[email protected]> wrote:
>
> Hi Hongtao and SkyWalking Community,
>
> I've been reviewing the BanyanDB codebase to prepare a proposal for the
> Native Data Export/Import Utility.
>
> I see that bydbctl is the natural home for this feature. My current
> thinking for the architecture is:
>
> Streaming over Buffering: Instead of loading full query results into
> memory, the export command (bydbctl data export) should implement a gRPC
> Stream receiver that writes to the file buffer in chunks. This ensures we
> can export GBs of logs with minimal RAM usage.
>
> Format Strategy:
>
> Parquet: Use schema reflection to map the BanyanDB TagFamilies directly to
> Parquet columns for efficient analysis.
>
> Binary: A raw dump of the KV pairs for faster restore operations (Disaster
> Recovery).
>
> I have prototyped a basic Parquet writer in Go to test the schema mapping.
> Before I draft the full proposal, do you have a preference on how we handle
> schema evolution? (e.g., if the imported data has extra tags that the
> current server schema doesn't match).
>
> Best regards,
>
> Tanay Paul

Re: [GSoC 2026] Design Discussion for BanyanDB Native Export/Import Utility

Reply via email to