Hi Hongtao and SkyWalking Community,

I've been reviewing the BanyanDB codebase to prepare a proposal for the
Native Data Export/Import Utility.

I see that bydbctl is the natural home for this feature. My current
thinking for the architecture is:

Streaming over Buffering: Instead of loading full query results into
memory, the export command (bydbctl data export) should implement a gRPC
Stream receiver that writes to the file buffer in chunks. This ensures we
can export GBs of logs with minimal RAM usage.

Format Strategy:

Parquet: Use schema reflection to map the BanyanDB TagFamilies directly to
Parquet columns for efficient analysis.

Binary: A raw dump of the KV pairs for faster restore operations (Disaster
Recovery).

I have prototyped a basic Parquet writer in Go to test the schema mapping.
Before I draft the full proposal, do you have a preference on how we handle
schema evolution? (e.g., if the imported data has extra tags that the
current server schema doesn't match).

Best regards,

Tanay Paul

Reply via email to