[
https://issues.apache.org/jira/browse/AVRO-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhang Jiawei updated AVRO-4172:
-------------------------------
Description:
We have identified two cross-language compatibility issues related to the ZSTD
codec in Avro:
# Different codec names
• In Java Avro (and the other language bindings that follow it) the codec is
written into the file metadata as {{{}"zstandard"{}}},
• while the C++ implementation writes {{{}"zstd"{}}}.
This makes a data file produced by one language unreadable by the other.Java:
[Code|#L40]]
##
Java: [Code|#L40]]
C++: [Code|#L57]]
# Streaming vs. single-shot encoding
Java Avro writes ZSTD data in streaming mode, whereas the C++ implementation
can only decode single-shot ZSTD frames.
As a result, a ZSTD-compressed file generated by Java Avro cannot be read by
the current C++ library.
Reference:
[https://github.com/apache/avro/blob/dc7bbd086283bb61dfabd8fcdf980d22f30c7a93/lang/c%2B%2B/impl/DataFile.cc#L494]
was:
We have identified two cross-language compatibility issues related to the ZSTD
codec in Avro:
# Different codec names
• In Java Avro (and the other language bindings that follow it) the codec is
written into the file metadata as {{{}"zstandard"{}}},
• while the C++ implementation writes {{{}"zstd"{}}}.
This makes a data file produced by one language unreadable by the other.Java:
[Code|#L40]]
##
Java: [Code|#L40]]
C++: [Code|#L57]]
# Streaming vs. single-shot encoding
Java Avro writes ZSTD data in streaming mode, whereas the C++ implementation
can only decode single-shot ZSTD frames.
As a result, a ZSTD-compressed file generated by Java Avro cannot be read by
the current C++ library.
Reference:
[https://github.com/apache/avro/blob/dc7bbd086283bb61dfabd8fcdf980d22f30c7a93/lang/c%2B%2B/impl/DataFile.cc#L494]
> [C++] Fix ZSTD codec compatibility with Java Avro
> -------------------------------------------------
>
> Key: AVRO-4172
> URL: https://issues.apache.org/jira/browse/AVRO-4172
> Project: Apache Avro
> Issue Type: Bug
> Reporter: Zhang Jiawei
> Priority: Major
> Attachments: image-2025-08-10-18-27-06-588.png
>
>
> We have identified two cross-language compatibility issues related to the
> ZSTD codec in Avro:
> # Different codec names
> • In Java Avro (and the other language bindings that follow it) the codec is
> written into the file metadata as {{{}"zstandard"{}}},
> • while the C++ implementation writes {{{}"zstd"{}}}.
> This makes a data file produced by one language unreadable by the other.Java:
> [Code|#L40]]
> ##
> Java: [Code|#L40]]
> C++: [Code|#L57]]
> # Streaming vs. single-shot encoding
> Java Avro writes ZSTD data in streaming mode, whereas the C++ implementation
> can only decode single-shot ZSTD frames.
> As a result, a ZSTD-compressed file generated by Java Avro cannot be read by
> the current C++ library.
> Reference:
> [https://github.com/apache/avro/blob/dc7bbd086283bb61dfabd8fcdf980d22f30c7a93/lang/c%2B%2B/impl/DataFile.cc#L494]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)