Potentially it would be helpful to flip the question around. As Andrew
articulates, a reference implementation is required to implement all
elements from the specification, and therefore the major consequence of
labeling parquet-mr thusly would be that any specification change would
have to be implemented within parquet-mr as part of the standardisation
process. It would be insufficient for it to be implemented in, for
example, two of the parquet implementations maintained by the arrow
project. I personally think that would be a shame and likely exclude
many people who would otherwise be interested in evolving the parquet
specification, but think that is at the core of this question.
Kind Regards,
Raphael
On 13/05/2024 20:55, Andrew Lamb wrote:
Question: Should we label parquet-mr or any other parquet implementations
"reference" implications"?
This came up as part of Vinoo's great PR to list different parquet
reference implementations[1][2].
The term "reference implementation" often has an official connotation. For
example the wikipedia definition is "a program that implements all
requirements from a corresponding specification. The reference
implementation ... should be considered the "correct" behavior of any other
implementation of it."[3]
Given the close association of parquet-mr to the parquet standard, it is a
natural candidate to label as "reference implementation." However, it is
not clear to me if there is consensus that it should be thusly labeled.
I have a strong opinion that a consensus on this question would be very
helpful. I don't actually have a strong opinion about the answer
Andrew
[1]: https://github.com/apache/parquet-site/pull/53#discussion_r1582882267
[2]: https://github.com/apache/parquet-site/pull/53#discussion_r1598283465
[3]: https://en.wikipedia.org/wiki/Reference_implementation