One thing we could that might move the burden on to implementations rather than some central CI job (which is a substantial effort, I agree, having worked with the arrow ne)
Perhaps we could start simply with "reader compatibility" with the existing files in parquet-testing[1] 1. Define a JSON file format with expected results 2. Document how readers should generate that expected JSON file Then to determine compatibility with each "feature" an implementation would show it could read and create the expected JSON file. This misses plenty of potential nuance, but it would likely cover most of the basic "can this implementation read files" type questions Andrew [1] https://github.com/apache/parquet-testing On Tue, May 28, 2024 at 8:01 AM Antoine Pitrou <[email protected]> wrote: > > Hello, > > On Mon, 27 May 2024 22:46:45 -0700 > Micah Kornfield <[email protected]> > wrote: > > > > 2. Is anybody interested in looking more deeply into developing > > integration tests between the different Parquet implementations and major > > down-stream consumers of Parquet? I believe Apache arrow has a pretty > good > > model [3][4] in a lot of respects with cross-language integration tests, > > and nightly (via crossbow) integration tests with other consumers, but > > there are a wide variety of things that would improve the current state. > > One other possible concern is the amount of CI resources this might > > consume, and if we will need contributions to fund it. > > Caveat: Arrow has a lot less parameters to test for. The variability is > mostly one-dimensional and falls under the data type rubric. As a > matter of fact, other Arrow features such as compression or delta > dictionaries are less well-tested. > > Testing Parquet interoperability could easily get into a combinatorial > explosion of optional features, encodings, etc. > > I'm not saying that it shouldn't be done, but it may require a different > approach than Arrow's approach of building and testing all > implementations against each other in a single CI job. > > Regards > > Antoine. > > >
