For historical context golden files were first introduced so we could verify backwards compatibility. I think the preferred method is still to do "live" testing. (i.e. Having one implementation consume JSON output a binary file, read the binary file with the second implementation and emit JSON, and then check equality of JSON)
On Fri, Mar 19, 2021 at 9:51 AM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi, > > Thanks a lot for bringing this up, Fernando. I had the same thought when I > first looked at the tensor implementation in Rust. Now it is a bit more > clear :) > > So, if I understood correctly, the direction would be to declare a > "JSON-integration" equivalent for tensors, generate a set of "golden binary > - json" files from C++, push them to the arrow-testing submodule, and then > use them to compare implementations. > > Best, > Jorge > > > > On Tue, Mar 16, 2021 at 5:02 PM Fernando Herrera < > fernando.j.herr...@gmail.com> wrote: > > > Hi Wes, > > > > Thanks for the update. It would be interesting to add a centralized > > plan for tensors in Arrow. It would allow sharing data between packages > > like numpy, ndarray, pytorch, tensorflow really easy. Don't you think so? > > Let me have a look at how the integration tests are created in Archery > > so I can add some to start testing IPC in rust. > > > > Thanks > > > > On Tue, Mar 16, 2021 at 3:16 PM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > hi Fernando — for clarity, there is no centralized planning in this > > > project. If a volunteer wants to do something and there are no > > > objections from other people, then they are free to go ahead and do > > > it. If there aren't any Jira issues about adding integration tests, it > > > would make sense to go ahead and open some and clarify the scope of > > > what you would like to see get developed. > > > > > > On Tue, Mar 16, 2021 at 3:25 AM Antoine Pitrou <anto...@python.org> > > wrote: > > > > > > > > > > > > Hi Fernando, > > > > > > > > Currently there are no explicit plans to do it, but that would be > > > > certainly useful if other implementation start implementing tensor > IPC > > > > support. > > > > > > > > One should start by defining a reference format (probably JSON) such > as > > > > exists for other IPC types: > > > > https://arrow.apache.org/docs/format/Integration.html > > > > > > > > Regards > > > > > > > > Antoine. > > > > > > > > > > > > Le 16/03/2021 à 10:02, Fernando Herrera a écrit : > > > > > Are there any plans to include integration testing for tensors in > the > > > > > pipeline? > > > > > > > > > > Thanks, > > > > > Fernando > > > > > > > > > > On Mon, Mar 15, 2021 at 8:16 PM Antoine Pitrou <anto...@python.org > > > > > wrote: > > > > > > > > > >> On Mon, 15 Mar 2021 19:48:22 +0000 > > > > >> Fernando Herrera <fernando.j.herr...@gmail.com> wrote: > > > > >>> Hi Neal, > > > > >>> > > > > >>> Thanks for the update and the link. > > > > >>> > > > > >>> I found that the project has these files for tensor checking > > > > >>> > > > > >> > > > > > > https://github.com/apache/arrow-testing/tree/e8ce32338f2dfeca3a5126f7677bdee159604000/data/arrow-ipc-tensor-stream > > > > >>> > > > > >>> So, if I understand correctly, for any application to be > compatible > > > > >>> with C++ tensors it should be able to read these files. Am I > > correct? > > > > >> > > > > >> No, these are invalid files found by fuzz testing, that used to > > crash > > > > >> the C++ IPC reader. More information here: > > > > >> https://arrow.apache.org/docs/developers/cpp/fuzzing.html > > > > >> > > > > >> We don't have any reference files for integration testing of > tensors > > > > >> and sparse tensors currently. > > > > >> > > > > >> Regards > > > > >> > > > > >> Antoine. > > > > >> > > > > >> > > > > >> > > > > > > > > > > >