Golden files can also make it easier to implement the read side without
firing up the entire integration machinery.
Regards
Antoine.
Le 19/03/2021 à 17:56, Micah Kornfield a écrit :
For historical context golden files were first introduced so we could
verify backwards compatibility. I think the preferred method is still to
do "live" testing. (i.e. Having one implementation consume JSON output a
binary file, read the binary file with the second implementation and emit
JSON, and then check equality of JSON)
On Fri, Mar 19, 2021 at 9:51 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:
Hi,
Thanks a lot for bringing this up, Fernando. I had the same thought when I
first looked at the tensor implementation in Rust. Now it is a bit more
clear :)
So, if I understood correctly, the direction would be to declare a
"JSON-integration" equivalent for tensors, generate a set of "golden binary
- json" files from C++, push them to the arrow-testing submodule, and then
use them to compare implementations.
Best,
Jorge
On Tue, Mar 16, 2021 at 5:02 PM Fernando Herrera <
fernando.j.herr...@gmail.com> wrote:
Hi Wes,
Thanks for the update. It would be interesting to add a centralized
plan for tensors in Arrow. It would allow sharing data between packages
like numpy, ndarray, pytorch, tensorflow really easy. Don't you think so?
Let me have a look at how the integration tests are created in Archery
so I can add some to start testing IPC in rust.
Thanks
On Tue, Mar 16, 2021 at 3:16 PM Wes McKinney <wesmck...@gmail.com>
wrote:
hi Fernando — for clarity, there is no centralized planning in this
project. If a volunteer wants to do something and there are no
objections from other people, then they are free to go ahead and do
it. If there aren't any Jira issues about adding integration tests, it
would make sense to go ahead and open some and clarify the scope of
what you would like to see get developed.
On Tue, Mar 16, 2021 at 3:25 AM Antoine Pitrou <anto...@python.org>
wrote:
Hi Fernando,
Currently there are no explicit plans to do it, but that would be
certainly useful if other implementation start implementing tensor
IPC
support.
One should start by defining a reference format (probably JSON) such
as
exists for other IPC types:
https://arrow.apache.org/docs/format/Integration.html
Regards
Antoine.
Le 16/03/2021 à 10:02, Fernando Herrera a écrit :
Are there any plans to include integration testing for tensors in
the
pipeline?
Thanks,
Fernando
On Mon, Mar 15, 2021 at 8:16 PM Antoine Pitrou <anto...@python.org
wrote:
On Mon, 15 Mar 2021 19:48:22 +0000
Fernando Herrera <fernando.j.herr...@gmail.com> wrote:
Hi Neal,
Thanks for the update and the link.
I found that the project has these files for tensor checking
https://github.com/apache/arrow-testing/tree/e8ce32338f2dfeca3a5126f7677bdee159604000/data/arrow-ipc-tensor-stream
So, if I understand correctly, for any application to be
compatible
with C++ tensors it should be able to read these files. Am I
correct?
No, these are invalid files found by fuzz testing, that used to
crash
the C++ IPC reader. More information here:
https://arrow.apache.org/docs/developers/cpp/fuzzing.html
We don't have any reference files for integration testing of
tensors
and sparse tensors currently.
Regards
Antoine.