Golden files can also make it easier to implement the read side without firing up the entire integration machinery.

Regards

Antoine.


Le 19/03/2021 à 17:56, Micah Kornfield a écrit :
For historical context golden files were first introduced so we could
verify backwards compatibility.  I think the preferred method is still to
do "live" testing.  (i.e. Having one implementation consume JSON output a
binary file, read the binary file with the second implementation and emit
JSON, and then check equality of JSON)

On Fri, Mar 19, 2021 at 9:51 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

Hi,

Thanks a lot for bringing this up, Fernando. I had the same thought when I
first looked at the tensor implementation in Rust. Now it is a bit more
clear :)

So, if I understood correctly, the direction would be to declare a
"JSON-integration" equivalent for tensors, generate a set of "golden binary
- json" files from C++, push them to the arrow-testing submodule, and then
use them to compare implementations.

Best,
Jorge



On Tue, Mar 16, 2021 at 5:02 PM Fernando Herrera <
fernando.j.herr...@gmail.com> wrote:

Hi Wes,

Thanks for the update. It would be interesting to add a centralized
plan for tensors in Arrow. It would allow sharing data between packages
like numpy, ndarray, pytorch, tensorflow really easy. Don't you think so?
Let me have a look at how the integration tests are created in Archery
so I can add some to start testing IPC in rust.

Thanks

On Tue, Mar 16, 2021 at 3:16 PM Wes McKinney <wesmck...@gmail.com>
wrote:

hi Fernando — for clarity, there is no centralized planning in this
project. If a volunteer wants to do something and there are no
objections from other people, then they are free to go ahead and do
it. If there aren't any Jira issues about adding integration tests, it
would make sense to go ahead and open some and clarify the scope of
what you would like to see get developed.

On Tue, Mar 16, 2021 at 3:25 AM Antoine Pitrou <anto...@python.org>
wrote:


Hi Fernando,

Currently there are no explicit plans to do it, but that would be
certainly useful if other implementation start implementing tensor
IPC
support.

One should start by defining a reference format (probably JSON) such
as
exists for other IPC types:
https://arrow.apache.org/docs/format/Integration.html

Regards

Antoine.


Le 16/03/2021 à 10:02, Fernando Herrera a écrit :
Are there any plans to include integration testing for tensors in
the
pipeline?

Thanks,
Fernando

On Mon, Mar 15, 2021 at 8:16 PM Antoine Pitrou <anto...@python.org

wrote:

On Mon, 15 Mar 2021 19:48:22 +0000
Fernando Herrera <fernando.j.herr...@gmail.com> wrote:
Hi Neal,

Thanks for the update and the link.

I found that the project has these files for tensor checking




https://github.com/apache/arrow-testing/tree/e8ce32338f2dfeca3a5126f7677bdee159604000/data/arrow-ipc-tensor-stream

So, if I understand correctly, for any application to be
compatible
with C++ tensors it should be able to read these files. Am I
correct?

No, these are invalid files found by fuzz testing, that used to
crash
the C++ IPC reader. More information here:
https://arrow.apache.org/docs/developers/cpp/fuzzing.html

We don't have any reference files for integration testing of
tensors
and sparse tensors currently.

Regards

Antoine.








Reply via email to