[
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250730#comment-16250730
]
ASF GitHub Bot commented on ARROW-1693:
---------------------------------------
wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344130863
> I'm a bit torn here. On the one hand, I don't want to check in 21mb worth
of tests to source control. On the other hand, I don't want to hand-write the
11k assertions that the snapshot tests represent (and would also presumably be
many-MBs worth of tests anyway).
> I believe git compresses files across the network? And if space-on-disk is
an issue, I could add a post-clone script to automatically compress the
snapshot files after checkout (about 3mb gzipped). Jest doesn't work with
compressed snapshot files out of the box, but I could add some steps to the
test runner to decompress the snapshots before running.
I guess I'm not quite understanding what snapshot tests accomplish here that
normal array comparisons would not. In Java and C++ we have functions that
compare the contents of arrays. So when you say hand-writing the snapshot test
assertions, what's being tested and why is that the only way to test that
behavior? Is there a concern that a programmatic comparison like
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/json-integration-test.cc#L180
might not be as strong of an assertion as a UI-based test (what the values
from the arrays would actually appear as in the DOM)?
Having the possibility of a single PR bloating the git history by whatever
the snap files gzip down to doesn't seem like a good idea. Even having large
diffs as the result of automatically generated files on commit isn't ideal
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [JS] Error reading dictionary-encoded integration test files
> ------------------------------------------------------------
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
> Issue Type: Bug
> Components: JavaScript
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow,
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case;
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)