[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

ASF GitHub Bot (JIRA) Mon, 13 Nov 2017 18:58:17 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250730#comment-16250730
 ]


ASF GitHub Bot commented on ARROW-1693:
---------------------------------------

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344130863
 
 
   > I'm a bit torn here. On the one hand, I don't want to check in 21mb worth 
of tests to source control. On the other hand, I don't want to hand-write the 
11k assertions that the snapshot tests represent (and would also presumably be 
many-MBs worth of tests anyway).
   
   > I believe git compresses files across the network? And if space-on-disk is 
an issue, I could add a post-clone script to automatically compress the 
snapshot files after checkout (about 3mb gzipped). Jest doesn't work with 
compressed snapshot files out of the box, but I could add some steps to the 
test runner to decompress the snapshots before running.
   
   I guess I'm not quite understanding what snapshot tests accomplish here that 
normal array comparisons would not. In Java and C++ we have functions that 
compare the contents of arrays. So when you say hand-writing the snapshot test 
assertions, what's being tested and why is that the only way to test that 
behavior? Is there a concern that a programmatic comparison like
   
   
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/json-integration-test.cc#L180
   
   might not be as strong of an assertion as a UI-based test (what the values 
from the arrays would actually appear as in the DOM)?
   
   Having the possibility of a single PR bloating the git history by whatever 
the snap files gzip down to doesn't seem like a good idea. Even having large 
diffs as the result of automatically generated files on commit isn't ideal

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [JS] Error reading dictionary-encoded integration test files
> ------------------------------------------------------------
>
>                 Key: ARROW-1693
>                 URL: https://issues.apache.org/jira/browse/ARROW-1693
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: JavaScript
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>         Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

Reply via email to