[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252658#comment-16252658 ]
ASF GitHub Bot commented on ARROW-1693: --------------------------------------- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344431051 > @wesm This "validate the tests results once" part is where I'm getting lost. How do you know whether anything is correct if you don't write down what you expect to be true? Ah right, in this case the JSON files are the initial source of truth. In this case I compared the snapshots against the Arrow files read via pandas/pyarrow, and it looked correct. After this (assuming stable test data), the snapshots are the source of truth. If we decide to change the test data, then we have to re-validate the snapshots are what we expect them to be. But I want to stress, I'm not against doing it differently. I'm also bandwidth constrained, and snapshots get high coverage with minimal effort. It sounds like the JSON reader should provide all the same benefits as snapshot testing. From that perspective, I see snapshots as a stop-gap until the JS JSON reader is done (unless there's a way we can validate columns with the C++ or Java JSON readers from the JS tests?) With that in mind, I agree it's best not to commit the snapshots to the git history, if we're just going to remove them once the JSON reader is ready. In the interim, I don't mind validating any new JS PR's against my local snapshots, as the volume of JS PR's isn't that high yet. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > ------------------------------------------------------------ > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript > Reporter: Brian Hulette > Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)