[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252658#comment-16252658
 ] 

ASF GitHub Bot commented on ARROW-1693:
---------------------------------------

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344431051
 
 
   > @wesm This "validate the tests results once" part is where I'm getting 
lost. How do you know whether anything is correct if you don't write down what 
you expect to be true?
   
   Ah right, in this case the JSON files are the initial source of truth. In 
this case I compared the snapshots against the Arrow files read via 
pandas/pyarrow, and it looked correct. After this (assuming stable test data), 
the snapshots are the source of truth. If we decide to change the test data, 
then we have to re-validate the snapshots are what we expect them to be.
   
   But I want to stress, I'm not against doing it differently. I'm also 
bandwidth constrained, and snapshots get high coverage with minimal effort. It 
sounds like the JSON reader should provide all the same benefits as snapshot 
testing. From that perspective, I see snapshots as a stop-gap until the JS JSON 
reader is done (unless there's a way we can validate columns with the C++ or 
Java JSON readers from the JS tests?)
   
   With that in mind, I agree it's best not to commit the snapshots to the git 
history, if we're just going to remove them once the JSON reader is ready. In 
the interim, I don't mind validating any new JS PR's against my local 
snapshots, as the volume of JS PR's isn't that high yet.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> ------------------------------------------------------------
>
>                 Key: ARROW-1693
>                 URL: https://issues.apache.org/jira/browse/ARROW-1693
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: JavaScript
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>         Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to