[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245180#comment-16245180
 ] 

ASF GitHub Bot commented on ARROW-1693:
---------------------------------------

trxcllnt opened a new pull request #1294: WIP ARROW-1693: Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294
 
 
   This PR adds a workaround for reading the metadata layout for C++ 
dictionary-encoded vectors.
   
   I added tests that validate against the C++/Java integration suite. In order 
to make the new tests pass, I had to update the generated flatbuffers format 
and add a few types the JS version didn't have yet (Bool, Date32, and 
Timestamp). It also uses the new `isDelta` flag on DictionaryBatches to 
determine whether the DictionaryBatch vector should replace or append to the 
existing dictionary.
   
   I also added a script for generating test arrow files from the C++ and Java 
implementations, so we don't break the tests updating the format in the future. 
I saved the generated Arrow files in with the tests because I didn't see a way 
to pipe the JSON test data through the C++/Java json-to-arrow commands without 
writing to a file. If I missed something and we can do it all in-memory, I'd be 
happy to make that change!
   
   This PR is marked WIP because I added an [integration 
test](https://github.com/apache/arrow/commit/6e98874d9f4bfae7758f8f731212ae7ceb3f1321#diff-18c6be12406c482092d4b1f7bd70a8e1R22)
 that validates the JS reader reads C++ and Java files the same way, but 
unfortunately it doesn't. Debugging, I noticed a number of other differences 
between the buffer layout metadata between the C++ and Java versions. If we go 
ahead with @jacques-n [comment in 
ARROW-1693](https://issues.apache.org/jira/browse/ARROW-1693?focusedCommentId=16244812&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16244812)
 and remove/ignore the metadata, this test should pass too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> ------------------------------------------------------------
>
>                 Key: ARROW-1693
>                 URL: https://issues.apache.org/jira/browse/ARROW-1693
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: JavaScript
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>         Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to