[ https://issues.apache.org/jira/browse/ARROW-18007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denis Gursky updated ARROW-18007: --------------------------------- Labels: JS (was: ) > [JS] Values returned as undefined when arrow file bigger than 2gb > ----------------------------------------------------------------- > > Key: ARROW-18007 > URL: https://issues.apache.org/jira/browse/ARROW-18007 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript > Reporter: Denis Gursky > Priority: Major > Labels: JS > > Steps: > > 1. Generate arrow file bigger than 2gb > {code:java} > import pyarrow as pa > nums1 = [42] > nums2 = [42.42] > mil = 1000000 > for n in range(1, 140 * mil): > nums1.append(n) > nums2.append(1 / n) > arr1 = pa.array(nums1) > arr2 = pa.array(nums2) > schema = pa.schema([ > pa.field('nums1', arr1.type), > pa.field('nums2', arr2.type), > ]) > with pa.OSFile('arraydata.arrow', 'wb') as sink: > with pa.ipc.new_file(sink, schema=schema) as writer: > batch = pa.record_batch([arr1, arr2], schema=schema) > writer.write(batch) {code} > 2. Try to read it via the JS SDK > {code:java} > const fs = require("fs"); > const { tableFromIPC, RecordBatchReader } = require("apache-arrow"); > const filePath = "./arraydata.arrow"; > const stream = fs.createReadStream(filePath); > const reader = RecordBatchReader.from(stream); > (async function () { > const table = await tableFromIPC(reader); > console.log("numRows", table.numRows); > console.log("first row", table.get(0).toArray()); > })(); {code} > The code above prints: > {code:java} > numRows 140000000 > first row [ undefined, undefined ] {code} > {{numRows}} is correct, but the values are coming out as {{{}undefined{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)