[jira] [Created] (ARROW-16704) tableFromIPC should handle AsyncRecordBatchReader inputs
Paul Taylor created ARROW-16704: --- Summary: tableFromIPC should handle AsyncRecordBatchReader inputs Key: ARROW-16704 URL: https://issues.apache.org/jira/browse/ARROW-16704 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Affects Versions: 8.0.0 Reporter: Paul Taylor To match the prior `Table.from()` method, `tableFromIPC()` method should handle the case where the input is an async RecordBatchReader. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-12570) [JS] Fix issues that blocked the v4.0.0 release
Paul Taylor created ARROW-12570: --- Summary: [JS] Fix issues that blocked the v4.0.0 release Key: ARROW-12570 URL: https://issues.apache.org/jira/browse/ARROW-12570 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Reporter: Paul Taylor Assignee: Paul Taylor A few issues had to be fixed manually for the v4.0.0 release: * ts-jest throwing a type error running the tests on the TS source * lerna.json really does need those version numbers * npm has introduced rate limits since v3.0.0 * support npm 2FA one-time-passwords for publish -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12305) [JS] Benchmark test data generate.py assumes python 2
Paul Taylor created ARROW-12305: --- Summary: [JS] Benchmark test data generate.py assumes python 2 Key: ARROW-12305 URL: https://issues.apache.org/jira/browse/ARROW-12305 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Reporter: Paul Taylor Assignee: Paul Taylor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10255) [JS] Reorganize imports and exports to be more friendly to ESM tree-shaking
Paul Taylor created ARROW-10255: --- Summary: [JS] Reorganize imports and exports to be more friendly to ESM tree-shaking Key: ARROW-10255 URL: https://issues.apache.org/jira/browse/ARROW-10255 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Affects Versions: 0.17.1 Reporter: Paul Taylor Assignee: Paul Taylor Presently most of our public classes can't be easily [tree-shaken|https://webpack.js.org/guides/tree-shaking/] by library consumers. This is a problem for libraries that only need to use parts of Arrow. For example, the vis.gl projects have an integration test that imports three of our simpler classes and tests the resulting bundle size: {code:javascript} import {Schema, Field, Float32} from 'apache-arrow'; // | Bundle Size| Compressed // | 202KB (207112) KB | 45KB (46618) KB {code} We can help solve this with the following changes: * Add "sideEffects": false to our ESM package.json * Reorganize our imports to only include what's needed * Eliminate or move some static/member methods to standalone exported functions * Wrap the utf8 util's node Buffer detection in eval so Webpack doesn't compile in its own Buffer shim * Removing flatbuffers namespaces from generated TS because these defeat Webpack's tree-shaking ability Candidate functions for removal/moving to standalone functions: * Schema.new, Schema.from, Schema.prototype.compareTo * Field.prototype.compareTo * Type.prototype.compareTo * Table.new, Table.from * Column.new * Vector.new, Vector.from * RecordBatchReader.from After applying a few of the above changes to the Schema and flatbuffers files, I was able to reduce the vis.gl's import size 90%: {code:javascript} // Bundle Size | Compressed // 24KB (24942) KB | 6KB (6154) KB {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9659) [C++] RecordBatchStreamReader throws on CUDA device buffers
Paul Taylor created ARROW-9659: -- Summary: [C++] RecordBatchStreamReader throws on CUDA device buffers Key: ARROW-9659 URL: https://issues.apache.org/jira/browse/ARROW-9659 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 1.0.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: 1.0.1 Prior to 1.0.0, the RecordBatchStreamReader was capable of reading source CudaBuffers wrapped in a CudaBufferReader. In 1.0.0, the Array validation routines call into Buffer::data(), which throws an error if the source isn't in host memory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9041) [C++] overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class
[ https://issues.apache.org/jira/browse/ARROW-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126947#comment-17126947 ] Paul Taylor commented on ARROW-9041: These are resolved in [PR 5677|https://github.com/apache/arrow/pull/5677]. Now that the [new variant.hpp header|https://github.com/apache/arrow/pull/7053] is in 0.17.1, we should be able to upgrade. > [C++] overloaded virtual function "arrow::io::Writable::Write" is only > partially overridden in class > - > > Key: ARROW-9041 > URL: https://issues.apache.org/jira/browse/ARROW-9041 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.15.0 >Reporter: Karthikeyan Natarajan >Priority: Major > Labels: easyfix > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > Following warnings appear > cpp/build/arrow/install/include/arrow/io/file.h(189): warning: overloaded > virtual function "arrow::io::Writable::Write" is only partially overridden in > class "arrow::io::MemoryMappedFile" > cpp/build/arrow/install/include/arrow/io/memory.h(98): warning: overloaded > virtual function "arrow::io::Writable::Write" is only partially overridden in > class "arrow::io::MockOutputStream" > cpp/build/arrow/install/include/arrow/io/memory.h(116): warning: overloaded > virtual function "arrow::io::Writable::Write" is only partially overridden in > class "arrow::io::FixedSizeBufferWriter" > Suggestion solution is to use `using Writable::Write` in protected/private. > [https://isocpp.org/wiki/faq/strange-inheritance#hiding-rule] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8394) Typescript compiler errors for arrow d.ts files, when using es2015-esm package
[ https://issues.apache.org/jira/browse/ARROW-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119867#comment-17119867 ] Paul Taylor commented on ARROW-8394: Thanks [~pprice], I'll look into this. I had to do a bunch of weird things to trick the 3.5 compiler into propagating the types, so I'm hoping I can back some of those out to get it working in 3.9 and simplify the typedefs along the way. > Typescript compiler errors for arrow d.ts files, when using es2015-esm package > -- > > Key: ARROW-8394 > URL: https://issues.apache.org/jira/browse/ARROW-8394 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.16.0 >Reporter: Shyamal Shukla >Priority: Blocker > > Attempting to use apache-arrow within a web application, but typescript > compiler throws the following errors in some of arrow's .d.ts files > import \{ Table } from "../node_modules/@apache-arrow/es2015-esm/Arrow"; > export class SomeClass { > . > . > constructor() { > const t = Table.from(''); > } > *node_modules/@apache-arrow/es2015-esm/column.d.ts:14:22* - error TS2417: > Class static side 'typeof Column' incorrectly extends base class static side > 'typeof Chunked'. Types of property 'new' are incompatible. > *node_modules/@apache-arrow/es2015-esm/ipc/reader.d.ts:238:5* - error TS2717: > Subsequent property declarations must have the same type. Property 'schema' > must be of type 'Schema', but here has type 'Schema'. > 238 schema: Schema; > *node_modules/@apache-arrow/es2015-esm/recordbatch.d.ts:17:18* - error > TS2430: Interface 'RecordBatch' incorrectly extends interface 'StructVector'. > The types of 'slice(...).clone' are incompatible between these types. > the tsconfig.json file looks like > { > "compilerOptions": { > "target":"ES6", > "outDir": "dist", > "baseUrl": "src/" > }, > "exclude": ["dist"], > "include": ["src/*.ts"] > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8053) [JS] Improve performance of filtering
[ https://issues.apache.org/jira/browse/ARROW-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077728#comment-17077728 ] Paul Taylor commented on ARROW-8053: [~hulettbh] did the predicates stuff. It could certainly be more optimized if it JIT'd into a flat JS function. An apples-to-apples comparison would be to filter the rows individually: {code:javascript} function filterStruct(struct, predicate) { let keys = [], i = -1, j = -1; for (let row of table) if (predicate(row, ++i)) keys[++j] = i; return DictionaryVector.from(struct, new Int32(), keys) } function predicate(policy) { return policy.proto === 6 && ((policy.startPort > 0 && policy.endPort < 200) || policy.startPort === 49152) && policy.isActive === true; } const count = filterStruct(policiesTable, pred).length {code} I generally agree with [~lmeyerov] though, don't do inline scans and reductions if you care about performance. Use WASM/web workers to distrubute across CPU cores, or (better yet) WebGL TransformFeedback on the GPU (both work in node and browsers, neither require non-JS dependencies). Arrow excels at both of these strategies. > [JS] Improve performance of filtering > - > > Key: ARROW-8053 > URL: https://issues.apache.org/jira/browse/ARROW-8053 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Will Strimling >Priority: Major > > A series of observable notebooks have shown quite convincingly that arrow > doesn't compete with other libraries or JavaScript when it comes to filtering > performance. Has there been any discussion or roadmaps established for > improving it? > Most convincing Observables: > * > [https://observablehq.com/@duaneatat/apache-arrow-filtering-vs-array-filter] > * > [https://observablehq.com/@robertleeplummerjr/array-filtering-apache-arrow-vs-gpu-js-textures-vs-array-fil] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types
[ https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013120#comment-17013120 ] Paul Taylor commented on ARROW-7513: [~lmeyerov] Int64Vector and Uint64Vector.from methods either require you pass the JS BigInt types, or a second "is64bit" boolean argument: https://github.com/apache/arrow/blob/master/js/src/vector/int.ts#L63-L64. All the IntVectors share the same from implementation IIRC because of a limitation in the typescript compiler that may not exist anymore. > [JS] Arrow Tutorial: Common data types > -- > > Key: ARROW-7513 > URL: https://issues.apache.org/jira/browse/ARROW-7513 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Leo Meyerovich >Assignee: Leo Meyerovich >Priority: Minor > > The JS client lacks basic introductory material around creating the common > basic data types such as turning JS arrays into ints, dicts, etc. There is no > equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . > This has made use for myself difficult, and I bet for others. > > As with prev tutorials, I started sketching on > [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit] > . When we're happy can make sense to export as an html or something to the > repo, or just link from the main readme. > I believe the target topics worth covering are: > * Common user data types: Ints, Dicts, Struct, Time > * Common column types: Data, Vector, Column > * Going from individual & arrays & buffers of JS values to Arrow-wrapped > forms, and basic inspection of the result > Not worth going into here is Tables vs. RecordBatches, which is the other > tutorial. > > 1. Ideas of what to add/edit/remove? > 2. And anyone up for helping with discussion of Data vs. Vector, and ingest > of Time & Struct? > 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff > here. > > cc [~wesm] [~bhulette] [~paul.e.taylor] > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6886) [C++] arrow::io header nvcc compiler warnings
[ https://issues.apache.org/jira/browse/ARROW-6886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-6886: --- Fix Version/s: 0.15.1 > [C++] arrow::io header nvcc compiler warnings > - > > Key: ARROW-6886 > URL: https://issues.apache.org/jira/browse/ARROW-6886 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.15.0 >Reporter: Paul Taylor >Priority: Trivial > Labels: pull-request-available > Fix For: 0.15.1 > > Time Spent: 50m > Remaining Estimate: 0h > > Seeing the following compiler warnings statically linking the arrow::io > headers with nvcc: > {noformat} > arrow/install/include/arrow/io/file.h(189): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MemoryMappedFile" > arrow/install/include/arrow/io/memory.h(98): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MockOutputStream" > arrow/install/include/arrow/io/memory.h(116): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::FixedSizeBufferWriter" > arrow/install/include/arrow/io/file.h(189): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MemoryMappedFile" > arrow/install/include/arrow/io/memory.h(98): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MockOutputStream" > arrow/install/include/arrow/io/memory.h(116): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::FixedSizeBufferWriter" > arrow/install/include/arrow/io/file.h(189): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MemoryMappedFile" > arrow/install/include/arrow/io/memory.h(98): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MockOutputStream" > arrow/install/include/arrow/io/memory.h(116): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::FixedSizeBufferWriter" > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6886) [C++] arrow::io header nvcc compiler warnings
[ https://issues.apache.org/jira/browse/ARROW-6886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953183#comment-16953183 ] Paul Taylor commented on ARROW-6886: [~apitrou] Yeah this warning is benign, but the team is moving towards a zero-tolerance policy for compilation warnings. I'm looking into a fix now, hopefully can have a PR ready in time for 0.15.1. These headers are included by some of our cuda files, so here's the full command as generated by cmake: {noformat} /usr/local/cuda-10.0/bin/nvcc -DARROW_METADATA_V4 -Dcudf_EXPORTS -Igoogletest/install/include -Iinclude -I../include -I../src -I../thirdparty/cub -I../thirdparty/jitify -I../thirdparty/libcudacxx/include -Iarrow/install/include -Iarrow/build/flatbuffers_ep-prefix/src/flatbuffers_ep-install/include -I/home/ptaylor/dev/rapids/compose/etc/conda/envs/rapids/include -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call -Xcompiler -Wall,-Werror --define-macro HT_LEGACY_ALLOCATOR -O3 -DNDEBUG -Xcompiler=-fPIC -DJITIFY_USE_CACHE -DCUDF_VERSION=0.11.0 -std=c++14 -x cu -x cuda -c /home/ptaylor/dev/rapids/cudf/cpp/src/io/avro/avro_reader_impl.cu -o CMakeFiles/cudf.dir/src/io/avro/avro_reader_impl.cu.o && /usr/local/cuda-10.0/bin/nvcc -DARROW_METADATA_V4 -Dcudf_EXPORTS -Igoogletest/install/include -Iinclude -I../include -I../src -I../thirdparty/cub -I../thirdparty/jitify -I../thirdparty/libcudacxx/include -Iarrow/install/include -Iarrow/build/flatbuffers_ep-prefix/src/flatbuffers_ep-install/include -I/home/ptaylor/dev/rapids/compose/etc/conda/envs/rapids/include -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call -Xcompiler -Wall,-Werror --define-macro HT_LEGACY_ALLOCATOR -O3 -DNDEBUG -Xcompiler=-fPIC -DJITIFY_USE_CACHE -DCUDF_VERSION=0.11.0 -std=c++14 -x cu -x cuda -M /home/ptaylor/dev/rapids/cudf/cpp/src/io/avro/avro_reader_impl.cu -MT CMakeFiles/cudf.dir/src/io/avro/avro_reader_impl.cu.o -o $DEP_FILE {noformat} > [C++] arrow::io header nvcc compiler warnings > - > > Key: ARROW-6886 > URL: https://issues.apache.org/jira/browse/ARROW-6886 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.15.0 >Reporter: Paul Taylor >Priority: Trivial > > Seeing the following compiler warnings statically linking the arrow::io > headers with nvcc: > {noformat} > arrow/install/include/arrow/io/file.h(189): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MemoryMappedFile" > arrow/install/include/arrow/io/memory.h(98): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MockOutputStream" > arrow/install/include/arrow/io/memory.h(116): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::FixedSizeBufferWriter" > arrow/install/include/arrow/io/file.h(189): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MemoryMappedFile" > arrow/install/include/arrow/io/memory.h(98): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MockOutputStream" > arrow/install/include/arrow/io/memory.h(116): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::FixedSizeBufferWriter" > arrow/install/include/arrow/io/file.h(189): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MemoryMappedFile" > arrow/install/include/arrow/io/memory.h(98): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::MockOutputStream" > arrow/install/include/arrow/io/memory.h(116): warning: overloaded virtual > function "arrow::io::Writable::Write" is only partially overridden in class > "arrow::io::FixedSizeBufferWriter" > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6886) [C++] arrow::io header nvcc compiler warnings
Paul Taylor created ARROW-6886: -- Summary: [C++] arrow::io header nvcc compiler warnings Key: ARROW-6886 URL: https://issues.apache.org/jira/browse/ARROW-6886 Project: Apache Arrow Issue Type: New Feature Components: C++ Affects Versions: 0.15.0 Reporter: Paul Taylor Seeing the following compiler warnings statically linking the arrow::io headers with nvcc: {noformat} arrow/install/include/arrow/io/file.h(189): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::MemoryMappedFile" arrow/install/include/arrow/io/memory.h(98): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::MockOutputStream" arrow/install/include/arrow/io/memory.h(116): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::FixedSizeBufferWriter" arrow/install/include/arrow/io/file.h(189): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::MemoryMappedFile" arrow/install/include/arrow/io/memory.h(98): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::MockOutputStream" arrow/install/include/arrow/io/memory.h(116): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::FixedSizeBufferWriter" arrow/install/include/arrow/io/file.h(189): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::MemoryMappedFile" arrow/install/include/arrow/io/memory.h(98): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::MockOutputStream" arrow/install/include/arrow/io/memory.h(116): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::FixedSizeBufferWriter" {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6759) [JS] Run less comprehensive every-commit build, relegate multi-target builds perhaps to nightlies
[ https://issues.apache.org/jira/browse/ARROW-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943014#comment-16943014 ] Paul Taylor commented on ARROW-6759: Yeah no sweat, we can change the `ci/travis_script_js.sh` build and test commands to only test the UMD builds. Historically these have the most issues since they're minified, so if they pass everything should pass: {code:bash} npm run build -- -m umd -t es5 -t es2015 -t esnext npm test -- -m umd -t es5 -t es2015 -t esnext {code} > [JS] Run less comprehensive every-commit build, relegate multi-target builds > perhaps to nightlies > - > > Key: ARROW-6759 > URL: https://issues.apache.org/jira/browse/ARROW-6759 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > The JavaScript CI build is taking 25-30 minutes nowadays. This could be > abbreviated by testing fewer deployment targets. We obviously still need to > test all the deployment targets but we could do that nightly instead of on > every commit -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-6575) [JS] decimal toString does not support negative values
[ https://issues.apache.org/jira/browse/ARROW-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934909#comment-16934909 ] Paul Taylor edited comment on ARROW-6575 at 9/25/19 2:24 AM: - [~zad] Yeah I couldn't figure out how to propagate the sign bit through the decimal conversion. I'd be happy to review a PR if you know the right way to do it. was (Author: paul.e.taylor): Yeah, I couldn't figure out how to propagate the sign bit through the decimal conversion. I'd be happy to review a PR if you know the right way to do it. > [JS] decimal toString does not support negative values > -- > > Key: ARROW-6575 > URL: https://issues.apache.org/jira/browse/ARROW-6575 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.14.1 >Reporter: Andong Zhan >Priority: Critical > > The main description is here: [https://github.com/apache/arrow/issues/5397] > Also, I have a simple test case (slightly changed generate-test-data.js and > generated-data-validators): > {code:java} > export const decimal = (length = 2, nullCount = length * 0.2 | 0, scale = 0, > precision = 38) => vectorGenerator.visit(new Decimal(scale, precision), > length, nullCount); > function fillDecimal(length: number) { > // const BPE = Uint32Array.BYTES_PER_ELEMENT; // 4 > const array = new Uint32Array(length); > // const max = (2 ** (8 * BPE)) - 1; > // for (let i = -1; ++i < length; array[i] = rand() * max * (rand() > 0.5 > ? -1 : 1)); > array[0] = 0; > array[1] = 1286889712; > array[2] = 2218195178; > array[3] = 4282345521; > array[4] = 0; > array[5] = 16004768; > array[6] = 3587851993; > array[7] = 126217744; > return array; > } > {code} > and the expected value should be > {code:java} > expect(vector.get(0).toString()).toBe('-1'); > expect(vector.get(1).toString()).toBe('1'); > {code} > However, the actual first value is 339282366920938463463374607431768211456 > which is wrong! The second value is correct by the way. > I believe the bug is in the function called > function decimalToString>(a: T) because it cannot > return a negative value at all. > [arrow/js/src/util/bn.ts|https://github.com/apache/arrow/blob/d54425de19b7dbb2764a40355d76d1c785cf64ec/js/src/util/bn.ts#L99] > Line 99 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6641) Remove Deprecated WriteableFile warning
[ https://issues.apache.org/jira/browse/ARROW-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934910#comment-16934910 ] Paul Taylor commented on ARROW-6641: I think this was addressed by [e41ad0d2|https://github.com/apache/arrow/commit/e41ad0d2ccaf96812d902b161d8a0b2b372f1b72] which should make it into the 0.15 release. > Remove Deprecated WriteableFile warning > --- > > Key: ARROW-6641 > URL: https://issues.apache.org/jira/browse/ARROW-6641 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.14.0, 0.14.1 >Reporter: Karthikeyan Natarajan >Priority: Major > Labels: newbie > > Current version is 0.14.1. As per comment, deprecated `WriteableFile` should > be removed. > > {code:java} > // TODO(kszucs): remove this after 0.13 > #ifndef _MSC_VER > using WriteableFile ARROW_DEPRECATED("Use WritableFile") = WritableFile; > using ReadableFileInterface ARROW_DEPRECATED("Use RandomAccessFile") = > RandomAccessFile; > #else > // MSVC does not like using ARROW_DEPRECATED with using declarations > using WriteableFile = WritableFile; > using ReadableFileInterface = RandomAccessFile; > #endif > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6370) [JS] Table.from adds 0 on int columns
[ https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor closed ARROW-6370. -- Resolution: Not A Bug > [JS] Table.from adds 0 on int columns > - > > Key: ARROW-6370 > URL: https://issues.apache.org/jira/browse/ARROW-6370 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.14.1 >Reporter: Sascha Hofmann >Priority: Major > > I am generating an arrow table in pyarrow and send it via gRPC like this: > {code:java} > sink = pa.BufferOutputStream() > writer = pa.RecordBatchStreamWriter(sink, batch.schema) > writer.write_batch(batch) > writer.close() > yield ds.Response( > status=200, > loading=False, > response=[sink.getvalue().to_pybytes()] > ) > {code} > On the javascript end, I parse it like that: > {code:java} > Table.from(response.getResponseList()[0]) > {code} > That works but when I look at the actual table, int columns have a 0 for > every other row. String columns seem to be parsed just fine. > The Python byte array created from to_pybytes() has the same length as > received in javascript. I am also able to recreate the original table for the > byte array in Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6575) [JS] decimal toString does not support negative values
[ https://issues.apache.org/jira/browse/ARROW-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934909#comment-16934909 ] Paul Taylor commented on ARROW-6575: Yeah, I couldn't figure out how to propagate the sign bit through the decimal conversion. I'd be happy to review a PR if you know the right way to do it. > [JS] decimal toString does not support negative values > -- > > Key: ARROW-6575 > URL: https://issues.apache.org/jira/browse/ARROW-6575 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.14.1 >Reporter: Andong Zhan >Priority: Critical > > The main description is here: [https://github.com/apache/arrow/issues/5397] > Also, I have a simple test case (slightly changed generate-test-data.js and > generated-data-validators): > {code:java} > export const decimal = (length = 2, nullCount = length * 0.2 | 0, scale = 0, > precision = 38) => vectorGenerator.visit(new Decimal(scale, precision), > length, nullCount); > function fillDecimal(length: number) { > // const BPE = Uint32Array.BYTES_PER_ELEMENT; // 4 > const array = new Uint32Array(length); > // const max = (2 ** (8 * BPE)) - 1; > // for (let i = -1; ++i < length; array[i] = rand() * max * (rand() > 0.5 > ? -1 : 1)); > array[0] = 0; > array[1] = 1286889712; > array[2] = 2218195178; > array[3] = 4282345521; > array[4] = 0; > array[5] = 16004768; > array[6] = 3587851993; > array[7] = 126217744; > return array; > } > {code} > and the expected value should be > {code:java} > expect(vector.get(0).toString()).toBe('-1'); > expect(vector.get(1).toString()).toBe('1'); > {code} > However, the actual first value is 339282366920938463463374607431768211456 > which is wrong! The second value is correct by the way. > I believe the bug is in the function called > function decimalToString>(a: T) because it cannot > return a negative value at all. > [arrow/js/src/util/bn.ts|https://github.com/apache/arrow/blob/d54425de19b7dbb2764a40355d76d1c785cf64ec/js/src/util/bn.ts#L99] > Line 99 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6574) [JS] TypeError with utf8 and JSONVectorLoader.readData
[ https://issues.apache.org/jira/browse/ARROW-6574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934903#comment-16934903 ] Paul Taylor commented on ARROW-6574: [~akre54] This is the JSON IPC format which is only suitable for integration tests between the different Arrow implementations. You can use the Vector [Builders|https://github.com/apache/arrow/blob/b2785d38a110c8fd8a3d7c957cd78d8911607a5e/js/src/builder.ts#L54] to encode to arbitrary JS objects into Arrow Vectors and Tables. The raw Builder APIs allow you to control every aspect of the chunking and flushing behavior, but as a consequence are relatively low-level. There are higher-level APIs for transforming values from iterables, async iterables, node streams, or DOM streams. You can see examples of usage [in the tests here|https://github.com/apache/arrow/blob/b2785d38a110c8fd8a3d7c957cd78d8911607a5e/js/test/unit/builders/builder-tests.ts#L261], or see [this example|https://github.com/trxcllnt/csv-to-arrow-js] converting a CSV row stream to Arrow. Lastly if your values are already in memory, you can call `Vector.from()` with an Arrow type and an iterable (or async-iterable) of JS values, and it'll use the Builders to return a Vector of the specified type: {code:javascript} // create from a list of numbers or a Float32Array (zero-copy) -- all values will be valid const f32 = Float32Vector.from([1.1, 2.5, 3.7]); // or a different style, handy if inferring the types at runtime // values in the `nullValues` array will be treated as NULL, and written in the validity bitmap const f32 = Vector.from({ nullValues: [-1, NaN], type: new Arrow.Float32(), values: [1.1, -1, 2.5, 3.7, NaN], }); // ^ result: [1.1, null, 2.5, 3.7, null] // or with values from an AsyncIterator const f32 = await Vector.from({ type: new Arrow.Float32(), values: (async function*() { yield* [1.1, 2.5, 3.7]; }()) }); {code} > [JS] TypeError with utf8 and JSONVectorLoader.readData > -- > > Key: ARROW-6574 > URL: https://issues.apache.org/jira/browse/ARROW-6574 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.14.1 > Environment: node v10.16.0, OSX 10.14.5 >Reporter: Adam M Krebs >Priority: Major > > Minimal repro: > > {code:javascript} > const fields = [ > { > name: 'first_name', > type: {name: 'utf8'}, > nullable: false, > children: [], > }, > ]; > Table.from({ > schema: {fields}, > batches: [{ > count: 1, > columns: [{ > name: 'first_name', > count: 1, > VALIDITY: [], > DATA: ['Fred'] > }] > }] > });{code} > > Output: > {code:java} > /[snip]/node_modules/apache-arrow/visitor/vectorloader.js:92 > readData(type, { offset } = this.nextBufferRange()) { > ^TypeError: Cannot destructure property `offset` of > 'undefined' or 'null'. > at JSONVectorLoader.readData > (/[snip]/node_modules/apache-arrow/visitor/vectorloader.js:92:38) > at JSONVectorLoader.visitUtf8 > (/[snip]/node_modules/apache-arrow/visitor/vectorloader.js:46:188) > at JSONVectorLoader.visit > (/[snip]/node_modules/apache-arrow/visitor.js:28:48) > at JSONVectorLoader.visit > (/[snip]/node_modules/apache-arrow/visitor/vectorloader.js:40:22) > at nodes.map (/[snip]/node_modules/apache-arrow/visitor.js:25:44) > at Array.map () > at JSONVectorLoader.visitMany > (/[snip]/node_modules/apache-arrow/visitor.js:25:22) > at RecordBatchJSONReaderImpl._loadVectors > (/[snip]/node_modules/apache-arrow/ipc/reader.js:523:107) > at RecordBatchJSONReaderImpl._loadRecordBatch > (/[snip]/node_modules/apache-arrow/ipc/reader.js:209:79) > at RecordBatchJSONReaderImpl.next > (/[snip]/node_modules/apache-arrow/ipc/reader.js:280:42){code} > > > Looks like the `nextBufferRange` call is returning `undefined`, due to an > out-of-bounds `buffersIndex`. > > Happy to provide more info if needed. Seems to only affect utf8 types and > nothing else. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-6370) [JS] Table.from adds 0 on int columns
[ https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928851#comment-16928851 ] Paul Taylor edited comment on ARROW-6370 at 9/12/19 7:53 PM: - [~saschahofmann] bq. From my understanding of Arrow it should be a platform-independent format, meaning that if I am sending an arrow table from Python to JS it should turn out the same, right? Yes, and that's what's happening here. But you're sending 8-byte integers to a platform which has historically only supported 4-byte integers, which is why you see each 8-byte integer as a pair of 4-byte integers. I recommend reading [this post|https://v8.dev/features/bigint] on BigInts in the v8 blog. BigInts (and their related typed arrays) are relatively new additions to JS, and aren't supported in all engines yet. We have done our best to support getting and setting BigInt values when running in VM that does support them, but for now we still have to support platforms without BigInt. That's why the values Array for Int64Vector is a stride-2 Int32Array instead of a BigInt64Array. was (Author: paul.e.taylor): [~saschahofmann] bq. From my understanding of Arrow it should be a platform-independent format, meaning that if I am sending an arrow table from Python to JS it should turn out the same, right? Yes, and that's what's happening here. But you're sending 8-byte integers to a platform which has historically only supported 4-byte integers, which is why you see each 8-byte integer as a pair of 4-byte integers. I recommend reading [this post|https://v8.dev/features/bigint] on BigInts in the v8 blog. BigInts (and their related typed arrays) are relatively new additions to JS, and aren't supported in all engines yet. We have done our best to support getting and setting BigInt values when running in VM that do support them, but for now we still have to support platforms without BigInt. That's why the values Array for Int64Vector is a stride-2 Int32Array instead of a BigInt64Array. > [JS] Table.from adds 0 on int columns > - > > Key: ARROW-6370 > URL: https://issues.apache.org/jira/browse/ARROW-6370 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.14.1 >Reporter: Sascha Hofmann >Priority: Major > > I am generating an arrow table in pyarrow and send it via gRPC like this: > {code:java} > sink = pa.BufferOutputStream() > writer = pa.RecordBatchStreamWriter(sink, batch.schema) > writer.write_batch(batch) > writer.close() > yield ds.Response( > status=200, > loading=False, > response=[sink.getvalue().to_pybytes()] > ) > {code} > On the javascript end, I parse it like that: > {code:java} > Table.from(response.getResponseList()[0]) > {code} > That works but when I look at the actual table, int columns have a 0 for > every other row. String columns seem to be parsed just fine. > The Python byte array created from to_pybytes() has the same length as > received in javascript. I am also able to recreate the original table for the > byte array in Python. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (ARROW-6370) [JS] Table.from adds 0 on int columns
[ https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928851#comment-16928851 ] Paul Taylor edited comment on ARROW-6370 at 9/12/19 7:52 PM: - [~saschahofmann] bq. From my understanding of Arrow it should be a platform-independent format, meaning that if I am sending an arrow table from Python to JS it should turn out the same, right? Yes, and that's what's happening here. But you're sending 8-byte integers to a platform which has historically only supported 4-byte integers, which is why you see each 8-byte integer as a pair of 4-byte integers. I recommend reading [this post|https://v8.dev/features/bigint] on BigInts in the v8 blog. BigInts (and their related typed arrays) are relatively new additions to JS, and aren't supported in all engines yet. We have done our best to support getting and setting BigInt values when running in VM that do support them, but for now we still have to support platforms without BigInt. That's why the values Array for Int64Vector is a stride-2 Int32Array instead of a BigInt64Array. was (Author: paul.e.taylor): [~saschahofmann] bq. From my understanding of Arrow it should be a platform-independent format, meaning that if I am sending an arrow table from Python to JS it should turn out the same, right? Yes, and that's what's happening here. But you're sending 8-byte integers to a platform which has historically only supported 4-byte integers, which is why you see each 8-byte integer as a pair of 4-byte integers. I recommend reading [this post|https://v8.dev/features/bigint] on BigInts in the v8 blog. BigInts (and their related typed arrays) are relatively new additions to JS, and aren't supported in all engines yet. We have done our best to support geting and setting BigInt values when running in VM that supports them, but for now we still have to support platforms without BigInt. That's why the values Array for Int64Vector is a stride-2 Int32Array instead of a BigInt64Array. > [JS] Table.from adds 0 on int columns > - > > Key: ARROW-6370 > URL: https://issues.apache.org/jira/browse/ARROW-6370 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.14.1 >Reporter: Sascha Hofmann >Priority: Major > > I am generating an arrow table in pyarrow and send it via gRPC like this: > {code:java} > sink = pa.BufferOutputStream() > writer = pa.RecordBatchStreamWriter(sink, batch.schema) > writer.write_batch(batch) > writer.close() > yield ds.Response( > status=200, > loading=False, > response=[sink.getvalue().to_pybytes()] > ) > {code} > On the javascript end, I parse it like that: > {code:java} > Table.from(response.getResponseList()[0]) > {code} > That works but when I look at the actual table, int columns have a 0 for > every other row. String columns seem to be parsed just fine. > The Python byte array created from to_pybytes() has the same length as > received in javascript. I am also able to recreate the original table for the > byte array in Python. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns
[ https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928851#comment-16928851 ] Paul Taylor commented on ARROW-6370: [~saschahofmann] bq. From my understanding of Arrow it should be a platform-independent format, meaning that if I am sending an arrow table from Python to JS it should turn out the same, right? Yes, and that's what's happening here. But you're sending 8-byte integers to a platform which has historically only supported 4-byte integers, which is why you see each 8-byte integer as a pair of 4-byte integers. I recommend reading [this post|https://v8.dev/features/bigint] on BigInts in the v8 blog. BigInts (and their related typed arrays) are relatively new additions to JS, and aren't supported in all engines yet. We have done our best to support geting and setting BigInt values when running in VM that supports them, but for now we still have to support platforms without BigInt. That's why the values Array for Int64Vector is a stride-2 Int32Array instead of a BigInt64Array. > [JS] Table.from adds 0 on int columns > - > > Key: ARROW-6370 > URL: https://issues.apache.org/jira/browse/ARROW-6370 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.14.1 >Reporter: Sascha Hofmann >Priority: Major > > I am generating an arrow table in pyarrow and send it via gRPC like this: > {code:java} > sink = pa.BufferOutputStream() > writer = pa.RecordBatchStreamWriter(sink, batch.schema) > writer.write_batch(batch) > writer.close() > yield ds.Response( > status=200, > loading=False, > response=[sink.getvalue().to_pybytes()] > ) > {code} > On the javascript end, I parse it like that: > {code:java} > Table.from(response.getResponseList()[0]) > {code} > That works but when I look at the actual table, int columns have a 0 for > every other row. String columns seem to be parsed just fine. > The Python byte array created from to_pybytes() has the same length as > received in javascript. I am also able to recreate the original table for the > byte array in Python. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns
[ https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925928#comment-16925928 ] Paul Taylor commented on ARROW-6370: [~saschahofmann] I closed this because this is working as intended. 64-bit little-endian numbers are represented as pairs of lo, hi twos-complement 32-bit integers. If your values are less than 32-bits, the high bits will be zero. We're not inserting zeros, the zeros are part of the data Python is sending to JavaScript. The Int64Vector and Uint64Vector support implicitly casting either to a normal JS 64-bit float as (with 53-bits of precision) if you can afford to lose precision, or to JS's new [BigInt|https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt] type if you need the full 64-bits of precision and are on a platform that supports BigInt (v8 and the newest versions of FF). {code:javascript} const { Int64, Vector } = require('apache-arrow'); let i64s = Vector.from({ type: new Int64(), values: [123n, 456n, 789n] }); for (let x of i64s) { console.log(x); // will be an Int32Array of two numbers: lo, hi console.log(0 + x); // casts to a 53-bit integer, i.e. regular JS float64 console.log(0n + x); // casts to a BigInt, i.e. JS's new 64-bit integer } {code} > [JS] Table.from adds 0 on int columns > - > > Key: ARROW-6370 > URL: https://issues.apache.org/jira/browse/ARROW-6370 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.14.1 >Reporter: Sascha Hofmann >Priority: Major > > I am generating an arrow table in pyarrow and send it via gRPC like this: > {code:java} > sink = pa.BufferOutputStream() > writer = pa.RecordBatchStreamWriter(sink, batch.schema) > writer.write_batch(batch) > writer.close() > yield ds.Response( > status=200, > loading=False, > response=[sink.getvalue().to_pybytes()] > ) > {code} > On the javascript end, I parse it like that: > {code:java} > Table.from(response.getResponseList()[0]) > {code} > That works but when I look at the actual table, int columns have a 0 for > every other row. String columns seem to be parsed just fine. > The Python byte array created from to_pybytes() has the same length as > received in javascript. I am also able to recreate the original table for the > byte array in Python. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-2786) [JS] Read Parquet files in JavaScript
[ https://issues.apache.org/jira/browse/ARROW-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922744#comment-16922744 ] Paul Taylor commented on ARROW-2786: There are a few JS Parquet implementations, with [parquetjs|https://www.npmjs.com/package/parquetjs] being the more mature one from what I recall. A while back I put together [this demo|https://github.com/trxcllnt/arrow-to-parquet-js] converting Arrow -> Parquet in pure JS. The major drawback is the ParquetJS writer is row-oriented, so performance will be an issue. I opened [this issue|https://github.com/ironSource/parquetjs/issues/84] to get some clarification, but haven't heard back yet. > [JS] Read Parquet files in JavaScript > - > > Key: ARROW-2786 > URL: https://issues.apache.org/jira/browse/ARROW-2786 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Reporter: Wes McKinney >Priority: Major > Labels: parquet > > See question in https://github.com/apache/arrow/issues/2209 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (ARROW-6370) [JS] Table.from adds 0 on int columns
[ https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor closed ARROW-6370. -- Resolution: Not A Bug > [JS] Table.from adds 0 on int columns > - > > Key: ARROW-6370 > URL: https://issues.apache.org/jira/browse/ARROW-6370 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.14.1 >Reporter: Sascha Hofmann >Priority: Major > > I am generating an arrow table in pyarrow and send it via gRPC like this: > {code:java} > sink = pa.BufferOutputStream() > writer = pa.RecordBatchStreamWriter(sink, batch.schema) > writer.write_batch(batch) > writer.close() > yield ds.Response( > status=200, > loading=False, > response=[sink.getvalue().to_pybytes()] > ) > {code} > On the javascript end, I parse it like that: > {code:java} > Table.from(response.getResponseList()[0]) > {code} > That works but when I look at the actual table, int columns have a 0 for > every other row. String columns seem to be parsed just fine. > The Python byte array created from to_pybytes() has the same length as > received in javascript. I am also able to recreate the original table for the > byte array in Python. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (ARROW-5741) [JS] Make numeric vector from functions consistent with TypedArray.from
[ https://issues.apache.org/jira/browse/ARROW-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-5741. Resolution: Fixed Fix Version/s: (was: 1.0.0) 0.15.0 Issue resolved by pull request 4746 [https://github.com/apache/arrow/pull/4746] > [JS] Make numeric vector from functions consistent with TypedArray.from > --- > > Key: ARROW-5741 > URL: https://issues.apache.org/jira/browse/ARROW-5741 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Brian Hulette >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Described in > https://lists.apache.org/thread.html/b648a781cba7f10d5a6072ff2e7dab6c03e2d1f12e359d9261891486@%3Cdev.arrow.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6053) [Python] RecordBatchStreamReader::Open2 cdef type signature doesn't match C++
Paul Taylor created ARROW-6053: -- Summary: [Python] RecordBatchStreamReader::Open2 cdef type signature doesn't match C++ Key: ARROW-6053 URL: https://issues.apache.org/jira/browse/ARROW-6053 Project: Apache Arrow Issue Type: New Feature Components: Python Affects Versions: 0.14.1 Reporter: Paul Taylor Assignee: Paul Taylor The Cython method signature for RecordBatchStreamReader::Open2 doesn't match the C++ type signature and causes a compiler type error trying to call Open2 from Cython. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (ARROW-5762) [Integration][JS] Integration Tests for Map Type
[ https://issues.apache.org/jira/browse/ARROW-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor reassigned ARROW-5762: -- Assignee: Paul Taylor > [Integration][JS] Integration Tests for Map Type > > > Key: ARROW-5762 > URL: https://issues.apache.org/jira/browse/ARROW-5762 > Project: Apache Arrow > Issue Type: Improvement > Components: Integration, JavaScript >Reporter: Bryan Cutler >Assignee: Paul Taylor >Priority: Major > Fix For: 1.0.0 > > > ARROW-1279 enabled integration tests for MapType between Java and C++, but > JavaScript had to be disabled for the map case due to an error. Once this is > fixed, {{generate_map_case}} could be moved under {{generate_nested_case}} > with the other nested types. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5762) [Integration][JS] Integration Tests for Map Type
[ https://issues.apache.org/jira/browse/ARROW-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-5762: --- Fix Version/s: 1.0.0 > [Integration][JS] Integration Tests for Map Type > > > Key: ARROW-5762 > URL: https://issues.apache.org/jira/browse/ARROW-5762 > Project: Apache Arrow > Issue Type: Improvement > Components: Integration, JavaScript >Reporter: Bryan Cutler >Priority: Major > Fix For: 1.0.0 > > > ARROW-1279 enabled integration tests for MapType between Java and C++, but > JavaScript had to be disabled for the map case due to an error. Once this is > fixed, {{generate_map_case}} could be moved under {{generate_nested_case}} > with the other nested types. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-5762) [Integration][JS] Integration Tests for Map Type
[ https://issues.apache.org/jira/browse/ARROW-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887544#comment-16887544 ] Paul Taylor commented on ARROW-5762: After reviewing the C++, the JS version of the Map type is not the same (it's essentially a Struct except instead of accessing child fields by field index, they're accessed by name). We should absolutely update the JS Map implementation before the 1.0 release. > [Integration][JS] Integration Tests for Map Type > > > Key: ARROW-5762 > URL: https://issues.apache.org/jira/browse/ARROW-5762 > Project: Apache Arrow > Issue Type: Improvement > Components: Integration, JavaScript >Reporter: Bryan Cutler >Priority: Major > > ARROW-1279 enabled integration tests for MapType between Java and C++, but > JavaScript had to be disabled for the map case due to an error. Once this is > fixed, {{generate_map_case}} could be moved under {{generate_nested_case}} > with the other nested types. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-5763) [JS] enable integration tests for MapVector
[ https://issues.apache.org/jira/browse/ARROW-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887479#comment-16887479 ] Paul Taylor commented on ARROW-5763: After reviewing the C++, the JS version of the Map type is not the same (it's essentially a Struct except instead of accessing child fields by field index, they're accessed by name). We should absolutely update the JS Map implementation before the 1.0 release. > [JS] enable integration tests for MapVector > --- > > Key: ARROW-5763 > URL: https://issues.apache.org/jira/browse/ARROW-5763 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Reporter: Benjamin Kietzman >Priority: Minor > > As of 0.14, C++ and Java support Map arrays those implementations pass > integration tests. JS has a MapVector and some unit tests for it, but it > should be tested against other implementations as well -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (ARROW-5532) [JS] Field Metadata Not Read
[ https://issues.apache.org/jira/browse/ARROW-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-5532. Resolution: Fixed Issue resolved by pull request 4476 https://github.com/apache/arrow/pull/4476 > [JS] Field Metadata Not Read > > > Key: ARROW-5532 > URL: https://issues.apache.org/jira/browse/ARROW-5532 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.13.0 > Environment: Mac OSX 10.14, Chrome 74 >Reporter: Trey Hakanson >Assignee: Paul Taylor >Priority: Major > Labels: Javas > Fix For: 0.14.0 > > > Field metadata is not read when using {{@apache-arrow/ts@0.13.0}}. Example > below also uses {{pyarrow==0.13.0}} > Steps to reproduce: > Adding metadata: > {code:title=toarrow.py|borderStyle=solid} > import pyarrow as pa > import pandas as pd > source = "sample.csv" > output = "sample.arrow" > df = pd.read_csv(source) > table = pa.Table.from_pandas(df) > schema = pa.schema([ > column.field.add_metadata({"foo": "bar"})) > for column > in table.columns > ]) > writer = pa.RecordBatchFileWriter(output, schema) > writer.write(table) > writer.close() > {code} > Reading field metadata using {{pyarrow}}: > {code:title=readarrow.py|borderStyle=solid} > source = "sample.arrow" > field = "foo" > reader = pa.RecordBatchFileReader(source) > reader.schema.field_by_name(field).metadata # Correctly shows `{"foo": "bar"}` > {code} > Reading field metadata using {{@apache-arrow/ts}}: > {code:title=toarrow.ts|borderStyle=solid} > import { Table, Field, Type } from "@apache-arrow/ts"; > const url = "https://example.com/sample.arrow";; > const buf = await fetch(url).then(res => res.arrayBuffer()); > const table = Table.from([new Uint8Array(buf)]); > for (let field of table.schema.fields) { > field.metadata; // Incorrectly shows an empty map > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5537) [JS] Support delta dictionaries in RecordBatchWriter and DictionaryBuilder
Paul Taylor created ARROW-5537: -- Summary: [JS] Support delta dictionaries in RecordBatchWriter and DictionaryBuilder Key: ARROW-5537 URL: https://issues.apache.org/jira/browse/ARROW-5537 Project: Apache Arrow Issue Type: New Feature Affects Versions: 0.13.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: 0.14.0 The new JS DictionaryBuilder and RecordBatchWriter and should support building and writing delta dictionary batches to enable creating DictionaryVectors while streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5239) Add support for interval types in javascript
[ https://issues.apache.org/jira/browse/ARROW-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846108#comment-16846108 ] Paul Taylor commented on ARROW-5239: We have the Interval year_month and day_time types in JS, but I'm not sure if this issue is about a new kind of Interval DataType. [~emkornfi...@gmail.com], any thoughts? > Add support for interval types in javascript > > > Key: ARROW-5239 > URL: https://issues.apache.org/jira/browse/ARROW-5239 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Reporter: Micah Kornfield >Priority: Major > > Update integration_test.py to include interval tests for JSTest once this is > done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5396) [JS] Ensure reader and writer support files and streams with no RecordBatches
Paul Taylor created ARROW-5396: -- Summary: [JS] Ensure reader and writer support files and streams with no RecordBatches Key: ARROW-5396 URL: https://issues.apache.org/jira/browse/ARROW-5396 Project: Apache Arrow Issue Type: New Feature Components: JavaScript Affects Versions: 0.13.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: 0.14.0 Re: https://issues.apache.org/jira/browse/ARROW-2119 and [https://github.com/apache/arrow/pull/3871], the JS reader and writer should support files and streams with a Schema but no RecordBatches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5100) [JS] Writer swaps byte order if buffers share the same underlying ArrayBuffer
[ https://issues.apache.org/jira/browse/ARROW-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-5100. Resolution: Fixed Issue resolved by pull request 4102 [https://github.com/apache/arrow/pull/4102] > [JS] Writer swaps byte order if buffers share the same underlying ArrayBuffer > - > > Key: ARROW-5100 > URL: https://issues.apache.org/jira/browse/ARROW-5100 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.13.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We collapse contiguous Uint8Arrays that share the same underlying ArrayBuffer > and have overlapping byte ranges. This was done to maintain true zero-copy > behavior when using certain node core streams that use a buffer pool > internally, and could write chunks of the same logical Arrow Message at > out-of-order byte offsets in the pool. > Unfortunately this can also lead to a bug where, in rare cases, buffers are > swapped while writing Arrow Messages too. We could have a flag to indicate > whether we think collapsing out-of-order same-buffer chunks is safe, but I'm > not sure if we can always know that, so I'd prefer to take it out and incur > the copy cost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5115) [JS] Implement the Vector Builders
Paul Taylor created ARROW-5115: -- Summary: [JS] Implement the Vector Builders Key: ARROW-5115 URL: https://issues.apache.org/jira/browse/ARROW-5115 Project: Apache Arrow Issue Type: New Feature Components: JavaScript Affects Versions: 0.13.0 Reporter: Paul Taylor Assignee: Paul Taylor We should implement the streaming Vector Builders in JS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5100) [JS] Writer swaps byte order if buffers share the same underlying ArrayBuffer
Paul Taylor created ARROW-5100: -- Summary: [JS] Writer swaps byte order if buffers share the same underlying ArrayBuffer Key: ARROW-5100 URL: https://issues.apache.org/jira/browse/ARROW-5100 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: 0.13.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: 0.14.0 We collapse contiguous Uint8Arrays that share the same underlying ArrayBuffer and have overlapping byte ranges. This was done to maintain true zero-copy behavior when using certain node core streams that use a buffer pool internally, and could write chunks of the same logical Arrow Message at out-of-order byte offsets in the pool. Unfortunately this can also lead to a bug where, in rare cases, buffers are swapped while writing Arrow Messages too. We could have a flag to indicate whether we think collapsing out-of-order same-buffer chunks is safe, but I'm not sure if we can always know that, so I'd prefer to take it out and incur the copy cost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4976) [JS] RecordBatchReader should reset its Node/DOM streams
Paul Taylor created ARROW-4976: -- Summary: [JS] RecordBatchReader should reset its Node/DOM streams Key: ARROW-4976 URL: https://issues.apache.org/jira/browse/ARROW-4976 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: JS-0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.1 RecordBatchReaders should reset their internal platform streams on reset so they can be piped to separate output streams when reset. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4780) [JS] Package sourcemap files, update default package JS version
[ https://issues.apache.org/jira/browse/ARROW-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4780: --- Affects Version/s: JS-0.4.0 > [JS] Package sourcemap files, update default package JS version > --- > > Key: ARROW-4780 > URL: https://issues.apache.org/jira/browse/ARROW-4780 > Project: Apache Arrow > Issue Type: Improvement >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Minor > > The build should split the sourcemaps out to speed up client builds, and > include a "module" entry in the package.json for @pika/web, and the main > package should ship the latest ESNext JS versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4780) [JS] Package sourcemap files, update default package JS version
Paul Taylor created ARROW-4780: -- Summary: [JS] Package sourcemap files, update default package JS version Key: ARROW-4780 URL: https://issues.apache.org/jira/browse/ARROW-4780 Project: Apache Arrow Issue Type: Improvement Reporter: Paul Taylor Assignee: Paul Taylor The build should split the sourcemaps out to speed up client builds, and include a "module" entry in the package.json for @pika/web, and the main package should ship the latest ESNext JS versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4781) [JS] Ensure empty data initializes empty typed arrays
Paul Taylor created ARROW-4781: -- Summary: [JS] Ensure empty data initializes empty typed arrays Key: ARROW-4781 URL: https://issues.apache.org/jira/browse/ARROW-4781 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: JS-0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.1 Empty ArrayData instances should initialize with the appropriate 0-length buffers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4780) [JS] Package sourcemap files, update default package JS version
[ https://issues.apache.org/jira/browse/ARROW-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4780: --- Component/s: JavaScript > [JS] Package sourcemap files, update default package JS version > --- > > Key: ARROW-4780 > URL: https://issues.apache.org/jira/browse/ARROW-4780 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Minor > Fix For: JS-0.4.1 > > > The build should split the sourcemaps out to speed up client builds, and > include a "module" entry in the package.json for @pika/web, and the main > package should ship the latest ESNext JS versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4780) [JS] Package sourcemap files, update default package JS version
[ https://issues.apache.org/jira/browse/ARROW-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4780: --- Fix Version/s: JS-0.4.1 > [JS] Package sourcemap files, update default package JS version > --- > > Key: ARROW-4780 > URL: https://issues.apache.org/jira/browse/ARROW-4780 > Project: Apache Arrow > Issue Type: Improvement >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Minor > Fix For: JS-0.4.1 > > > The build should split the sourcemaps out to speed up client builds, and > include a "module" entry in the package.json for @pika/web, and the main > package should ship the latest ESNext JS versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3667) [JS] Incorrectly reads record batches with an all null column
[ https://issues.apache.org/jira/browse/ARROW-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-3667. Resolution: Fixed Assignee: Paul Taylor Fix Version/s: JS-0.4.1 Issue resolved by pull request 3787 [https://github.com/apache/arrow/pull/3787] > [JS] Incorrectly reads record batches with an all null column > - > > Key: ARROW-3667 > URL: https://issues.apache.org/jira/browse/ARROW-3667 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: JS-0.3.1 >Reporter: Brian Hulette >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.5.0, JS-0.4.1 > > > The JS library seems to incorrectly read any columns that come after an > all-null column in IPC buffers produced by pyarrow. > Here's a python script that generates two arrow buffers, one with an all-null > column followed by a utf-8 column, and a second with those two reversed > {code:python} > import pyarrow as pa > import pandas as pd > def serialize_to_arrow(df, fd, compress=True): > batch = pa.RecordBatch.from_pandas(df) > writer = pa.RecordBatchFileWriter(fd, batch.schema) > writer.write_batch(batch) > writer.close() > if __name__ == "__main__": > df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', > 'def', 'ghi']}, columns=['nulls', 'not nulls']) > with open('bad.arrow', 'wb') as fd: > serialize_to_arrow(df, fd) > df = pd.DataFrame(df, columns=['not nulls', 'nulls']) > with open('good.arrow', 'wb') as fd: > serialize_to_arrow(df, fd) > {code} > JS incorrectly interprets the [null, not null] case: > {code:javascript} > > var arrow = require('apache-arrow') > undefined > > var fs = require('fs') > undefined > > arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not > > nulls').get(0) > 'abc' > > arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0) > '\u\u\u\u\u0003\u\u\u\u0006\u\u\u\t\u\u\u' > {code} > Presumably this is because pyarrow is omitting some (or all) of the buffers > associated with the all-null column, but the JS IPC reader is still looking > for them, causing the buffer count to get out of sync. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4728) [JS] Failing test Table#assign with a zero-length Null column round-trips through serialization
[ https://issues.apache.org/jira/browse/ARROW-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-4728. Resolution: Fixed Fix Version/s: JS-0.4.1 Issue resolved by pull request 3789 [https://github.com/apache/arrow/pull/3789] > [JS] Failing test Table#assign with a zero-length Null column round-trips > through serialization > --- > > Key: ARROW-4728 > URL: https://issues.apache.org/jira/browse/ARROW-4728 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.12.1 >Reporter: Francois Saint-Jacques >Assignee: Paul Taylor >Priority: Major > Labels: ci-failure, pull-request-available, travis-ci > Fix For: 0.13.0, JS-0.4.1 > > Time Spent: 10m > Remaining Estimate: 0h > > See https://travis-ci.org/apache/arrow/jobs/500383002#L1022 > {code:javascript} > ● Table#serialize() › Table#assign with an empty table round-trips through > serialization > expect(received).toBe(expected) // Object.is equality > Expected: 86 > Received: 41 > 91 | const source = table1.assign(Table.empty()); > 92 | expect(source.numCols).toBe(table1.numCols); > > 93 | expect(source.length).toBe(table1.length); > | ^ > 94 | const result = Table.from(source.serialize()); > 95 | expect(result).toEqualTable(source); > 96 | > expect(result.schema.metadata.get('foo')).toEqual('bar'); > at Object.test (test/unit/table/serialize-tests.ts:93:35) > ● Table#serialize() › Table#assign with a zero-length Null column > round-trips through serialization > expect(received).toBe(expected) // Object.is equality > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (ARROW-3667) [JS] Incorrectly reads record batches with an all null column
[ https://issues.apache.org/jira/browse/ARROW-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-3667: --- Comment: was deleted (was: Issue resolved by pull request 3787 [https://github.com/apache/arrow/pull/3787]) > [JS] Incorrectly reads record batches with an all null column > - > > Key: ARROW-3667 > URL: https://issues.apache.org/jira/browse/ARROW-3667 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: JS-0.3.1 >Reporter: Brian Hulette >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.5.0, JS-0.4.1 > > > The JS library seems to incorrectly read any columns that come after an > all-null column in IPC buffers produced by pyarrow. > Here's a python script that generates two arrow buffers, one with an all-null > column followed by a utf-8 column, and a second with those two reversed > {code:python} > import pyarrow as pa > import pandas as pd > def serialize_to_arrow(df, fd, compress=True): > batch = pa.RecordBatch.from_pandas(df) > writer = pa.RecordBatchFileWriter(fd, batch.schema) > writer.write_batch(batch) > writer.close() > if __name__ == "__main__": > df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', > 'def', 'ghi']}, columns=['nulls', 'not nulls']) > with open('bad.arrow', 'wb') as fd: > serialize_to_arrow(df, fd) > df = pd.DataFrame(df, columns=['not nulls', 'nulls']) > with open('good.arrow', 'wb') as fd: > serialize_to_arrow(df, fd) > {code} > JS incorrectly interprets the [null, not null] case: > {code:javascript} > > var arrow = require('apache-arrow') > undefined > > var fs = require('fs') > undefined > > arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not > > nulls').get(0) > 'abc' > > arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0) > '\u\u\u\u\u0003\u\u\u\u0006\u\u\u\t\u\u\u' > {code} > Presumably this is because pyarrow is omitting some (or all) of the buffers > associated with the all-null column, but the JS IPC reader is still looking > for them, causing the buffer count to get out of sync. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3667) [JS] Incorrectly reads record batches with an all null column
[ https://issues.apache.org/jira/browse/ARROW-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782516#comment-16782516 ] Paul Taylor commented on ARROW-3667: Issue resolved by pull request 3787 [https://github.com/apache/arrow/pull/3787] > [JS] Incorrectly reads record batches with an all null column > - > > Key: ARROW-3667 > URL: https://issues.apache.org/jira/browse/ARROW-3667 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: JS-0.3.1 >Reporter: Brian Hulette >Priority: Major > Fix For: JS-0.5.0 > > > The JS library seems to incorrectly read any columns that come after an > all-null column in IPC buffers produced by pyarrow. > Here's a python script that generates two arrow buffers, one with an all-null > column followed by a utf-8 column, and a second with those two reversed > {code:python} > import pyarrow as pa > import pandas as pd > def serialize_to_arrow(df, fd, compress=True): > batch = pa.RecordBatch.from_pandas(df) > writer = pa.RecordBatchFileWriter(fd, batch.schema) > writer.write_batch(batch) > writer.close() > if __name__ == "__main__": > df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', > 'def', 'ghi']}, columns=['nulls', 'not nulls']) > with open('bad.arrow', 'wb') as fd: > serialize_to_arrow(df, fd) > df = pd.DataFrame(df, columns=['not nulls', 'nulls']) > with open('good.arrow', 'wb') as fd: > serialize_to_arrow(df, fd) > {code} > JS incorrectly interprets the [null, not null] case: > {code:javascript} > > var arrow = require('apache-arrow') > undefined > > var fs = require('fs') > undefined > > arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not > > nulls').get(0) > 'abc' > > arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0) > '\u\u\u\u\u0003\u\u\u\u0006\u\u\u\t\u\u\u' > {code} > Presumably this is because pyarrow is omitting some (or all) of the buffers > associated with the all-null column, but the JS IPC reader is still looking > for them, causing the buffer count to get out of sync. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4738) [JS] NullVector should include a null data buffer
[ https://issues.apache.org/jira/browse/ARROW-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-4738. Resolution: Fixed Issue resolved by pull request 3787 [https://github.com/apache/arrow/pull/3787] > [JS] NullVector should include a null data buffer > - > > Key: ARROW-4738 > URL: https://issues.apache.org/jira/browse/ARROW-4738 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.1 > > Time Spent: 10m > Remaining Estimate: 0h > > Arrow C++ and pyarrow expect NullVectors to include a null data buffer, so > ArrowJS should write one into the buffer layout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4738) [JS] NullVector should include a null data buffer
[ https://issues.apache.org/jira/browse/ARROW-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782221#comment-16782221 ] Paul Taylor commented on ARROW-4738: [~bhulette] a PR is up here: https://github.com/apache/arrow/pull/3787 > [JS] NullVector should include a null data buffer > - > > Key: ARROW-4738 > URL: https://issues.apache.org/jira/browse/ARROW-4738 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.1 > > > Arrow C++ and pyarrow expect NullVectors to include a null data buffer, so > ArrowJS should write one into the buffer layout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4738) [JS] NullVector should include a null data buffer
[ https://issues.apache.org/jira/browse/ARROW-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782218#comment-16782218 ] Paul Taylor commented on ARROW-4738: [~bhulette] yeah, I think so > [JS] NullVector should include a null data buffer > - > > Key: ARROW-4738 > URL: https://issues.apache.org/jira/browse/ARROW-4738 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.1 > > > Arrow C++ and pyarrow expect NullVectors to include a null data buffer, so > ArrowJS should write one into the buffer layout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4728) [JS] Failing test Table#assign with a zero-length Null column round-trips through serialization
[ https://issues.apache.org/jira/browse/ARROW-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782203#comment-16782203 ] Paul Taylor commented on ARROW-4728: Thanks [~fsaintjacques], I submitted https://github.com/apache/arrow/pull/3789 with a fix > [JS] Failing test Table#assign with a zero-length Null column round-trips > through serialization > --- > > Key: ARROW-4728 > URL: https://issues.apache.org/jira/browse/ARROW-4728 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.12.1 >Reporter: Francois Saint-Jacques >Assignee: Paul Taylor >Priority: Major > Labels: ci-failure, travis-ci > Fix For: 0.13.0 > > > See https://travis-ci.org/apache/arrow/jobs/500414242#L1002 > {code:javascript} > ● Table#serialize() › Table#assign with an empty table round-trips through > serialization > expect(received).toBe(expected) // Object.is equality > Expected: 86 > Received: 41 > 91 | const source = table1.assign(Table.empty()); > 92 | expect(source.numCols).toBe(table1.numCols); > > 93 | expect(source.length).toBe(table1.length); > | ^ > 94 | const result = Table.from(source.serialize()); > 95 | expect(result).toEqualTable(source); > 96 | > expect(result.schema.metadata.get('foo')).toEqual('bar'); > at Object.test (test/unit/table/serialize-tests.ts:93:35) > ● Table#serialize() › Table#assign with a zero-length Null column > round-trips through serialization > expect(received).toBe(expected) // Object.is equality > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4738) [JS] NullVector should include a null data buffer
Paul Taylor created ARROW-4738: -- Summary: [JS] NullVector should include a null data buffer Key: ARROW-4738 URL: https://issues.apache.org/jira/browse/ARROW-4738 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: JS-0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.1 Arrow C++ and pyarrow expect NullVectors to include a null data buffer, so ArrowJS should write one into the buffer layout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4728) [JS] Failing test Table#assign with a zero-length Null column round-trips through serialization
[ https://issues.apache.org/jira/browse/ARROW-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor reassigned ARROW-4728: -- Assignee: Paul Taylor > [JS] Failing test Table#assign with a zero-length Null column round-trips > through serialization > --- > > Key: ARROW-4728 > URL: https://issues.apache.org/jira/browse/ARROW-4728 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.12.1 >Reporter: Francois Saint-Jacques >Assignee: Paul Taylor >Priority: Major > Labels: ci-failure, travis-ci > Fix For: 0.13.0 > > > See https://travis-ci.org/apache/arrow/jobs/500414242#L1002 > {code:javascript} > ● Table#serialize() › Table#assign with an empty table round-trips through > serialization > expect(received).toBe(expected) // Object.is equality > Expected: 86 > Received: 41 > 91 | const source = table1.assign(Table.empty()); > 92 | expect(source.numCols).toBe(table1.numCols); > > 93 | expect(source.length).toBe(table1.length); > | ^ > 94 | const result = Table.from(source.serialize()); > 95 | expect(result).toEqualTable(source); > 96 | > expect(result.schema.metadata.get('foo')).toEqual('bar'); > at Object.test (test/unit/table/serialize-tests.ts:93:35) > ● Table#serialize() › Table#assign with a zero-length Null column > round-trips through serialization > expect(received).toBe(expected) // Object.is equality > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4682) [JS] Writer should be able to write empty tables
Paul Taylor created ARROW-4682: -- Summary: [JS] Writer should be able to write empty tables Key: ARROW-4682 URL: https://issues.apache.org/jira/browse/ARROW-4682 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: JS-0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.1 The writer should be able to write empty tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4674) [JS] Update arrow2csv to new Row API
Paul Taylor created ARROW-4674: -- Summary: [JS] Update arrow2csv to new Row API Key: ARROW-4674 URL: https://issues.apache.org/jira/browse/ARROW-4674 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: JS-0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.1 The {{arrow2csv}} utility uses {{row.length}} to measure cells, but now that we've made Rows use Symbols for their internal properties, it should enumerate the values with the iterator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4524) [JS] Improve Row proxy generation performance
[ https://issues.apache.org/jira/browse/ARROW-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-4524. Resolution: Fixed Fix Version/s: JS-0.4.1 > [JS] Improve Row proxy generation performance > - > > Key: ARROW-4524 > URL: https://issues.apache.org/jira/browse/ARROW-4524 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: JS-0.4.1 > > > See > https://github.com/vega/vega-loader-arrow/commit/19c88e130aaeeae9d0166360db467121e5724352#r32253784 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4652) [JS] RecordBatchReader throughNode should respect autoDestroy
Paul Taylor created ARROW-4652: -- Summary: [JS] RecordBatchReader throughNode should respect autoDestroy Key: ARROW-4652 URL: https://issues.apache.org/jira/browse/ARROW-4652 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: JS-0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.1 The Reader transform stream closes after reading one set of tables even when autoDestroy is false. Instead it should reset/reopen the reader, like {{RecordBatchReader.readAll()}} does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4580) [JS] Accept Iterables in IntVector/FloatVector from() signatures
Paul Taylor created ARROW-4580: -- Summary: [JS] Accept Iterables in IntVector/FloatVector from() signatures Key: ARROW-4580 URL: https://issues.apache.org/jira/browse/ARROW-4580 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Affects Versions: JS-0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.1 Right now {{IntVector.from()}} and {{FloatVector.from()}} expect the data is already in typed-array form. But if we know the desired Vector type before hand (e.g. if {{Int32Vector.from()}} is called), we can accept any JS iterable of the values. In order to do this, we should ensure {{Float16Vector.from()}} properly clamps incoming f32/f64 values to u16s, in case the source is a vanilla 64-bit JS float. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4579) [JS] Add more interop with BigInt/BigInt64Array/BigUint64Array
Paul Taylor created ARROW-4579: -- Summary: [JS] Add more interop with BigInt/BigInt64Array/BigUint64Array Key: ARROW-4579 URL: https://issues.apache.org/jira/browse/ARROW-4579 Project: Apache Arrow Issue Type: New Feature Components: JavaScript Affects Versions: JS-0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.1 We should use or return the new native [BigInt types|https://developers.google.com/web/updates/2018/05/bigint] whenever it's available. * Use the native {{BigInt}} to convert/stringify i64s/u64s * Support the {{BigInt}} type in element comparator and {{indexOf()}} * Add zero-copy {{toBigInt64Array()}} and {{toBigUint64Array()}} methods to {{Int64Vector}} and {{Uint64Vector}}, respectively -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4578) [JS] Float16Vector toArray should be zero-copy
Paul Taylor created ARROW-4578: -- Summary: [JS] Float16Vector toArray should be zero-copy Key: ARROW-4578 URL: https://issues.apache.org/jira/browse/ARROW-4578 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: JS-0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.1 The {{Float16Vector#toArray()}} implementation currently transforms each half float into a single float, and returns a Float32Array. All the other {{toArray()}} implementations are zero-copy, and this deviation would break anyone expecting to give two-byte half floats to native APIs like WebGL. We should instead include {{Float16Vector#toFloat32Array()}} and {{Float16Vector#toFloat64Array()}} convenience methods that do rely on copying. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4555) [JS] Add high-level Table and Column creation methods
[ https://issues.apache.org/jira/browse/ARROW-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4555: --- Affects Version/s: (was: 0.4.0) JS-0.4.0 > [JS] Add high-level Table and Column creation methods > - > > Key: ARROW-4555 > URL: https://issues.apache.org/jira/browse/ARROW-4555 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: 0.4.1 > > > It'd be great to have a few high-level functions that implicitly create the > Schema, RecordBatches, etc. from a Table and a list of Columns. For example: > {code:actionscript} > const table = Table.new( > Column.new('foo', ...), > Column.new('bar', ...) > ); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4554) [JS] Implement logic for combining Vectors with different lengths/chunksizes
[ https://issues.apache.org/jira/browse/ARROW-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4554: --- Fix Version/s: (was: 0.4.1) JS-0.5.0 > [JS] Implement logic for combining Vectors with different lengths/chunksizes > > > Key: ARROW-4554 > URL: https://issues.apache.org/jira/browse/ARROW-4554 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.5.0 > > > We should add logic to combine and possibly slice/re-chunk and uniformly > partition chunks into separate RecordBatches. This will make it easier to > create Tables or RecordBatches from Vectors of different lengths. This is > also necessary for {{Table#assign()}}. PR incoming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4553) [JS] Implement Schema/Field/DataType comparators
[ https://issues.apache.org/jira/browse/ARROW-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4553: --- Affects Version/s: (was: 0.4.0) JS-0.4.0 > [JS] Implement Schema/Field/DataType comparators > > > Key: ARROW-4553 > URL: https://issues.apache.org/jira/browse/ARROW-4553 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: 0.4.1 > > > Some basic type comparison logic is necessary for {{Table#assign()}}. PR > incoming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4555) [JS] Add high-level Table and Column creation methods
[ https://issues.apache.org/jira/browse/ARROW-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4555: --- Fix Version/s: (was: 0.4.1) JS-0.5.0 > [JS] Add high-level Table and Column creation methods > - > > Key: ARROW-4555 > URL: https://issues.apache.org/jira/browse/ARROW-4555 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.5.0 > > > It'd be great to have a few high-level functions that implicitly create the > Schema, RecordBatches, etc. from a Table and a list of Columns. For example: > {code:actionscript} > const table = Table.new( > Column.new('foo', ...), > Column.new('bar', ...) > ); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4554) [JS] Implement logic for combining Vectors with different lengths/chunksizes
[ https://issues.apache.org/jira/browse/ARROW-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4554: --- Affects Version/s: (was: 0.4.0) JS-0.4.0 > [JS] Implement logic for combining Vectors with different lengths/chunksizes > > > Key: ARROW-4554 > URL: https://issues.apache.org/jira/browse/ARROW-4554 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: 0.4.1 > > > We should add logic to combine and possibly slice/re-chunk and uniformly > partition chunks into separate RecordBatches. This will make it easier to > create Tables or RecordBatches from Vectors of different lengths. This is > also necessary for {{Table#assign()}}. PR incoming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4553) [JS] Implement Schema/Field/DataType comparators
[ https://issues.apache.org/jira/browse/ARROW-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4553: --- Fix Version/s: (was: 0.4.1) JS-0.5.0 > [JS] Implement Schema/Field/DataType comparators > > > Key: ARROW-4553 > URL: https://issues.apache.org/jira/browse/ARROW-4553 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.5.0 > > > Some basic type comparison logic is necessary for {{Table#assign()}}. PR > incoming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4552) [JS] Table and Schema assign implementations
[ https://issues.apache.org/jira/browse/ARROW-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4552: --- Fix Version/s: JS-0.5.0 > [JS] Table and Schema assign implementations > > > Key: ARROW-4552 > URL: https://issues.apache.org/jira/browse/ARROW-4552 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.5.0 > > > It'd be really handy to have a basic {{assign}} methods on the Table and > Schema. I've extracted and cleaned up some internal helper methods I have > that does this. PR incoming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4557) [JS] Add Table/Schema/RecordBatch `selectAt(...indices)` method
Paul Taylor created ARROW-4557: -- Summary: [JS] Add Table/Schema/RecordBatch `selectAt(...indices)` method Key: ARROW-4557 URL: https://issues.apache.org/jira/browse/ARROW-4557 Project: Apache Arrow Issue Type: New Feature Components: JavaScript Affects Versions: JS-0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.5.0 Presently Table, Schema, and RecordBatch have basic {{select(...colNames)}} implementations. Having an easy {{selectAt(...colIndices)}} impl would be a nice complement, especially when there are duplicate column names. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4552) [JS] Table and Schema assign implementations
[ https://issues.apache.org/jira/browse/ARROW-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4552: --- Affects Version/s: (was: 0.4.0) JS-0.4.0 > [JS] Table and Schema assign implementations > > > Key: ARROW-4552 > URL: https://issues.apache.org/jira/browse/ARROW-4552 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > > It'd be really handy to have a basic {{assign}} methods on the Table and > Schema. I've extracted and cleaned up some internal helper methods I have > that does this. PR incoming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4555) [JS] Add high-level Table and Column creation methods
Paul Taylor created ARROW-4555: -- Summary: [JS] Add high-level Table and Column creation methods Key: ARROW-4555 URL: https://issues.apache.org/jira/browse/ARROW-4555 Project: Apache Arrow Issue Type: New Feature Components: JavaScript Affects Versions: 0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: 0.4.1 It'd be great to have a few high-level functions that implicitly create the Schema, RecordBatches, etc. from a Table and a list of Columns. For example: {code:actionscript} const table = Table.new( Column.new('foo', ...), Column.new('bar', ...) ); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4554) [JS] Implement logic for combining Vectors with different lengths/chunksizes
Paul Taylor created ARROW-4554: -- Summary: [JS] Implement logic for combining Vectors with different lengths/chunksizes Key: ARROW-4554 URL: https://issues.apache.org/jira/browse/ARROW-4554 Project: Apache Arrow Issue Type: New Feature Components: JavaScript Affects Versions: 0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: 0.4.1 We should add logic to combine and possibly slice/re-chunk and uniformly partition chunks into separate RecordBatches. This will make it easier to create Tables or RecordBatches from Vectors of different lengths. This is also necessary for {{Table#assign()}}. PR incoming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4553) [JS] Implement Schema/Field/DataType comparators
Paul Taylor created ARROW-4553: -- Summary: [JS] Implement Schema/Field/DataType comparators Key: ARROW-4553 URL: https://issues.apache.org/jira/browse/ARROW-4553 Project: Apache Arrow Issue Type: New Feature Components: JavaScript Affects Versions: 0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: 0.4.1 Some basic type comparison logic is necessary for {{Table#assign()}}. PR incoming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4552) [JS] Table and Schema assign implementations
Paul Taylor created ARROW-4552: -- Summary: [JS] Table and Schema assign implementations Key: ARROW-4552 URL: https://issues.apache.org/jira/browse/ARROW-4552 Project: Apache Arrow Issue Type: New Feature Components: JavaScript Affects Versions: 0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor It'd be really handy to have a basic {{assign}} methods on the Table and Schema. I've extracted and cleaned up some internal helper methods I have that does this. PR incoming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2116) [JS] Implement IPC writer
[ https://issues.apache.org/jira/browse/ARROW-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor reassigned ARROW-2116: -- Assignee: Paul Taylor > [JS] Implement IPC writer > - > > Key: ARROW-2116 > URL: https://issues.apache.org/jira/browse/ARROW-2116 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4477) [JS] Bn shouldn't override constructor of the resulting typed array
Paul Taylor created ARROW-4477: -- Summary: [JS] Bn shouldn't override constructor of the resulting typed array Key: ARROW-4477 URL: https://issues.apache.org/jira/browse/ARROW-4477 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: 0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: 0.4.0 There's an undefined constructor property definition in the {{Object.assign()}} call for the BigNum mixins that's overriding the constructor of the returned TypedArrays. I think this was left over from the first iteration where I used {{Object.create()}}. These should be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4442) [JS] Overly broad type annotation for Chunked typeId leading to type mismatches in generated typing
Paul Taylor created ARROW-4442: -- Summary: [JS] Overly broad type annotation for Chunked typeId leading to type mismatches in generated typing Key: ARROW-4442 URL: https://issues.apache.org/jira/browse/ARROW-4442 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Affects Versions: 0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Typescript is generating an overly broad type for the `typeId` property of the ChunkedVector class, leading to a type mismatch and failure to infer Column is Vector: {code:actionscript} let col: Vector; col = new Chunked(new Utf8()); ^ /* Argument of type 'Chunked' is not assignable to parameter of type 'Vector'. Type 'Chunked' is not assignable to type 'Vector'. Types of property 'typeId' are incompatible. Type 'Type' is not assignable to type 'Type.Utf8'. */ {code} The fix is to add an explicit return annotation to the Chunked typeId getter. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4396) Update Typedoc to support TypeScript 3.2
Paul Taylor created ARROW-4396: -- Summary: Update Typedoc to support TypeScript 3.2 Key: ARROW-4396 URL: https://issues.apache.org/jira/browse/ARROW-4396 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Reporter: Paul Taylor Assignee: Paul Taylor Update TypeDoc now that it supports TypeScript 3.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4395) ts-node throws type error running `bin/arrow2csv.js`
Paul Taylor created ARROW-4395: -- Summary: ts-node throws type error running `bin/arrow2csv.js` Key: ARROW-4395 URL: https://issues.apache.org/jira/browse/ARROW-4395 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: 0.4.0 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: 0.4.0 ts-node is being too strict, throws this (inaccurate) error JIT'ing the TS source: {code:none} $ cat test/data/cpp/stream/simple.arrow | ./bin/arrow2csv.js /home/ptaylor/dev/arrow/js/node_modules/ts-node/src/index.ts:228 return new TSError(diagnosticText, diagnosticCodes) ^ TSError: ⨯ Unable to compile TypeScript: src/vector/map.ts(25,57): error TS2345: Argument of type 'Field[]' is not assignable to parameter of type 'Field[]'. Type 'Field' is not assignable to type 'Field'. Type 'T[string] | T[number] | T[symbol]' is not assignable to type 'T[keyof T]'. Type 'T[symbol]' is not assignable to type 'T[keyof T]'. Type 'DataType' is not assignable to type 'T[keyof T]'. Type 'symbol' is not assignable to type 'keyof T'. Type 'symbol' is not assignable to type 'string | number'. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (ARROW-1496) [JS] Upload coverage data to codecov.io
[ https://issues.apache.org/jira/browse/ARROW-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-1496: --- Comment: was deleted (was: Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] ) > [JS] Upload coverage data to codecov.io > --- > > Key: ARROW-1496 > URL: https://issues.apache.org/jira/browse/ARROW-1496 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Wes McKinney >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1496) [JS] Upload coverage data to codecov.io
[ https://issues.apache.org/jira/browse/ARROW-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16747576#comment-16747576 ] Paul Taylor commented on ARROW-1496: Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] > [JS] Upload coverage data to codecov.io > --- > > Key: ARROW-1496 > URL: https://issues.apache.org/jira/browse/ARROW-1496 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Wes McKinney >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-1496) [JS] Upload coverage data to codecov.io
[ https://issues.apache.org/jira/browse/ARROW-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-1496. Resolution: Fixed Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] > [JS] Upload coverage data to codecov.io > --- > > Key: ARROW-1496 > URL: https://issues.apache.org/jira/browse/ARROW-1496 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Wes McKinney >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4283) Should RecordBatchStreamReader/Writer be AsyncIterable?
[ https://issues.apache.org/jira/browse/ARROW-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16747321#comment-16747321 ] Paul Taylor commented on ARROW-4283: [~pitrou] Thanks for the feedback. I want to clarify: my Python skills aren't sharp, I'm not familiar with the pyarrow API or Python's asyncio/async-iterable primitives, so filter my comments through the lens of a beginner. The little experience I do have is using the RecordBatchStreamReader to read from stdin (via {{sys.stdin.buffer}}) and named file descriptors (via {{os.fdopen()}}). Since Python's so friendly (and I have no idea how the Python IO primitives work), I thought maybe I could pass aiohttp's {{Request.stream}} to the RecordBatchStreamReader constructor, and quickly learned that no, I can't ;). In the JS implementation we have two main entry points for reading RecordBatch streams: # a static [{{RecordBatchReader.from(source)}}|https://github.com/apache/arrow/blob/cc1ce6194b905768b1a6d9f0e209270f62dc558a/js/src/ipc/reader.ts#L142], which accepts heterogeneous source types and returns a RecordBatchReader for the underlying Arrow type (file, stream, or JSON) and conforms to sync/async semantics of the source input type # methods that create [through/transform streams|https://github.com/apache/arrow/blob/cc1ce6194b905768b1a6d9f0e209270f62dc558a/js/bin/file-to-stream.js#L33] from the RecordBatchReader and RecordBatchWriter, for use with node's native stream primitives Each link in the streaming pipeline is a sort of transform stream, and a significant amount of effort went into supporting all the different node/browser IO primitives, so I understand if that's too much to ask at this point. As an alternative, would it be possible to add a method that accepts a Python byte stream, and returns a zero-copy AsyncIterable of RecordBatches? Or maybe add an an example in the [python/ipc|https://arrow.apache.org/docs/python/ipc.html#writing-and-reading-streams] docs page of how to do that? > Should RecordBatchStreamReader/Writer be AsyncIterable? > --- > > Key: ARROW-4283 > URL: https://issues.apache.org/jira/browse/ARROW-4283 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Paul Taylor >Priority: Minor > Fix For: 0.13.0 > > > Filing this issue after a discussion today with [~xhochy] about how to > implement streaming pyarrow http services. I had attempted to use both Flask > and [aiohttp|https://aiohttp.readthedocs.io/en/stable/streams.html]'s > streaming interfaces because they seemed familiar, but no dice. I have no > idea how hard this would be to add -- supporting all the asynciterable > primitives in JS was non-trivial. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4283) Should RecordBatchStreamReader/Writer be AsyncIterable?
[ https://issues.apache.org/jira/browse/ARROW-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-4283: --- Summary: Should RecordBatchStreamReader/Writer be AsyncIterable? (was: Should RecordBatchStreamReader/Writer be AsyncIteraable?) > Should RecordBatchStreamReader/Writer be AsyncIterable? > --- > > Key: ARROW-4283 > URL: https://issues.apache.org/jira/browse/ARROW-4283 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Paul Taylor >Priority: Minor > Fix For: 0.13.0 > > > Filing this issue after a discussion today with [~xhochy] about how to > implement streaming pyarrow http services. I had attempted to use both Flask > and [aiohttp|https://aiohttp.readthedocs.io/en/stable/streams.html]'s > streaming interfaces because they seemed familiar, but no dice. I have no > idea how hard this would be to add -- supporting all the asynciterable > primitives in JS was non-trivial. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4283) Should RecordBatchStreamReader/Writer be AsyncIteraable?
Paul Taylor created ARROW-4283: -- Summary: Should RecordBatchStreamReader/Writer be AsyncIteraable? Key: ARROW-4283 URL: https://issues.apache.org/jira/browse/ARROW-4283 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Paul Taylor Filing this issue after a discussion today with [~xhochy] about how to implement streaming pyarrow http services. I had attempted to use both Flask and [aiohttp|https://aiohttp.readthedocs.io/en/stable/streams.html]'s streaming interfaces because they seemed familiar, but no dice. I have no idea how hard this would be to add -- supporting all the asynciterable primitives in JS was non-trivial. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3689) [JS] Upgrade to TS 3.1
[ https://issues.apache.org/jira/browse/ARROW-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-3689. Resolution: Fixed Assignee: Paul Taylor Fix Version/s: (was: JS-0.5.0) JS-0.4.0 Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] > [JS] Upgrade to TS 3.1 > -- > > Key: ARROW-3689 > URL: https://issues.apache.org/jira/browse/ARROW-3689 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.0 > > > Attempted > [here|https://github.com/apache/arrow/pull/2611#issuecomment-431318129], but > ran into issues. > Should upgrade typedoc to 0.13 at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2839) [JS] Support whatwg/streams in IPC reader/writer
[ https://issues.apache.org/jira/browse/ARROW-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741633#comment-16741633 ] Paul Taylor commented on ARROW-2839: Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] > [JS] Support whatwg/streams in IPC reader/writer > > > Key: ARROW-2839 > URL: https://issues.apache.org/jira/browse/ARROW-2839 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Affects Versions: JS-0.3.1 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.5.0 > > > We should make it easy to stream Arrow in the browser via > [whatwg/streams|https://github.com/whatwg/streams]. I already have this > working at Graphistry, but I had to use some of the IPC internal methods. > Creating this issue to track back-porting that work and the few minor > refactors to the IPC internals that we'll need to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2235) [JS] Add tests for IPC messages split across multiple buffers
[ https://issues.apache.org/jira/browse/ARROW-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-2235. Resolution: Fixed Fix Version/s: (was: JS-0.5.0) JS-0.4.0 Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] > [JS] Add tests for IPC messages split across multiple buffers > - > > Key: ARROW-2235 > URL: https://issues.apache.org/jira/browse/ARROW-2235 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Brian Hulette >Priority: Major > Fix For: JS-0.4.0 > > > See https://github.com/apache/arrow/pull/1670 > This is probably easiest to do after the JS IPC writer is finished > (ARROW-2116) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2766) [JS] Add ability to construct a Table from a list of Arrays/TypedArrays
[ https://issues.apache.org/jira/browse/ARROW-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-2766. Resolution: Fixed Assignee: Paul Taylor Fix Version/s: (was: JS-0.5.0) JS-0.4.0 Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] > [JS] Add ability to construct a Table from a list of Arrays/TypedArrays > --- > > Key: ARROW-2766 > URL: https://issues.apache.org/jira/browse/ARROW-2766 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.0 > > > Something like > {code:javascript} > Table.from({'col1': [...], 'col2': [...], 'col3': [...]}) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3561) [JS] Update ts-jest
[ https://issues.apache.org/jira/browse/ARROW-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-3561. Resolution: Fixed Assignee: Paul Taylor Fix Version/s: JS-0.4.0 Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] > [JS] Update ts-jest > --- > > Key: ARROW-3561 > URL: https://issues.apache.org/jira/browse/ARROW-3561 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Dominik Moritz >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3337) [JS] IPC writer doesn't serialize the dictionary of nested Vectors
[ https://issues.apache.org/jira/browse/ARROW-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-3337. Resolution: Fixed Fix Version/s: (was: JS-0.5.0) JS-0.4.0 Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] > [JS] IPC writer doesn't serialize the dictionary of nested Vectors > -- > > Key: ARROW-3337 > URL: https://issues.apache.org/jira/browse/ARROW-3337 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Affects Versions: JS-0.3.1 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.0 > > > The JS writer only serializes dictionaries for [top-level > children|https://github.com/apache/arrow/blob/ee9b1ba426e2f1f117cde8d8f4ba6fbe3be5674c/js/src/ipc/writer/binary.ts#L40] > of a Table. This is wrong, and an oversight on my part. The fix here is to > put the actual Dictionary vectors in the `schema.dictionaries` map instead of > the dictionary fields, like I understand the C++ does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2778) [JS] Add Utf8Vector.from
[ https://issues.apache.org/jira/browse/ARROW-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-2778. Resolution: Fixed Fix Version/s: JS-0.4.0 Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] > [JS] Add Utf8Vector.from > > > Key: ARROW-2778 > URL: https://issues.apache.org/jira/browse/ARROW-2778 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3560) [JS] Remove @std/esm
[ https://issues.apache.org/jira/browse/ARROW-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-3560. Resolution: Fixed Assignee: Paul Taylor Fix Version/s: JS-0.4.0 Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] > [JS] Remove @std/esm > > > Key: ARROW-3560 > URL: https://issues.apache.org/jira/browse/ARROW-3560 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Dominik Moritz >Assignee: Paul Taylor >Priority: Minor > Fix For: JS-0.4.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > When I run npm install, I get this warning: > @std/esm@0.26.0: This package is discontinued. Use https://npmjs.com/esm -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2839) [JS] Support whatwg/streams in IPC reader/writer
[ https://issues.apache.org/jira/browse/ARROW-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-2839. Resolution: Fixed Fix Version/s: (was: JS-0.5.0) JS-0.4.0 Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] > [JS] Support whatwg/streams in IPC reader/writer > > > Key: ARROW-2839 > URL: https://issues.apache.org/jira/browse/ARROW-2839 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Affects Versions: JS-0.3.1 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.0 > > > We should make it easy to stream Arrow in the browser via > [whatwg/streams|https://github.com/whatwg/streams]. I already have this > working at Graphistry, but I had to use some of the IPC internal methods. > Creating this issue to track back-porting that work and the few minor > refactors to the IPC internals that we'll need to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (ARROW-2839) [JS] Support whatwg/streams in IPC reader/writer
[ https://issues.apache.org/jira/browse/ARROW-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-2839: --- Comment: was deleted (was: Issue resolved by pull request 3290 [https://github.com/apache/arrow/pull/3290|https://github.com/apache/arrow/pull/3290] ) > [JS] Support whatwg/streams in IPC reader/writer > > > Key: ARROW-2839 > URL: https://issues.apache.org/jira/browse/ARROW-2839 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Affects Versions: JS-0.3.1 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.5.0 > > > We should make it easy to stream Arrow in the browser via > [whatwg/streams|https://github.com/whatwg/streams]. I already have this > working at Graphistry, but I had to use some of the IPC internal methods. > Creating this issue to track back-porting that work and the few minor > refactors to the IPC internals that we'll need to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3336) JS writer doesn't serialize sliced Vectors correctly
[ https://issues.apache.org/jira/browse/ARROW-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor resolved ARROW-3336. Resolution: Fixed Issue resolved by pull request 2638 [https://github.com/apache/arrow/pull/2638] > JS writer doesn't serialize sliced Vectors correctly > > > Key: ARROW-3336 > URL: https://issues.apache.org/jira/browse/ARROW-3336 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Affects Versions: JS-0.3.1 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.0 > > Time Spent: 3h > Remaining Estimate: 0h > > The JS IPC writer is slicing the data and valueOffset buffers by starting > from the data's current logical offset. This is incorrect, since the slice > function already does this for the data, type, and valueOffset TypedArrays > internally. PR incoming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3337) JS writer doesn't serialize the dictionary of nested Vectors
Paul Taylor created ARROW-3337: -- Summary: JS writer doesn't serialize the dictionary of nested Vectors Key: ARROW-3337 URL: https://issues.apache.org/jira/browse/ARROW-3337 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Affects Versions: JS-0.3.1 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.0 The JS writer only serializes dictionaries for [top-level children|https://github.com/apache/arrow/blob/ee9b1ba426e2f1f117cde8d8f4ba6fbe3be5674c/js/src/ipc/writer/binary.ts#L40] of a Table. This is wrong, and an oversight on my part. The fix here is to put the actual Dictionary vectors in the `schema.dictionaries` map instead of the dictionary fields, like I understand the C++ does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3336) JS writer doesn't serialize sliced Vectors correctly
Paul Taylor created ARROW-3336: -- Summary: JS writer doesn't serialize sliced Vectors correctly Key: ARROW-3336 URL: https://issues.apache.org/jira/browse/ARROW-3336 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Affects Versions: JS-0.3.1 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.0 The JS IPC writer is slicing the data and valueOffset buffers by starting from the data's current logical offset. This is incorrect, since the slice function already does this for the data, type, and valueOffset TypedArrays internally. PR incoming. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3256) [JS] File footer and message metadata is inconsistent
[ https://issues.apache.org/jira/browse/ARROW-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628085#comment-16628085 ] Paul Taylor commented on ARROW-3256: Sorry for the confusion Wes, I got distracted while rewriting that sentence and forgot to remove the last half when I came back to it. Does this change look like a fix? https://github.com/apache/arrow/pull/2616/commits/2095e4ebffeb9f51f04d1b9500c958dbbca9bedd#diff-64a9bfd33e2b9cdeaf61082d9fde8a0dR77 > [JS] File footer and message metadata is inconsistent > - > > Key: ARROW-3256 > URL: https://issues.apache.org/jira/browse/ARROW-3256 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Wes McKinney >Priority: Major > Fix For: JS-0.4.0 > > > I added some assertions to the C++ library and found that the body length in > the file footer and the IPC message were different > {code} > ## > JS producing, C++ consuming > ## > == > Testing file > /home/travis/build/apache/arrow/integration/data/struct_example.json > == > -- Creating binary inputs > node --no-warnings /home/travis/build/apache/arrow/js/bin/json-to-arrow.js -a > /tmp/tmplbm3vbwz/3d2269c960f148b6b94e5f881c0bf9ca_struct_example.json_to_arrow > -j /home/travis/build/apache/arrow/integration/data/struct_example.json > -- Validating file > /home/travis/build/apache/arrow/cpp-build/debug/json-integration-test > --integration > --arrow=/tmp/tmplbm3vbwz/3d2269c960f148b6b94e5f881c0bf9ca_struct_example.json_to_arrow > --json=/home/travis/build/apache/arrow/integration/data/struct_example.json > --mode=VALIDATE > Command failed: > /home/travis/build/apache/arrow/cpp-build/debug/json-integration-test > --integration > --arrow=/tmp/tmplbm3vbwz/3d2269c960f148b6b94e5f881c0bf9ca_struct_example.json_to_arrow > --json=/home/travis/build/apache/arrow/integration/data/struct_example.json > --mode=VALIDATE > With output: > -- > /home/travis/build/apache/arrow/cpp/src/arrow/ipc/reader.cc:581 Check failed: > (message->body_length()) == (block.body_length) > {code} > I'm not sure what's wrong. I'll remove the assertions for now -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3256) [JS] File footer and message metadata is inconsistent
[ https://issues.apache.org/jira/browse/ARROW-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627755#comment-16627755 ] Paul Taylor commented on ARROW-3256: Yeah, looking at it now it makes sense why they're different. The JS is setting the FileBlock's body_length to the size of the entire serialized IPC message, not just the size of the data buffers. The body_length in the RecordBatch header currently is the total aligned sizes of the buffers in the batch, which I copied from here: https://github.com/apache/arrow/blob/516750216bfd48489b20988ad181e61823ecbb2f/cpp/src/arrow/ipc/writer.cc#L179 Also looking at where the body_length from a FileBlock is used, I see this: https://github.com/apache/arrow/blob/516750216bfd48489b20988ad181e61823ecbb2f/cpp/src/arrow/ipc/writer.cc#L866 That looks like the body_length field in the message header is the sum size of all the buffers. The size of the IPC message is then metadata_length + body_length + padding, and is written to the first 4 bytes of the ipc message. Have I misunderstood how the the C++ writer is computing the body_length? > [JS] File footer and message metadata is inconsistent > - > > Key: ARROW-3256 > URL: https://issues.apache.org/jira/browse/ARROW-3256 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Wes McKinney >Priority: Major > Fix For: JS-0.4.0 > > > I added some assertions to the C++ library and found that the body length in > the file footer and the IPC message were different > {code} > ## > JS producing, C++ consuming > ## > == > Testing file > /home/travis/build/apache/arrow/integration/data/struct_example.json > == > -- Creating binary inputs > node --no-warnings /home/travis/build/apache/arrow/js/bin/json-to-arrow.js -a > /tmp/tmplbm3vbwz/3d2269c960f148b6b94e5f881c0bf9ca_struct_example.json_to_arrow > -j /home/travis/build/apache/arrow/integration/data/struct_example.json > -- Validating file > /home/travis/build/apache/arrow/cpp-build/debug/json-integration-test > --integration > --arrow=/tmp/tmplbm3vbwz/3d2269c960f148b6b94e5f881c0bf9ca_struct_example.json_to_arrow > --json=/home/travis/build/apache/arrow/integration/data/struct_example.json > --mode=VALIDATE > Command failed: > /home/travis/build/apache/arrow/cpp-build/debug/json-integration-test > --integration > --arrow=/tmp/tmplbm3vbwz/3d2269c960f148b6b94e5f881c0bf9ca_struct_example.json_to_arrow > --json=/home/travis/build/apache/arrow/integration/data/struct_example.json > --mode=VALIDATE > With output: > -- > /home/travis/build/apache/arrow/cpp/src/arrow/ipc/reader.cc:581 Check failed: > (message->body_length()) == (block.body_length) > {code} > I'm not sure what's wrong. I'll remove the assertions for now -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-3256) [JS] File footer and message metadata is inconsistent
[ https://issues.apache.org/jira/browse/ARROW-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626629#comment-16626629 ] Paul Taylor edited comment on ARROW-3256 at 9/25/18 12:49 AM: -- [~wesmckinn] -The current behavior is metadata length + body length, aligned to the next-highest multiple of 8. This includes the 4 bytes used to store the metadata length. Do you recall the difference between the expected total size and the total size JS is creating? If so I can work backwards from that to figure out what to add or subtract.- Edit: I misunderstood the original bug -- I now understand you mean the body_length of the Message that the JS writer creates is different from the body_length of the FileBlock it lives in. I thought you meant there was a difference between JS and CPP. I can take a look soon. was (Author: paul.e.taylor): [~wesmckinn] ~The current behavior is metadata length + body length, aligned to the next-highest multiple of 8. This includes the 4 bytes used to store the metadata length. Do you recall the difference between the expected total size and the total size JS is creating? If so I can work backwards from that to figure out what to add or subtract.~ Edit: I misunderstood the original bug -- I now understand you mean the body_length of the Message that the JS writer creates is different from the body_length of the FileBlock it lives in. I thought you meant there was a difference between JS and CPP. I can take a look soon. > [JS] File footer and message metadata is inconsistent > - > > Key: ARROW-3256 > URL: https://issues.apache.org/jira/browse/ARROW-3256 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Wes McKinney >Priority: Major > Fix For: JS-0.4.0 > > > I added some assertions to the C++ library and found that the body length in > the file footer and the IPC message were different > {code} > ## > JS producing, C++ consuming > ## > == > Testing file > /home/travis/build/apache/arrow/integration/data/struct_example.json > == > -- Creating binary inputs > node --no-warnings /home/travis/build/apache/arrow/js/bin/json-to-arrow.js -a > /tmp/tmplbm3vbwz/3d2269c960f148b6b94e5f881c0bf9ca_struct_example.json_to_arrow > -j /home/travis/build/apache/arrow/integration/data/struct_example.json > -- Validating file > /home/travis/build/apache/arrow/cpp-build/debug/json-integration-test > --integration > --arrow=/tmp/tmplbm3vbwz/3d2269c960f148b6b94e5f881c0bf9ca_struct_example.json_to_arrow > --json=/home/travis/build/apache/arrow/integration/data/struct_example.json > --mode=VALIDATE > Command failed: > /home/travis/build/apache/arrow/cpp-build/debug/json-integration-test > --integration > --arrow=/tmp/tmplbm3vbwz/3d2269c960f148b6b94e5f881c0bf9ca_struct_example.json_to_arrow > --json=/home/travis/build/apache/arrow/integration/data/struct_example.json > --mode=VALIDATE > With output: > -- > /home/travis/build/apache/arrow/cpp/src/arrow/ipc/reader.cc:581 Check failed: > (message->body_length()) == (block.body_length) > {code} > I'm not sure what's wrong. I'll remove the assertions for now -- This message was sent by Atlassian JIRA (v7.6.3#76005)