[jira] [Resolved] (ARROW-18247) [JS] RangeError in Vector.toArray()

2022-12-11 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-18247.

Fix Version/s: 11.0.0
   Resolution: Fixed

Issue resolved by pull request 14587
https://github.com/apache/arrow/pull/14587

> [JS] RangeError in Vector.toArray()
> ---
>
> Key: ARROW-18247
> URL: https://issues.apache.org/jira/browse/ARROW-18247
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 9.0.0, 10.0.0
>Reporter: thomas sarlandie
>Assignee: thomas sarlandie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 11.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> `vector.toArray()` throws a `RangeError: offset is out of bounds` exception 
> when it is called on a vector created with two or more `arrow.Data` objects 
> and at least one has been padded for memory alignment.
>  
> See reproduction here: [https://observablehq.com/d/14488c116b338560]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17608) [JS] Implement C Data Interface

2022-10-24 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623263#comment-17623263
 ] 

Dominik Moritz commented on ARROW-17608:


That's fantastic. I very much look forward to continued discussions around 
this. Paul and I hang out in the Apache Slack if you want to discuss ideas 
synchronously. 

> [JS] Implement C Data Interface
> ---
>
> Key: ARROW-17608
> URL: https://issues.apache.org/jira/browse/ARROW-17608
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Kyle Barron
>Priority: Major
>
> I've recently been working on an implementation of the C Data Interface for 
> Arrow JS, the idea being that Arrow JS can read memory from WebAssembly this 
> way without a copy ([blog 
> post|https://observablehq.com/@kylebarron/zero-copy-apache-arrow-with-webassembly],
>  [repo|https://github.com/kylebarron/arrow-js-ffi/pull/11]). Dominik 
> [suggested|https://twitter.com/domoritz/status/1562670919469842432?s=20=Ts8HQe_fzgRmecUP1Qrhrw]
>  starting a discussion about potentially adding this into Arrow JS.
> My implementation is still WIP but figure it's not too early to start a 
> discussion. A couple notes:
> - I'm focused only on reading FFI memory, so I only have parsing code. I 
> figure writing doesn't really make sense in JS since Wasm can't access 
> arbitrary JS memory
> - In order to generate FFI memory in the tests, I'm using a small Rust module 
> to convert from an IPC table. If we didn't want to add a rust build step in 
> the tests, that module could be published to NPM
> Thoughts?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17979) [JS]: Update status docs

2022-10-10 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-17979:
--

 Summary: [JS]: Update status docs
 Key: ARROW-17979
 URL: https://issues.apache.org/jira/browse/ARROW-17979
 Project: Apache Arrow
  Issue Type: Task
  Components: Documentation, JavaScript
Reporter: Dominik Moritz
Assignee: Dominik Moritz


Arrow JS support nulls, sparse, and dense unions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17903) [JS] Update dependencies

2022-09-30 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-17903:
--

 Summary: [JS] Update dependencies
 Key: ARROW-17903
 URL: https://issues.apache.org/jira/browse/ARROW-17903
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Affects Versions: 10.0.0
Reporter: Dominik Moritz
Assignee: Dominik Moritz
 Fix For: 10.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17321) [JS] Update dependencies

2022-08-05 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-17321.

Resolution: Fixed

Issue resolved by pull request 13758
[https://github.com/apache/arrow/pull/13758]

> [JS] Update dependencies
> 
>
> Key: ARROW-17321
> URL: https://issues.apache.org/jira/browse/ARROW-17321
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Affects Versions: 9.0.0
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17321) [JS] Update dependencies

2022-08-05 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz updated ARROW-17321:
---
Summary: [JS] Update dependencies  (was: Update dependencies)

> [JS] Update dependencies
> 
>
> Key: ARROW-17321
> URL: https://issues.apache.org/jira/browse/ARROW-17321
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Affects Versions: 9.0.0
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Minor
> Fix For: 10.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17321) Update dependencies

2022-08-05 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-17321:
--

 Summary: Update dependencies
 Key: ARROW-17321
 URL: https://issues.apache.org/jira/browse/ARROW-17321
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Affects Versions: 9.0.0
Reporter: Dominik Moritz
Assignee: Dominik Moritz
 Fix For: 10.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-16744) [JavaScript] yarn perf fails with ReferenceError: exports is not defined in ES module scope

2022-06-03 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-16744.

Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13305
[https://github.com/apache/arrow/pull/13305]

> [JavaScript] yarn perf fails with ReferenceError: exports is not defined in 
> ES module scope
> ---
>
> Key: ARROW-16744
> URL: https://issues.apache.org/jira/browse/ARROW-16744
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 9.0.0
>Reporter: Elena Henderson
>Assignee: Elena Henderson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> `yarn perf` fails with error below starting with 
> [https://github.com/apache/arrow/commit/50fbfa14aa612d638d54925ce9b5e7a9dd4b]
>  
> {code:java}
> cd arrow
> git checkout -f 50fbfa14aa612d638d54925ce9b5e7a9dd4b
> cd js && yarn && yarn perf  {code}
> {code:java}
> (arrow-js) elena@e ~ % cd arrow git checkout -f 
> 50fbfa14aa612d638d54925ce9b5e7a9dd4b cd js && yarn && yarn perf HEAD is 
> now at 50fbfa14aa ARROW-16693: [JS] Upgrade to TS 4.7 yarn install v1.22.17 
> warning package-lock.json found. Your project contains lock files generated 
> by tools other than Yarn. It is advised not to mix package managers in order 
> to avoid resolution inconsistencies caused by unsynchronized lock files. To 
> clear this warning, remove package-lock.json. [1/5]   Validating 
> package.json... [2/5]   Resolving packages... success Already up-to-date. ✨  
> Done in 0.46s. yarn run v1.22.17 $ node --loader ts-node/esm/transpile-only 
> ./perf/index.ts (node:27404) ExperimentalWarning: --experimental-loader is an 
> experimental feature. This feature could change at any time (Use `node 
> --trace-warnings ...` to show where the warning was created) ReferenceError: 
> exports is not defined in ES module scope     at 
> file:///Users/elena/arrow/js/perf/index.ts:18:23     at ModuleJob.run 
> (internal/modules/esm/module_job.js:183:25)     at async Loader.import 
> (internal/modules/esm/loader.js:178:24)     at async Object.loadESM 
> (internal/process/esm_loader.js:68:5)     at async handleMainPromise 
> (internal/modules/run_main.js:59:12) error Command failed with exit code 1. 
> info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this 
> command. 
>  
> {code}
> cc [~domoritz] [~jonkeane] 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16704) tableFromIPC should handle AsyncRecordBatchReader inputs

2022-06-01 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-16704.

Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13278
[https://github.com/apache/arrow/pull/13278]

> tableFromIPC should handle AsyncRecordBatchReader inputs
> 
>
> Key: ARROW-16704
> URL: https://issues.apache.org/jira/browse/ARROW-16704
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Affects Versions: 8.0.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> To match the prior `Table.from()` method, `tableFromIPC()` method should 
> handle the case where the input is an async RecordBatchReader.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16371) [JS] Empty table should provide an empty iterator

2022-06-01 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-16371.

Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13287
[https://github.com/apache/arrow/pull/13287]

> [JS] Empty table should provide an empty iterator
> -
>
> Key: ARROW-16371
> URL: https://issues.apache.org/jira/browse/ARROW-16371
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Teodor Kostov
>Assignee: Paul Taylor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a table is created without any data and an iterator is requested I would 
> expect to get an empty iterator that just returns that it's done.
> Expected result:
> {code:json}
> {"value": null, "done": true}
> {code}
> However, the code fails in {{strideForType()}} with {{Uncaught TypeError: 
> type2 is undefined}}.
> {code:javascript}
> schema = new arrow.Schema(dataType.children)
> data = new arrow.Table(this.schema)
> const iter = data[Symbol.iterator]()
> {code}
> It seems that the [table just creates a new vector with its 
> data|https://github.com/apache/arrow/blob/e9481532e93e4f29a1c2c322e00f268d6cd9f534/js/src/table.ts#L227]
>  and then the [{{strideForType}} method 
> fails|https://github.com/apache/arrow/blob/e9481532e93e4f29a1c2c322e00f268d6cd9f534/js/src/type.ts#L652].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Closed] (ARROW-14775) [JS] Embrace ESM in main arrow package

2022-06-01 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-14775.
--

> [JS] Embrace ESM in main arrow package
> --
>
> Key: ARROW-14775
> URL: https://issues.apache.org/jira/browse/ARROW-14775
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
> Fix For: 8.0.0
>
>
> Instead of shipping both esm and commonjs, we could embrace esm as many other 
> js packages do now and thereby clean up our bundles as well. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-14775) [JS] Embrace ESM in main arrow package

2022-06-01 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-14775.

Fix Version/s: 8.0.0
   (was: 9.0.0)
   Resolution: Fixed

> [JS] Embrace ESM in main arrow package
> --
>
> Key: ARROW-14775
> URL: https://issues.apache.org/jira/browse/ARROW-14775
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
> Fix For: 8.0.0
>
>
> Instead of shipping both esm and commonjs, we could embrace esm as many other 
> js packages do now and thereby clean up our bundles as well. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16693) [JavaScript] Upgrade to TS 4.7

2022-05-31 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-16693.

Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13273
[https://github.com/apache/arrow/pull/13273]

> [JavaScript] Upgrade to TS 4.7
> --
>
> Key: ARROW-16693
> URL: https://issues.apache.org/jira/browse/ARROW-16693
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16693) [JavaScript] Upgrade to TS 4.7

2022-05-31 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-16693:
--

 Summary: [JavaScript] Upgrade to TS 4.7
 Key: ARROW-16693
 URL: https://issues.apache.org/jira/browse/ARROW-16693
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Reporter: Dominik Moritz
Assignee: Dominik Moritz






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16256) Document what spec version is supported.

2022-04-20 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-16256:
--

 Summary: Document what spec version is supported. 
 Key: ARROW-16256
 URL: https://issues.apache.org/jira/browse/ARROW-16256
 Project: Apache Arrow
  Issue Type: Task
  Components: Documentation
Reporter: Dominik Moritz
Assignee: Dominik Moritz
 Fix For: 8.0.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-8674) [JS] Implement IPC RecordBatch body buffer compression from ARROW-300

2022-04-18 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523988#comment-17523988
 ] 

Dominik Moritz commented on ARROW-8674:
---

https://github.com/manzt/numcodecs.js looks interesting as well. It used wasm 
inlined lz4. 

> [JS] Implement IPC RecordBatch body buffer compression from ARROW-300
> -
>
> Key: ARROW-8674
> URL: https://issues.apache.org/jira/browse/ARROW-8674
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Wes McKinney
>Priority: Major
>
> This may not be a hard requirement for JS because this would require pulling 
> in implementations of LZ4 and ZSTD which not all users may want



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-8674) [JS] Implement IPC RecordBatch body buffer compression from ARROW-300

2022-04-18 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523985#comment-17523985
 ] 

Dominik Moritz commented on ARROW-8674:
---

> For one, gzip compression is much slower than LZ4 or ZSTD compression.

Maybe. Let's make sure to compare native gzip compression that a web server 
uses with js lz4/zstd compression.

> I think it would be possible to force the `compress` and `decompress` 
> functions in the plugin system to be synchronous. That would just force the 
> user to finish any async initialization before trying to read/write a file, 
> since wasm bundles can't be instantiated synchronously I think.

It would unfortunately also preclude people from putting decompression into a 
worker. Maybe we can make the relevant IPC methods return return promises when 
the compression/decompression method is async (returns a promise).

> None of the ZSTD libraries I came across were pure JS. The only LZ4 one that 
> was pure JS was lz4js.

We could consider inlining the wasm code with base64 if it's tiny but I suspect 
it will not. Worth considering, though. 

Anyway, I think it makes sense to work on this and send a pull request. We 
should definitely have a way to pass in/register compression algorithms. Then 
let's look into whether we want to bundle any algorithms. Let's start with lz4 
and try a few libraries (e.g. https://github.com/gorhill/lz4-wasm, 
https://github.com/Benzinga/lz4js, https://github.com/pierrec/node-lz4). If 
they are small enough, I would consider including a default lz4 implementation. 
Sounds good?

> [JS] Implement IPC RecordBatch body buffer compression from ARROW-300
> -
>
> Key: ARROW-8674
> URL: https://issues.apache.org/jira/browse/ARROW-8674
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Wes McKinney
>Priority: Major
>
> This may not be a hard requirement for JS because this would require pulling 
> in implementations of LZ4 and ZSTD which not all users may want



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-16210) [JS] Implement tableFromJSON

2022-04-18 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-16210.
--

> [JS] Implement tableFromJSON
> 
>
> Key: ARROW-16210
> URL: https://issues.apache.org/jira/browse/ARROW-16210
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-3523) [JS] Assign dictionary IDs in IPC writer rather than on creation

2022-04-18 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523975#comment-17523975
 ] 

Dominik Moritz commented on ARROW-3523:
---

Is this still an issue?

> [JS] Assign dictionary IDs in IPC writer rather than on creation
> 
>
> Key: ARROW-3523
> URL: https://issues.apache.org/jira/browse/ARROW-3523
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
>
>  Currently the JS implementation relies on on the user assigning IDs for 
> dictionaries that they create, we should do something like the C++ 
> implementation, which uses a dictionary id memo to assign and retrieve 
> dictionary ids in the IPC writer 
> (https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/metadata-internal.cc#L495).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-4519) [JS] Publish JS API Docs for v0.4.0

2022-04-18 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-4519.
-
Resolution: Fixed

Closing since we are preparing version 8

> [JS] Publish JS API Docs for v0.4.0
> ---
>
> Key: ARROW-4519
> URL: https://issues.apache.org/jira/browse/ARROW-4519
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-4551) [JS] Investigate using Symbols to access Row columns by index

2022-04-18 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523973#comment-17523973
 ] 

Dominik Moritz commented on ARROW-4551:
---

Is this still an issue and can you explain what the benefits would be? 

> [JS] Investigate using Symbols to access Row columns by index
> -
>
> Key: ARROW-4551
> URL: https://issues.apache.org/jira/browse/ARROW-4551
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Minor
>
> Can we use row[Symbol.for(0)] instead of row[0] in order to avoid collisions? 
> What would the performance impact be?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-11706) [JS] Better BigInt compatibility check

2022-04-18 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523972#comment-17523972
 ] 

Dominik Moritz commented on ARROW-11706:


I think this can be closed, no? 

> [JS] Better BigInt compatibility check
> --
>
> Key: ARROW-11706
> URL: https://issues.apache.org/jira/browse/ARROW-11706
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Diana Clarke
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> See: https://github.com/apache/arrow/pull/9110
> Check for whether {{BigInt64ArrayAvailable}} and {{BigUint64ArrayAvailable}} 
> are available, rather than just {{BigIntAvailable}}. Recent versions of 
> JavaScriptCore/WebKit in Safari support {{BigInt}} but do not support 
> {{BigInt64Array}}, and so anything that relies on {{BigInt64Array}} will fail 
> despite {{BigIntAvailable}} being true.
> The manifestation of this issue can be seen when trying to run the following 
> within Safari on a table that contains bigints:
> {code:java}
> RecordBatchJSONWriter.writeAll(table).toString(true)
> message: "BigUint64Array is not available in this environment"
>   BigUint64ArrayUnavailableError
>   BigUint64ArrayUnavailable
>   bignumToString
>   bigNumsToStrings
>   generatorResume@[native code]
>   performIteration@[native code]
>   visitInt
>   visit
>   map@[native code]
>   recordBatchToJSON
>   close
>   finish
>   global code
> {code}
> See also: https://bugs.webkit.org/show_bug.cgi?id=190800



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15651) [JavaScript] Structs incorrectly initialise null values

2022-04-18 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523971#comment-17523971
 ] 

Dominik Moritz commented on ARROW-15651:


Since https://issues.apache.org/jira/browse/ARROW-15705 if fixed, can we close 
this? 

> [JavaScript] Structs incorrectly initialise null values
> ---
>
> Key: ARROW-15651
> URL: https://issues.apache.org/jira/browse/ARROW-15651
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 7.0.0
>Reporter: Alfred Mountfield
>Priority: Major
>
> Nullable StructArrays created with FixedSizeList fields seem to incorrectly 
> initialise.
> I've tried to create them using both the `Builder` and `makeVector` methods 
> and I believe it breaks the specification.
> I believe that the underlying arrays for the fields of a struct should be the 
> length of the struct array.
> However the `nullCount`s, `numChildren`s, and other methods all return 
> different numbers. (And this is causing a problem when we try to read the 
> same memory within Rust as the lengths and offsets differ)
> Specifically this:
> {code:javascript}
> let list_field = new Field('n1', new FixedSizeList(2, Float64), true);
> let struct_field = new Field('foo', new Struct([list_field]), true);
> let builder = new Builder({
> type:struct_field.type,
> nullValues: [null, undefined],
> });
> builder.append(null);
> console.log('Builder:' + JSON.stringify(builder));
> console.log('numChildren: ' + builder.numChildren);
> console.log('nullCount: ' + builder.nullCount);
> console.log('length: ' + builder.length);
> let vec1 = builder.toVector();
> console.log('Vector from Builder:' + vec1);
> console.log('numChildren: ' + vec1.numChildren);
> console.log('nullCount: ' + vec1.nullCount);
> console.log('length: ' + vec1.length);
> let vec2 = makeVector({
> data: [null],
> type:struct_field.type,
> nullable:true,
> });
> console.log('Vector from makeVector:' + JSON.stringify(vec2));
> console.log('numChildren: ' + vec2.numChildren);
> console.log('nullCount: ' + vec2.nullCount);
> console.log('length: ' + vec2.length);
> {code}
> Results in (I've removed some fields for brevity)
> {code:javascript}
> Builder: 
> \{"length":1,"finished":false,"type":{"children":[{"name":"n1","type":{"listSize":2,"children":[null]},"nullable":true,"metadata":{}}]},"children":[],"nullValues":[null,null],"stride":1,"_nulls":\{"buffer":
>  ... ,"stride":0.125,"BYTES_PER_ELEMENT":1,"length":1,"numValid":0}}
> numChildren: 0
> nullCount: 1
> length: 1
> Vector from Builder:[]
> numChildren: 1
> nullCount: 1
> length: 1
> Vector from makeVector:[]
> numChildren: 1
> nullCount: 0
> length: 0{code}
> I tried to test this within stackblitz, the Project source code [should be 
> available here|https://stackblitz.com/edit/node-kpgovc?file=index.js]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16222) [JS] Allow appending null on sparse union and map builders

2022-04-18 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-16222:
--

 Summary: [JS] Allow appending null on sparse union and map builders
 Key: ARROW-16222
 URL: https://issues.apache.org/jira/browse/ARROW-16222
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Reporter: Dominik Moritz


See https://github.com/apache/arrow/pull/12451 and in particular 
https://github.com/apache/arrow/pull/12451#pullrequestreview-887789954. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-16210) [JS] Implement tableFromJSON

2022-04-18 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-16210.

Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12908
[https://github.com/apache/arrow/pull/12908]

> [JS] Implement tableFromJSON
> 
>
> Key: ARROW-16210
> URL: https://issues.apache.org/jira/browse/ARROW-16210
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-16209) [JS] Support setting values on Tables

2022-04-18 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-16209.

Resolution: Fixed

Issue resolved by pull request 12907
[https://github.com/apache/arrow/pull/12907]

> [JS] Support setting values on Tables
> -
>
> Key: ARROW-16209
> URL: https://issues.apache.org/jira/browse/ARROW-16209
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 7.0.0
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> See https://github.com/vega/vega-lite/issues/8105



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-16208) [JS] Upgrade deps

2022-04-18 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-16208.

Resolution: Fixed

Issue resolved by pull request 12905
[https://github.com/apache/arrow/pull/12905]

> [JS] Upgrade deps
> -
>
> Key: ARROW-16208
> URL: https://issues.apache.org/jira/browse/ARROW-16208
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-15705) [JavaScript] Structs don't append nulls properly

2022-04-18 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-15705.

Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12451
[https://github.com/apache/arrow/pull/12451]

> [JavaScript] Structs don't append nulls properly
> 
>
> Key: ARROW-15705
> URL: https://issues.apache.org/jira/browse/ARROW-15705
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 7.0.0
>Reporter: Alfred Mountfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> If you have a StructBuilder, then the `set` method (which is inherited from 
> `Builder`) on it will modify the null-bitmap and then return directly due to 
> this snippet:
> {code:javascript}
> public set(index: number, value: T['TValue'] | TNull) {
> if (this.setValid(index, this.isValid(value))) {
> this.setValue(index, value);
> }
> return this;
> }
> {code}
>  
> I believe this breaks the spec, as it results in the children arrays not 
> having their lengths and null-counts increased. (At least the Rust 
> implementation expects child arrays to be the same length as their parent 
> struct array, and the spec seems to imply that's a requirement)
> I think there's an easy fix which would be to call `this.setValue` for 
> `StructBuilder`s regardless of `this.isValid(value)`
> Related to https://issues.apache.org/jira/browse/ARROW-15651



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-16153) [JS] Consider implementing a tableFromArray

2022-04-17 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-16153.
--
Resolution: Fixed

> [JS] Consider implementing a tableFromArray
> ---
>
> Key: ARROW-16153
> URL: https://issues.apache.org/jira/browse/ARROW-16153
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>
> The idea here is to implement a function that creates a table from an array 
> of objects using the struct builder. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ARROW-16208) [JS] Upgrade deps

2022-04-17 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz updated ARROW-16208:
---
Summary: [JS] Upgrade deps  (was: [JS] Upgrade reps)

> [JS] Upgrade deps
> -
>
> Key: ARROW-16208
> URL: https://issues.apache.org/jira/browse/ARROW-16208
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-8674) [JS] Implement IPC RecordBatch body buffer compression from ARROW-300

2022-04-17 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523434#comment-17523434
 ] 

Dominik Moritz commented on ARROW-8674:
---

Looking at lz4js, it's so small 
(https://cdn.jsdelivr.net/npm/lz4js@0.2.0/lz4.min.js) that it's probably okay 
to pull in a dependency by default. I agree that having some system to register 
a different decompress function could be nice. lz4js is a bit old so we would 
want to carefully look at the available libraries. It would be nice to have 
some out of the box support. To avoid increasing bundle sizes, we can decide 
which functions actually use the decompression library.

Could you look at the available js libraries and see what their sizes are? 
Also, is lz4 or zstd much more common than the other? 

We also should look into how much benefit we actually get from compression 
since most servers already support transparent gzip compression and so 
compressing an already compressed file will just incur overhead. 

If the libraries are too heavy, we can think about a plugin system. We could 
make our registry be synchronous. 

I definitely don't want to pull in wasm into the library as it will break 
people's workflows. 

> [JS] Implement IPC RecordBatch body buffer compression from ARROW-300
> -
>
> Key: ARROW-8674
> URL: https://issues.apache.org/jira/browse/ARROW-8674
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Wes McKinney
>Priority: Major
>
> This may not be a hard requirement for JS because this would require pulling 
> in implementations of LZ4 and ZSTD which not all users may want



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16210) [JS] Implement tableFromJSON

2022-04-17 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-16210:
--

 Summary: [JS] Implement tableFromJSON
 Key: ARROW-16210
 URL: https://issues.apache.org/jira/browse/ARROW-16210
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Reporter: Dominik Moritz
Assignee: Dominik Moritz






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16209) [JS] Support setting values on Tables

2022-04-17 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-16209:
--

 Summary: [JS] Support setting values on Tables
 Key: ARROW-16209
 URL: https://issues.apache.org/jira/browse/ARROW-16209
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Affects Versions: 7.0.0
Reporter: Dominik Moritz
Assignee: Dominik Moritz
 Fix For: 8.0.0


See https://github.com/vega/vega-lite/issues/8105



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16208) [JS] Upgrade reps

2022-04-16 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-16208:
--

 Summary: [JS] Upgrade reps
 Key: ARROW-16208
 URL: https://issues.apache.org/jira/browse/ARROW-16208
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Reporter: Dominik Moritz
Assignee: Dominik Moritz
 Fix For: 8.0.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-16167) [JS] Check for opportunities to optimize offsets

2022-04-13 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-16167.

Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12858
[https://github.com/apache/arrow/pull/12858]

> [JS] Check for opportunities to optimize offsets
> 
>
> Key: ARROW-16167
> URL: https://issues.apache.org/jira/browse/ARROW-16167
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Check for opportunities for https://github.com/apache/arrow/pull/12793



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ARROW-16167) [JS] Check for opportunities to optimize offsets

2022-04-13 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz updated ARROW-16167:
---
Priority: Minor  (was: Major)

> [JS] Check for opportunities to optimize offsets
> 
>
> Key: ARROW-16167
> URL: https://issues.apache.org/jira/browse/ARROW-16167
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Check for opportunities for https://github.com/apache/arrow/pull/12793



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16167) [JS] Check for opportunities to optimize offsets

2022-04-11 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-16167:
--

 Summary: [JS] Check for opportunities to optimize offsets
 Key: ARROW-16167
 URL: https://issues.apache.org/jira/browse/ARROW-16167
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Reporter: Dominik Moritz
Assignee: Dominik Moritz


Check for opportunities for https://github.com/apache/arrow/pull/12793



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16153) [JS] Consider implementing a tableFromArray

2022-04-08 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-16153:
--

 Summary: [JS] Consider implementing a tableFromArray
 Key: ARROW-16153
 URL: https://issues.apache.org/jira/browse/ARROW-16153
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Reporter: Dominik Moritz
Assignee: Dominik Moritz


The idea here is to implement a function that creates a table from an array of 
objects using the struct builder. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-16039) [JS] Documentation is quite obscure and not useful

2022-04-08 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519559#comment-17519559
 ] 

Dominik Moritz commented on ARROW-16039:


Absolutely. Please understand that we are a small team of volunteers doing what 
we can. I appreciate that you took the time to write up the painpoints here as 
it will help us guide our effort. 

> [JS] Documentation is quite obscure and not useful
> --
>
> Key: ARROW-16039
> URL: https://issues.apache.org/jira/browse/ARROW-16039
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Affects Versions: 7.0.0
> Environment: Linux, Deno, DOM
>Reporter: Teodor Kostov
>Priority: Major
>
> I've been looking forward to using Apache Arrow as a data storage component 
> in a frontend application that heavily relies on time series data. However, 
> the syntax seems to have changed quite a lot with version *7.0.0*. Most of 
> the examples on https://observablehq.com seem to be outdated because of that.
> The two main resources https://arrow.apache.org/docs/js/index.html and 
> https://arrow.apache.org/docs/js/modules/Arrow_dom.html are quite 
> insufficient to understand how to use the project. There are a bunch of 
> examples on how to create a table and a vector. However, it seems that the 
> most important use case in JS is not captured - how to create a table from an 
> array of records
> {code:javascript}
> [{a:1, b:2}, {a:3, b:4}]
> {code}
> or how to create a table from an observable that provides one record at a 
> time.
> {code:javascript}
> callback(record => ???)
> {code}
> No information on how to append data to a table (except the _concat()_ 
> method).
> No information on how to manipulate the data in a table or an example on how 
> to consume it beyond the fact that a table is an iterable and has a _get()_ 
> method.
> Please, it will be also quite helpful to add some examples on how to work 
> with time series data.
> Current state of the documentation does not provide the ability for the 
> project to be adopted by anyone else except the core developers.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-14647) bignumToNumber returns incorrect results for negative numbers

2022-04-06 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-14647.

Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 11655
[https://github.com/apache/arrow/pull/11655]

> bignumToNumber returns incorrect results for negative numbers
> -
>
> Key: ARROW-14647
> URL: https://issues.apache.org/jira/browse/ARROW-14647
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 6.0.0
>Reporter: Bob Matcuk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I have the following in my data:
>  
> {code:java}
> v = new Uint32Array([4294786450, 4294967295, 4294967295, 4294967295]) {code}
> bignumToNumber converts this to -18446744069414765000 but it should be 
> -180846.
> I actually suspect that bignumToNumber fails on anything greater than 64 
> bits, but I don't have any positive test cases.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-7737) [JS] Implement sorting

2022-04-06 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-7737.
---
Resolution: Won't Do

I don't think sorting is something we should be adding to the library itself so 
we can focus on the rest of the functionality. 

> [JS] Implement sorting
> --
>
> Key: ARROW-7737
> URL: https://issues.apache.org/jira/browse/ARROW-7737
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Affects Versions: 0.15.1
>Reporter: Anders Rune Jensen
>Priority: Minor
>
> I started using apache arrow for a project and it appears to be working 
> really well. One of the things that I'm missing though in order to fully use 
> it is sorting. Lets say I just want to latest 100 items based on the 
> timestamp or something. I don't think there is a way to do this yet?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-7851) [JS] JS Documentation generation fails

2022-04-06 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-7851.
---
Resolution: Cannot Reproduce

yarn doc works for me. 

> [JS] JS Documentation generation fails
> --
>
> Key: ARROW-7851
> URL: https://issues.apache.org/jira/browse/ARROW-7851
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Krisztian Szucs
>Priority: Major
>
> Just surfaced on GHA 
> https://github.com/apache/arrow/runs/443762627#step:5:11647
> cc [~paultaylor]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-8053) [JS] Improve performance of filtering

2022-04-06 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-8053.
-
Resolution: Abandoned

We don't filter anymore. 

> [JS] Improve performance of filtering
> -
>
> Key: ARROW-8053
> URL: https://issues.apache.org/jira/browse/ARROW-8053
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Will Strimling
>Priority: Major
>
> A series of observable notebooks have shown quite convincingly that arrow 
> doesn't compete with other libraries or JavaScript when it comes to filtering 
> performance. Has there been any discussion or roadmaps established for 
> improving it?
> Most convincing Observables:
>  * 
> [https://observablehq.com/@duaneatat/apache-arrow-filtering-vs-array-filter]
>  * 
> [https://observablehq.com/@robertleeplummerjr/array-filtering-apache-arrow-vs-gpu-js-textures-vs-array-fil]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-12531) [JS] Make the docs more user friendly

2022-04-06 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-12531.
--
Resolution: Fixed

> [JS] Make the docs more user friendly
> -
>
> Key: ARROW-12531
> URL: https://issues.apache.org/jira/browse/ARROW-12531
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Priority: Major
>
> Right now, the docs are very much just an API doc but they don't explain how 
> to use most of the library. We should cover
>  * add function/class comments
>  ** What does Table.isValid do? Can tables be invalid somehow???
>  ** Document {{Table.get}} and {{Table.getChildAt}}.
>  ** What is the difference between {{getColumnAt}} and {{getChildAt}} (for 
> Tables and Columns)?
> ** scan vs forEach?
>  ** Why is {{BindFunc}} optional in {{Table.scan}}?
>  * remove internal functions from the docs
>  * Document Vectors 
> ([https://arrow.apache.org/docs/js/modules/_vector_.vector.html] is just a 
> namespace)
>  * Make sure {{IntVector}} and {{FloatVector}} appear in docs
>  * What is the difference between {{Table}} and {{DataFrame}}?
>  * Add more examples
> As a good inspiration, we can look at [https://uwdata.github.io/arquero/api/].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-12531) [JS] Make the docs more user friendly

2022-04-06 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518157#comment-17518157
 ] 

Dominik Moritz commented on ARROW-12531:


I think we can close this since we made some changes and more ideas are in 
ARROW-16039.

> [JS] Make the docs more user friendly
> -
>
> Key: ARROW-12531
> URL: https://issues.apache.org/jira/browse/ARROW-12531
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Priority: Major
>
> Right now, the docs are very much just an API doc but they don't explain how 
> to use most of the library. We should cover
>  * add function/class comments
>  ** What does Table.isValid do? Can tables be invalid somehow???
>  ** Document {{Table.get}} and {{Table.getChildAt}}.
>  ** What is the difference between {{getColumnAt}} and {{getChildAt}} (for 
> Tables and Columns)?
> ** scan vs forEach?
>  ** Why is {{BindFunc}} optional in {{Table.scan}}?
>  * remove internal functions from the docs
>  * Document Vectors 
> ([https://arrow.apache.org/docs/js/modules/_vector_.vector.html] is just a 
> namespace)
>  * Make sure {{IntVector}} and {{FloatVector}} appear in docs
>  * What is the difference between {{Table}} and {{DataFrame}}?
>  * Add more examples
> As a good inspiration, we can look at [https://uwdata.github.io/arquero/api/].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15642) [Python] [JavaScript] Arrow IPC file output by apache-arrow tableToIPC method cannot be read by pyarrow

2022-04-06 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518150#comment-17518150
 ] 

Dominik Moritz commented on ARROW-15642:


Interesting discussion. I wonder whether we should by default create file IPC 
instead of stream since file is the more common use case. What's the default in 
other libraries? 

> [Python] [JavaScript] Arrow IPC file output by apache-arrow tableToIPC method 
> cannot be read by pyarrow
> ---
>
> Key: ARROW-15642
> URL: https://issues.apache.org/jira/browse/ARROW-15642
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript, Python
>Affects Versions: 7.0.0
>Reporter: Dan Coates
>Assignee: Weston Pace
>Priority: Major
>
> IPC files created by the node library `apache-arrow` don't seem to be able to 
> be read by pyarrow. There is an example of this issue here: 
> [https://github.com/dancoates/pyarrow-jsarrow-test 
> |https://github.com/dancoates/pyarrow-jsarrow-test]
>  
> writing the arrow file from js
> {code:javascript}
> import {tableToIPC, tableFromArrays} from 'apache-arrow';
> import fs from 'fs';
> const LENGTH = 2000;
> const rainAmounts = Float32Array.from(
>     { length: LENGTH },
>     () => Number((Math.random() * 20).toFixed(1)));
> const rainDates = Array.from(
>     { length: LENGTH },
>     (_, i) => new Date(Date.now() - 1000 * 60 * 60 * 24 * i));
> const rainfall = tableFromArrays({
>     precipitation: rainAmounts,
>     date: rainDates
> });
> const outputTable = tableToIPC(rainfall);
> fs.writeFileSync('jsarrow.arrow', outputTable); {code}
>  
> reading in python
> {code:python}
> import pyarrow as pa
> with open('jsarrow.arrow', 'rb') as f:
> with pa.ipc.open_file(f) as reader:
> df = reader.read_pandas()
> print(df.head())
>  {code}
>  
> produces the error:
> {code:java}
> pyarrow.lib.ArrowInvalid: Not an Arrow file {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-16117) [JS] Improve UTF8 decoding performance

2022-04-06 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-16117.

Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12793
[https://github.com/apache/arrow/pull/12793]

> [JS] Improve UTF8 decoding performance
> --
>
> Key: ARROW-16117
> URL: https://issues.apache.org/jira/browse/ARROW-16117
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
> Environment: MacOS, Chrome, Safari
>Reporter: Howard Zuo
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> While profiling the performance of decoding TPC-H Customer and Part 
> in-browser, datasets where there are a lot of UTF8s, it turned out that much 
> of the time was being spent in {{getVariableWidthBytes}} rather than in 
> {{TextDecoder}} itself. Ideally all the time should be spent in 
> {{{}TextDecoder{}}}.
> On Chrome {{getVariableWidthBytes}} took up to ~15% of the e2e decoding 
> latency, and on Safari it was close to ~40% (Safari's TextDecoder is much 
> faster than Chrome's, so this took up relatively more time).
> This is likely because the code in this PR is more amenable to V8/JSC's JIT, 
> since {{x}} and {{y}} now are guaranteed to be SMIs ("small integers") 
> instead of Object, allowing the JIT to emit efficient machine instructions 
> that only deal in 32-bit integers. Once V8 discovers that a {{x}} and {{y}} 
> can potentially be null (upon iterating past the bounds), it "poisons" the 
> codepath forever, since it has to deal with the null case.
> See this V8 post for a more in-depth explanation (in particular see the 
> examples underneath "Performance tips"):
> [https://v8.dev/blog/elements-kinds]
> Doing the bounds check explicitly instead of implicitly basically eliminates 
> this function from showing up in the profiling. Empirically, on my machine 
> decoding TPC-H Part dropped from 1.9s to 1.7s on Chrome, and Customer dropped 
> from 1.4s to 1.2s.
> [https://github.com/apache/arrow/pull/12793]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (ARROW-16117) [JS] Improve UTF8 decoding performance

2022-04-04 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz reassigned ARROW-16117:
--

Assignee: Dominik Moritz

> [JS] Improve UTF8 decoding performance
> --
>
> Key: ARROW-16117
> URL: https://issues.apache.org/jira/browse/ARROW-16117
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
> Environment: MacOS, Chrome, Safari
>Reporter: Howard Zuo
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While profiling the performance of decoding TPC-H Customer and Part 
> in-browser, datasets where there are a lot of UTF8s, it turned out that much 
> of the time was being spent in {{getVariableWidthBytes}} rather than in 
> {{TextDecoder}} itself. Ideally all the time should be spent in 
> {{{}TextDecoder{}}}.
> On Chrome {{getVariableWidthBytes}} took up to ~15% of the e2e decoding 
> latency, and on Safari it was close to ~40% (Safari's TextDecoder is much 
> faster than Chrome's, so this took up relatively more time).
> This is likely because the code in this PR is more amenable to V8/JSC's JIT, 
> since {{x}} and {{y}} now are guaranteed to be SMIs ("small integers") 
> instead of Object, allowing the JIT to emit efficient machine instructions 
> that only deal in 32-bit integers. Once V8 discovers that a {{x}} and {{y}} 
> can potentially be null (upon iterating past the bounds), it "poisons" the 
> codepath forever, since it has to deal with the null case.
> See this V8 post for a more in-depth explanation (in particular see the 
> examples underneath "Performance tips"):
> [https://v8.dev/blog/elements-kinds]
> Doing the bounds check explicitly instead of implicitly basically eliminates 
> this function from showing up in the profiling. Empirically, on my machine 
> decoding TPC-H Part dropped from 1.9s to 1.7s on Chrome, and Customer dropped 
> from 1.4s to 1.2s.
> [https://github.com/apache/arrow/pull/12793]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-16099) Warn about not supporting compression

2022-04-02 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-16099.

Resolution: Fixed

Issue resolved by pull request 12718
[https://github.com/apache/arrow/pull/12718]

> Warn about not supporting compression
> -
>
> Key: ARROW-16099
> URL: https://issues.apache.org/jira/browse/ARROW-16099
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-16098) Support `Iterable>` for Table

2022-04-02 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-16098.

Resolution: Fixed

Issue resolved by pull request 12773
[https://github.com/apache/arrow/pull/12773]

> Support `Iterable>` for Table
> -
>
> Key: ARROW-16098
> URL: https://issues.apache.org/jira/browse/ARROW-16098
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Affects Versions: 7.0.0
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (ARROW-16099) Warn about not supporting compression

2022-04-01 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz reassigned ARROW-16099:
--

Assignee: Dominik Moritz

> Warn about not supporting compression
> -
>
> Key: ARROW-16099
> URL: https://issues.apache.org/jira/browse/ARROW-16099
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
> Fix For: 8.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16099) Warn about not supporting compression

2022-04-01 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-16099:
--

 Summary: Warn about not supporting compression
 Key: ARROW-16099
 URL: https://issues.apache.org/jira/browse/ARROW-16099
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Reporter: Dominik Moritz
 Fix For: 8.0.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-15852) [JS] Table getByteLength and indexOf don't work

2022-04-01 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-15852.

Fix Version/s: 8.0.0
   Resolution: Fixed

Issue resolved by pull request 12771
[https://github.com/apache/arrow/pull/12771]

> [JS] Table getByteLength and indexOf don't work
> ---
>
> Key: ARROW-15852
> URL: https://issues.apache.org/jira/browse/ARROW-15852
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 7.0.0
>Reporter: Timothy Higinbottom
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The functions table.getByteLength() and table.indexOf() don't return the 
> correct values.
> They are bound dynamically to the Table class, in a way I don't fully 
> understand, with the following code:
> [https://github.com/apache/arrow/blob/1b796ec3f9caeb5e86e3348ba940bef8d95915c5/js/src/table.ts#L378-L390]
> The other functions like that, get(), set(), and isValid() all seem to work.  
> However, getByteLength() and indexOf() return the placeholder/sentinel values 
> of 0 and -1 respectively that are defined in the no-op code here: 
> [https://github.com/apache/arrow/blob/1b796ec3f9caeb5e86e3348ba940bef8d95915c5/js/src/table.ts#L207-L221,]
>  which I assume is to generate the right type definitions, and thus 
> documentation.
> It's fairly simple for a user to implement the right logic themselves (at 
> least for getByteLength) and it's a quick patch to define the functions 
> normally instead of on the prototype, e.g.:
>  
> {code:java}
>     /**
>      * Get the size in bytes of an element by index.
>      * @param index The index at which to get the byteLength.
>      */
>     // @ts-ignore
>     public getByteLength(index: number): number { return 
> this.data[index].byteLength; }
>     /**
>      * Get the size in bytes of a table.
>      */
>     //@ts-ignore
>     public getByteLength(): number { 
>         return this.data.map((batch) => batch.byteLength).reduce((sum, 
> newLength) => sum + newLength);
>     } {code}
> I'd be happy to send this as a PR if that's an OK alternative to the way it's 
> currently implemented. 
> Here's a Github repo of a minimal reproduction of the issue in NodeJS:
> [https://github.com/alexkreidler/apache-arrow-js-small-bug]
>  
> And an observable notebook for in the browser (although I couldn't get ESM 
> working): [https://observablehq.com/@08027ecfa2b2f7bb/arrow-7-canary]
>  
> Thanks to all for your work on Arrow!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16098) Support `Iterable>` for Table

2022-04-01 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-16098:
--

 Summary: Support `Iterable>` for Table
 Key: ARROW-16098
 URL: https://issues.apache.org/jira/browse/ARROW-16098
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Affects Versions: 7.0.0
Reporter: Dominik Moritz
Assignee: Dominik Moritz
 Fix For: 8.0.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-15821) Sourcemap paths don't work for files in directories

2022-03-07 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-15821.

Resolution: Fixed

Issue resolved by pull request 12539
[https://github.com/apache/arrow/pull/12539]

> Sourcemap paths don't work for files in directories
> ---
>
> Key: ARROW-15821
> URL: https://issues.apache.org/jira/browse/ARROW-15821
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Affects Versions: 7.0.0
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15821) Sourcemap paths don't work for files in directories

2022-03-01 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-15821:
--

 Summary: Sourcemap paths don't work for files in directories
 Key: ARROW-15821
 URL: https://issues.apache.org/jira/browse/ARROW-15821
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Affects Versions: 7.0.0
Reporter: Dominik Moritz
Assignee: Dominik Moritz
 Fix For: 8.0.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15379) Use a flywheel for struct row

2022-01-19 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-15379:
--

 Summary: Use a flywheel for struct row
 Key: ARROW-15379
 URL: https://issues.apache.org/jira/browse/ARROW-15379
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Dominik Moritz


When we access a row from a table or a struct, we create a proxy for the 
struct. We could improve the performance of these accesses by creating a single 
instance of the proxy and store it on the vector or the data type and then 
reuse that instance. 

This should improve performance. 

See 
https://github.com/apache/arrow/blob/7029f90ea3b39e97f1a671227ca932cbcdbcee05/js/src/visitor/get.ts#L219
 and 
https://github.com/apache/arrow/blob/7029f90ea3b39e97f1a671227ca932cbcdbcee05/js/src/vector/struct.ts#L27.
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15377) [JS][Release] JavaScript verification fails

2022-01-19 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17478999#comment-17478999
 ] 

Dominik Moritz commented on ARROW-15377:


That looks like a random error. Can you rerun the script and see whether it 
still fails?

> [JS][Release] JavaScript verification fails
> ---
>
> Key: ARROW-15377
> URL: https://issues.apache.org/jira/browse/ARROW-15377
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 7.0.0
>
>
> See build log 
> https://github.com/ursacomputing/crossbow/runs/4871354453?check_suite_focus=true#step:5:8164



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-3365) [JS] Working Dockerfile for docker-compose setup

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-3365.
---
Resolution: Abandoned

Since this issue has been open since 2018 and there was no activity, I am going 
to close it. Please reopen and add some updated details if it's still an issue.

> [JS] Working Dockerfile for docker-compose setup
> 
>
> Key: ARROW-3365
> URL: https://issues.apache.org/jira/browse/ARROW-3365
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, JavaScript
>Reporter: Krisztian Szucs
>Priority: Major
>
> Dockerfile 
> https://github.com/apache/arrow/pull/2572/files#diff-a6b24713199de032fd4141b04a163594
>  fails to build



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-11261) [JS] read and write compressed (LZ4/LZTD) files

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-11261.

Resolution: Duplicate

> [JS] read and write compressed (LZ4/LZTD) files
> ---
>
> Key: ARROW-11261
> URL: https://issues.apache.org/jira/browse/ARROW-11261
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Affects Versions: 2.0.0
>Reporter: eric mauviere
>Priority: Major
>
> arrow compressed files, such as one produced with R arrow::write_feather are 
> much smaller than those created with write_ipc_stream.
> And of course on the web it makes a big difference.
> But i can't properly read such 'feather' files with Arrow js 
> (arrow.Table.from).
> For instance dictionary values are not properly decoded



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-9311) [JS] Use feature enum in javascript

2022-01-16 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476934#comment-17476934
 ] 

Dominik Moritz commented on ARROW-9311:
---

Can you add a description for this issue? 

> [JS] Use feature enum in javascript
> ---
>
> Key: ARROW-9311
> URL: https://issues.apache.org/jira/browse/ARROW-9311
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Micah Kornfield
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-9246) [JS] Add forward compatibility checks for Decimal::bitWidth

2022-01-16 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476935#comment-17476935
 ] 

Dominik Moritz commented on ARROW-9246:
---

Can you add a description for this issue?

> [JS] Add forward compatibility checks for Decimal::bitWidth
> ---
>
> Key: ARROW-9246
> URL: https://issues.apache.org/jira/browse/ARROW-9246
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 8.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-9496) [JS] toArray() called on filtered Table returns all rows

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-9496.
-

> [JS] toArray() called on filtered Table returns all rows
> 
>
> Key: ARROW-9496
> URL: https://issues.apache.org/jira/browse/ARROW-9496
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
> Environment: OSX 10.15.2
> Behavior seen in browser and node
>Reporter: Peter Murphy
>Priority: Major
>
> Trying to experiment with building a library on top of Apache Arrow's 
> Javascript implementation, but ran into this:
> Example:
>  [https://runkit.com/pjm17971/pond-arrow]
> {code:java}
> const filtered = table.filter(predicate.col("pressure").lt(28.5))
> filtered.count() // 2 (correct)
> {code}
>  However:
> {code:java}
> const result = filtered.toArray().map(row => row.toJSON()) // 4 rows 
> (??){code}
> Is this expected behavior?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-9496) [JS] toArray() called on filtered Table returns all rows

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-9496.
---
Resolution: Won't Fix

Not a problem anymore since Arrow does not have a filtered data frame anymore. 

> [JS] toArray() called on filtered Table returns all rows
> 
>
> Key: ARROW-9496
> URL: https://issues.apache.org/jira/browse/ARROW-9496
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
> Environment: OSX 10.15.2
> Behavior seen in browser and node
>Reporter: Peter Murphy
>Priority: Major
>
> Trying to experiment with building a library on top of Apache Arrow's 
> Javascript implementation, but ran into this:
> Example:
>  [https://runkit.com/pjm17971/pond-arrow]
> {code:java}
> const filtered = table.filter(predicate.col("pressure").lt(28.5))
> filtered.count() // 2 (correct)
> {code}
>  However:
> {code:java}
> const result = filtered.toArray().map(row => row.toJSON()) // 4 rows 
> (??){code}
> Is this expected behavior?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-10221) [JS] toArray() method ignores nulls on some types.

2022-01-16 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476930#comment-17476930
 ] 

Dominik Moritz commented on ARROW-10221:


I think it's nice to have the fast {{toArray}}. We do have {{toJSON}}, which 
always takes the slow path. However, I agree that it could be confusing to 
accidentally miss nulls. And always using {{toJSON}} isn't good since it would 
always take the slow pass even when we have no nulls. 

Btw, I found that {{NaN}} is a valid value in typed arrays even though null is 
not. 

I think two three best solutions are
* Leave this as they are and ask people to use {{toJSON}} if they want to 
guarantee to have nulls. We could add a way for users to get the null mask 
(there already is {{isValid(index)}} but you need to ask for each value). Or we 
could have a way for people to define null values (e.g. as -1 or NaN)
* Take the slow path when the data type for the vector is nullable. 
Unfortunately, a user would have no easy option to take the fast pass anymore. 
Maybe we could have an option to force the fast pass?

I'm not happy with either but 

> [JS] toArray() method ignores nulls on some types.
> --
>
> Key: ARROW-10221
> URL: https://issues.apache.org/jira/browse/ARROW-10221
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.17.1
>Reporter: Ben Schmidt
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The .toArray() javascript method of vectors includes a shortcut to return the 
> underlying typed array; but this doesn't respect null values, and so can 
> return the wrong number.
>  
> ```
> v = arrow.Vector.from(\{values: [1, 2, 3, 4, 5, null, 6],type: new 
> arrow.Int32()})
> v.toArray()[5] // Incorrectly returns '0'
> v.get(5) // Correctly returns null
> ```
>  
> Solution: Eliminate the fast method, always return Javascript arrays. It 
> might be better to keep the old method in cases where there are guaranteed no 
> nulls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-10450) [JS] Table.fromStruct() silently truncates vectors to the first chunk

2022-01-16 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476895#comment-17476895
 ] 

Dominik Moritz commented on ARROW-10450:


Since Vectors now are always chunked, I suspect this issue has gone away. Can 
you confirm?

> [JS] Table.fromStruct() silently truncates vectors to the first chunk
> -
>
> Key: ARROW-10450
> URL: https://issues.apache.org/jira/browse/ARROW-10450
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 2.0.0
>Reporter: David Saslawsky
>Priority: Minor
>
> Table.fromStruct() only uses the first chunk from the input vector.
> {code:javascript}
> import { Bool, Field, Int32, Struct, Table, Vector } from "apache-arrow";
> const myStruct = new Struct([
>   Field.new({ name: "over", type: new Int32() }),
>   Field.new({ name: "out", type: new Bool() })
> ]);
> const data = [];
> for(let i=0;i<1500;i++) {
>   data.push({ over:i, out:i%2 === 0 });
> // create a vector with two chunks
> const victor = Vector.from({
>   type: myStruct,
>   /*highWaterMark: Infinity,*/
>   values: data
> });
> console.log(victor.length);  // 1500 
> const table = Table.fromStruct(victor);
> console.log(table.length);   // 1000
> {code}
>  The workaround is to set highWaterMark to Infinity
>  
> Table.new() works as expected
> {code:javascript}
> const int32Array = new Int32Array(1500);for(let i=0;i<1500;i++)  
> int32Array[i] = i;
> const intVector = Vector.from({  type: new Int32(),  values: int32Array});
> console.log(intVector.length);  // 1500
>  const intTable = Table.new({ intColumn:intVector });
> console.log(intTable.length);   // 1500
> {code}
>  
> The origin seems to be in Chunked.data() but I don't understand the code 
> enough to propose a fix.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-10794) [JS] Typescript Arrowjs Class 'RecordBatch' incorrectly extends base class 'StructVector

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-10794.
--

> [JS] Typescript Arrowjs Class 'RecordBatch' incorrectly extends base class 
> 'StructVector
> --
>
> Key: ARROW-10794
> URL: https://issues.apache.org/jira/browse/ARROW-10794
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 2.0.0
>Reporter: vikash
>Priority: Blocker
> Fix For: 7.0.0
>
> Attachments: Screenshot_1.png
>
>
> i  am  trying  to  use apache-arrow  js  in  angular typescript version 
> 4.0.2 ,for that  i have  seen  issues  in  Typescript  failed  to  compile
>  steps  to  reprodcue
> -
> 1) install  angular cli  npm install -g @angular/cli
> 2) create  new  project  using ng new my-app
> 3) install apache  arrow  using  npm install apache-arrow
> 4) file  app.componenet.ts have  added below code
> ```
> import \{ Component } from '@angular/core';
> import \{ Table } from 'apache-arrow';
> import \{ readFileSync } from 'fs';
> @Component({
>   selector: 'app-root',
>   templateUrl: './app.component.html',
>   styleUrls: ['./app.component.css']
> })
> export class AppComponent {
>   title = 'arrow-typescript';
>    arrow = readFileSync('simple.arrow');
>  table = Table.from([this.arrow]);
> }
> ```
>  
> but  when  i  am  using  npm  run  build  its  failed  with  below  error
> Error: node_modules/apache-arrow/recordbatch.d.ts:17:18 - error TS2430: 
> Interface 'RecordBatch' incorrectly extends interface 'StructVector'.
>  The types of 'slice(...).clone' are incompatible between these types.
>  Type '(data: Data>, children?: AbstractVector[] | undefined) 
> => RecordBatch' is not assignable to type ' 
> = Struct>(data: Data, children?: AbstractVector[] | undefined) => 
> VectorType'.
>  Types of parameters 'data' and 'data' are incompatible.
>  Type 'Data' is not assignable to type 'Data>'.
>  Type 'R' is not assignable to type 'Struct'.
>  Property 'dataTypes' is missing in type 'DataType' but required 
> in type 'Struct'.
> 17 export interface RecordBatch  ~~~
> node_modules/apache-arrow/type.d.ts:458:5
>  458 dataTypes: T;
>  ~
>  'dataTypes' is declared here.
> node_modules/apache-arrow/recordbatch.d.ts:24:22 - error TS2415: Class 
> 'RecordBatch' incorrectly extends base class 'StructVector'.
> 24 export declare class RecordBatch  ~~~
> node_modules/apache-arrow/ipc/reader.d.ts:236:5 - error TS2717: Subsequent 
> property declarations must have the same type. Property 'schema' must be of 
> type 'Schema', but here has type 'Schema'.
> 236 schema: Schema;
>  ~~
> node_modules/apache-arrow/ipc/reader.d.ts:189:5
>  189 schema: Schema;
>  ~~
>  'schema' was also declared here.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-11326) [JS] utf8 vector buffers don't work if allocated within Web Assembly memory of Node.js

2022-01-16 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476894#comment-17476894
 ] 

Dominik Moritz commented on ARROW-11326:


Is this still an issue with the latest Arrow in master? Can you provide a small 
example that reproduces the issue? 

I can make a new vector of strings with this


{code:js}
const v = Arrow.vectorFromArray('A', new Arrow.Utf8);
console.log(v.get(0));
{code}


> [JS] utf8 vector buffers don't work if allocated within Web Assembly memory 
> of Node.js
> --
>
> Key: ARROW-11326
> URL: https://issues.apache.org/jira/browse/ARROW-11326
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
> Environment: node.js in Mac book pro
>Reporter: Dmitri Bronnikov
>Priority: Major
>
> After making int32array of offsets = [0, 1] and uint8array of values = 
> [ascii_code('A')], create a vector of strings:
> const vec = arrow.Vector.new(arrow.Data.new(new Utf8(), 0, 1, 0, [offsets, 
> values, null, null])
> then access the first and only element:
> console.log(vec.get(0))
> Works within browsers. Works in node.js with fixed size types, e.g. float or 
> integer.
> Fails in Node.js (v14.11.0.) with this callstack 
> at ../../node_modules/@apache-arrow/es2015-umd/buffer/index.js:311:1
>     at __proto__ 
> (../../node_modules/@apache-arrow/es2015-umd/buffer/index.js:167:1)
>     at Function._Buffer [as from] 
> (../../node_modules/@apache-arrow/es2015-umd/buffer/index.js:154:1)
>     at prototype 
> (../../node_modules/@apache-arrow/es2015-umd/util/utf8.ts:43:31)
>     at partial2 
> (../../node_modules/@apache-arrow/es2015-umd/visitor/get.ts:293:12)
>     at go.isArray [as get] 
> (../../node_modules/@apache-arrow/es2015-umd/vector/index.ts:175:43)
>     at Sr.get (../../node_modules/@apache-arrow/es2015-umd/util/args.ts:27:7)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-10794) [JS] Typescript Arrowjs Class 'RecordBatch' incorrectly extends base class 'StructVector

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-10794.

Fix Version/s: 7.0.0
   Resolution: Fixed

> [JS] Typescript Arrowjs Class 'RecordBatch' incorrectly extends base class 
> 'StructVector
> --
>
> Key: ARROW-10794
> URL: https://issues.apache.org/jira/browse/ARROW-10794
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 2.0.0
>Reporter: vikash
>Priority: Blocker
> Fix For: 7.0.0
>
> Attachments: Screenshot_1.png
>
>
> i  am  trying  to  use apache-arrow  js  in  angular typescript version 
> 4.0.2 ,for that  i have  seen  issues  in  Typescript  failed  to  compile
>  steps  to  reprodcue
> -
> 1) install  angular cli  npm install -g @angular/cli
> 2) create  new  project  using ng new my-app
> 3) install apache  arrow  using  npm install apache-arrow
> 4) file  app.componenet.ts have  added below code
> ```
> import \{ Component } from '@angular/core';
> import \{ Table } from 'apache-arrow';
> import \{ readFileSync } from 'fs';
> @Component({
>   selector: 'app-root',
>   templateUrl: './app.component.html',
>   styleUrls: ['./app.component.css']
> })
> export class AppComponent {
>   title = 'arrow-typescript';
>    arrow = readFileSync('simple.arrow');
>  table = Table.from([this.arrow]);
> }
> ```
>  
> but  when  i  am  using  npm  run  build  its  failed  with  below  error
> Error: node_modules/apache-arrow/recordbatch.d.ts:17:18 - error TS2430: 
> Interface 'RecordBatch' incorrectly extends interface 'StructVector'.
>  The types of 'slice(...).clone' are incompatible between these types.
>  Type '(data: Data>, children?: AbstractVector[] | undefined) 
> => RecordBatch' is not assignable to type ' 
> = Struct>(data: Data, children?: AbstractVector[] | undefined) => 
> VectorType'.
>  Types of parameters 'data' and 'data' are incompatible.
>  Type 'Data' is not assignable to type 'Data>'.
>  Type 'R' is not assignable to type 'Struct'.
>  Property 'dataTypes' is missing in type 'DataType' but required 
> in type 'Struct'.
> 17 export interface RecordBatch  ~~~
> node_modules/apache-arrow/type.d.ts:458:5
>  458 dataTypes: T;
>  ~
>  'dataTypes' is declared here.
> node_modules/apache-arrow/recordbatch.d.ts:24:22 - error TS2415: Class 
> 'RecordBatch' incorrectly extends base class 'StructVector'.
> 24 export declare class RecordBatch  ~~~
> node_modules/apache-arrow/ipc/reader.d.ts:236:5 - error TS2717: Subsequent 
> property declarations must have the same type. Property 'schema' must be of 
> type 'Schema', but here has type 'Schema'.
> 236 schema: Schema;
>  ~~
> node_modules/apache-arrow/ipc/reader.d.ts:189:5
>  189 schema: Schema;
>  ~~
>  'schema' was also declared here.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-12302) [JS] Arrow does not compile with Typescript 4.2

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-12302.

Resolution: Fixed

Done in https://github.com/apache/arrow/pull/10371

> [JS] Arrow does not compile with Typescript 4.2
> ---
>
> Key: ARROW-12302
> URL: https://issues.apache.org/jira/browse/ARROW-12302
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Paul Taylor
>Priority: Major
>
> {code:java}
> yarn tsc
> yarn run v1.22.10
> $ /Users/dominik/Code/arrow/js/node_modules/.bin/tsc
> src/bin/arrow2csv.ts:104:24 - error TS7057: 'yield' expression implicitly 
> results in an 'any' type because its containing generator lacks a return-type 
> annotation.104 reader && (yield reader);
>src/bin/arrow2csv.ts:115:28 - error 
> TS7057: 'yield' expression implicitly results in an 'any' type because its 
> containing generator lacks a return-type annotation.115 
> reader && (yield reader);
>src/interfaces.ts:255:5 - error 
> TS2502: '[Type.List ]' is referenced directly or indirectly 
> in its own type annotation.255 [Type.List ]: T extends 
> type.List ? vecs.ListVector   
>   : never ;
> src/interfaces.ts:258:5 - error TS2502: 
> '[Type.FixedSizeList]' is referenced directly or indirectly in its 
> own type annotation.258 [Type.FixedSizeList]: T extends 
> type.FixedSizeList? vecs.FixedSizeListVector  
>   : never ;
> src/io/adapters.ts:207:25 - error TS2304: 
> Cannot find name 'ReadableStreamBYOBReader'.207 private byobReader: 
> ReadableStreamBYOBReader | null = null;
> src/io/adapters.ts:209:21 
> - error TS2304: Cannot find name 'ReadableStreamBYOBReader'.209 private 
> reader: ReadableStreamBYOBReader | ReadableStreamDefaultReader | null;
> src/io/adapters.ts:264:56 - 
> error TS2554: Expected 0 arguments, but got 1.264 this.byobReader 
> = this.source['getReader']({ mode: 'byob' });
>
> src/io/adapters.ts:283:33 - error TS2304: Cannot find name 
> 'ReadableStreamBYOBReader'.283 async function readInto(reader: 
> ReadableStreamBYOBReader, buffer: ArrayBufferLike, offset: number, size: 
> number): Promise> {
> 
> src/io/adapters.ts:303:17 - error TS2322: Type 
> '(value: [T, any] | PromiseLike<[T, any]>) => void' is not assignable to type 
> '(value?: [T, any] | PromiseLike<[T, any]> | undefined) => void'.303 
> (r) => (resolve = r) && stream['once'](event, handler)
> ~~~src/io/adapters.ts:303:17 - error TS2322: Type 
> '(value: [T, any] | PromiseLike<[T, any]>) => void' is not assignable to type 
> '(value?: [T, any] | PromiseLike<[T, any]> | undefined) => void'.
>   Types of parameters 'value' and 'value' are incompatible.
> Type '[T, any] | PromiseLike<[T, any]> | undefined' is not assignable to 
> type '[T, any] | PromiseLike<[T, any]>'.
>   Type 'undefined' is not assignable to type '[T, any] | PromiseLike<[T, 
> any]>'.303 (r) => (resolve = r) && stream['once'](event, handler)
> ~~~src/io/adapters.ts:394:45 - error TS2794: Expected 
> 1 arguments, but got 0. Did you forget to include 'void' in your type 
> argument to 'Promise'?394 err != null ? reject(err) : 
> resolve();
> ~  
> node_modules/typescript/lib/lib.es2015.promise.d.ts:33:34
> 33 new (executor: (resolve: (value: T | PromiseLike) => void, 
> reject: (reason?: any) => void) => void): Promise;
> ~
> An argument for 'value' was not provided.src/io/interfaces.ts:79:58 - 
> error TS2304: Cannot find name 'PipeOptions'.79 public pipeTo(writable: 
> WritableStream, options?: PipeOptions) { return 
> this._getDOMStream().pipeTo(writable, options); }
> 
> ~~~src/io/interfaces.ts:80:119 - error TS2304: Cannot find name 
> 'PipeOptions'.80 public pipeThrough ReadableStream>(duplex: { writable: WritableStream, readable: R }, 
> options?: PipeOptions) {
>   
>
> ~~~src/io/interfaces.ts:169:39 - error TS2322: 

[jira] [Resolved] (ARROW-12863) [JS] Field nullable value is overwritten

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-12863.

Resolution: Cannot Reproduce

> [JS] Field nullable value is overwritten
> 
>
> Key: ARROW-12863
> URL: https://issues.apache.org/jira/browse/ARROW-12863
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 4.0.0
>Reporter: Nick Rabinowitz
>Priority: Minor
>
> I cannot find a way to manually create a table with non-nullable fields in 
> JS. When I create the fields and pass them in via {{Table.new}} the value of 
> {{nullable}} is overwritten.
> Example:
> {code:javascript}
> const type = new Utf8();
> const field = new Field('test', type, false);
> const column = Column.new(field, []);
> console.log(column.nullable); // false
> const table = Table.new(column);
> console.log(table.schema.fields[0].nullable); // true
> {code}
> The issue seems to be the hardcoded value here: 
> https://github.com/apache/arrow/blob/master/js/src/util/args.ts#L184



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-12863) [JS] Field nullable value is overwritten

2022-01-16 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476891#comment-17476891
 ] 

Dominik Moritz commented on ARROW-12863:


Seems to work fine in master now


{code:js}
const table = Arrow.makeTable({
a: Float32Array.from({ length: 10 }, () => Math.random()),
c: new Arrow.vectorFromArray([1, 2, 3, null])
});

console.log(table.schema.fields[0].nullable);
console.log(table.schema.fields[1].nullable);
{code}

prints `false, true` as expected.


> [JS] Field nullable value is overwritten
> 
>
> Key: ARROW-12863
> URL: https://issues.apache.org/jira/browse/ARROW-12863
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 4.0.0
>Reporter: Nick Rabinowitz
>Priority: Minor
>
> I cannot find a way to manually create a table with non-nullable fields in 
> JS. When I create the fields and pass them in via {{Table.new}} the value of 
> {{nullable}} is overwritten.
> Example:
> {code:javascript}
> const type = new Utf8();
> const field = new Field('test', type, false);
> const column = Column.new(field, []);
> console.log(column.nullable); // false
> const table = Table.new(column);
> console.log(table.schema.fields[0].nullable); // true
> {code}
> The issue seems to be the hardcoded value here: 
> https://github.com/apache/arrow/blob/master/js/src/util/args.ts#L184



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (ARROW-13861) [JS] Create Field with List type will throw error

2022-01-16 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-13861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471274#comment-17471274
 ] 

Dominik Moritz edited comment on ARROW-13861 at 1/16/22, 5:41 PM:
--

In https://github.com/apache/arrow/pull/10371 you can do

{code:js}
const vector = vectorFromArray([[1, 2], [2, 3]], new List(Field.new({ name: 
'field', type: new Uint8 })));
console.log(vector.toArray());
{code}


was (Author: domoritz):
In https://github.com/apache/arrow/pull/10371 you can do

{code:js}
const vector = vectorFromArray([[1, 2], [2, 3]], new List(Field.new({ name: 
'field', type: new Uint8 })));
console.log(vector.toArray());
{code}

```

> [JS] Create Field with List type will throw error
> -
>
> Key: ARROW-13861
> URL: https://issues.apache.org/jira/browse/ARROW-13861
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 5.0.0
>Reporter: zizhao.chen
>Priority: Major
>  Labels: arrow
>
>  
> const field_vector=new ApacheArrow.Field(
>   "field_vector",
>   new ApacheArrow.List(new ApacheArrow.Float32(), 4),
>   false
> );
>  
> Will throw error{color:#FF} TypeError: Cannot read property 'children' of 
> undefined{color}
> Source code: 
> [https://github.com/apache/arrow/blob/master/js/src/schema.ts#L137]
>  
> If it's a bug, i'd like to fix this.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-13861) [JS] Create Field with List type will throw error

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-13861.

Fix Version/s: 7.0.0
   Resolution: Cannot Reproduce

> [JS] Create Field with List type will throw error
> -
>
> Key: ARROW-13861
> URL: https://issues.apache.org/jira/browse/ARROW-13861
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 5.0.0
>Reporter: zizhao.chen
>Priority: Major
>  Labels: arrow
> Fix For: 7.0.0
>
>
>  
> const field_vector=new ApacheArrow.Field(
>   "field_vector",
>   new ApacheArrow.List(new ApacheArrow.Float32(), 4),
>   false
> );
>  
> Will throw error{color:#FF} TypeError: Cannot read property 'children' of 
> undefined{color}
> Source code: 
> [https://github.com/apache/arrow/blob/master/js/src/schema.ts#L137]
>  
> If it's a bug, i'd like to fix this.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-14650) [JS] toArray equivalent to values/values64

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-14650.
--

> [JS] toArray equivalent to values/values64
> --
>
> Key: ARROW-14650
> URL: https://issues.apache.org/jira/browse/ARROW-14650
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Nicholas Roberts
>Priority: Minor
> Fix For: 7.0.0
>
>
> As discussed in ARROW-10901, 64 bit integer vectors have values64 getters 
> available for systems with support for BigInt typed arrays. Column-oriented 
> dataframe libraries (such as UW's 
> [arquero|https://github.com/uwdata/arquero]) generally use the 
> Chunked::toArray convenience method in favour of directly dealing with chunks 
> or vectors, and therefore always receive the int32/uint32 data.
> I think there are a few alternatives for improving high level access to a 64 
> bit column's values:
>  * An optional bit width (or is64Bit, like the ::from variants) parameter 
> in Chunked::toArray, IntVector::toArray.
>  * A new Chunked::toArray64 method, and the same on IntVector (or at least, 
> the 64 bit variants).
>  * Use values64 directly in the consuming library (loop over the chunks, copy 
> into a destination typed array).
> The toArray64 option would probably be a bit of a mess (requiring a fallback 
> to toArray on BaseVector), an optional parameter might be the cleanest 
> approach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-14650) [JS] toArray equivalent to values/values64

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-14650.

Fix Version/s: 7.0.0
   Resolution: Implemented

> [JS] toArray equivalent to values/values64
> --
>
> Key: ARROW-14650
> URL: https://issues.apache.org/jira/browse/ARROW-14650
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Nicholas Roberts
>Priority: Minor
> Fix For: 7.0.0
>
>
> As discussed in ARROW-10901, 64 bit integer vectors have values64 getters 
> available for systems with support for BigInt typed arrays. Column-oriented 
> dataframe libraries (such as UW's 
> [arquero|https://github.com/uwdata/arquero]) generally use the 
> Chunked::toArray convenience method in favour of directly dealing with chunks 
> or vectors, and therefore always receive the int32/uint32 data.
> I think there are a few alternatives for improving high level access to a 64 
> bit column's values:
>  * An optional bit width (or is64Bit, like the ::from variants) parameter 
> in Chunked::toArray, IntVector::toArray.
>  * A new Chunked::toArray64 method, and the same on IntVector (or at least, 
> the 64 bit variants).
>  * Use values64 directly in the consuming library (loop over the chunks, copy 
> into a destination typed array).
> The toArray64 option would probably be a bit of a mess (requiring a fallback 
> to toArray on BaseVector), an optional parameter might be the cleanest 
> approach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-12536) [JS] Construct tables from JavaScript types

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-12536.
--

> [JS] Construct tables from JavaScript types
> ---
>
> Key: ARROW-12536
> URL: https://issues.apache.org/jira/browse/ARROW-12536
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
> Fix For: 7.0.0
>
>
> Right now, Arrow has no automatic type inference for JavaScript types, so I 
> think we would need to add that. 
> {code:javascript}
> // Convert from JS types automatically
> const t = Arrow.Table.from({
>   Country: ["USA", "Canada", "Mexico"],
>   GDP: [123, 234, 345],
> })
> // I'd also like Arrow to support other common JS table formats:
> const t = Arrow.Table.from([
>   {Country: "USA", GDP: 123},
>   {Country: "Canada", GDP: 234},
>   {Country: "Mexico", GDP: 345},
> ])
> const t = Arrow.Table.from([
>   ["Country", "GDP"],
>   ["USA", 123],
>   ["Canada", 234],
>   ["Mexico", 345],
> ])
> {code}
> Thanks to Thiago for the suggestions!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-12549) [JS] Table and RecordBatch should not extend Vector

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-12549.
--

Done in https://github.com/apache/arrow/pull/10371

> [JS] Table and RecordBatch should not extend Vector
> ---
>
> Key: ARROW-12549
> URL: https://issues.apache.org/jira/browse/ARROW-12549
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Paul Taylor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Right now, Table  are chunked vectors and Record Batches are struct vectors 
> which means the classes are linked deeply. We should not extend and instead 
> copy the binary search to Table and copy BaseVector.prototype.get etc to 
> record batch. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-12536) [JS] Construct tables from JavaScript types

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-12536.

Fix Version/s: 7.0.0
 Assignee: Dominik Moritz
   Resolution: Implemented

Done in https://github.com/apache/arrow/pull/10371

> [JS] Construct tables from JavaScript types
> ---
>
> Key: ARROW-12536
> URL: https://issues.apache.org/jira/browse/ARROW-12536
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
> Fix For: 7.0.0
>
>
> Right now, Arrow has no automatic type inference for JavaScript types, so I 
> think we would need to add that. 
> {code:javascript}
> // Convert from JS types automatically
> const t = Arrow.Table.from({
>   Country: ["USA", "Canada", "Mexico"],
>   GDP: [123, 234, 345],
> })
> // I'd also like Arrow to support other common JS table formats:
> const t = Arrow.Table.from([
>   {Country: "USA", GDP: 123},
>   {Country: "Canada", GDP: 234},
>   {Country: "Mexico", GDP: 345},
> ])
> const t = Arrow.Table.from([
>   ["Country", "GDP"],
>   ["USA", 123],
>   ["Canada", 234],
>   ["Mexico", 345],
> ])
> {code}
> Thanks to Thiago for the suggestions!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (ARROW-12536) [JS] Construct tables from JavaScript types

2022-01-16 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476847#comment-17476847
 ] 

Dominik Moritz edited comment on ARROW-12536 at 1/16/22, 5:28 PM:
--

Done in https://github.com/apache/arrow/pull/10371. For now, we only support 
the columnar format. 


was (Author: domoritz):
Done in https://github.com/apache/arrow/pull/10371

> [JS] Construct tables from JavaScript types
> ---
>
> Key: ARROW-12536
> URL: https://issues.apache.org/jira/browse/ARROW-12536
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
> Fix For: 7.0.0
>
>
> Right now, Arrow has no automatic type inference for JavaScript types, so I 
> think we would need to add that. 
> {code:javascript}
> // Convert from JS types automatically
> const t = Arrow.Table.from({
>   Country: ["USA", "Canada", "Mexico"],
>   GDP: [123, 234, 345],
> })
> // I'd also like Arrow to support other common JS table formats:
> const t = Arrow.Table.from([
>   {Country: "USA", GDP: 123},
>   {Country: "Canada", GDP: 234},
>   {Country: "Mexico", GDP: 345},
> ])
> const t = Arrow.Table.from([
>   ["Country", "GDP"],
>   ["USA", 123],
>   ["Canada", 234],
>   ["Mexico", 345],
> ])
> {code}
> Thanks to Thiago for the suggestions!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-12538) [JS] Show Vectors in the docs

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-12538.

Fix Version/s: 7.0.0
   Resolution: Fixed

Done in https://github.com/apache/arrow/pull/10371

> [JS] Show Vectors in the docs
> -
>
> Key: ARROW-12538
> URL: https://issues.apache.org/jira/browse/ARROW-12538
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-14933) [JS] apache-arrow does not compile with typescript when types are checked

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-14933.

Fix Version/s: 7.0.0
 Assignee: (was: Dominik Moritz)
   Resolution: Fixed

Done in https://github.com/apache/arrow/pull/10371

> [JS] apache-arrow does not compile with typescript when types are checked
> -
>
> Key: ARROW-14933
> URL: https://issues.apache.org/jira/browse/ARROW-14933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 6.0.1
> Environment: Node v16.6.0
> TypeScript v4.5.2
> Apache Arrow v6.0.1
>Reporter: Fritz Lekschas
>Priority: Major
> Fix For: 7.0.0
>
>
> Similar to 
> [https://stackoverflow.com/questions/66956030/apache-arrow-does-not-compile-with-typescript]
> Apache Arrow v6.0 does not compile with TypeScript v4.5 when Apache Arrow's 
> types are checked (e.g., `"skipLibCheck": false`)
> When I import `import \{ Table } from '@apache-arrow/ts';`, I am getting 
> several errors like the following:
>  * node_modules/apache-arrow/util/buffer.d.ts:10:328 - error TS2304: Cannot 
> find name 'ReadableStreamReadResult'.
>  * node_modules/apache-arrow/Arrow.d.ts:86:540 - error TS2304: Cannot find 
> name 'ReadableStreamReadDoneResult'
>  * node_modules/apache-arrow/recordbatch.d.ts:24:22 - error TS2415: Class 
> 'RecordBatch' incorrectly extends base class 'StructVector'.
>  * node_modules/apache-arrow/io/interfaces.d.ts:61:18 - error TS2304: Cannot 
> find name 'PipeOptions'.
>  * ...
> I've created a repo that reproduces the errors: 
> [https://github.com/flekschas/apache-arrow-typescript]
> Are those error expected/known or does apache-arrow require a special 
> TypeScript config?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-12538) [JS] Show Vectors in the docs

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-12538.
--

> [JS] Show Vectors in the docs
> -
>
> Key: ARROW-12538
> URL: https://issues.apache.org/jira/browse/ARROW-12538
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-14933) [JS] apache-arrow does not compile with typescript when types are checked

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-14933.
--

> [JS] apache-arrow does not compile with typescript when types are checked
> -
>
> Key: ARROW-14933
> URL: https://issues.apache.org/jira/browse/ARROW-14933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 6.0.1
> Environment: Node v16.6.0
> TypeScript v4.5.2
> Apache Arrow v6.0.1
>Reporter: Fritz Lekschas
>Priority: Major
> Fix For: 7.0.0
>
>
> Similar to 
> [https://stackoverflow.com/questions/66956030/apache-arrow-does-not-compile-with-typescript]
> Apache Arrow v6.0 does not compile with TypeScript v4.5 when Apache Arrow's 
> types are checked (e.g., `"skipLibCheck": false`)
> When I import `import \{ Table } from '@apache-arrow/ts';`, I am getting 
> several errors like the following:
>  * node_modules/apache-arrow/util/buffer.d.ts:10:328 - error TS2304: Cannot 
> find name 'ReadableStreamReadResult'.
>  * node_modules/apache-arrow/Arrow.d.ts:86:540 - error TS2304: Cannot find 
> name 'ReadableStreamReadDoneResult'
>  * node_modules/apache-arrow/recordbatch.d.ts:24:22 - error TS2415: Class 
> 'RecordBatch' incorrectly extends base class 'StructVector'.
>  * node_modules/apache-arrow/io/interfaces.d.ts:61:18 - error TS2304: Cannot 
> find name 'PipeOptions'.
>  * ...
> I've created a repo that reproduces the errors: 
> [https://github.com/flekschas/apache-arrow-typescript]
> Are those error expected/known or does apache-arrow require a special 
> TypeScript config?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-13514) [JS] Update flatbuffers

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-13514.

Resolution: Fixed

Done in https://github.com/apache/arrow/pull/10371

> [JS] Update flatbuffers
> ---
>
> Key: ARROW-13514
> URL: https://issues.apache.org/jira/browse/ARROW-13514
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
> Fix For: 7.0.0
>
>
> * Update the flatbuffers npm package to version 2
> * Remove @types/flatbuffers since flatbuffers comes with its own typings
> * Update the generated flatbuffers



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-13514) [JS] Update flatbuffers

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-13514.
--

> [JS] Update flatbuffers
> ---
>
> Key: ARROW-13514
> URL: https://issues.apache.org/jira/browse/ARROW-13514
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
> Fix For: 7.0.0
>
>
> * Update the flatbuffers npm package to version 2
> * Remove @types/flatbuffers since flatbuffers comes with its own typings
> * Update the generated flatbuffers



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-10220) [JS] Cache javascript utf-8 dictionary keys?

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-10220.
--

> [JS] Cache javascript utf-8 dictionary keys?
> 
>
> Key: ARROW-10220
> URL: https://issues.apache.org/jira/browse/ARROW-10220
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Affects Versions: 1.0.1
>Reporter: Ben Schmidt
>Assignee: Dominik Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> String decoding from arrow tables is a major bottleneck in using arrow in 
> Javascript–it can take a second to decode a million rows. For utf-8 types, 
> I'm not sure what could be done; but some memoization would help utf-8 
> dictionary types.
> Currently, the javascript implementation decodes a utf-8 string every time 
> you request an item from a dictionary with utf-8 data. If arrow cached the 
> decoded strings to a native js Map, routine operations like looping over all 
> the entries in a text column might be on the order of 10x faster. Here's an 
> observable notebook [benchmarking that and a couple other 
> strategies|https://observablehq.com/@bmschmidt/faster-arrow-dictionary-unpacking].
> I would file a pull request, but 1) I would have to learn some typescript to 
> do so, and 2) this idea may be undesirable because it creates new objects 
> that will increase the memory footprint of a table, rather than just using 
> the typed arrays.
> Some discussion of how the real-world issues here affect the arquero project 
> is [here|https://github.com/uwdata/arquero/issues/1].
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-12548) [JS] Get rid of columns

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-12548.
--

> [JS] Get rid of columns
> ---
>
> Key: ARROW-12548
> URL: https://issues.apache.org/jira/browse/ARROW-12548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
> Fix For: 7.0.0
>
>
> Just use the name Child (as we have for Vectors). 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-10220) [JS] Cache javascript utf-8 dictionary keys?

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-10220.

Fix Version/s: 7.0.0
   Resolution: Implemented

Done in https://github.com/apache/arrow/pull/10371

> [JS] Cache javascript utf-8 dictionary keys?
> 
>
> Key: ARROW-10220
> URL: https://issues.apache.org/jira/browse/ARROW-10220
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Affects Versions: 1.0.1
>Reporter: Ben Schmidt
>Assignee: Dominik Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> String decoding from arrow tables is a major bottleneck in using arrow in 
> Javascript–it can take a second to decode a million rows. For utf-8 types, 
> I'm not sure what could be done; but some memoization would help utf-8 
> dictionary types.
> Currently, the javascript implementation decodes a utf-8 string every time 
> you request an item from a dictionary with utf-8 data. If arrow cached the 
> decoded strings to a native js Map, routine operations like looping over all 
> the entries in a text column might be on the order of 10x faster. Here's an 
> observable notebook [benchmarking that and a couple other 
> strategies|https://observablehq.com/@bmschmidt/faster-arrow-dictionary-unpacking].
> I would file a pull request, but 1) I would have to learn some typescript to 
> do so, and 2) this idea may be undesirable because it creates new objects 
> that will increase the memory footprint of a table, rather than just using 
> the typed arrays.
> Some discussion of how the real-world issues here affect the arquero project 
> is [here|https://github.com/uwdata/arquero/issues/1].
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Reopened] (ARROW-11347) [JS] Consider Objects instead of Maps

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz reopened ARROW-11347:


> [JS] Consider Objects instead of Maps
> -
>
> Key: ARROW-11347
> URL: https://issues.apache.org/jira/browse/ARROW-11347
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Priority: Major
>  Labels: performance
> Fix For: 7.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A quick experiment 
> (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to 
> show that object accesses are a lot faster than map accesses. Would it make 
> sense to switch to objects in the row API to improve performance? 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-12548) [JS] Get rid of columns

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-12548.

Fix Version/s: 7.0.0
   Resolution: Fixed

Done in https://github.com/apache/arrow/pull/10371

> [JS] Get rid of columns
> ---
>
> Key: ARROW-12548
> URL: https://issues.apache.org/jira/browse/ARROW-12548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
> Fix For: 7.0.0
>
>
> Just use the name Child (as we have for Vectors). 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-11347) [JS] Consider Objects instead of Maps

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-11347.

Resolution: Invalid

> [JS] Consider Objects instead of Maps
> -
>
> Key: ARROW-11347
> URL: https://issues.apache.org/jira/browse/ARROW-11347
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Priority: Major
>  Labels: performance
> Fix For: 7.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A quick experiment 
> (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to 
> show that object accesses are a lot faster than map accesses. Would it make 
> sense to switch to objects in the row API to improve performance? 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-11347) [JS] Consider Objects instead of Maps

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-11347.
--
Assignee: Dominik Moritz

> [JS] Consider Objects instead of Maps
> -
>
> Key: ARROW-11347
> URL: https://issues.apache.org/jira/browse/ARROW-11347
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: performance
> Fix For: 7.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A quick experiment 
> (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to 
> show that object accesses are a lot faster than map accesses. Would it make 
> sense to switch to objects in the row API to improve performance? 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-11347) [JS] Consider Objects instead of Maps

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-11347.

Fix Version/s: 7.0.0
   Resolution: Fixed

Done in https://github.com/apache/arrow/pull/10371

> [JS] Consider Objects instead of Maps
> -
>
> Key: ARROW-11347
> URL: https://issues.apache.org/jira/browse/ARROW-11347
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Priority: Major
>  Labels: performance
> Fix For: 7.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A quick experiment 
> (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to 
> show that object accesses are a lot faster than map accesses. Would it make 
> sense to switch to objects in the row API to improve performance? 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ARROW-12549) [JS] Table and RecordBatch should not extend Vector

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz resolved ARROW-12549.

Resolution: Fixed

Issue resolved by pull request 10371
[https://github.com/apache/arrow/pull/10371]

> [JS] Table and RecordBatch should not extend Vector
> ---
>
> Key: ARROW-12549
> URL: https://issues.apache.org/jira/browse/ARROW-12549
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Paul Taylor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Right now, Table  are chunked vectors and Record Batches are struct vectors 
> which means the classes are linked deeply. We should not extend and instead 
> copy the binary search to Table and copy BaseVector.prototype.get etc to 
> record batch. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (ARROW-12549) [JS] Table and RecordBatch should not extend Vector

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz reassigned ARROW-12549:
--

Assignee: Paul Taylor  (was: Dominik Moritz)

> [JS] Table and RecordBatch should not extend Vector
> ---
>
> Key: ARROW-12549
> URL: https://issues.apache.org/jira/browse/ARROW-12549
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Paul Taylor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Right now, Table  are chunked vectors and Record Batches are struct vectors 
> which means the classes are linked deeply. We should not extend and instead 
> copy the binary search to Table and copy BaseVector.prototype.get etc to 
> record batch. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (ARROW-12549) [JS] Table and RecordBatch should not extend Vector

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz reassigned ARROW-12549:
--

Assignee: Paul Taylor

> [JS] Table and RecordBatch should not extend Vector
> ---
>
> Key: ARROW-12549
> URL: https://issues.apache.org/jira/browse/ARROW-12549
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Paul Taylor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Right now, Table  are chunked vectors and Record Batches are struct vectors 
> which means the classes are linked deeply. We should not extend and instead 
> copy the binary search to Table and copy BaseVector.prototype.get etc to 
> record batch. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ARROW-12549) [JS] Table and RecordBatch should not extend Vector

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz updated ARROW-12549:
---
Fix Version/s: 7.0.0

> [JS] Table and RecordBatch should not extend Vector
> ---
>
> Key: ARROW-12549
> URL: https://issues.apache.org/jira/browse/ARROW-12549
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Right now, Table  are chunked vectors and Record Batches are struct vectors 
> which means the classes are linked deeply. We should not extend and instead 
> copy the binary search to Table and copy BaseVector.prototype.get etc to 
> record batch. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (ARROW-12549) [JS] Table and RecordBatch should not extend Vector

2022-01-16 Thread Dominik Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz reassigned ARROW-12549:
--

Assignee: Dominik Moritz  (was: Paul Taylor)

> [JS] Table and RecordBatch should not extend Vector
> ---
>
> Key: ARROW-12549
> URL: https://issues.apache.org/jira/browse/ARROW-12549
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Right now, Table  are chunked vectors and Record Batches are struct vectors 
> which means the classes are linked deeply. We should not extend and instead 
> copy the binary search to Table and copy BaseVector.prototype.get etc to 
> record batch. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-13514) [JS] Update flatbuffers

2022-01-14 Thread Dominik Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476190#comment-17476190
 ] 

Dominik Moritz commented on ARROW-13514:


Already done in https://github.com/apache/arrow/pull/10371. We're just wrapping 
up some benchmarking and fixing performance regressions. 

> [JS] Update flatbuffers
> ---
>
> Key: ARROW-13514
> URL: https://issues.apache.org/jira/browse/ARROW-13514
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
> Fix For: 7.0.0
>
>
> * Update the flatbuffers npm package to version 2
> * Remove @types/flatbuffers since flatbuffers comes with its own typings
> * Update the generated flatbuffers



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   3   4   >