[ 
https://issues.apache.org/jira/browse/ARROW-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-5274:
------------------------------
    External issue URL: https://github.com/apache/arrow/issues/21744

> [JavaScript] Wrong array type for countBy
> -----------------------------------------
>
>                 Key: ARROW-5274
>                 URL: https://issues.apache.org/jira/browse/ARROW-5274
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: JavaScript
>            Reporter: Yngve Kristiansen
>            Assignee: Yngve Kristiansen
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.14.0
>
>   Original Estimate: 5m
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The {{countBy}} function is not returning correct histograms, as it seems to 
> select the wrong array type for the indexing.
> The following line in countBy seems to be causing the problems:
> {{const countByteLength = Math.ceil(Math.log(vector.dictionary.length) / 
> Math.log(256));}}
> For example, if the dictionary length is 3, yet the indices length is 1 
> million, the result of this expression will be 1, which will lead to a 
> Uint8Array being used, again resulting in overflows.
> Codepen example
>  [https://codepen.io/Yngve92/pen/mYdWrr]
> If I switch the expression to: {{const countByteLength = 
> Math.ceil(Math.log(vector.length) / Math.log(256));}} it seems to be working 
> all right, but I am not sure if this is correct.
> The expression is on L63, L189 in src/compute/dataframe.ts.
>  
> PR submitted: [https://github.com/apache/arrow/pull/4265] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to