[ 
https://issues.apache.org/jira/browse/ARROW-15828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502126#comment-17502126
 ] 

Nicola Crane commented on ARROW-15828:
--------------------------------------

Thanks for reporting this!  My best guess is that a ChunkedArrray is typically 
created by reading in data in chunks (i.e. smaller Arrays) which are not stored 
in contiguous sections of memory and so this is represented in the chunking.  
When the values are cast to a new data type, a new data structure is created 
and it's all in contiguous sections of memory, i.e. a single Array.  I think 
this is likely intentional behaviour - is this change from ChunkedArray to 
Array causing you any particular issues?  

> [Python][R] ChunkedArray's cast() method combine multiple arrays into one
> -------------------------------------------------------------------------
>
>                 Key: ARROW-15828
>                 URL: https://issues.apache.org/jira/browse/ARROW-15828
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python, R
>    Affects Versions: 7.0.0
>            Reporter: SHIMA Tatsuya
>            Priority: Major
>
> It appears that if I try to cast to int or float, the array will be one.
> {code:r}
> library(arrow, warn.conflicts = FALSE)
> #> See arrow_info() for available features
> chunked_array(1:2, 3:4, 5:6)$cast(string())
> #> ChunkedArray
> #> [
> #>   [
> #>     "1",
> #>     "2"
> #>   ],
> #>   [
> #>     "3",
> #>     "4"
> #>   ],
> #>   [
> #>     "5",
> #>     "6"
> #>   ]
> #> ]
> chunked_array(1:2, 3:4, 5:6)$cast(float64())
> #> ChunkedArray
> #> [
> #>   [
> #>     1,
> #>     2,
> #>     3,
> #>     4,
> #>     5,
> #>     6
> #>   ]
> #> ]
> chunked_array(1:2, 3:4, 5:6)$cast(int64())
> #> ChunkedArray
> #> [
> #>   [
> #>     1,
> #>     2,
> #>     3,
> #>     4,
> #>     5,
> #>     6
> #>   ]
> #> ]
> chunked_array(1:2, 3:4, 5:6)$cast(date32())
> #> ChunkedArray
> #> [
> #>   [
> #>     1970-01-02,
> #>     1970-01-03
> #>   ],
> #>   [
> #>     1970-01-04,
> #>     1970-01-05
> #>   ],
> #>   [
> #>     1970-01-06,
> #>     1970-01-07
> #>   ]
> #> ]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to