drin commented on PR #39836:
URL: https://github.com/apache/arrow/pull/39836#issuecomment-2210215451

   > I'm not entirely sure I've understood your distinction between the two 
cases. Am I correct in understanding that in the first of the two options that 
you describe a row containing[[1], [1,2,3,4]] would have a different hash value 
to [[1,1], [2,3,4]] but in the second, 'value-based' option these two lists 
would have the same hash?
   
   Oh yeah, I guess I didn't make that part clear. But yes, you understood 
correctly. I was in favor of `[[1], [1, 2, 3, 4]]` having the **same** hash as 
`[[1, 1], [2, 3, 4]]` (what I called the "value-based" option).
   
   If you want those to have different hashes, then it may be possible to 
transform the list array into a struct array and one struct column can be all 
of the offsets coalesced and a second struct column can be all of the values 
(referenced). To do it any other way would require extending the existing 
hashing functions in non-trivial ways or adding new hashing functions entirely.
   
   So, maybe for this PR I can simply do shallow nested arrays and someone can 
work on the deep nested arrays in aa follow-up PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to