drin commented on PR #39836: URL: https://github.com/apache/arrow/pull/39836#issuecomment-2210215451
> I'm not entirely sure I've understood your distinction between the two cases. Am I correct in understanding that in the first of the two options that you describe a row containing[[1], [1,2,3,4]] would have a different hash value to [[1,1], [2,3,4]] but in the second, 'value-based' option these two lists would have the same hash? Oh yeah, I guess I didn't make that part clear. But yes, you understood correctly. I was in favor of `[[1], [1, 2, 3, 4]]` having the **same** hash as `[[1, 1], [2, 3, 4]]` (what I called the "value-based" option). If you want those to have different hashes, then it may be possible to transform the list array into a struct array and one struct column can be all of the offsets coalesced and a second struct column can be all of the values (referenced). To do it any other way would require extending the existing hashing functions in non-trivial ways or adding new hashing functions entirely. So, maybe for this PR I can simply do shallow nested arrays and someone can work on the deep nested arrays in aa follow-up PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
