rishvin commented on issue #1941: URL: https://github.com/apache/datafusion-comet/issues/1941#issuecomment-3111745019
> Some Updates: I have a simple test to start with, which will produce `_groupingmapsort`. > > ``` > val data = Seq( > | Map("a" -> 1, "b" -> 2), > | Map("a" -> 3, "b" -> 4), > | Map("b" -> 2, "a" -> 1) > | ) > > val df = data.toDF("map") > df.groupBy("map").count().show(false) > ``` > > So far based on my understanding, it looks like - this will require some plumbing at the Parquet reader utils in Scala, because currently we only support primitive types but `MapType` is a complex type. I made some hack to bypass some type-checking, however, the `Native.initColumnReader()` expects `primitiveTypeId`. I have to understand how the `MapType` would translate to `primitiveTypeId`. The closest one at the moment seems to be parquet's `BINARY` physical type. So, thinking of passing `BINARY` type and annotating with `MapKeyValueTypeAnnotation`. However, I'm still trying to understand this piece of code, so I might be wrong. I will do some more experiments to have more clarity on this. I managed to scan the map-type by setting `CometConf.COMET_NATIVE_SCAN_IMPL.key -> native_datafusion `. Added `map_sort` UDF with return type as `Map`. However, it looks like doing a group-by on the map-type is not possible at the moment. Native code throws error message like - `Not yet implemented: not yet implemented: Map(...)`. This error seems to be coming from Arrow. So, presently, we cannot use MapType for grouping, we might have to translate map more simple type. Will explore more on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org