[ 
https://issues.apache.org/jira/browse/SPARK-18134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663950#comment-15663950
 ] 

Christian Zorneck commented on SPARK-18134:
-------------------------------------------

We have many use cases. Generally we want to group or distinct data of course.

One of our use cases: our input event format is line based JSON. The first 
level has a defined set of attributes. One of the attributes is the generic 
part of the event, which is a map/object/dictionary (Map<String, String>). 
During unstaging we merge the data with the existing one, and distinct it, to 
avoid duplicates. This is also needed, to be done without merging, because 
data/event collector software do not guarantee, to send no duplicates. So we 
have to distinct the data.

Another use case is, that we often have group by clauses in our aggregations, 
and have map columns. In some cases, I do not know what attributes are in this 
map, at this time.

> SQL: MapType in Group BY and Joins not working
> ----------------------------------------------
>
>                 Key: SPARK-18134
>                 URL: https://issues.apache.org/jira/browse/SPARK-18134
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1, 1.6.2, 2.0.0, 2.0.1
>            Reporter: Christian Zorneck
>
> Since version 1.5 and issue SPARK-9415, MapTypes can no longer be used in 
> GROUP BY and join clauses. This makes it incompatible to HiveQL. So, a Hive 
> feature was removed from Spark. This makes Spark incompatible to various 
> HiveQL statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to