[ https://issues.apache.org/jira/browse/SPARK-25225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592397#comment-16592397 ]
Takeshi Yamamuro commented on SPARK-25225: ------------------------------------------ I don't understand exactly your secinario though, `UserDefinedType` is not enough for u? > Add support for "List"-Type columns > ----------------------------------- > > Key: SPARK-25225 > URL: https://issues.apache.org/jira/browse/SPARK-25225 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core > Affects Versions: 2.3.1 > Reporter: Yuriy Davygora > Priority: Minor > > At the moment, Spark Dataframe ArrayType-columns only support all elements of > the array being of same data type. > At our company, we are currently rewriting old MapReduce code with Spark. One > of the frequent use-cases is aggregating data into timeseries: > Example input: > {noformat} > ID date data > 1 2017-01-01 data_1_1 > 1 2018-02-02 data_1_2 > 2 2017-03-03 data_2_1 > 3 2018-04-04 data 2_2 > ... > {noformat} > Expected outpus: > {noformat} > ID timeseries > 1 [[2017-01-01, data_1_1],[2018-02-02, data1_2]] > 2 [[2017-03-03, data_2_1],[2018-04-04, data2_2]] > ... > {noformat} > Here, the values in the data column of the input are, in most cases, not > primitive, but, for example, lists, dicts, nested lists, etc. Spark, however, > does not support creating an array column of a string column and a non-string > column. > We would like to kindly ask you to implement one of the following: > 1. Extend ArrayType to support elements of different data type > 2. Introduce a new container type (ListType?) which would support elements of > different type -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org