Hi! I’m trying to query a dataset that reads data from csv and provides a SQL on top of it. The problem I have is I have a hierarchy of objects that I need to represent as a table so that users might use SQL to query it and do some aggregations. I do have multi value attributes (that in csv file looks like column_1, column_2, …, column_n) and I do have particular entities that split into several columns, like an Address (city, street, etc). And each row (let’s say it represents a Person) might have several Addresses.
It’s pretty clear that it’s not simple to flatten everything into one long list of columns as I would be able to find some weird stuff by doing that. So my question is the following: 1. Does SchemaRDD support something like multi value attributes? It might look like and array of values that lives in just one column. Although it’s not clear how I’d aggregate over it. May be there is some custom type API I can utilise? 2. Does newly supported DataFrame provides something in this regard? My understanding is that columns in DataFrame do need to be actual columns (as in a relation), but they may be different types (like arrays or objects). May be implementation of DataFrame itself provides some sort of custom types or smth pluggable that I might consider. Any clue would be really appreciated. Thanks -- Eugene Morozov fathers...@list.ru