OK, I will work on design doc. > On Dec 18, 2015, at 06:45, Mike Carey <[email protected]> wrote: > > Agreed. We probably need a mini design doc here. The short term urgency > seems to be a need to represent lists that can include nulls, as this is > blocking JPL and is also something easily produced by queries (AQL or SQL++). > Longer term one can imagine where this would be something that might vary > (at the lowest level of detail) by list, e.g., you might represent dense and > sparse lists quite differently, you might use compression for certain kinds > of lists, etc. > > On 12/18/15 1:57 AM, Till Westmann wrote: >> Hi Ildar, >> >> it seems that we have 2 separate points here: >> 1) There are bugs in the way we decide which list representation to use and >> 2) we could add support for (and an optimized representation for) a list of >> a fixed but nullable type. >> It seems that - by fixing 1) - we could get rid of the issues you’ve listed. >> >> But I also think that it would be nice to support lists of a nullable type >> (feels like an omission that we don’t support that in the language) - and >> potentially provide an efficient representation for them. >> However, it is not clear to me how we would do this. >> A few thoughts: >> - Would we maintain the current representation for homogenous lists of >> non-nullable types? >> - Would we introduce a new type tag for “nullable lists”? >> - Would we redefine the current representation to mean something else? >> Do you have thoughts on those? >> >> Cheers, >> Till >> >> On 16 Dec 2015, at 8:12, Ildar Absalyamov wrote: >> >>> Hi devs, >>> >>> Recently I have been playing around with lists and functions, which >>> receive/return list parameters/values. I have noticed one particular issue, >>> which seems to be incorrect. >>> As you might know internally we do support 2 types of lists homogeneous, >>> where all the items are untagged and the item type is stored in type >>> definition, and heterogeneous, where items on contrary are tagged, and the >>> list item type is effectively ANY. >>> The decision which of two types would be used is usually done by parser or >>> is altered by IntroduceEnforcedListTypeRule, which effectively turns >>> heterogenous list into homogenous if all the items have the same type. >>> Right now only we allow homogeneous lists to be defined as a field in some >>> type, we also restrict the item type to be only non-nullable type: >>> create type listType { >>> “id”:int64, >>> “list”:[int64] // [int64?] is not possible >>> } >>> >>> This constraint spans both of the language level as well as serialization. >>> Under that restriction the only way to load the list, which contains null >>> values, would be to make the appropriate field open (open lists are >>> heterogenous by definition). >>> >>> 1) Seems like we’re missing an optimization opportunity when we are dealing >>> with large sparse lists. Serialization in this case might use a bit mask to >>> specify which items in the lists are not null, and later encode only those >>> items. >>> 2) I believe if we alter IntroduceEnforcedListTypeRule to enforce list to >>> homogeneous list with nullable item type we might resolve issues >>> https://issues.apache.org/jira/browse/ASTERIXDB-905, >>> https://issues.apache.org/jira/browse/ASTERIXDB-867, >>> https://issues.apache.org/jira/browse/ASTERIXDB-1131all at once. >>> >>> Thoughts? >>> >>> Best regards, >>> Ildar >
Best regards, Ildar
