Re: Homogeneous lists with nullable items

Ildar Absalyamov Fri, 18 Dec 2015 10:01:26 -0800

OK, I will work on design doc.

> On Dec 18, 2015, at 06:45, Mike Carey <[email protected]> wrote:
> 
> Agreed.  We probably need a mini design doc here. The short term urgency 
> seems to be a need to represent lists that can include nulls, as this is 
> blocking JPL and is also something easily produced by queries (AQL or SQL++). 
>  Longer term one can imagine where this would be something that might vary 
> (at the lowest level of detail) by list, e.g., you might represent dense and 
> sparse lists quite differently, you might use compression for certain kinds 
> of lists, etc.
> 
> On 12/18/15 1:57 AM, Till Westmann wrote:
>> Hi Ildar,
>> 
>> it seems that we have 2 separate points here:
>> 1) There are bugs in the way we decide which list representation to use and
>> 2) we could add support for (and an optimized representation for) a list of 
>> a fixed but nullable type.
>> It seems that - by fixing 1) - we could get rid of the issues you’ve listed.
>> 
>> But I also think that it would be nice to support lists of a nullable type 
>> (feels like an omission that we don’t support that in the language) - and 
>> potentially provide an efficient representation for them.
>> However, it is not clear to me how we would do this.
>> A few thoughts:
>> - Would we maintain the current representation for homogenous lists of 
>> non-nullable types?
>> - Would we introduce a new type tag for “nullable lists”?
>> - Would we redefine the current representation to mean something else?
>> Do you have thoughts on those?
>> 
>> Cheers,
>> Till
>> 
>> On 16 Dec 2015, at 8:12, Ildar Absalyamov wrote:
>> 
>>> Hi devs,
>>> 
>>> Recently I have been playing around with lists and functions, which 
>>> receive/return list parameters/values. I have noticed one particular issue, 
>>> which seems to be incorrect.
>>> As you might know internally we do support 2 types of lists homogeneous, 
>>> where all the items are untagged and the item type is stored in type 
>>> definition, and heterogeneous, where items on contrary are tagged, and the 
>>> list item type is effectively ANY.
>>> The decision which of two types would be used is usually done by parser or 
>>> is altered by IntroduceEnforcedListTypeRule, which effectively turns 
>>> heterogenous list into homogenous if all the items have the same type.
>>> Right now only we allow homogeneous lists to be defined as a field in some 
>>> type, we also restrict the item type to be only non-nullable type:
>>> create type listType {
>>> “id”:int64,
>>> “list”:[int64]   // [int64?] is not possible
>>> }
>>> 
>>> This constraint spans both of the language level as well as serialization. 
>>> Under that restriction the only way to load the list, which contains null 
>>> values, would be to make the appropriate field open (open lists are 
>>> heterogenous by definition).
>>> 
>>> 1) Seems like we’re missing an optimization opportunity when we are dealing 
>>> with large sparse lists. Serialization in this case might use a bit mask to 
>>> specify which items in the lists are not null, and later encode only those 
>>> items.
>>> 2) I believe if we alter IntroduceEnforcedListTypeRule to enforce list to 
>>> homogeneous list with nullable item type we might resolve issues 
>>> https://issues.apache.org/jira/browse/ASTERIXDB-905, 
>>> https://issues.apache.org/jira/browse/ASTERIXDB-867, 
>>> https://issues.apache.org/jira/browse/ASTERIXDB-1131all at once.
>>> 
>>> Thoughts?
>>> 
>>> Best regards,
>>> Ildar
>


Best regards,
Ildar

Re: Homogeneous lists with nullable items

Reply via email to