Re: Question about open indexes

Ildar Absalyamov Fri, 25 Sep 2015 16:24:00 -0700

It did not really occur to me during today during the meeting, but Preston 
pointed out that the secondary index delete fix, that I proposed, spans both 
Hyracks & Asterix codebase. Thus we will either have to release Hyracks once 
again, or bite the bullet, sign the RC without this fixing this issue and 
create bug-fix releases for both Hyracks&Asterix right after.


> On Sep 22, 2015, at 22:27, Mike Carey <[email protected]> wrote:
> 
> Ah - that makes sense now.  Thx.  (And welcome back. :-))
> 
> On 9/22/15 10:02 PM, Ildar Absalyamov wrote:
>> Sorry for confusion, my initial answer was not correct enough, probably 
>> should have waited sometime after I drove 1500 miles form Seattle :)
>> The casting in the insert pipeline, which Abdullah mentioned, is needed only 
>> for secondary index insert. The reasoning behind this casting is to ensure 
>> that the record is equivalent, thus it is safe to create an open index. It 
>> is true that we can get <Pk, Sk> pairs out of original record using 
>> get-field-by-name\index, but the cast operator is introduced merely to kill 
>> the pipeline if the dataset input is not correct.
>> Thus the records in primary are never touched of modified, not matter what 
>> indexes were created.
>> I am not sure however what is the second cast in Abdullah’s plan, and where 
>> is comes from.
>> 
>> @Taewoo, so scan-delete-btree-secondary-index-open test does not actually 
>> delete data from the secondary index? I have checked the plan and it has the 
>> delete operator. Maybe it is initialized with wrong parameters, I’ll have a 
>> close look.
>> 
>>> On Sep 22, 2015, at 18:33, Mike Carey <[email protected]> wrote:
>>> 
>>> Sounds kinda bad!  Also, I wonder what happens when the compiler encounters 
>>> records in the dataset - whose type in the catalog doesn't claim to have a 
>>> given (but now indexed) open field - e.g., during a data scan or an access 
>>> via some other path?  Can Bad Things Happen due to the compiler not 
>>> properly anticipating the casted form of the records?  (Maybe I am 
>>> misunderstanding something, but we should probably take a careful look at 
>>> the test cases - and make sure we do things like add a bunch of records, 
>>> then add such an index, then add some more records, then stress-test 
>>> type-related things that come at the dataset (i) thru the index, (ii) thru 
>>> a primary dataset scan, and (iii) thru some other index.)
>>> 
>>> On 9/22/15 4:06 PM, Taewoo Kim wrote:
>>>> I think this issue:https://issues.apache.org/jira/browse/ASTERIXDB-1109 is
>>>> related. Currently, index entries (SK, PK) are not deleted on an open-type
>>>> secondary index during a deletion. This issue was not surfaced due to the
>>>> fact that every search after a secondary index search had to go through the
>>>> primary index lookup.
>>>> 
>>>> Best,
>>>> Taewoo
>>>> 
>>>> On Tue, Sep 22, 2015 at 12:04 AM, Ildar Absalyamov <
>>>> [email protected]> wrote:
>>>> 
>>>>> Abdullah,
>>>>> 
>>>>> If I remember correctly whenever a secondary open index is created all
>>>>> existing records would be casted to a proper type to ensure that the index
>>>>> creation is valid.
>>>>> As for the overall correctness of casting operation, semantically creating
>>>>> an open index is the same thing as altering the dataset type. The current
>>>>> implementation allows only one open index of particular type created on a
>>>>> single field. If we would have had “alter datatype” functionality the open
>>>>> indexing would not be required at all.
>>>>> 
>>>>>> On Sep 21, 2015, at 23:25, abdullah alamoudi <[email protected]> wrote:
>>>>>> 
>>>>>> More thoughts:
>>>>>> I assume the intention of the cast was just to make sure if the open
>>>>> field
>>>>>> exists, it is of the specified type. Moreover, the un-casted record
>>>>> should
>>>>>> be inserted into the index.
>>>>>> If my assumptions are not correct, please, let me know ASAP.
>>>>>> 
>>>>>> I have two thoughts on this:
>>>>>> 1. Actually, insert plans show that the records being inserted into the
>>>>>> primary index is actually the casted record creating the issue described
>>>>>> above.
>>>>>> 
>>>>>> 2. I don't believe this is the right way to ensure that the open field if
>>>>>> exists is of the right type. why not extract the field using field access
>>>>>> by name function and then verify the type using the field tag?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Sep 22, 2015 at 9:11 AM, abdullah alamoudi <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Dev, @Ildar,
>>>>>>> 
>>>>>>> In the insert pipeline for datasets with open indexes, we introduce a
>>>>> cast
>>>>>>> function before the insert and so one would expect the records to look
>>>>> like
>>>>>>> the casted record type which I assume has {{the closed fields + a
>>>>> nullable
>>>>>>> field}}.
>>>>>>> 
>>>>>>> The question is, what happens to the previously existing records?, since
>>>>>>> now the index has both, records of the original type and records of the
>>>>>>> casted type.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Abdullah.
>>>>>>> 
>>>>> Best regards,
>>>>> Ildar
>>>>> 
>>>>> 
>> Best regards,
>> Ildar
>> 
>> 
> 

Best regards,
Ildar

Re: Question about open indexes

Reply via email to