Re: Metadata changes

Mike Carey Mon, 14 Dec 2015 17:52:41 -0800

Ah...  Indeed, got it.
It sure would be nice to have such indexes...  :-)
(In general they would be very useful.)
Hmm.


On 12/14/15 5:29 PM, Steven Jacobs wrote:

There are two cases where the code is attempting to use indexes:
1) When deleting a type, find and delete the anonymous subtypes.
2) When deleting a type, confirm that it is not used as a nested type of
another type.

Ignoring the "indexes" that we have in Metadata, Datatype records have a
field called "Fields" which contains a list of the fields within the type.
Each value in this list has a "fieldname" and "fieldtype."

For 1 we can simply iterate through this list and call delete recursively
when "fieldtype" is not primitive and anonymous
For 2 we need some way to find parent types given a type. The only way to
do this quickly would be to create an index on Fields.fieldtype which is a
field within a record within a list.

Steven

On Monday, December 14, 2015, Mike Carey <[email protected]> wrote:

Can you briefly explain why option 3 is so heavy? (Remind us how the use
info is modeled?)

On 12/14/15 3:43 PM, Steven Jacobs wrote:

We just had a UCR discussion on this topic. The issue is really with the
third "index" here. The code now is using one "index" to go in two
directions:
1) To find datatypes that use datatype A
2) To find datatypes that are used by datatype A.

The way that it works now is hacked together, but designed for
performance.
So we have three choices here:

1) Stick to the status quo, and leave the "indexes" as they are
2) Remove the Metadata secondary indexes, which will eliminate the hack
but
cost some performance on Metadata
3) Implement the Metadata secondary indexes correctly as Asterix indexes.
For this solution to work with our dataset designs, we will need to have
the ability to index homogeneous lists. In addition, we will have reverse
compatibility issues unless we plan things out for the transition.

What are the thoughts?


Orthogonally, it seems that the consensus for storing the datatype
dataverse in the dataset Metadata is to just add it as an open field at
least for now. Is that correct?

Steven


On Mon, Dec 14, 2015 at 1:23 PM, Mike Carey <[email protected]> wrote:

Thoughts inlined:

On 12/14/15 11:12 AM, Steven Jacobs wrote:

Here are the conclusions that Ildar and I have drawn from looking at the

secondary indexes:

First of all it seems that datasets are local to node groups, but
dataverses can span node groups, which seems a little odd to me.

Node groups are an undocumented but to-be-exploited-someday feature that

allows datasets to be stored on less than all nodes in a given cluster.
As
we face bigger clusters, we'll want to open up that possibility.  We will
hopefully use them inside w/o having to make users manage them manually
like parallel DB2 did/does.  Dataverses are really just a namespace
thing,
not a storage thing at all, so they are orthogonal to (and unrelated to)
node groups.

There are three Metadata secondary indexes:  GROUPNAME_ON_DATASET_INDEX,

DATATYPENAME_ON_DATASET_INDEX, DATATYPENAME_ON_DATATYPE_INDEX

The first is used in only one case:
When dropping a node group, check if there are any datasets using this
node
group. If so, don't allow the drop
BUT, this index has a field called "dataverse" which is not used at all.

This one seems like a waste of space since we do this almost never. (Not

much space, but unnecessary.)  If we keep it it should become a proper
index.

The second is used when dropping a datatype. If there is a dataset using

this datatype, don't allow the drop.
Similarly, this index has a "dataverse" which is never used.

You're about to use the dataverse part, right?  :-)  This index seems

like
it will be useful but should be a proper index.

The third index is used to go in two cases, using two different ideas of

"keys"
It seems like this should actually be two different indexes.

I don't think I understood this comment....


This is my understanding so far. It would be good to discuss what the

"correct" version should be.
Steven




On Mon, Dec 14, 2015 at 10:12 AM, Steven Jacobs <[email protected]>
wrote:

Hi all,

I'm implementing a change so that datasets can use datatypes from
alternate data verses (previously the type and set had to be from the
same
dataverse). Unfortunately this means another change for Dataset
Metadata
(which will now store the dataverse for its type).

As such, I had a couple of questions:

1) Should this change be thrown into the release branch, as it is
another
Metadata change?

2) In implementing this change, I've been looking at the Metadata
secondary indexes. I had a discussion with Ildar, and it seems the
thread
on Metadata secondary indexes being "hacked" has been lost. Is this
also
something that should get into the release? Is there anyone currently
looking at it?

Steven

Re: Metadata changes

Reply via email to