Re: Registry schema cleanup

2014-11-20 Thread Lawrence Velázquez

> On Nov 20, 2014, at 7:46 AM, Rainer Müller  wrote:
> 
> On a side note, using something more generic like 'checksum' as field
> name with a value of 'md5:abc012...' or 'sha256:def345...' would be more
> flexible as a the schema. Either CHAR or VARCHAR are all just TEXT
> ignoring the length for SQLite, right?

Yup.

http://www.sqlite.org/datatype3.html#affinity

vq
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: Registry schema cleanup

2014-11-20 Thread Rainer Müller
On 2014-11-18 20:48, Clemens Lang wrote:
> From the files table, I'd remove `mtime', `md5sum' and `editable', which
> are all unused and set to dummy values at the moment. I know those were
> intended to be used in configuration management, but this hasn't
> happened so far and we can always re-add the fields later, if we ever
> finish implementing this.

These fields have been used in the gsoc10-configfiles branch. There is
certainly enough to cleanup on that branch by now. I guess introducing
these fields again with another registry upgrade would probably be the
smaller effort.

On a side note, using something more generic like 'checksum' as field
name with a value of 'md5:abc012...' or 'sha256:def345...' would be more
flexible as a the schema. Either CHAR or VARCHAR are all just TEXT
ignoring the length for SQLite, right? However, that part only needs to
be discussed by whoever wants to pick up the work on that branch.

> In the dependencies table, we should add a `dep_port` index on the `id'
> column to improve join performance.
> 
> We currently do not have any indices for the portgroups table. We should
> add an index on the `id' column for joins and one over (id, name,
> version, size, sha256) to speed up opening portgroups and executing
> Portfiles from registry.
> 
> Of course, these changes imply the corresponding changes in registry2.0
> and cregistry.
> 
> If nobody objects to the changes, I have a patch ready to commit.

Sounds fine, please go ahead.

Just to be clear: once this lands on trunk, anyone installing from trunk
should be aware that their installation will not be compatible with the
released version anymore.

Rainer
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: Registry schema cleanup

2014-11-18 Thread Brandon Allbery
On Tue, Nov 18, 2014 at 11:31 PM, Ian Wadham  wrote:
>
> For this kind of reason, fellow relational database designers/programmers
> and I have
> always eschewed the use of nulls (wherever I have worked, since about the
> 80s).


Funny, I never had a problem with the behavior of NULL. It's not
capricious; it's just *not a value*.

-- 
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: Registry schema cleanup

2014-11-18 Thread Ian Wadham
Hi guys,

On 19/11/2014, at 9:34 AM, Joshua Root wrote:
> On 2014-11-19 06:48 , Clemens Lang wrote:
>> I'd like to clean up the database schema of the registry. We have a couple
>> of fields in there that are currently unused, as well as a few possible
>> indices that could improve performance and at least one index that is
>> actively harmful.
>> 
>> In detail:
>> 
>> I'd like to drop the `url' column from the ports table. All ports I have
>> installed have this set to NULL and I couldn't find a use-case where it
>> was being used. I tried installing a port from a URL, but that doesn't
>> set the field either.
>> Even worse, the `url' column is part of a unique index `UNIQUE(url,
>> epoch, version, revision, variants)', which is bad, because if `url' is
>> always NULL, that means you can never install two ports with the same
>> tuple of (epoch, version, revision, variants). I'm surprised we haven't
>> hit this yet.
> 
> SQLite considers NULLs to be distinct in this case.
> 

FWIW I think the operative phrase on that page is:
"-- I have about decided that SQL's treatment of NULLs is capricious and cannot 
be
deduced by logic.  It must be discovered by experiment."

For this kind of reason, fellow relational database designers/programmers and I 
have
always eschewed the use of nulls (wherever I have worked, since about the 80s).

You cannot know for sure what they are going to do nor how they are going to 
perform
resource-wise. An index with its leading term being null smells distinctly 
fishy to me - a
change in internal implementation could change the performance, perhaps 
radically.

It has always been the case, in my experience and AFAICR, that you can use a
prescribed default value, such as "", 0 or -1, in place of a null 
(missing/unknown value)
and thus get completely predictable actions and performance. And it is not often
(with apologies to the purists) that you really need to distinguish between 
null and
empty values. Even then there is usually a simple, practical way around the 
problem.

My 2c.

Cheers, Ian W.

___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: Registry schema cleanup

2014-11-18 Thread Brandon Allbery
On Tue, Nov 18, 2014 at 10:55 PM, Clemens Lang  wrote:

> Thank you, that's valuable information. Weird information, but still
> valuable.
>

Weird it may be, but it's the ANSI standard behavior for NULLs and also the
behavior implemented by the original (IBM's SEQUEL II, Oracle).

-- 
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: Registry schema cleanup

2014-11-18 Thread Clemens Lang
Hi,

- On 18 Nov, 2014, at 23:34, Joshua Root j...@macports.org wrote:


> SQLite considers NULLs to be distinct in this case.
> 

Thank you, that's valuable information. Weird information, but still
valuable. That also explains why this hasn't caused any problems yet.
Anyway, in that case the `url' column is simply unused, and for that
reason I'd propose to remove it.

-- 
Clemens Lang
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: Registry schema cleanup

2014-11-18 Thread Joshua Root
On 2014-11-19 06:48 , Clemens Lang wrote:
> Hi all,
> 
> I'd like to clean up the database schema of the registry. We have a couple
> of fields in there that are currently unused, as well as a few possible
> indices that could improve performance and at least one index that is
> actively harmful.
> 
> In detail:
> 
> I'd like to drop the `url' column from the ports table. All ports I have
> installed have this set to NULL and I couldn't find a use-case where it
> was being used. I tried installing a port from a URL, but that doesn't
> set the field either.
> Even worse, the `url' column is part of a unique index `UNIQUE(url,
> epoch, version, revision, variants)', which is bad, because if `url' is
> always NULL, that means you can never install two ports with the same
> tuple of (epoch, version, revision, variants). I'm surprised we haven't
> hit this yet.

SQLite considers NULLs to be distinct in this case.


> Consequently, this implies removal of the port_url index.
> 
>>From the files table, I'd remove `mtime', `md5sum' and `editable', which
> are all unused and set to dummy values at the moment. I know those were
> intended to be used in configuration management, but this hasn't
> happened so far and we can always re-add the fields later, if we ever
> finish implementing this.
> 
> In the dependencies table, we should add a `dep_port` index on the `id'
> column to improve join performance.
> 
> We currently do not have any indices for the portgroups table. We should
> add an index on the `id' column for joins and one over (id, name,
> version, size, sha256) to speed up opening portgroups and executing
> Portfiles from registry.
> 
> Of course, these changes imply the corresponding changes in registry2.0
> and cregistry.
> 
> If nobody objects to the changes, I have a patch ready to commit.

All sounds fine to me.

- Josh
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Registry schema cleanup

2014-11-18 Thread Clemens Lang
Hi all,

I'd like to clean up the database schema of the registry. We have a couple
of fields in there that are currently unused, as well as a few possible
indices that could improve performance and at least one index that is
actively harmful.

In detail:

I'd like to drop the `url' column from the ports table. All ports I have
installed have this set to NULL and I couldn't find a use-case where it
was being used. I tried installing a port from a URL, but that doesn't
set the field either.
Even worse, the `url' column is part of a unique index `UNIQUE(url,
epoch, version, revision, variants)', which is bad, because if `url' is
always NULL, that means you can never install two ports with the same
tuple of (epoch, version, revision, variants). I'm surprised we haven't
hit this yet.
Consequently, this implies removal of the port_url index.

>From the files table, I'd remove `mtime', `md5sum' and `editable', which
are all unused and set to dummy values at the moment. I know those were
intended to be used in configuration management, but this hasn't
happened so far and we can always re-add the fields later, if we ever
finish implementing this.

In the dependencies table, we should add a `dep_port` index on the `id'
column to improve join performance.

We currently do not have any indices for the portgroups table. We should
add an index on the `id' column for joins and one over (id, name,
version, size, sha256) to speed up opening portgroups and executing
Portfiles from registry.

Of course, these changes imply the corresponding changes in registry2.0
and cregistry.

If nobody objects to the changes, I have a patch ready to commit.

-- 
Clemens Lang
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev