php-general Digest 20 Jan 2011 16:20:42 -0000 Issue 7143

php-general-digest-help Thu, 20 Jan 2011 08:23:15 -0800

php-general Digest 20 Jan 2011 16:20:42 -0000 Issue 7143

Topics (messages 310890 through 310901):


email address syntax checker
        310890 by: Donovan Brooke
        310891 by: David Hutto
        310893 by: Peter Lind
        310896 by: Nilesh Govindarajan

Re: Class and interface location
        310892 by: Tommy Pham
        310894 by: Richard Quadling
        310895 by: Steve Staples
        310897 by: Tommy Pham

Organisational question: surely someone has implemented many Boolean values 
(tags) and a solution exist
        310898 by: Dotan Cohen
        310899 by: Richard Quadling
        310900 by: Jerry Schwartz
        310901 by: Dotan Cohen

Administrivia:

To subscribe to the digest, e-mail:
        [email protected]

To unsubscribe from the digest, e-mail:
        [email protected]

To post to the list, e-mail:
        [email protected]


----------------------------------------------------------------------

--- Begin Message ---
Hi Guys,
I'm waddling my way through database interaction and thought someone onthe list may already have a simple email checker that they'd like toshare...
you know, looking for the @ char and dots etc..
I did a quick search of the archives and found a couple elaboratethings.. but
I'm looking for something simple. This job will have trusted users and
the checker is more to help them catch mistakes when registering.

Thanks!,
Donovan


--
D Brooke
--- End Message ---

--- Begin Message ---

On Wed, Jan 19, 2011 at 11:14 PM, Donovan Brooke <[email protected]> wrote:
> Hi Guys,
>
> I'm waddling my way through database interaction and thought someone on the
> list may already have a simple email checker that they'd like to share...
>
> you know, looking for the @ char and dots etc..
>
> I did a quick search of the archives and found a couple elaborate things..
> but
> I'm looking for something simple.

Simple is an irrelevant term, you'd have to elaborate on your current
experience.

 This job will have trusted users and
> the checker is more to help them catch mistakes when registering.
>
> Thanks!,
> Donovan
>
>
> --
> D Brooke
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

-- 
The lawyer in me says argue...even if you're wrong. The scientist in
me... says shut up, listen, and then argue. But the lawyer won on
appeal, so now I have to argue due to a court order.

Furthermore, if you could be a scientific celebrity, would you want
einstein sitting around with you on saturday morning, while you're
sitting in your undies, watching Underdog?...Or better yet, would
Einstein want you to violate his Underdog time?

Can you imagine Einstein sitting around in his underware? Thinking
about the relativity between his cotton nardsac, and his Fruit of the
Looms?

But then again, J. Edgar Hoover would want his pantyhose intertwined
within the equation.

However, I digress, momentarily.

--- End Message ---

--- Begin Message ---

On 20 January 2011 05:14, Donovan Brooke <[email protected]> wrote:
> Hi Guys,
>
> I'm waddling my way through database interaction and thought someone on the
> list may already have a simple email checker that they'd like to share...
>
> you know, looking for the @ char and dots etc..
>
> I did a quick search of the archives and found a couple elaborate things..
> but
> I'm looking for something simple. This job will have trusted users and
> the checker is more to help them catch mistakes when registering.

if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
    echo "Bad user! Bad user!";
}

Regards
Peter

-- 
<hype>
WWW: plphp.dk / plind.dk
LinkedIn: plind
BeWelcome/Couchsurfing: Fake51
Twitter: kafe15
</hype>

--- End Message ---

--- Begin Message ---

On 01/20/2011 09:44 AM, Donovan Brooke wrote:

Hi Guys,

I'm waddling my way through database interaction and thought someone on
the list may already have a simple email checker that they'd like to
share...

you know, looking for the @ char and dots etc..

I did a quick search of the archives and found a couple elaborate
things.. but
I'm looking for something simple. This job will have trusted users and
the checker is more to help them catch mistakes when registering.

Thanks!,
Donovan

Well, I had created an email validator long ago, after a neat researchon Google, reading RFCs, etc.I don't guarantee that it's without bugs, but it has been correct for mein all valid & invalid email addresses I used for test.


Code:

<?php

function checkMail($mail) {

  if(strlen($mail) <= 0) {
    return false;
  }

  $split = explode('@', $mail);

  if(count($split) > 2) {
    return false;
  }

  list($username, $domain) = $split;

/*


   * Don't allow
   * Two dots, Two @
   * !, #, $, ^, &, *, (, ), [, ], {, }, ?, /, \, ~, `, <, >, ', "
   */

$userNameRegex1 ='/\.{2,}|@{2,}|[\!#\$\^&\*\[\]{}\?\/\\\|~`<>\'"]+/';


  /*
   * Username should consist of only
   * A-Z, a-z, 0-9, -, ., _, +, %
   */

  $userNameRegex2 = '/[a-z0-9_.+%-]+/i';

  /*
   * Domain cannot contain two successive dots
   */

  $domainRegex1 = '/\.{2,}/';

  /*
   * Domain can contain only
   * A-Z, a-z, 0-9, ., -,
   */

  $domainRegex2 = '/[a-z0-9.-]+/i';

  if(preg_match($userNameRegex1, $username) or
    !preg_match($userNameRegex2, $username) or
     preg_match($domainRegex1, $domain) or
    !preg_match($domainRegex2, $domain) or
    !checkdnsrr($domain, 'MX')) {
    return false;
  } else {
    return true;
  }

}


--
Regards,
Nilesh Govindarajan
Facebook: http://www.facebook.com/nilesh.gr
Twitter: http://twitter.com/nileshgr
Website: http://www.itech7.com

--- End Message ---

--- Begin Message ---

> -----Original Message-----
> From: Larry Garfield [mailto:[email protected]]
> Sent: Wednesday, January 19, 2011 7:47 PM
> To: [email protected]
> Subject: Re: [PHP] Class and interface location
> 
> On Wednesday, January 19, 2011 8:56:50 pm Tommy Pham wrote:
> 
> > > And actually, thinking about it, I wonder if requiring the explicit
> > declaration
> > > is a good thing anyway because then it's immediately obvious and
> > > greppable what the class does. :-)
> > >
> > > --Larry Garfield
> >
> > You mean requiring explicit declaration of
> >
> > > class Bob implements Foo {
> > >
> > > }
> >
> > It's so you can guarantee your app from future breakage because the
> > interface guarantees minimum functionality and still maintain great
> > flexibility such that:
> 
> Well, let me offer a more precise example of what I want to do:
> 
> interface Stuff {
>   function doStuff($a);
> }
> 
> interface Things extends Stuff {
>   function doThings($a);
> }
> 
> class Bob implements Stuff {...}
> 
> class Alice implements Stuff {...}
> 
> class George {}
> 
> class Matt implements Things {...}
> 
> 
> function make_stuff_happen($o) {
>   foreach (class_that_implements_stuff() as $class) {
>     $c = new $class();
>     $c->doStuff($o);
>   }
> }
> 
> The above code should iterate over Bob, Alice, and Matt, but not George.
> The trick is how to implement class_that_implements_stuff() (or rather,
> class_that_implements_something('Stuff')) in a sane and performant
> fashion that supports auto-loading.
> 
> If all of the above is together in a single file, it's dead simple with
> get_declared_classes() and class_implements().  However, in practice there
> could be some 200 interfaces -- with each installation having a different
set
> of them -- and as many as 500 classes implementing some combination of
> those interfaces, possibly multiple on the same class -- also with each
> installation having a different set of them.  I do not want to be forced
to
> include all of those classes and interfaces on every page load when in
> practice only perhaps two dozen will be needed on a particular request.
> That makes
> class_that_implements_stuff() considerably tricker.
> 
> That let naturally to having a cached index (in a database, in a file with
a big
> lookup array, whatever) that pre-computed which class implements what
> interface and what file it lives in, so that we can easily lookup what
classes
> we need and then let the autoloader find them.  (Previous benchmarks
> have shown that an index-based autoloader is actually pretty darned fast.)
> That just makes building that index the challenge, hence this email
thread.
> 
> Thinking it through, however, I am now wondering if, in practice, indirect
> implementation (class Matt above) will even be that common.  It may not
> be common enough to be an issue in practice, so requiring Matt to be
> declared as:
> 
> class Matt implements Stuff, Things {...}
> 
> isn't really that big of a deal and makes the indexer considerably
simpler.
> We actually have one already that indexes class locations; it just doesn't
> track interfaces.
> 
> I also, on further review, don't think that class-based inheritance will
ever
> be an issue.  The following:
> 
> class Mary extends Alice {...}
> 
> Would not make much sense at all because then both Mary and Alice would
> run, so Alice's code would run twice.  So I may be over-complicating the
> situation.
> 
> (For those following along at home, yes, this is essentially observer
pattern.
> However, it's on a very large scale and the thing being observed may not
> always be an object, so the usual active registration process of
instantiating
> both objects and telling them about each other on the off chance that they
> *may* be needed is excessively wasteful.)
> 
> Does that make it clearer what I'm trying to do?
> 
> --Larry Garfield
> 

Ah...  My current project is a bit similar to your intention but invoking a
class depending on functionality needed and not by looking up the interface.
Just about everything within the project is either interface or abstract
except the 'brain' of the app.  It's a MVC design intending to be 100%
modular.  A module may have several urls mapped to it. Any adequate third
party plugins can be dropped within any module.  So the problem you'll run
into is this:

interface IBase
{
  function baseFunc();
}

class BaseClass implements IBase {  }

class SubClass extends BaseClass { }

class SubSubClass extends SubClass { }

How would you know which class (BaseClass, SubClass, SubSubClass, or however
many subs down the tree branch you may have eventually) to invoke if you're
basing on just the interface ?  IMHO, I think you're overcomplicating your
project.  I think you should design it base on the interface (ie, passing it
within the parameter(s) ) but invoke the 'right' class for the functionality
intended.  That's why I gave the example earlier.  As the project may start
out with just Database.  Eventually needs arise, I could implement
MySQLDatabase.  Then later still build another subclass and still won't
break or do a major rewrite of what I started with.
 
And I agree about loading the class(es) that probably will never used.  It's
very wasteful.  Aso, if you haven't looked into PureMVC, I suggest you look
into it.  It's core program flow is pure abstract (via mainly interfaces).
It uses 'observer' and notify the appropriate listener(s) class(es).  I
think what you're looking for is maybe its multicore with sync/async calls
and/or (unix like) pipes plumbing.  FYI, it does tends to implement
Reflection :(  So I'm rolling my own :)

Regards,
Tommy

--- End Message ---

--- Begin Message ---

On 20 January 2011 03:46, Larry Garfield <[email protected]> wrote:
> Does that make it clearer what I'm trying to do?

Some ideas. I've no idea if any of them are any good.

1 - All plugins implement a self-registration mechanism.

A analogy to this in the Windows world would be an ActiveX component.
All ActiveX components implement a function which is callable by a
component registration program (regsvr32.exe). During installation of
the app, the component registration process detects the registration
function and calls it to register the component. The function provides
all the data required for registration.

So, something like ...

/**
 * Register plugin
 *
 * @return pluginRegistration
 */
final static public function abstractBasePlugin::registerPlugin(){
 // Create and populate a pluginRegistration object.
 // This would contain the interfaces implemented, the classes from
which the get_called_class() extends along with the hierarchy of the
interfaces and classes.
}

for example, would allow any class that extends abstractBasePlugin, to
be able to provide all the details about its class and interface
hierarchy.

Rather than just dropping files onto the server and hoping things all
work, all plugins must be "installed". This installation procedure
would then build the index at installation time for each plugin. Add
to this an unregister mechanism for those dead plugins (or for plugins
being upgraded) ...

If plugins are in constant development, the index needs to be updated
when the plugin is ready. Forcing an installation procedure would
certainly help here I think as only the current plugin needs to be
analysed and only once.

I'd probably also add details about the author of the plugin, so in
the event that the plugin is deleted (see auto tuning below), the
alerting system could inform the developer or whatever you think is
appropriate.


2 - Code protection.

Rather than leaving the source code as is, the installation process
could PHAR archive the files before placing it in the appropriate
place. That way, the include file wouldn't be editable in the live
environment ... maybe not good for the developers, but for users,
installed plugins can't be edited (easily) and therefore the index
should be consistent.


3 - Auto tuning.

I would probably add a small amount of realtime tuning to the indexer
in the event that a plugin is missing (someone deleted the file). The
lack of the file sends an appropriate email alert detailing the
missing file and indicates any orphaned subclasses and then tags the
index to exclude the missing class from that point onwards. All
dependent subclasses would also have to be tagged to exclude them from
the index. If the pluginRegistration content includes contact details,
you can include these in the alert.


4 - Plugin Inheritence.

During installation, if a plugin extends another plugin, you could
either throw an alert of some sort or have a mechanism that says the
parent class is non-callable. If nothing else, you could at least
report it and tell the developers that they need to sort this out.
Again, the developer may be able to provide details of this in the
pluginRegistration content (bool $parentCallable = true). Obviously,
multiple extensions could report different values for parentCallable,
so there would need to be some sort of guidelines and the installer
should enforce them.



The main thrust of this is that doing a static analysis is going to be
expensive and really you only need to analyse the file once and an
"installation" procedure would provide that time. It would be a step
that is familiar with many users (at least desktop user). And I think
it isn't too onerous on the developers. They only have to populate a
single object and extend from a single abstract class (or class that
has already extended from the abstract plugin class).


I hope these ideas help in some way.

Richard.


-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

--- End Message ---

--- Begin Message ---

On Wed, 2011-01-19 at 21:46 -0600, Larry Garfield wrote:
> On Wednesday, January 19, 2011 8:56:50 pm Tommy Pham wrote:
> 
> > > And actually, thinking about it, I wonder if requiring the explicit
> > declaration
> > > is a good thing anyway because then it's immediately obvious and
> > > greppable what the class does. :-)
> > > 
> > > --Larry Garfield
> > 
> > You mean requiring explicit declaration of
> > 
> > > class Bob implements Foo {
> > > 
> > > }
> > 
> > It's so you can guarantee your app from future breakage because the
> > interface guarantees minimum functionality and still maintain great
> > flexibility such that:
> 
> Well, let me offer a more precise example of what I want to do:
> 
> interface Stuff {
>   function doStuff($a);
> }
> 
> interface Things extends Stuff {
>   function doThings($a);
> }
> 
> class Bob implements Stuff {...}
> 
> class Alice implements Stuff {...}
> 
> class George {}
> 
> class Matt implements Things {...}
> 
> 
> function make_stuff_happen($o) {
>   foreach (class_that_implements_stuff() as $class) {
>     $c = new $class();
>     $c->doStuff($o);
>   }
> }
> 
> The above code should iterate over Bob, Alice, and Matt, but not George.  The 
> trick is how to implement class_that_implements_stuff() (or rather, 
> class_that_implements_something('Stuff')) in a sane and performant fashion 
> that supports auto-loading.
> 
> If all of the above is together in a single file, it's dead simple with 
> get_declared_classes() and class_implements().  However, in practice there 
> could be some 200 interfaces -- with each installation having a different set 
> of them -- and as many as 500 classes implementing some combination of those 
> interfaces, possibly multiple on the same class -- also with each 
> installation 
> having a different set of them.  I do not want to be forced to include all of 
> those classes and interfaces on every page load when in practice only perhaps 
> two dozen will be needed on a particular request.  That makes 
> class_that_implements_stuff() considerably tricker.
> 
> That let naturally to having a cached index (in a database, in a file with a 
> big lookup array, whatever) that pre-computed which class implements what 
> interface and what file it lives in, so that we can easily lookup what 
> classes 
> we need and then let the autoloader find them.  (Previous benchmarks have 
> shown that an index-based autoloader is actually pretty darned fast.)  That 
> just makes building that index the challenge, hence this email thread.
> 
> Thinking it through, however, I am now wondering if, in practice, indirect 
> implementation (class Matt above) will even be that common.  It may not be 
> common enough to be an issue in practice, so requiring Matt to be declared as:
> 
> class Matt implements Stuff, Things {...}
> 
> isn't really that big of a deal and makes the indexer considerably simpler.  
> We actually have one already that indexes class locations; it just doesn't 
> track interfaces.  
> 
> I also, on further review, don't think that class-based inheritance will ever 
> be an issue.  The following:
> 
> class Mary extends Alice {...}
> 
> Would not make much sense at all because then both Mary and Alice would run, 
> so Alice's code would run twice.  So I may be over-complicating the situation.
> 
> (For those following along at home, yes, this is essentially observer 
> pattern.  
> However, it's on a very large scale and the thing being observed may not 
> always be an object, so the usual active registration process of 
> instantiating 
> both objects and telling them about each other on the off chance that they 
> *may* be needed is excessively wasteful.)
> 
> Does that make it clearer what I'm trying to do?
> 
> --Larry Garfield
> 

Maybe it's the noob in me, or the lack of smrts... but couldn't you just
add a var in the class, and set it like:
$GLOBALS['classes_that_do_things']['name'] = 'class_name';
And for those that do 'things', and then do the foreach on that?  
foreach($GLOBALS['classes_that_do_things']['name'] as $class_name) {...}

not sure if that would work or not, it was just something that was in
the cobwebs of my brain this morning.

Steve

--- End Message ---

--- Begin Message ---

> -----Original Message-----
> From: Steve Staples [mailto:[email protected]]
> Sent: Thursday, January 20, 2011 5:16 AM
> To: Larry Garfield
> Cc: [email protected]
> Subject: Re: [PHP] Class and interface location
> 
> On Wed, 2011-01-19 at 21:46 -0600, Larry Garfield wrote:
> > On Wednesday, January 19, 2011 8:56:50 pm Tommy Pham wrote:
> >
> > > > And actually, thinking about it, I wonder if requiring the
> > > > explicit
> > > declaration
> > > > is a good thing anyway because then it's immediately obvious and
> > > > greppable what the class does. :-)
> > > >
> > > > --Larry Garfield
> > >
> > > You mean requiring explicit declaration of
> > >
> > > > class Bob implements Foo {
> > > >
> > > > }
> > >
> > > It's so you can guarantee your app from future breakage because the
> > > interface guarantees minimum functionality and still maintain great
> > > flexibility such that:
> >
> > Well, let me offer a more precise example of what I want to do:
> >
> > interface Stuff {
> >   function doStuff($a);
> > }
> >
> > interface Things extends Stuff {
> >   function doThings($a);
> > }
> >
> > class Bob implements Stuff {...}
> >
> > class Alice implements Stuff {...}
> >
> > class George {}
> >
> > class Matt implements Things {...}
> >
> >
> > function make_stuff_happen($o) {
> >   foreach (class_that_implements_stuff() as $class) {
> >     $c = new $class();
> >     $c->doStuff($o);
> >   }
> > }
> >
> > The above code should iterate over Bob, Alice, and Matt, but not
> > George.  The trick is how to implement class_that_implements_stuff()
> > (or rather,
> > class_that_implements_something('Stuff')) in a sane and performant
> > fashion that supports auto-loading.
> >
> > If all of the above is together in a single file, it's dead simple
> > with
> > get_declared_classes() and class_implements().  However, in practice
> > there could be some 200 interfaces -- with each installation having a
> > different set of them -- and as many as 500 classes implementing some
> > combination of those interfaces, possibly multiple on the same class
> > -- also with each installation having a different set of them.  I do
> > not want to be forced to include all of those classes and interfaces
> > on every page load when in practice only perhaps two dozen will be
> > needed on a particular request.  That makes
> > class_that_implements_stuff() considerably tricker.
> >
> > That let naturally to having a cached index (in a database, in a file
> > with a big lookup array, whatever) that pre-computed which class
> > implements what interface and what file it lives in, so that we can
> > easily lookup what classes we need and then let the autoloader find
> > them.  (Previous benchmarks have shown that an index-based autoloader
> > is actually pretty darned fast.)  That just makes building that index the
> challenge, hence this email thread.
> >
> > Thinking it through, however, I am now wondering if, in practice,
> > indirect implementation (class Matt above) will even be that common.
> > It may not be common enough to be an issue in practice, so requiring Matt
> to be declared as:
> >
> > class Matt implements Stuff, Things {...}
> >
> > isn't really that big of a deal and makes the indexer considerably simpler.
> > We actually have one already that indexes class locations; it just
> > doesn't track interfaces.
> >
> > I also, on further review, don't think that class-based inheritance
> > will ever be an issue.  The following:
> >
> > class Mary extends Alice {...}
> >
> > Would not make much sense at all because then both Mary and Alice
> > would run, so Alice's code would run twice.  So I may be over-complicating
> the situation.
> >
> > (For those following along at home, yes, this is essentially observer
> pattern.
> > However, it's on a very large scale and the thing being observed may
> > not always be an object, so the usual active registration process of
> > instantiating both objects and telling them about each other on the
> > off chance that they
> > *may* be needed is excessively wasteful.)
> >
> > Does that make it clearer what I'm trying to do?
> >
> > --Larry Garfield
> >
> 
> Maybe it's the noob in me, or the lack of smrts... but couldn't you just add a
> var in the class, and set it like:
> $GLOBALS['classes_that_do_things']['name'] = 'class_name'; And for those
> that do 'things', and then do the foreach on that?
> foreach($GLOBALS['classes_that_do_things']['name'] as $class_name) {...}
> 
> not sure if that would work or not, it was just something that was in the
> cobwebs of my brain this morning.
> 
> Steve
> 

@Steve,

That would mean that class have to be either loaded/invoked.  Larry doesn't 
want that.  He wants a way to include the right class and instantiate it. 

" the usual active registration process of instantiating both objects and 
telling them about each other on the off chance that they *may* be needed is 
excessively wasteful"

You wouldn't see the performance difference when your project is very small.  
But when it reaches several hundreds files+, including all of them will be a 
performance killer.

@Larry,

After a good night's rest, I just realize that what you're asking is similar to 
another thread discusion.  This sounds like a 'cache' management problem 
doesn't it?  Consider that part of the page (DB model objects, PHP generated 
html, etc,) is cached, what's not cached means that the cache management would 
have to include those class(es) and instantiate it/them.  So in the no cache 
environment, a file may include about 50 files (interfaces, abstracts, 
superclasses) while the cache environment would only need to include about 5-10 
or so.  

Regards,
Tommy

--- End Message ---

--- Begin Message ---

I am designing an application that make heavy usage of one-to-many
tags for items. That is, each item can have multiple tags, and there
are tens of tags (likely to grow to hundreds). Most operation on the
database are expected to be searches for the items that have a
particular tag. That is, users will search per tags, not per items.

These are the ways that I've thought about storing the tags, some bad
and some worse. If there is a better way I'd love to know.

1) Each item will get a row in a "tags" table, with a column for each tag.
mysql> CREATE TABLE tags (
    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    item VARCHAR(100),
    tag1 bool,
    tag2 bool,
    ....
    tagN bool
);

With this approach I would be adding a new column every time a new
category is added. This looks to me a good way given that users will
be searching per tag and a simple "SELECT item FROM tags WHERE
tag1=true;" is an easy, inexpensive query. This table will get very
large, there will likely be literally thousands of items (there will
exist more items than tags).



2) Store the applicable tags one per line in a text field in the items table.
mysql> CREATE TABLE items (
    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    item VARCHAR(100),
    tags text,
);

This looks like a bad idea, searching by tag will be a mess.



3) Store the tags in a table and add items to a text field. For instance:
mysql> CREATE TABLE tags (
    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    tagName VARCHAR(100),
    items text,
);

This looks to be the best way from a MySQL data retrieval perspective,
but I do not know how expensive it will be to then split the items in
PHP. Furthermore, adding items to tags could get real expensive.



Caveat: at some point in the future there may be added the ability to
have a tag hierarchy. For instance, there could exist a tag
"restaurant" that will get the subtags "italian" and "french". I could
fake this with any approach by having a table of existing tags with a
"parentTag" field, so if I plan on having this table anyway would
method 3 above be preferable?

Note: this message is cross-posted to the MySQL and the PHP lists as I
am really not sure where is the best place to do the logic. My
apologies to those who receive the message twice.

Thanks!

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--- End Message ---

--- Begin Message ---

On 20 January 2011 14:32, Dotan Cohen <[email protected]> wrote:
> I am designing an application that make heavy usage of one-to-many
> tags for items. That is, each item can have multiple tags, and there
> are tens of tags (likely to grow to hundreds). Most operation on the
> database are expected to be searches for the items that have a
> particular tag. That is, users will search per tags, not per items.
>
> These are the ways that I've thought about storing the tags, some bad
> and some worse. If there is a better way I'd love to know.
>
> 1) Each item will get a row in a "tags" table, with a column for each tag.
> mysql> CREATE TABLE tags (
>    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
>    item VARCHAR(100),
>    tag1 bool,
>    tag2 bool,
>    ....
>    tagN bool
> );
>
> With this approach I would be adding a new column every time a new
> category is added. This looks to me a good way given that users will
> be searching per tag and a simple "SELECT item FROM tags WHERE
> tag1=true;" is an easy, inexpensive query. This table will get very
> large, there will likely be literally thousands of items (there will
> exist more items than tags).
>
>
>
> 2) Store the applicable tags one per line in a text field in the items table.
> mysql> CREATE TABLE items (
>    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
>    item VARCHAR(100),
>    tags text,
> );
>
> This looks like a bad idea, searching by tag will be a mess.
>
>
>
> 3) Store the tags in a table and add items to a text field. For instance:
> mysql> CREATE TABLE tags (
>    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
>    tagName VARCHAR(100),
>    items text,
> );
>
> This looks to be the best way from a MySQL data retrieval perspective,
> but I do not know how expensive it will be to then split the items in
> PHP. Furthermore, adding items to tags could get real expensive.
>
>
>
> Caveat: at some point in the future there may be added the ability to
> have a tag hierarchy. For instance, there could exist a tag
> "restaurant" that will get the subtags "italian" and "french". I could
> fake this with any approach by having a table of existing tags with a
> "parentTag" field, so if I plan on having this table anyway would
> method 3 above be preferable?
>
> Note: this message is cross-posted to the MySQL and the PHP lists as I
> am really not sure where is the best place to do the logic. My
> apologies to those who receive the message twice.
>
> Thanks!
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

I'd have my items table, my tags table and a join table for the two.
My join table is really simple. UniqueID, ItemID, TagID.

I'd recommend using a nested set approach for the tags
(http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
gives a good explanation on the issues and methodology of nested
sets).



-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

--- End Message ---

--- Begin Message ---

I think the canonical way would be to have one table for your items, one table 
for your tags, and one table for your tag assignments.

CREATE TABLE items (
item_id INT(11) AUTO-INCREMENT PRIMARY KEY,
item_name VARCHAR(100) NOT NULL KEY,
...
);

CREATE TABLE tags (
tag_id INT(11) AUTO-INCREMENT PRIMARY KEY,
tag_name VARCHAR(100) NOT NULL KEY,
...
);

CREATE TABLE item_tags (
item_id INT(11) NOT NULL KEY,
tag_id INT(11) NOT NULL KEY
);

This way you could do

SELECT item_id, item_name FROM
tags JOIN item_tags ON tags.tag_id = item_tags.tag_id
JOIN items ON item_tags.item_id = items.item_id
WHERE ...
;

to get all of the items with a particular tag, or

SELECT tag_id, tag_name FROM
items JOIN item_tags ON items.item_id = item_tags.item_id
JOIN tags ON item_tags.tag_id = tags.tag_id
WHERE ...
;

with equal ease and efficiency.

Using an ever-lengthening bitmap for the tag assignments is a trap for the 
unwary. The path to perdition is lined with the bodies of those who believed 
"We'll never need more than x..."

As for setting up a hierarchy, that's trickier. One way to handle that is to 
work like libraries do: 10 is "fiction", 10.05 is "crime novels", 10.05.07 is 
"British authors", and so forth. Your `tags` table then looks like

CREATE TABLE tags (
tag_id INT(11) AUTO-INCREMENT PRIMARY KEY,
tag_name VARCHAR(100) NOT NULL KEY,
tag_number VARCHAR(100) NOT NULL KEY,
...
);

Then you can search for tags by

tag_number LIKE ('10.%') or
tag_number LIKE ('10.05%')

and so forth. This scheme is infinitely extendable. To get the entire 
hierarchy, you simply

SELECT tag_number, tag_name FROM tags ORDER BY tag_number;

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: [email protected]
Web site: www.the-infoshop.com


>-----Original Message-----
>From: Dotan Cohen [mailto:[email protected]]
>Sent: Thursday, January 20, 2011 9:32 AM
>To: mysql.; php-general.
>Subject: Organisational question: surely someone has implemented many Boolean
>values (tags) and a solution exist
>
>I am designing an application that make heavy usage of one-to-many
>tags for items. That is, each item can have multiple tags, and there
>are tens of tags (likely to grow to hundreds). Most operation on the
>database are expected to be searches for the items that have a
>particular tag. That is, users will search per tags, not per items.
>
>These are the ways that I've thought about storing the tags, some bad
>and some worse. If there is a better way I'd love to know.
>
>1) Each item will get a row in a "tags" table, with a column for each tag.
>mysql> CREATE TABLE tags (
>    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
>    item VARCHAR(100),
>    tag1 bool,
>    tag2 bool,
>    ....
>    tagN bool
>);
>
>With this approach I would be adding a new column every time a new
>category is added. This looks to me a good way given that users will
>be searching per tag and a simple "SELECT item FROM tags WHERE
>tag1=true;" is an easy, inexpensive query. This table will get very
>large, there will likely be literally thousands of items (there will
>exist more items than tags).
>
>
>
>2) Store the applicable tags one per line in a text field in the items table.
>mysql> CREATE TABLE items (
>    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
>    item VARCHAR(100),
>    tags text,
>);
>
>This looks like a bad idea, searching by tag will be a mess.
>
>
>
>3) Store the tags in a table and add items to a text field. For instance:
>mysql> CREATE TABLE tags (
>    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
>    tagName VARCHAR(100),
>    items text,
>);
>
>This looks to be the best way from a MySQL data retrieval perspective,
>but I do not know how expensive it will be to then split the items in
>PHP. Furthermore, adding items to tags could get real expensive.
>
>
>
>Caveat: at some point in the future there may be added the ability to
>have a tag hierarchy. For instance, there could exist a tag
>"restaurant" that will get the subtags "italian" and "french". I could
>fake this with any approach by having a table of existing tags with a
>"parentTag" field, so if I plan on having this table anyway would
>method 3 above be preferable?
>
>Note: this message is cross-posted to the MySQL and the PHP lists as I
>am really not sure where is the best place to do the logic. My
>apologies to those who receive the message twice.
>
>Thanks!
>
>--
>Dotan Cohen
>
>http://gibberish.co.il
>http://what-is-what.com
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe:    http://lists.mysql.com/[email protected]

--- End Message ---

--- Begin Message ---

On Thu, Jan 20, 2011 at 17:00, Richard Quadling <[email protected]> wrote:
> I'd have my items table, my tags table and a join table for the two.
> My join table is really simple. UniqueID, ItemID, TagID.
>

Yes, that is the first approach that I mentioned. It looks to be a
good compromise.


> I'd recommend using a nested set approach for the tags
> (http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
> gives a good explanation on the issues and methodology of nested
> sets).
>

That is terrific, at least the first half. The second half, with the
Venn diagrams, is awkward!

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--- End Message ---

php-general Digest 20 Jan 2011 16:20:42 -0000 Issue 7143

Reply via email to