Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Tommy Pham
- Original Message 
> From: Paul M Foster 
> To: php-general@lists.php.net
> Sent: Friday, September 4, 2009 9:15:08 PM
> Subject: Re: [PHP] Searching on AlphaNumeric Content Only
> 
> On Fri, Sep 04, 2009 at 08:15:41PM -0400, Robert Cummings wrote:
> 
> > Paul M Foster wrote:
> >> On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:
> >>
> >>> On Fri, Sep 4, 2009 at 12:02 PM, Lupus
> >>> Michaeliswrote:
> >>
> >>> if you're on shared hosting (but then again I am of the opinion
> >>> that those who choose to run with shared hosting dig their own graves
> >>> in more ways than one).
> >>
> >> Any time you or someone else would like to backstop me in setting up a
> >> dedicated server on rackspace or somewhere else, for free or really
> >> cheap, you let me know! Otherwise, those of us with less than complete
> >> expertise in server setup are stuck with shared hosting. ;-}
> >
> > For a few hundred bucks a year you can get a VPS (Virtual Private
> > Server), which gives you root access. once you get a couple of clients
> > on it, it will pay for itself year after year. If you have clients that
> > don't use much resources, you can put a few on and even make a profit on
> > the hosting alone.
> 
> Oh sure. Well aware of it. The problem is not finding a VPS or working
> with root access or anything like that. The problem is expertise. POP3,
> SMTP, SSH, HTTP, DNS, firewall security while still allowing access to
> outward facing servers, etc. It's more expertise than most people have,
> including me. The servers I run are internal and don't have to deal with
> the rigors of the internet, and only serve a couple of people. Setting
> up multiple domains under an Apache server is black magic to me, for
> instance. And then there's backups, and
> what-do-I-do-if-the-server-fails, etc. Sheesh.

I stand corrected.  You paid for peace of mind :D


> 
> Paul
> 
> -- 
> Paul M. Foster
> 
> -- 
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Paul M Foster
On Fri, Sep 04, 2009 at 08:15:41PM -0400, Robert Cummings wrote:

> Paul M Foster wrote:
>> On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:
>>
>>> On Fri, Sep 4, 2009 at 12:02 PM, Lupus
>>> Michaelis wrote:
>>
>>> if you're on shared hosting (but then again I am of the opinion
>>> that those who choose to run with shared hosting dig their own graves
>>> in more ways than one).
>>
>> Any time you or someone else would like to backstop me in setting up a
>> dedicated server on rackspace or somewhere else, for free or really
>> cheap, you let me know! Otherwise, those of us with less than complete
>> expertise in server setup are stuck with shared hosting. ;-}
>
> For a few hundred bucks a year you can get a VPS (Virtual Private
> Server), which gives you root access. once you get a couple of clients
> on it, it will pay for itself year after year. If you have clients that
> don't use much resources, you can put a few on and even make a profit on
> the hosting alone.

Oh sure. Well aware of it. The problem is not finding a VPS or working
with root access or anything like that. The problem is expertise. POP3,
SMTP, SSH, HTTP, DNS, firewall security while still allowing access to
outward facing servers, etc. It's more expertise than most people have,
including me. The servers I run are internal and don't have to deal with
the rigors of the internet, and only serve a couple of people. Setting
up multiple domains under an Apache server is black magic to me, for
instance. And then there's backups, and
what-do-I-do-if-the-server-fails, etc. Sheesh.

Paul

-- 
Paul M. Foster

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Robert Cummings

Paul M Foster wrote:

On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:


On Fri, Sep 4, 2009 at 12:02 PM, Lupus
Michaelis wrote:



if you're on shared hosting (but then again I am of the opinion
that those who choose to run with shared hosting dig their own graves
in more ways than one).


Any time you or someone else would like to backstop me in setting up a
dedicated server on rackspace or somewhere else, for free or really
cheap, you let me know! Otherwise, those of us with less than complete
expertise in server setup are stuck with shared hosting. ;-}


For a few hundred bucks a year you can get a VPS (Virtual Private 
Server), which gives you root access. once you get a couple of clients 
on it, it will pay for itself year after year. If you have clients that 
don't use much resources, you can put a few on and even make a profit on 
the hosting alone.


Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Paul M Foster
On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:

> On Fri, Sep 4, 2009 at 12:02 PM, Lupus
> Michaelis wrote:

> if you're on shared hosting (but then again I am of the opinion
> that those who choose to run with shared hosting dig their own graves
> in more ways than one).

Any time you or someone else would like to backstop me in setting up a
dedicated server on rackspace or somewhere else, for free or really
cheap, you let me know! Otherwise, those of us with less than complete
expertise in server setup are stuck with shared hosting. ;-}

Paul

-- 
Paul M. Foster

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Tommy Pham
- Original Message 
> From: Robert Cummings 
> To: Eddie Drapkin 
> Cc: Lupus Michaelis ; php-general@lists.php.net
> Sent: Friday, September 4, 2009 1:36:08 PM
> Subject: Re: [PHP] Searching on AlphaNumeric Content Only
> 
> Eddie Drapkin wrote:
> > On Fri, Sep 4, 2009 at 12:02 PM, Lupus
> > Michaeliswrote:
> >> Ashley Sheridan a écrit :
> >>
> >>> You'll have far greater performance issues if you retrieve all those
> >>> records and attempt to do the same thing inside of PHP...
> >>  It's why I speak about « avoiding » and not « bannishing ». Like can be
> >> usefull, I used to use it. But it is not the a good answer to all problems.
> >> The problem with like operator is it can't use the index (or in a very
> >> limited way). So I try to warn about it.
> >>
> >>  So said, I never submit an all-retrieving method. I know it isn't the
> >> solution too.
> >>
> >> --
> >> Mickaël Wolff aka Lupus Michaelis
> >> http://lupusmic.org
> >>
> >> --
> >> PHP General Mailing List (http://www.php.net/)
> >> To unsubscribe, visit: http://www.php.net/unsub.php
> >>
> >>
> > 
> > So far, in this thread, there've been a few solutions:
> > 1) LIKE in SQL.
> > 2) REGEXP in SQL.
> > 3) PCRE in PHP
> > 4) Other fetch all methods in PHP.
> > 
> > The one thing that I'm seeing as a consistent agreement is that the
> > performance hit of whichever of the aforementioned measures is going
> > to be enough to be considering something else.   I briefly mentioned -
> > I apologize for the brevity of that email because I was in a hurry -
> > that a legitimate full text search engine is the right solution to
> > this problem.  The only problem with deploying a full text search
> > engine is going to be the difficulty in the deployment and perhaps
> > issues if you're on shared hosting (but then again I am of the opinion
> > that those who choose to run with shared hosting dig their own graves
> > in more ways than one).
> > 
> > What a full text search engine gives you is flexibility in your
> > searches, such that the initial question, when I read it, I thought
> > "Oh, someone will tell him to use Sphinx or Solr as both have special
> > filters for word seperation and would handle this without any special
> > instruction."  Instead, this is never even brought up!
> > 
> > Why was using a full text search engine to do this sort of thing - not
> > to mention the other benefits that it would bring (responsiveness and
> > flexibility in searching, speed, decreased use of MySQL, etc. etc.) -
> > rejected so offhandedly?  I can't actually think of a better way to do
> > this without requiring a whole heap of overhead, either processing or
> > programming.
> 
> I've just sort of stopped in on this thread... but why isn't MySQL's 
> FULLTEXT engine being considered?

If I remember correctly, it's only on available on MyISAM table type.  I don't
think the OP mentioned that he's using MyISAM or MySQL for that matter.

> 
> Cheers,
> Rob.
> -- 
> http://www.interjinn.com
> Application and Templating Framework for PHP
> 
> -- 
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Robert Cummings

Eddie Drapkin wrote:

On Fri, Sep 4, 2009 at 12:02 PM, Lupus
Michaelis wrote:

Ashley Sheridan a écrit :


You'll have far greater performance issues if you retrieve all those
records and attempt to do the same thing inside of PHP...

 It's why I speak about « avoiding » and not « bannishing ». Like can be
usefull, I used to use it. But it is not the a good answer to all problems.
The problem with like operator is it can't use the index (or in a very
limited way). So I try to warn about it.

 So said, I never submit an all-retrieving method. I know it isn't the
solution too.

--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




So far, in this thread, there've been a few solutions:
1) LIKE in SQL.
2) REGEXP in SQL.
3) PCRE in PHP
4) Other fetch all methods in PHP.

The one thing that I'm seeing as a consistent agreement is that the
performance hit of whichever of the aforementioned measures is going
to be enough to be considering something else.   I briefly mentioned -
I apologize for the brevity of that email because I was in a hurry -
that a legitimate full text search engine is the right solution to
this problem.  The only problem with deploying a full text search
engine is going to be the difficulty in the deployment and perhaps
issues if you're on shared hosting (but then again I am of the opinion
that those who choose to run with shared hosting dig their own graves
in more ways than one).

What a full text search engine gives you is flexibility in your
searches, such that the initial question, when I read it, I thought
"Oh, someone will tell him to use Sphinx or Solr as both have special
filters for word seperation and would handle this without any special
instruction."  Instead, this is never even brought up!

Why was using a full text search engine to do this sort of thing - not
to mention the other benefits that it would bring (responsiveness and
flexibility in searching, speed, decreased use of MySQL, etc. etc.) -
rejected so offhandedly?  I can't actually think of a better way to do
this without requiring a whole heap of overhead, either processing or
programming.


I've just sort of stopped in on this thread... but why isn't MySQL's 
FULLTEXT engine being considered?


Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Eddie Drapkin
On Fri, Sep 4, 2009 at 12:02 PM, Lupus
Michaelis wrote:
> Ashley Sheridan a écrit :
>
>> You'll have far greater performance issues if you retrieve all those
>> records and attempt to do the same thing inside of PHP...
>
>  It's why I speak about « avoiding » and not « bannishing ». Like can be
> usefull, I used to use it. But it is not the a good answer to all problems.
> The problem with like operator is it can't use the index (or in a very
> limited way). So I try to warn about it.
>
>  So said, I never submit an all-retrieving method. I know it isn't the
> solution too.
>
> --
> Mickaël Wolff aka Lupus Michaelis
> http://lupusmic.org
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

So far, in this thread, there've been a few solutions:
1) LIKE in SQL.
2) REGEXP in SQL.
3) PCRE in PHP
4) Other fetch all methods in PHP.

The one thing that I'm seeing as a consistent agreement is that the
performance hit of whichever of the aforementioned measures is going
to be enough to be considering something else.   I briefly mentioned -
I apologize for the brevity of that email because I was in a hurry -
that a legitimate full text search engine is the right solution to
this problem.  The only problem with deploying a full text search
engine is going to be the difficulty in the deployment and perhaps
issues if you're on shared hosting (but then again I am of the opinion
that those who choose to run with shared hosting dig their own graves
in more ways than one).

What a full text search engine gives you is flexibility in your
searches, such that the initial question, when I read it, I thought
"Oh, someone will tell him to use Sphinx or Solr as both have special
filters for word seperation and would handle this without any special
instruction."  Instead, this is never even brought up!

Why was using a full text search engine to do this sort of thing - not
to mention the other benefits that it would bring (responsiveness and
flexibility in searching, speed, decreased use of MySQL, etc. etc.) -
rejected so offhandedly?  I can't actually think of a better way to do
this without requiring a whole heap of overhead, either processing or
programming.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Lupus Michaelis

Ashley Sheridan a écrit :


You'll have far greater performance issues if you retrieve all those
records and attempt to do the same thing inside of PHP...


  It's why I speak about « avoiding » and not « bannishing ». Like can 
be usefull, I used to use it. But it is not the a good answer to all 
problems. The problem with like operator is it can't use the index (or 
in a very limited way). So I try to warn about it.


  So said, I never submit an all-retrieving method. I know it isn't the 
solution too.


--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Ashley Sheridan
On Fri, 2009-09-04 at 17:00 +0200, Lupus Michaelis wrote:
> Ashley Sheridan a écrit :
> > What's wrong with using the wildcards that are built into most SQL
> > variants?
> 
>Performance issues. Like is an operator to avoid when possible.
> 
> -- 
> Mickaël Wolff aka Lupus Michaelis
> http://lupusmic.org
> 
You'll have far greater performance issues if you retrieve all those
records and attempt to do the same thing inside of PHP...

Thanks,
Ash
http://www.ashleysheridan.co.uk




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Lupus Michaelis

Ashley Sheridan a écrit :

What's wrong with using the wildcards that are built into most SQL
variants?


  Performance issues. Like is an operator to avoid when possible.

--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Ben Dunlap
>
> stripping, stemming, spelling corrections ?
>  ... uhm, that's probably why they invented regular expressions, isn't it?
>
> As I said, at the end of the day, this will be a manual slow, potentially 
> wrong implementation of what we already have and use on daily basis.

If you've got a regular-expression-based method in mind that simply
nails the OP's problem, please share. I'm still not seeing how
"regular expressions" is a sufficient answer to the OP's problem,
which is basically fuzzy search.

My sense is that regular expressions are for situations where you
basically know just what you're searching for, but don't really know
where it falls in your search space.

The OP, on the other hand, is building a system where he won't know
just what he's searching for -- all he'll know is that his search key
is "sort of like" the thing he actually needs to find.

You might be able to squeeze this problem, or at least some part of
it, into a regex-based solution, but I don't think it's a natural fit.

Ben

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Andrea Giammarchi

stripping, stemming, spelling corrections ?
 ... uhm, that's probably why they invented regular expressions, isn't it?

As I said, at the end of the day, this will be a manual slow, potentially wrong 
implementation of what we already have and use on daily basis.

But obviously, everybody is free to create his own problems, no doubts about 
that.

Regards

> Has anyone considered deploying an actual search engine (Solr, Sphinx,
> etc.), as they will take care of the stripping, stemming, spelling
> corrections, etc?


_
With Windows Live, you can organize, edit, and share your photos.
http://www.microsoft.com/middleeast/windows/windowslive/products/photo-gallery-edit.aspx

Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Paul M Foster
On Thu, Sep 03, 2009 at 12:12:40PM -0700, sono...@fannullone.us wrote:

>   Thanks to everyone who has responded.  After reading everyone's
> response, I think I have a very simple way to solve my "problem".
>
>   Using my original example, if someone wants to find item #
> 4D-2448-7PS, no matter what they type in, I'll take the input, strip
> out all non-alphanumeric characters to make it 4D24487PS, add the
> wildcard character between each of the remaining characters like so,
> 4*D*2*4*4*8*7*P*S, and then do the search.

Your expression, if used to directly search in your SQL table, won't
work. The '*' character isn't a valid wildcard for SQL. In PostgreSQL,
the wildcard for any number of characters is '%', and for a single
character is '_'. I don't know that MySQL understands this same
convention. And who knows about Oracle.

As others have mentioned, it would be ideal (though not very
"normalized") to create a new table column which contains the
alphanumerics without the punctuation characters ('-'). In nearly any
SQL dialect, you could do a simple SELECT using LIKE to find your item,
if you're searching on this extra field.

If you want do the searching in PHP, then it becomes more complicated.
You'll have to strip out the dashes from the user input, and then query
all the keys from your table, and test them using a regular expression.
As mentioned before, this is time-consuming for a large table.

Here's something else to consider: Could there ever be two items which
only differ by the placement of their dashes? Like 4D-2448-7PS versus
4D2-44-87PS? If not, then you should store the item number without
punctuation, and use that as the primary key on your table. Have an
"extra" field which shows the item number with dashes. You can use this
extra field in printing inventory labels or whatever (I don't recall the
context of your original post).

Paul

-- 
Paul M. Foster

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Tommy Pham
- Original Message 
> From: "sono...@fannullone.us" 
> To: PHP General List 
> Sent: Thursday, September 3, 2009 12:12:40 PM
> Subject: Re: [PHP] Searching on AlphaNumeric Content Only
> 
> Thanks to everyone who has responded.  After reading everyone's response, 
> I 
> think I have a very simple way to solve my "problem".
> 
> Using my original example, if someone wants to find item # 4D-2448-7PS, 
> no 
> matter what they type in, I'll take the input, strip out all non-alphanumeric 
> characters to make it 4D24487PS, add the wildcard character between each of 
> the 
> remaining characters like so, 4*D*2*4*4*8*7*P*S, and then do the search.

The correct wildcard syntax to work in any DB (Oracle, MySQL, MSSQL, etc), is % 
and not * if I remember correctly.  Searching like this is ok but won't be 
efficient when you have a lot of rows.  As for external file processing txt, 
csv, etc... I recommend you create a separate mechanism for it since each 
storage medium is meant for different purposes.  txt (both delimited and fix 
formatted) and csv are usually meant for importing/exporting between various 
RDBMS types and different companies.  They're not mean for fast searching of 
data.  I suggest you think about the amount of the data you have to deal with 
1st and how often will the search be done on that data.  It's probably easier 
and faster just to import the ascii into db and do you search on db if you have 
to work with any ascii.

As for adding another field to the db, perhaps your project just started?  If 
so, wouldn't it be better to do it with the future in mind so later you won't 
have to go back and redesign the db and modify the codes because now you have 
over 100k rows to search and the search occurs just about every other hits?  
That time you now have could be used for code optimizing for better 
performance, add more features/functionalities to the site, etc... :)  Trust 
me, searching the db table with over 200k rows and return the results with 
multi-table joins based 1 criteria isn't fun.  Keep in mind that you shouldn't 
keep the users waiting more than 5 seconds.  Only exception to that rule is 
data mining where you'll have millions of rows to work with ;)  Then it's no 
longer your problem.  It's the DBA :D

Regards,
Tommy

> 
> Still being new at this, it seems to be the simplest approach, or is my 
> thinking flawed?  This also keeps me from having to add another field in the 
> db 
> to search on.
> 
> BTW, this solution needs to work with any db, even ASCII files, so it has 
> to 
> happen in PHP.
> 
> Thanks again,
> Frank
> 
> --PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Eddie Drapkin
On Thu, Sep 3, 2009 at 3:17 PM, Ashley Sheridan 
wrote:
> On Thu, 2009-09-03 at 12:12 -0700, sono...@fannullone.us wrote:
>>       Thanks to everyone who has responded.  After reading everyone's
>> response, I think I have a very simple way to solve my "problem".
>>
>>       Using my original example, if someone wants to find item #
>> 4D-2448-7PS, no matter what they type in, I'll take the input, strip
>> out all non-alphanumeric characters to make it 4D24487PS, add the
>> wildcard character between each of the remaining characters like so,
>> 4*D*2*4*4*8*7*P*S, and then do the search.
>>
>>       Still being new at this, it seems to be the simplest approach, or is
>> my thinking flawed?  This also keeps me from having to add another
>> field in the db to search on.
>>
>>       BTW, this solution needs to work with any db, even ASCII files, so it
>> has to happen in PHP.
>>
>> Thanks again,
>> Frank
>>
> For speed you might want to consider an extra field in the DB in the
> future. If the database gets larger, or your query needs to join several
> tables together, then things will take a noticeable speed hit. I had a
> similar issue myself where I had to search for names based on
> mis-spellings of them. In the end I searched with metaphone tags on an
> extra field in the DB set up for that purpose, but it was the only way
> to do it that didn't affect the speed of the site.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

Has anyone considered deploying an actual search engine (Solr, Sphinx,
etc.), as they will take care of the stripping, stemming, spelling
corrections, etc?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Ashley Sheridan
On Thu, 2009-09-03 at 12:12 -0700, sono...@fannullone.us wrote:
>   Thanks to everyone who has responded.  After reading everyone's  
> response, I think I have a very simple way to solve my "problem".
> 
>   Using my original example, if someone wants to find item #  
> 4D-2448-7PS, no matter what they type in, I'll take the input, strip  
> out all non-alphanumeric characters to make it 4D24487PS, add the  
> wildcard character between each of the remaining characters like so,  
> 4*D*2*4*4*8*7*P*S, and then do the search.
> 
>   Still being new at this, it seems to be the simplest approach, or is  
> my thinking flawed?  This also keeps me from having to add another  
> field in the db to search on.
> 
>   BTW, this solution needs to work with any db, even ASCII files, so it  
> has to happen in PHP.
> 
> Thanks again,
> Frank
> 
For speed you might want to consider an extra field in the DB in the
future. If the database gets larger, or your query needs to join several
tables together, then things will take a noticeable speed hit. I had a
similar issue myself where I had to search for names based on
mis-spellings of them. In the end I searched with metaphone tags on an
extra field in the DB set up for that purpose, but it was the only way
to do it that didn't affect the speed of the site.

Thanks,
Ash
http://www.ashleysheridan.co.uk




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread sono-io
	Thanks to everyone who has responded.  After reading everyone's  
response, I think I have a very simple way to solve my "problem".


	Using my original example, if someone wants to find item #  
4D-2448-7PS, no matter what they type in, I'll take the input, strip  
out all non-alphanumeric characters to make it 4D24487PS, add the  
wildcard character between each of the remaining characters like so,  
4*D*2*4*4*8*7*P*S, and then do the search.


	Still being new at this, it seems to be the simplest approach, or is  
my thinking flawed?  This also keeps me from having to add another  
field in the db to search on.


	BTW, this solution needs to work with any db, even ASCII files, so it  
has to happen in PHP.


Thanks again,
Frank

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Andrea Giammarchi


> Even if the REGEXP has to change with every query?

Ben, it does not matter, this is not a PHP problem but a DB 
structure/select/insert/update problem.
Whatever REGEXP you use, a REGEXP is what you need to solve this problem, 
certainly not a PHP loop over each row with operations for each rows.

These things are OK if you do not know REGEXP or REGEXP MySQL syntax, but in 
this case you should ask for the correct REGEXP rather than talk about 
performances, obviously slower outside MySQL and via a runtime interpreted 
language as PHP is, or other solutions which aim is to end up with something 
that just emulate a select with REGEXP.

Did you get my point? Finally, when I say "you" I mean generally speaking :)

Regards

_
With Windows Live, you can organize, edit, and share your photos.
http://www.microsoft.com/middleeast/windows/windowslive/products/photo-gallery-edit.aspx

Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Ben Dunlap
> What's wrong with using the wildcards that are built into most SQL
> variants?
>
> SELECT * FROM table WHERE item_id LIKE '%#abcdef'
>
> Will select all records where the item_id field ends in '#abcdef'

That works if you know the user is always going to enter the last 7
characters of the product id, but that's not how the OP characterized
the problem. The OP talked about search strings where multiple
characters had been omitted from different parts of the product id.

Ben

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Ben Dunlap
> Excuse me? Somebody suggested a PHP loop to solve a query problem and you are 
> saying that REGEXP should not be used?
> MySQL caches queries and 100 SELECT with a REGEXP will cost zero after the 
> first one if nothing changed inside the table.

Even if the REGEXP has to change with every query?

Performance aside, I think REGEXP() could be used here, but not in the
way you've suggested. As the OP has described his table, your regex
("^[a-zA-Z0-9]+$") won't match any rows, because all of his product
IDs have non-alphanumeric characters in them.

Suppose this table:

pk  | prod_id
1   | 07-ABCD-98
2   | 98-ZCXQ-21

And now suppose the OP's scenario, where a user tries to search on
product id, but enters "07ABCD98".

If the aim is to use REGEXP() to return row 1, I suppose you could
intersperse the search string with ".?" sequences and end up with this
query:

SELECT * FROM table WHERE prod_id REGEXP '^0.?7.?A.?B.?C.?D.?9.?8$'

I think just stripping the alphanumeric characters would end up being
more flexible, though.

-Ben

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Andrea Giammarchi


> Indeed you could do it via a regexp, but that uses up quite some memory. 
> Every time you do a SELECT. You can simply add a table column with the 
> stripped value and let the table update itself (with an ON UPDATE ON 
> INSERT trigger, which takes the input value for the itemID and strips it 
> once).
> 
> When doing this on inputting the value into the database, you save 
> yourself the pain (and performance) of doing it on every SELECT-query.

Excuse me? Somebody suggested a PHP loop to solve a query problem and you are 
saying that REGEXP should not be used?
MySQL caches queries and 100 SELECT with a REGEXP will cost zero after the 
first one if nothing changed inside the table.

At the same time an internal REGEXP is faster than everything else has to move 
out and be parsed after via, probably, the same REGEXP engine. Try some bench.

This problem, imho, is a non-problem, at least not a PHP problem.

How MySQL optimizes internally REGEXPs is not PHP problem as well.

It's like to create a loop to read byte after byte because file_get_contents 
could be memory greedy (if you do that with 1 Gb of file you are you doing 
wrong in any case, logs need to be split as example) or avoid MATCH AGAINST in 
query if we have too many rows because of performances problem (table could be 
slipt as well to optimize performances) ... and these practices to avoid native 
solutions are a bit hilarious, imho.

Regards

_
Drag n’ drop—Get easy photo sharing with Windows Live™ Photos.

http://www.microsoft.com/windows/windowslive/products/photos.aspx

RE: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Andrea Giammarchi

Which DB?
If it is MySQL, as example, you can simply use REGEXP syntax "^[a-zA-Z0-9]+$" 
via SELECT

Regards

> From: sono...@fannullone.us
> To: php-general@lists.php.net
> Date: Wed, 2 Sep 2009 20:47:15 -0700
> Subject: [PHP] Searching on AlphaNumeric Content Only
> 
>   Is there is a way to search only for the alphanumeric content of  
> field in a db?  I have an itemID field that contains item #'s that  
> include dashes, forward slashes, etc, and I want people to be able to  
> search for an item # even if they don't enter the punctuation exactly.
> 
>   Here's an example:  let's say there is an itemID of 4D-2448-7PS but  
> someone omits the dashes and searches on 4D24487PS.  Is it possible in  
> PHP to have the find be successful, even if the search criteria  
> doesn't exactly match what's stored in the field?
> 
>   If this is possible, I'd appreciate it if someone could just point me  
> in the right direction so I can read up on it.
> 
> Thanks,
> Frank
> 
> -- 
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 

_
Drag n’ drop—Get easy photo sharing with Windows Live™ Photos.

http://www.microsoft.com/windows/windowslive/products/photos.aspx

Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Ashley Sheridan
On Wed, 2009-09-02 at 21:30 -0700, Ben Dunlap wrote:
> >Is there is a way to search only for the alphanumeric content of
> > field in a db?  I have an itemID field that contains item #'s that include
> > dashes, forward slashes, etc, and I want people to be able to search for an
> > item # even if they don't enter the punctuation exactly.
> 
> Not sure if there's anything specifically PHP-ish that will help you
> here, but I would be inclined to start by storing a stripped-down
> version of the item ID (alphanumeric characters only) in a separate
> column in the database table.
> 
> Then, when a user enters some search data, I would remove
> non-alphanumeric characters, if any, from the user's input, and then
> search the stripped column with this normalized version of the input.
> 
> If you want even fuzzier matching (inadvertent transpositions or an
> omitted character or two OK, for example), you might read about
> Levenshtein distance:
> 
> http://en.wikipedia.org/wiki/Levenshtein_distance
> 
> PHP has a levenshtein function but you'll have to figure out a way to
> use it efficiently with your data set. Or, if Levenshtein isn't quite
> right for your needs, the article above might at least point you in a
> useful direction.
> 
> Ben
> 

What's wrong with using the wildcards that are built into most SQL
variants?

SELECT * FROM table WHERE item_id LIKE '%#abcdef'

Will select all records where the item_id field ends in '#abcdef'

Thanks,
Ash
http://www.ashleysheridan.co.uk




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-02 Thread Ben Dunlap
>        Is there is a way to search only for the alphanumeric content of
> field in a db?  I have an itemID field that contains item #'s that include
> dashes, forward slashes, etc, and I want people to be able to search for an
> item # even if they don't enter the punctuation exactly.

Not sure if there's anything specifically PHP-ish that will help you
here, but I would be inclined to start by storing a stripped-down
version of the item ID (alphanumeric characters only) in a separate
column in the database table.

Then, when a user enters some search data, I would remove
non-alphanumeric characters, if any, from the user's input, and then
search the stripped column with this normalized version of the input.

If you want even fuzzier matching (inadvertent transpositions or an
omitted character or two OK, for example), you might read about
Levenshtein distance:

http://en.wikipedia.org/wiki/Levenshtein_distance

PHP has a levenshtein function but you'll have to figure out a way to
use it efficiently with your data set. Or, if Levenshtein isn't quite
right for your needs, the article above might at least point you in a
useful direction.

Ben

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-02 Thread German Geek
Hi,

It's definitely possible to do when you do it in PHP, but not sure about on
the database side. You could read all records into memory and then iterate
over it with something like:

$toSearch = "4D24487PS"
$charsToIgnore = array('-','+',...);

foreach ($items as $k=>$item) {
  $itemVal = str_replace($charsToIgnore, '', $item);
  if (strcmp(str_replace($charsToIgnore, '', $toSearch), $itemVal) == 0) {
$return = $item;
break;
  }
}

This however might use a lot of memory, but if your DB is a manageable size
it should be ok. You can probably optimise it by iterating over a db result
set instead of reading everything into an array.

Cheers,
Tim
++Tim Hinnerk Heuer++

http://www.ihostnz.com

2009/9/3 

>Is there is a way to search only for the alphanumeric content of
> field in a db?  I have an itemID field that contains item #'s that include
> dashes, forward slashes, etc, and I want people to be able to search for an
> item # even if they don't enter the punctuation exactly.
>
>Here's an example:  let's say there is an itemID of 4D-2448-7PS but
> someone omits the dashes and searches on 4D24487PS.  Is it possible in PHP
> to have the find be successful, even if the search criteria doesn't exactly
> match what's stored in the field?
>
>If this is possible, I'd appreciate it if someone could just point
> me in the right direction so I can read up on it.
>
> Thanks,
> Frank
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


[PHP] Searching on AlphaNumeric Content Only

2009-09-02 Thread sono-io
	Is there is a way to search only for the alphanumeric content of  
field in a db?  I have an itemID field that contains item #'s that  
include dashes, forward slashes, etc, and I want people to be able to  
search for an item # even if they don't enter the punctuation exactly.


	Here's an example:  let's say there is an itemID of 4D-2448-7PS but  
someone omits the dashes and searches on 4D24487PS.  Is it possible in  
PHP to have the find be successful, even if the search criteria  
doesn't exactly match what's stored in the field?


	If this is possible, I'd appreciate it if someone could just point me  
in the right direction so I can read up on it.


Thanks,
Frank

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php