Re: Postal code searching

Hal�sz S�ndor Wed, 25 Apr 2012 12:14:06 -0700

>>>> 2012/04/25 10:14 +0100, Mark Goodge >>>>
On 24/04/2012 17:24, Tompkins Neil wrote:
>How about if I want to only return postal codes that are like W1U 8JE
>not W13 0SU.
>
>Because in this example I have W1 as the postal code and W13 is the other
>postal code


No, you don't. In this example you have W1U as one outbound code and W13 as the 
other.

W1U postcodes are not a subset of W1 postcodes, any more than IP27 postcodes 
are a subset of IP2 postcodes. The fact that in W1U the district segment is in 
the form of NA rather than NN doesn't change the fact that it's an indivisible 
two-character code.

So I think the first question has to be, why do you want to get W1 as a 
particular substring from the postcode W1U 8JE?

British postcodes have a structure which is easy for humans to understand, 
although (unfortunately) rather hard to parse automatically. Essentially, every 
full postcode contains four elements:

Area code: one or two alpha characters, either A or AA
District code: one or two alphanumeric characters the first of which is always 
numeric, either N, NN or NA
Sector code: single numeric character, always N
Walk code: two alpha characters, always AA

It's customary, but not part of the formal specification, to insert whitespace 
between the District and Sector codes.

So, given the postcode WC1H 8EJ, we have:

Area: WC
District: 1H
Sector: 8
Walk: EJ

Taken together, the first two sections form the "outbound" part of the 
postcode, and the second two form the "inbound". (That is, the first two 
identify the destination sorting depot that the originating depot will send the 
post to, and the second two are used by the destination depot to make the 
actual delivery).

The reason for mentioning this is that postcodes, having a wide range of 
possible formats, are not easy to handle with simple substring searches if 
you're trying to extract outbound codes from a full postcode. It can be done 
with regular expressions, but you have to be wary of assuming that the space 
between District and Sector will always be present as, particularly if you're 
getting data from user input, it might not be.

In my own experience (which is quite extensive, as I've done a lot of work with 
systems, such as online retail, which use postcodes as a key part of the data), 
I've always found it simpler to pre-process the postcodes prior to inserting 
them into the database in order to ensure they have a consistent format (eg, 
inserting a space if none exists). That then makes it easy to select an 
outbound code, as you can use the space as a boundary. But if you want to be 
able to go further up the tree and select area codes (eg, distinguishing 
between EC, WC and W) then it's harder, as you have to account for the fact 
that some are two characters and some are only one. You can do it with a 
regular expression, taking everything prior to the first digit, but it's a lot 
easier in this case to extract the area code prior to inserting the data into 
the database and store the area code in a separate column.
<<<<<<<<
It seems to me that sector & walk taken together always make up three 
characters; therefore, blanks aside, the outbound part from a good postcode is
LEFT(pc, CHAR_LENGTH(pc)-3)
, and with REPLACE it is trivial to drop all blanks. If Neil Tompkins wanted 
only to get the outbound part, that is enough. As for the area, if it is only 
one or twain characters long, to get that this is enough:
LEFT(pc, IF(SUBSTR(pc, 2, 1) > '9', 2, 1))
. Extremely crude coding, but if the postcode is right, .... This much one can 
do within an SQL function with no regular-expression handling --and MySQL s 
regular-expression handling yields only 'yes' or 'no'-- , but, of course, if 
one wishes to verify that it is right, that is another matter. Are there any 
rules for that, or is the best recourse to get a file of good outbound codes 
from the post office?

As for the string-matching question, matching 'W1' and 'W13' against 'W13 0SU', 
one rule to consider is that the longest match is the right one. This problem 
or like is often given in SQL classes:

SELECT *
FROM pc JOIN shortpc ON LEFT(pc.c, CHAR_LENGTH(shortpc.c)) = shortpc.c
WHERE (SELECT MAX(CHAR_LENGTH(shortpc.c))
        FROM pc AS a JOIN shortpc ON LEFT(a.c, CHAR_LENGTH(shortpc.c)) = 
shortpc.c
        WHERE a.c = pc.c) = LENGTH(shortpc.c)

I hope this is not a class problem.

And after this I ask, is any outbound code a leading part of any other outbound 
code? If not, this twist is not needed. I also consider it good design if not.

I was glad to see this lesson in British postcodes, something that I never 
pursued because I had no need of it. 


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Re: Postal code searching

Reply via email to