Thanks.

On 6/1/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> 
> 
> Scott Klarenbach <[EMAIL PROTECTED]> wrote on 05/27/2005 06:57:19
> PM:
> 
> 
> > -- How is 'xFxIxExLxDx1x' easier to search than 'FIELD1'?
> > 
> > It's not easier.  It is; however, accurate for the purpose at hand. 
> > FIELD1 isn't completely inclusive...it would miss FIELD-1...
> > 
> > -- However, It seems to me that this kind of data manipulation
> > (cleanup) needs to happen BEFORE the data enters the databse.
> > 
> > Sometimes FIELD-1 is the ACTUAL data, with no erroneus
> > characters...and sometimes it is FIELD1 with an erroneous (or
> > unwanted) - character before the 1)...so cleaning the data would
> > actually be corrupting some of it.
> > 
> > I need to keep the data in it's original form, but also allow for
> > querying without worrying about the special characters inside the
> > column.
> > 
> > FYI, these are part numbers off of electronic components, many of them
> > coming from China...so, a Cisco part may have an MPN of RX321, or
> > RX321-TR...either is valid.  Now, the corresponding Chinese part
> > number for the first one, may come back as RX32-1...which is out of my
> > control.
> > 
> > Cleaning the data would be the wrong approach, because it would
> > actually invalidate the second Cisco part number, which MEANT to
> > include the special characters.
> > 
> > For this reason, the user wants to be able to search for 'RX321' and
> > 'RX321TR' respectively, and not worry about whether the data is
> > erroneous or valid; just to basically ignore all the characters and
> > let a human decide what they want.
> > 
> > If you have a more elegant solution, I'm all ears :-).
> > 
> > On 5/27/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > > 
> > > 
> > > Scott Klarenbach <[EMAIL PROTECTED]> wrote on 05/27/2005
> 04:38:40
> > > PM:
> > > 
> > > 
> > > > Ya, it is a little too specific...here's why I need it.
> > > > 
> > > > I have a client that wants to search for part numbers in his DB.  The
> > > > problem is, they come into his DB from external sources, with all
> > > > sorts of special characters in them...
> > > > 
> > > > So, he has fields like 
> > > > field_one!,
> > > > fi--eld   2,
> > > > @fi#eld__3xxx
> > > > 
> > > >  etc
> > > > 
> > > > but, he wants to do a search for 'fieldone' and return the first one,
> > > > 'field2' returns the second, etc...basically disregard all non-alphas
> > > > padding every character in the search string.
> > > > 
> > > > On 5/27/05, Eric Bergen <[EMAIL PROTECTED]> wrote:
> > > > > I'm working on a set of UDFs for preg functions.
> > > > > 
> > > > > 
> > > > > [EMAIL PROTECTED] wrote:
> > > > > 
> > > > > >I have a hard time figuring out when you would use such a function.
> I
> > > do
> > > > > >not believe you will be able to duplicate this behavior without
> > > > > >constructing your own UDF or by writing a stored procedure. BTW,
> why
> > > *do*
> > > > > >you want this function?
> > > > > >
> > > > > >Shawn Green
> > > > > >Database Administrator
> > > > > >Unimin Corporation - Spruce Pine
> > > > > >
> > > > > >Scott Klarenbach <[EMAIL PROTECTED]> wrote on 05/27/2005
> > > 01:30:35
> > > > > >PM:
> > > > > >
> > > > > >
> > > > > >
> > > > > >>I'm trying to replicate this PHP behavior in a MySQL stored
> procedure.
> > > > > >> The purpose is to pad every character of the string with a pad
> > > > > >>character.  For example, if the pad character is 'x' and thestring
> is
> > > > > >>'STRING', the result is 'xSxTxRxIxNxGx'.
> > > > > >>
> > > > > >>Here is the PHP code if it helps.  I'd like to use a regular
> > > > > >>expression to replace, but I guess I could loop through the string
> > > > > >>char by char and build a new one, it's just less elegant.  Thanks
> in
> > > > > >>advance.
> > > > > >>
> > > > > >>PHP:
> > > > > >>-------------
> > > > > >>$regPattern =  implode('x', preg_split('//', "STRING", -1,
> > > > > >>PREG_SPLIT_NO_EMPTY));
> > > > > >>
> > > 
> > > Thank you very much. I find this whole "padding" process very
> > > counterintuitive. I have a few minor questions, if you don't mind. How
> is
> > > 'xFxIxExLxDx1x' easier to search than 'FIELD1'?  Would you, could you
> please
> > > explain the theory behind why and when this kind of padding should be
> done?
> > > What problem does it solve and how is it a solution to that problem? 
> This
> > > is completely baffling to me and I thought I had seen a lot of weird
> data
> > > before :-) 
> > > 
> > > However, It seems to me that this kind of data manipulation (cleanup)
> needs
> > > to happen BEFORE the data enters the databse. What data import
> tool/process
> > > is your client using? Can you not change the import process to scrub the
> > > data and does it not have a better facility to interleave padding into a
> > > string than a MySQL stored procedure or UDF? 
> > > 
> > > Thanks for you patience! 
> > > 
> > > 
> > > Shawn Green
> > > Database Administrator
> > > Unimin Corporation - Spruce Pine 
> > > 
> > >
> 
> I would suggest the following technique. Keep your original data (trimmed of
> leading and trailing whitespace characters) in one field and a "cleansed"
> version of the same information in a second field. By cleansing you should
> eliminate all non-alphanumerics from your search string. 
> 
> I have some direct insight to searching similar but different product codes
> (same product/different codes & different products/same codes). My wife
> works for a company that consolidates medical inventories for major hospital
> and health care organizations and there is usually more than one vendor to
> the same "product". This leads to the exact same inventory code confusion
> you describe. In the majority of their data, the vendor will either prepend
> or append a manufacturer's id code or their own id code to the "normal" part
> number. Part of my wife's job is to filter through the confusion and match
> identical products to each other (based on vendor's ID, manufacturer's id,
> product codes, and descriptions) so that her clients no longer need to
> purchase the exact same thing from multiple sources (at a higher cost per
> unit) when they could be buying in bulk from a single source saving millions
> of dollars. 
> 
> That's part of how they do it. I figure it may work for you too. 
> 
> 
> Shawn Green
> Database Administrator
> Unimin Corporation - Spruce Pine 
> 
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to