RE: natural sort via substrings

2008-06-13 Thread Boyd, Todd M.
 -Original Message-
 From: Emily Heureux [mailto:[EMAIL PROTECTED]
 Sent: Thursday, June 12, 2008 8:25 PM
 To: mysql@lists.mysql.com
 Subject: natural sort via substrings
 
 Hi, I am attempting to do a natural sort from within mysql, if
 possible.
 So, for example, jane2 would come before jane10, and normal strings
 would
 still sort as expected.  I found some solutions, like using length for
 the
 numerical part, but that only works if the strings are the same
length.
 Ideally, I would like to use substring_index, but stick a regexp in as
 the
 delimiter.  So far, it seems you cannot do this.  Does anyone know how
 to
 put a regexp as the delimiter in substring_index?  For example, I want
 to do
 something like this:
 
 .order by substring_index(name, 'regexp [0-9]+', 1);
 
 Is this possible?

Forgive me if I am incorrect, but wouldn't jane2 already be listed
before jane10 if you just ORDER BY fieldname ASC? I suppose jane2
and jane20 would wind up next to each other if this were the case, but
can you not zero-fill your values (i.e., jane02) to prevent this from
happening?

Sorry if my suggestion falls short of the mark, but the conditions for
your test case were vague at best. :) Can you not zero-fill? Are you
sorting by the entire field's value, or just a portion of it? Wouldn't
substring_index() sort all jane## entries arbitrarily, since your
(theoretical) example returns everything to the left of the first match
(but not including the match)?


Todd Boyd
Web Programmer




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: natural sort via substrings

2008-06-13 Thread Boyd, Todd M.
 -Original Message-
 From: Boyd, Todd M. [mailto:[EMAIL PROTECTED]
 Sent: Friday, June 13, 2008 10:33 AM
 To: Emily Heureux; mysql@lists.mysql.com
 Subject: RE: natural sort via substrings

---8--- snip

 Forgive me if I am incorrect, but wouldn't jane2 already be listed
 before jane10 if you just ORDER BY fieldname ASC? I suppose
jane2

Man, what a morning. After reading what I wrote, it's obvious to me that
this should not happen (jane2 coming before jane10, I mean.. 1  2).
Maybe sort by length first, then sub-sort by alpha? As far as regex
goes, I know of no way to use it inside the MySQL function
substring_index().


Todd Boyd
Web Programmer




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: natural sort via substrings

2008-06-13 Thread Emily Heureux
Hi, I am sorry for being so vague.  The values are not as simple as jane20,
jane10, or jane2.  There are names like, jane-2, alex 3, alex4, and just
'jane', etc.  ORDER BY sorts by either numerical OR string, not both, as far
as I can tell.  I'll have to think about zero-fill, but I doubt that will
work given the lack of standards for the names I am getting (protein names).
I need a sort that works like the way a human would sort.

As for substring_index(), I was using that because the names I am dealing
with often have a distinguishing number at the end, so I would like to just
grab all but the number(s), and then grab just the number(s) (So, actually,
I am using substring_index() twice).  If I can separate out the numbers at
the end from the rest of the string with a regexp delimiter, the problem is
solved.  I just don't know if that can be done from within ORDER BY.

Thanks,
Emily
 -Original Message-
 From: Boyd, Todd M. [mailto:[EMAIL PROTECTED]
 Sent: Friday, June 13, 2008 8:33 AM
 To: Emily Heureux; mysql@lists.mysql.com
 Subject: RE: natural sort via substrings
 
  -Original Message-
  From: Emily Heureux [mailto:[EMAIL PROTECTED]
  Sent: Thursday, June 12, 2008 8:25 PM
  To: mysql@lists.mysql.com
  Subject: natural sort via substrings
 
  Hi, I am attempting to do a natural sort from within mysql, if
  possible.
  So, for example, jane2 would come before jane10, and normal strings
  would
  still sort as expected.  I found some solutions, like using length for
  the
  numerical part, but that only works if the strings are the same
 length.
  Ideally, I would like to use substring_index, but stick a regexp in as
  the
  delimiter.  So far, it seems you cannot do this.  Does anyone know how
  to
  put a regexp as the delimiter in substring_index?  For example, I want
  to do
  something like this:
 
  .order by substring_index(name, 'regexp [0-9]+', 1);
 
  Is this possible?
 
 Forgive me if I am incorrect, but wouldn't jane2 already be listed
 before jane10 if you just ORDER BY fieldname ASC? I suppose jane2
 and jane20 would wind up next to each other if this were the case, but
 can you not zero-fill your values (i.e., jane02) to prevent this from
 happening?
 
 Sorry if my suggestion falls short of the mark, but the conditions for
 your test case were vague at best. :) Can you not zero-fill? Are you
 sorting by the entire field's value, or just a portion of it? Wouldn't
 substring_index() sort all jane## entries arbitrarily, since your
 (theoretical) example returns everything to the left of the first match
 (but not including the match)?
 
 
 Todd Boyd
 Web Programmer
 
 




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: natural sort via substrings

2008-06-13 Thread Boyd, Todd M.
 -Original Message-
 From: Emily Heureux [mailto:[EMAIL PROTECTED]
 Sent: Friday, June 13, 2008 11:12 AM
 To: Boyd, Todd M.; mysql@lists.mysql.com
 Subject: RE: natural sort via substrings
 
 Hi, I am sorry for being so vague.  The values are not as simple as
 jane20,
 jane10, or jane2.  There are names like, jane-2, alex 3, alex4, and
 just
 'jane', etc.  ORDER BY sorts by either numerical OR string, not both,
 as far
 as I can tell.  I'll have to think about zero-fill, but I doubt that
 will
 work given the lack of standards for the names I am getting (protein
 names).
 I need a sort that works like the way a human would sort.
 
 As for substring_index(), I was using that because the names I am
 dealing
 with often have a distinguishing number at the end, so I would like to
 just
 grab all but the number(s), and then grab just the number(s) (So,
 actually,
 I am using substring_index() twice).  If I can separate out the
numbers
 at
 the end from the rest of the string with a regexp delimiter, the
 problem is
 solved.  I just don't know if that can be done from within ORDER BY.

---8--- snip

Emily,

From an exhaustive search of the web (including MySQL's page and
others), it appears that Regular Expression support in MySQL is limited
at best. All you can do with it thus far is determine whether or not
something matched the given expression--1 or 0 are the only results
possible, from what I can gather.

You may be forced to either nest a bunch of queries using
substring_index() and other server-side string manipulation functions,
or sort the data after queried-extraction. :(

By all means, though, do not take my information as gospel. There may
very well be a way to pull off what you're trying to do... but ORDER BY
REGEXP is not it.


Todd Boyd
Web Programmer




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: natural sort via substrings

2008-06-13 Thread John Trammell
 -Original Message-
 From: Emily Heureux [mailto:[EMAIL PROTECTED]
 Sent: Thursday, June 12, 2008 8:25 PM
 To: mysql@lists.mysql.com
 Subject: natural sort via substrings

 Hi, I am attempting to do a natural sort from within mysql, if
 possible.  So, for example, jane2 would come before jane10, and
 normal strings would still sort as expected.

I'd try adding a column (e.g. sort_as) to your table, and then populate that 
with strings suitable for sorting (e.g. jane2 would become jane0002, 
jane10 = jane0010, and so on).

One nice thing about this is that you can use an external language like Java, 
Perl, or Python to calculate your sort string.  SQL is a painful choice for 
that task in my opinion.

HTH, HAND.
J

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: natural sort via substrings

2008-06-13 Thread Boyd, Todd M.
From: Jim Lyons [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 13, 2008 12:10 PM
To: Boyd, Todd M.
Subject: Re: natural sort via substrings

What I would do is form 2 additional fields from the first, an alphanumeric 
field and a numeric field so:

jane-2   jane- 2
alex 3   alex  3  (maybe put a blank after the 'x' but be sure to 
handle it properly, trailing spaces are tricky)
alex4    alex  4
jane  jane  (maybe null, or 0 for the numeric field, depending 
on the application)

Then sort on the 2 fields.  Depending on your application, you might want to 
keep the entire name field as well as the 2 derivative fields, just accept the 
de-normalization.  You can use the entire field for display purposes.

If your table is of any size whatsoever this would be vastly preferable to 
doing all sorts of substring-ing in your where clauses.  You won't have a 
prayer of having the optimizer use an index.

Jim

On Fri, Jun 13, 2008 at 11:58 AM, Boyd, Todd M. [EMAIL PROTECTED] wrote:
 -Original Message-
 From: Emily Heureux [mailto:[EMAIL PROTECTED]
 Sent: Friday, June 13, 2008 11:12 AM
 To: Boyd, Todd M.; mysql@lists.mysql.com
 Subject: RE: natural sort via substrings

 Hi, I am sorry for being so vague.  The values are not as simple as
 jane20,
 jane10, or jane2.  There are names like, jane-2, alex 3, alex4, and
 just
 'jane', etc.  ORDER BY sorts by either numerical OR string, not both,
 as far
 as I can tell.  I'll have to think about zero-fill, but I doubt that
 will
 work given the lack of standards for the names I am getting (protein
 names).
 I need a sort that works like the way a human would sort.

 As for substring_index(), I was using that because the names I am
 dealing
 with often have a distinguishing number at the end, so I would like to
 just
 grab all but the number(s), and then grab just the number(s) (So,
 actually,
 I am using substring_index() twice).  If I can separate out the
numbers
 at
 the end from the rest of the string with a regexp delimiter, the
 problem is
 solved.  I just don't know if that can be done from within ORDER BY.

---8--- snip

Emily,

From an exhaustive search of the web (including MySQL's page and
others), it appears that Regular Expression support in MySQL is limited
at best. All you can do with it thus far is determine whether or not
something matched the given expression--1 or 0 are the only results
possible, from what I can gather.

You may be forced to either nest a bunch of queries using
substring_index() and other server-side string manipulation functions,
or sort the data after queried-extraction. :(

By all means, though, do not take my information as gospel. There may
very well be a way to pull off what you're trying to do... but ORDER BY
REGEXP is not it.



Jim,

Thank you for formulating my idea into a more easily-understandable format. :) 
It's been a long week. What I meant to say when I mentioned sort the data 
after queried-extraction was that you would need to split the data extracted 
with a query into its sort-able parts as you described. Very well put on your 
part, though.

Emily,

You might try and find a conditional string function that will find the first 
digit character in a string (or a nifty manipulation of a different string 
function that accomplishes the same objective). I understand that's sort of 
along the line of what you were trying to do in the first place, but perhaps 
this new funneling of the problem will give you some different insight into a 
possible solution. I'll scour the web and documentation with the free time I 
may or may not get today and see what I can come up with, as well.

Don't give up hope! :) This sort of data transformation is not an uncommon 
occurrence, and someone somewhere must have tried to tackle a similar 
problem--with positive results. *crosses fingers* If not, maybe someone else's 
idea for a solution can plant the seed.


Todd Boyd
Web Programmer




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: natural sort via substrings

2008-06-13 Thread Emily Heureux
Ok, thanks for all the information.  I have tried solving the problem via
perl, but I am using DBIx::Class and Catalyst, which are both relatively new
to me.  The way DBIC deals with grabbing data from the database combined
with my limited skills had me looking for a mysql solution.

You're saving me time telling me regexp in the order by clause won't work.
I won't waste any more time down that road.

Thank you again,
-Emily

 -Original Message-
 From: Boyd, Todd M. [mailto:[EMAIL PROTECTED]
 Sent: Friday, June 13, 2008 11:29 AM
 To: Jim Lyons; mysql@lists.mysql.com; Emily Heureux
 Subject: RE: natural sort via substrings
 
 From: Jim Lyons [mailto:[EMAIL PROTECTED]
 Sent: Friday, June 13, 2008 12:10 PM
 To: Boyd, Todd M.
 Subject: Re: natural sort via substrings
 
 What I would do is form 2 additional fields from the first, an
 alphanumeric field and a numeric field so:
 
 jane-2   jane- 2
 alex 3   alex  3  (maybe put a blank after the 'x' but be sure to
 handle it properly, trailing spaces are tricky)
 alex4    alex  4
 jane  jane  (maybe null, or 0 for the numeric field,
 depending on the application)
 
 Then sort on the 2 fields.  Depending on your application, you might want
 to keep the entire name field as well as the 2 derivative fields, just
 accept the de-normalization.  You can use the entire field for display
 purposes.
 
 If your table is of any size whatsoever this would be vastly preferable to
 doing all sorts of substring-ing in your where clauses.  You won't have a
 prayer of having the optimizer use an index.
 
 Jim
 
 On Fri, Jun 13, 2008 at 11:58 AM, Boyd, Todd M. [EMAIL PROTECTED] wrote:
  -Original Message-
  From: Emily Heureux [mailto:[EMAIL PROTECTED]
  Sent: Friday, June 13, 2008 11:12 AM
  To: Boyd, Todd M.; mysql@lists.mysql.com
  Subject: RE: natural sort via substrings
 
  Hi, I am sorry for being so vague.  The values are not as simple as
  jane20,
  jane10, or jane2.  There are names like, jane-2, alex 3, alex4, and
  just
  'jane', etc.  ORDER BY sorts by either numerical OR string, not both,
  as far
  as I can tell.  I'll have to think about zero-fill, but I doubt that
  will
  work given the lack of standards for the names I am getting (protein
  names).
  I need a sort that works like the way a human would sort.
 
  As for substring_index(), I was using that because the names I am
  dealing
  with often have a distinguishing number at the end, so I would like to
  just
  grab all but the number(s), and then grab just the number(s) (So,
  actually,
  I am using substring_index() twice).  If I can separate out the
 numbers
  at
  the end from the rest of the string with a regexp delimiter, the
  problem is
  solved.  I just don't know if that can be done from within ORDER BY.
 
 ---8--- snip
 
 Emily,
 
 From an exhaustive search of the web (including MySQL's page and
 others), it appears that Regular Expression support in MySQL is limited
 at best. All you can do with it thus far is determine whether or not
 something matched the given expression--1 or 0 are the only results
 possible, from what I can gather.
 
 You may be forced to either nest a bunch of queries using
 substring_index() and other server-side string manipulation functions,
 or sort the data after queried-extraction. :(
 
 By all means, though, do not take my information as gospel. There may
 very well be a way to pull off what you're trying to do... but ORDER BY
 REGEXP is not it.
 
 
 
 Jim,
 
 Thank you for formulating my idea into a more easily-understandable
 format. :) It's been a long week. What I meant to say when I mentioned
 sort the data after queried-extraction was that you would need to split
 the data extracted with a query into its sort-able parts as you described.
 Very well put on your part, though.
 
 Emily,
 
 You might try and find a conditional string function that will find the
 first digit character in a string (or a nifty manipulation of a different
 string function that accomplishes the same objective). I understand that's
 sort of along the line of what you were trying to do in the first place,
 but perhaps this new funneling of the problem will give you some different
 insight into a possible solution. I'll scour the web and documentation
 with the free time I may or may not get today and see what I can come up
 with, as well.
 
 Don't give up hope! :) This sort of data transformation is not an uncommon
 occurrence, and someone somewhere must have tried to tackle a similar
 problem--with positive results. *crosses fingers* If not, maybe someone
 else's idea for a solution can plant the seed.
 
 
 Todd Boyd
 Web Programmer
 
 




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]