Re: Duplicated records

Scott Hamm Tue, 05 Apr 2005 10:03:41 -0700

Well, to be more specific, I would like to list *ALL* duplicated Batch
rather than to count them.


SELECT ID, Batch, KeyDate, OperatorID, QAID, NTID
FROM QA
WHERE Batch in (duplicates);

On Apr 5, 2005 12:43 PM, Rhino <[EMAIL PROTECTED]> wrote:
> Oh, that's an entirely different problem than the one I understood from your
> original question.
> 
> Your new explanation is clearer but I'm still not sure I follow all of it.
> The formatting/wrapping of the example makes things a bit fuzzy too. Am I
> right in assuming that on the first row of the result, the OperatorID is
> 2661, the QAID is 5334 and the NTID is JulieAnt? If so, is JulieAnt the
> reviewer? Also are the QAID and the NTID different ways of saying the same
> thing, such as the fact that JulieAnt's employee number is 5334, or is it
> just a coincidence that they are the same on both rows?
> 
> I'm assuming that a Batch has two different operators if one shift starts
> the batch but the batch isn't finished by the time the shift ends so a
> second operator finishes the Batch. Will you ever have cases where a Batch
> takes so long that it takes more than two shifts to finish it? For instance,
> would you ever see a case where the company gets an order for a billion
> grapple grommets and it takes 9 shifts to finish it? I'm trying to figure
> out if you also need to worry about batches that are split over more than
> two shifts.
> 
> Does the Batch value uniquely identify a particular Batch or is it the
> combination of Batch and KeyDate?
> 
> Ok, assuming for the moment that Batch alone uniquely identifies a
> particular Batch, and that you care about Batches that are split across two
> *or more* shifts, you need a query like this to find those Batches:
> 
> select Batch, count(*) as count
> from QA
> group by Batch
> having count >= 2;
> 
> That will return the Batch numbers alone.
> 
> Then, you simply do a second Select to get the other properties of the Batch
> that you care about. That query would look very much like the one in your
> example except that you would do it like so:
> 
> SELECT ID, Batch, KeyDate, OperatorID, QAID, NTID
>  FROM QA
>  WHERE Batch in ('439584414', '123456', '999444');
> 
> In other words, you'd put a list consisting of all of the values returned
> from the one that got the Batch numbers into the IN clause.
> 
> Now, that could be a little tedious if there were a lot of 'split' batches
> and you may also run into an issue if the number of of values in the IN
> clause exceeded MySQL's limit, whatever that is. (Or maybe MySQL has a limit
> on the maximum *length* of the IN clause; if it does, that could bite you
> too.) To get around that, you could just repeat the select with the IN
> clause for subsets of the results from the first query or even just do your
> original query once for each of the values returned by the counting query.
> 
> A much better alternative would be to use a subquery but that is only an
> option if you are running MySQL 4.1 or later. That would let you combine
> both queries together into a single big query. Are you running V4.1 or
> later? If you are, I could take a stab at the combined query....
> 
> Rhino
> 
> 
> ----- Original Message -----
> From: "Scott Hamm" <[EMAIL PROTECTED]>
> To: "Rhino" <[EMAIL PROTECTED]>
> Cc: "Mysql" <mysql@lists.mysql.com>
> Sent: Tuesday, April 05, 2005 11:14 AM
> Subject: Re: Duplicated records
> 
> > Sorry for the confusion.  I don't mean a duplicated records, but to
> > find duplicated Batch where two different operators worked on a single
> > batch (one started off, then another one to finish the batch) and a
> > single reviewer to review a batch. I need a list that lists duplicated
> > Batches with different operators that worked on that batch.
> >
> > For example: (for clarification)
> > SELECT ID, Batch, KeyDate, OperatorID, QAID, NTID
> > FROM QA
> > WHERE Batch='439584414';
> >
> > ID         Batch              KeyDate                      OperatorID
> > QAID      NTID
> > 90577    439584414 2004-10-03 00:00:00 2661       5334 JulieAnt
> > 90575    439584414 2004-10-03 00:00:00 5657       5334 JulieAnt
> >
> >
> > On Apr 5, 2005 10:54 AM, Rhino <[EMAIL PROTECTED]> wrote:
> > >
> > > ----- Original Message -----
> > > From: "Scott Hamm" <[EMAIL PROTECTED]>
> > > To: "Mysql" <mysql@lists.mysql.com>
> > > Sent: Tuesday, April 05, 2005 10:05 AM
> > > Subject: Duplicated records
> > >
> > > > Here is my novice question:
> > > > I need to find duplicates in "Batch" category when I issued this
> query:
> > > >
> > > > SELECT * FROM QA WHERE Batch=439584414;
> > > >
> > > > Result:
> > > > 90577
> > > > 1 2661 5334 JulieAnt 25 5 5 439584414 2004-10-03 00:00:00 2004-10-03
> > > > 00:00:00 0 90575
> > > > 1 5657 5334 JulieAnt 25 5 5 439584414 2004-10-03 00:00:00 2004-10-03
> > > 00:00:00 0
> > > >
> > > > How do I issue a query that finds duplicated Batch number?
> > > >
> > > First of all, I think you made a mistake when putting this data into
> your
> > > note, a mistake which is compounded by the wrapping that the email
> program
> > > did. I *think* you meant to display two rows with the '90577' value at
> the
> > > end of the second row, not before the first row. This makes the example
> a
> > > bit confusing and hard to follow.
> > >
> > > The two rows you provide in your example are not duplicates of each
> other.
> > > Even assuming that the '90577' actually belongs at the end of the second
> row
> > > rather than the start of the first row, the two rows are different: the
> > > first row ends in '90575', not '90577' and the second value in each row
> is
> > > also different: '2661' and '5657'. Therefore, you're not really trying
> to
> > > find duplicate records because a duplicate row would be one that is
> > > identical in every column, not just identical in several columns. I hope
> > > that doesn't sound like I am splitting hairs; maybe I am. ;-)
> > >
> > > Anyway, the standard method for finding duplicates of rows goes like
> this
> > > and can be adapted for your situation. Let's say that you have a table
> > > called Foo with columns foo1, foo2, and foo3 and you want to find all of
> the
> > > duplicate rows in the table, i.e. any row whose foo1, foo2, and foo3
> value
> > > is identical to the foo1, foo2, and foo3 value of another table. You
> would
> > > write this query:
> > >
> > >     select foo1, foo2, foo3, count(*) as num
> > >     from Foo1
> > >     group by foo1, foo2, foo3
> > >     having num > 1
> > >     order by num;
> > >
> > > Your result might look like this:
> > >
> > >     foo1    foo2    foo3    num
> > >     Daffy    D        Duck    1
> > >     Bugs    B        Bunny    1
> > >     Elmer    A        Fudd    3
> > >
> > > This result means that Daffy D Duck and Bugs B Bunny each occur once in
> the
> > > table but Elmer A Fudd occurs three times.
> > >
> > > In the case you mentioned in your email, where the rows aren't complete
> > > duplicates of one another, just write the query so that you omit the
> columns
> > > which are allowed to be different. In your case, that would appear to be
> all
> > > but the second and last columns.
> > >
> > > However, I think if you get to the point of having duplicate records in
> a
> > > table when you don't want them, finding them is like shutting the barn
> door
> > > after the horse is already gone. I think you need to reconsider your
> design,
> > > specifically your primary key, and change that primary key to *PREVENT*
> the
> > > duplicate keys in the first place.
> > >
> > > For instance, in the case of the Foo1 table, I can prevent duplicate
> rows by
> > > choosing a good primary key; in this case I would choose the combination
> of
> > > all three columns in the table; that would ensure that I never stored
> more
> > > than 1 person named Elmer A Fudd. [Actually, that's not a great example!
> It
> > > might be perfectly valid to have two people named Elmer A Fudd - or John
> A
> > > Smith - so I probably need to add additional columns to my table to
> ensure
> > > that the rows are unique; something like Social Security Number which is
> > > (supposed to be) unique would be ideal for this purpose.]
> > >
> > > Rhino
> > >
> > > --
> > > No virus found in this outgoing message.
> > > Checked by AVG Anti-Virus.
> > > Version: 7.0.308 / Virus Database: 266.9.2 - Release Date: 05/04/2005
> > >
> > >
> >
> >
> > --
> > Power to people, Linux is here.
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
> >
> >
> > --
> > No virus found in this incoming message.
> > Checked by AVG Anti-Virus.
> > Version: 7.0.308 / Virus Database: 266.9.2 - Release Date: 05/04/2005
> >
> >
> 
> --
> No virus found in this outgoing message.
> Checked by AVG Anti-Virus.
> Version: 7.0.308 / Virus Database: 266.9.2 - Release Date: 05/04/2005
> 
> 


-- 
Power to people, Linux is here.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Duplicated records

Reply via email to