Monday, June 20, 2005, 5:48:30 PM, Erik Hatcher wrote: > Now you've just said the same conflicting thing a different way. You > want to cluster but only return one. :)
i think i missunderstood here the Term: cluster. so yes, i just want one image returned. > If you only want one image returned, then it seems that only indexing > the same image once is the way to go. When you find a duplicate MD5, > don't index that as a second document. You will, instead, update the > document by adding additional ALT text and perhaps the additional URL. this sounds pretty ok ! > Is there a reason why indexing each unique image (by MD5) is not a > good way to go in your case? >> in sql this would be: >> select distinct md5, url, alt from table group by md5 order by >> score asc; > This would give you multiple records for the same MD5. You said > above you only want one per MD5. here i'm afraid you are not correct, because i have GROUP BY MD5 clause which will return no duplicates. (tested it on mysql) for the query above. 170 rows in set (0.13 sec) select distinct md5 from image; | e127d0e91af5d8b2522138fb46c2e1bc | | 7a18b029925d8357599878a85fd6b02f | +----------------------------------+ 170 rows in set (0.00 sec) same nr of rows :D -- Catalin Constantin --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]