On Thu, Feb 17, 2011 at 9:09 AM, Ovid <publiustemp-perl...@yahoo.com> wrote:
> For a *very* contrived use case, imagine that you're being introduced to your 
> daughter's boyfriend for the first time and you know his name is "Alexander". 
> He might introduce himself as "Alexander", "Alex", "Al", or even "Xander" and 
> you might not bat an eyelash. If he introduces himself as "Sally" or "Bob", 
> it's times to start asking questions.
>
> In my case, I have code which returns a list of items, but I'm pulling real 
> data (and it's very hard not to pull real data for this use case) and that 
> data will *usually* be in the order I expect, but subtle variations are 
> allowed and cannot be easily prevented. Unfortunately, I can't tell you more 
> than this.

Got it.

My impression is that if the real question is whether it's "Alex" and
not "Sally", then I would try to avoid the ordering issue entirely if
possible and apply some domain-relevant sort to the data and then
iterate through, counting how many things are totally missing.

In your original example,

 [
   [   1, 'North Beach',       'au', 'city'  ],
   [   2, 'North Beach',       'us', 'city'  ],
   [   3, 'North Beach',       'us', 'city'  ],
   [   4, 'North Beach Hotel', 'us', 'hotel' ],
   [   5, 'North Beach',       'us', 'city'  ],
   [   6, 'North Beach',       'us', 'city'  ],
 ]

records 2, 3, 5 and 6 are identical.  So when you say that 654321 is
not OK, is that because 1 ('au') should come before 2 ('us')?  You
said that 2 and 4 could be reversed, so the ordering of 'hotel' and
'city' seems irrelevant.

So I think you've got to nail down what specifically about the order
is required, then sort in a way that preserves that important
dimension (country), but standardizes the rest (e.g. always putting
'hotel' after 'city').  Then you're just walking through it looking
for things that are mismatched or missing.

I realize your original example is highly simplified, but I hope my
suggestion about preserving the important ordering and sorting the
rest for comparison is clear enough and maybe even constructive.  :-)

Regards,
David

Reply via email to