Greetings, List!

Say I have an old-fashioned dbf style table, with a single name field of
50 characters:

  names = dbf.Table(':memory:', 'name C(40)')

Then I add a bunch of names from who-knows-where:

  for name in some_iterable():
    names.append((name))

Now I want to know how many start with a 'J'...
I have two options, 1) brute force:

  matches = [rec for rec in names if rec.name[0] == 'J']

or, 2) binary search after ordering:

  names.order('name[:1]')
  matches = names.search('J', startswith=True)

So far so good.  Now it gets a little more complicated.  In my use case
I am trying to match records from one database to records from a another
database;  therefore, I do _not_ know what text I will be searching for,
only the fields I will be using.

If I only had one criteria, I'd still be okay:

  different_table.order('zipcode[:5], last_name')
  for record in original_table:
    matches = different_table.search([record.zipcode[:5], last_name])

However, I have three different sets of matches:
  'first_name[:1], last_name, city, dlvryaddrs[:4]'
  'first_name[:1], last_name[:5], dlvryaddrs[:8]'
  'first_name, last_name, city, state'

This is not a problem for the ordering, as I can just do
  for criteria in (choices):
    different_table.order(criteria)

The problem comes at the matching stage:  the .search method is
expecting a list of the pieces it is supposed to find, so what I need is
a way to apply, for example, 'first_name[:1], last_name[:5],
dlvryaddrs[:8]', to the current record to yield the text to search for.

Current code follows, more comments towards the end.

<code>
import dbf
import shutil
from collections import defaultdict
from cookbook.utils import index

source_tables = [ '/temp/kaz15514',
                  '/temp/kks15515',
                  '/temp/kmn15585',
                  '/temp/knv15516',
                  '/temp/ktx15722',
                  '/temp/kwa15584',
                  '/temp/mco15902',
                  '/temp/msq15994' ]

counts = defaultdict(int)

for i in index(source_tables):
    source_tables[i] = dbf.Table(source_tables[i])

shutil.copy('z:/orders/25105/mbk16508_02', '.')
match_back = dbf.Table('mbk16508_02')
match_back.add_fields('f1ltcta4 C(100), f1l5a8 C(100), ftltctst C(100)')

for field, criteria in \
       (('f1ltcta4', 'first_name[:1], last_name, city, dlvryaddrs[:4]'),
        ('f1l5a8', 'first_name[:1], last_name[:5], dlvryaddrs[:8]'),
        ('ftltctst', 'first_name, last_name, city, state'))
    match_back.order(criteria)
    for table in source_tables:
        counts = defaultdict(int)
        for record in match_back:
            matches = table.search(?????)
            .
            .
            .
</code>

The only idea I have at the moment is to parse the string (much like I
do in the order method), and after the string is parsed pluck out the
the needed pieces.  If that is the best and/or most practical way to do
it, I was thinking of adding __call__ to the record class.  Then,
besides being able to do:

  matches = table.search([record.zip4[:5], record.dlvryaddrs])

I could also do:

  matches = table.search(record('zip4[:5], dlvryaddrs'))

or, equivalently,
  criteria = 'this, that[:7], the_other'
  matches = table.search(record(criteria))

Any better ideas?  Am I missing anything already in the stdlib?

Any and all tips appreciated!

~Ethan~

P.S.
<shameless plug>  Python d-Base currently lives at
http://groups.google.com/group/python-dbase, and it's main purpose in
life is to ease the transition between old dbf files and newer sql
tables.  It can, however, be used for read/write access to dBase III and
VFP 6 tables, including memo fields.

Success stories and bug reports both equally welcome!  :D
</shameless plug>

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to