Am 05.12.2012 18:04, schrieb Nick Mellor: > Sample data Well let's see what
def split_product(p): p = p.strip() w = p.split(" ") try: j = next(i for i,v in enumerate(w) if v.upper() != v) except StopIteration: return p, '' return " ".join(w[:j]), " ".join(w[j:]) (which i still find a very elegant solution) has to say about those sample data: >>> for line in open('test.dat', 'r'): ... print(split_product(line)) ('BEANS', 'hand picked') ('BEETROOT', 'certified organic') ('BOK CHOY', '(bunch)') ('BROCCOLI', 'Mornington Peninsula') ('BRUSSEL SPROUTS', '') ('CABBAGE', 'green') ('CABBAGE', 'Red') ('CAPSICUM RED', '') ('CARROTS', '') ('CARROTS', 'loose') ('CARROTS', 'juicing, certified organic') ('CARROTS', 'Trentham, large seconds, certified organic') ('CARROTS', 'Trentham, firsts, certified organic') ('CAULIFLOWER', '') ('CELERY', 'Mornington Peninsula IPM grower') ('CELERY', 'Mornington Peninsula IPM grower') ('CUCUMBER', '') ('EGGPLANT', '') ('FENNEL', '') ('GARLIC', '(from Argentina)') ('GINGER', 'fresh uncured') ('KALE', '(bunch)') ('KOHL RABI', 'certified organic') ('LEEKS', '') ('LETTUCE', 'iceberg') ('MUSHROOM', 'cup or flat') ('MUSHROOM', 'Swiss brown') ('ONION', 'brown') ('ONION', 'red') ('ONION', 'spring (bunch)') ('PARSNIP,', 'certified organic') ('POTATOES', 'certified organic') ('POTATOES', 'Sebago') ('POTATOES', 'Desiree') ('POTATOES', 'Bullarto chemical free') ('POTATOES', 'Dutch Cream') ('POTATOES', 'Nicola') ('POTATOES', 'Pontiac') ('POTATOES', 'Otway Red') ('POTATOES', 'teardrop') ('PUMPKIN', 'certified organic') ('SCHALLOTS', 'brown') ('SNOW PEAS', '') ('SPINACH', "I'll try to get certified organic (bunch)") ('SWEET POTATO', 'gold certified organic') ('SWEET POTATO', 'red small') ('SWEDE', 'certified organic') ('TOMATOES ', 'Qld') ('TURMERIC', 'fresh certified organic') ('ZUCCHINI', '') ('APPLES', 'Harcourt Pink Lady, Fuji, Granny Smith') ('APPLES', 'Harcourt 2 kg bags, Pink Lady or Fuji (bag)') ('AVOCADOS', '') ('AVOCADOS', 'certified organic, seconds') ('BANANAS', 'Qld, organic') ('GRAPEFRUIT', '') ('GRAPES', 'crimson seedless') ('KIWI FRUIT', 'Qld certified organic') ('LEMONS', '') ('LIMES', '') ('MANDARINS', '') ('ORANGES', 'Navel') ('PEARS', 'Beurre Bosc Harcourt new season') ('PEARS', 'Packham, Harcourt new season') ('SULTANAS', '350g pre-packed bags') ('EGGS', "Melita free range, Barker's Creek") ('BASIL', '(bunch)') ('CORIANDER', '(bunch)') ('DILL', '(bunch)') ('MINT', '(bunch)') ('PARSLEY', '(bunch)') ('', 'Spring ONION from QLD') I think the only thing one is left to think about is the ('PARSNIP,', 'certified organic') case. What about that extra comma? Perhaps it could even be considered an "error" in the original data? I don't see a good general way to deal with those which does not have to handle trailing punctuation on the product name explicitly as a special case. Greetings -- http://mail.python.org/mailman/listinfo/python-list