Carl Banks wrote:
> import collections
> import itertools
>
> def createInitialCluster(fileName):
>     fixedPoints = []
>     # quantization is a dict that assigns sequentially-increasing
> numbers
>     # to values when reading keys that don't yet exit
>     quantization = defaultdict.collections(itertools.count().next)
>     with open(fileName, 'r') as f:
>         for line in f:
>             dimensions = []
>             for s in line.rstrip('\n').split(","):
>                 if isNumeric(s):
>                     dimensions.append(float(s))
>                 else:
>                     dimensions.append(float(quantization[s]))
>             fixedPoints.append(Point(dimensions))
>     return Cluster(fixedPoints)

nice reply (i didn't know defaultdict worked like that - very neat).

two small things i noticed:

1 - do you need a separate quantization for each column?  the code above
might give, for example, non-contiguous ranges of integers for a
particular column if a string occurs ("by accident" perhaps) in more than
one.

2 - don't bother with isNumeric.  just return the cast value or catch the
exception:

  [...]
  try:
    dimensions.append(float(s))
  except:
    dimensions.append(float(quantization[s]))

(not sure float() is needed there either if you're using a recent version
of python - only reason i can think of is to avoid integer division in
older versions).

andrew


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to