On 09/29/2009 11:37 AM, Christopher Barker wrote: > Pierre GM wrote: > >> I was thinking about something this week-end: we could create a second >> list when looping on the rows, where we would store the length of each >> splitted row. After the loop, we can find if these values don't match >> the expected number of columns `nbcols` and where. Then, we can decide >> to strip the `rows` list of its invalid values (that corresponds to >> skipping) or raise an exception, but in both cases we know where the >> problem is. >> My only concern is that we'd be creating yet another list of integers, >> which would increase memory usage. Would it be a problem ? >> > I doubt it would be that big deal, however... > Probably more than memory is the execution time involved in printing these problem rows.
There are already two loops over the data where you can measure the number of elements in the row but the first may be more appropriate. So a simple solution is that in the first loop you could append the 'bad' rows to one list and append to a 'good' rows to a exist row list or just store the row number that is bad. Untested code for corresponding part of io.py: row_bad=[] # store bad rows bad_row_numbers=[] # store just the row number row_number=0 #simple row counter that probably should be the first data row not first line of the file for line in itertools.chain([first_line,], fhd): values = split_line(line) # Skip an empty line if len(values) == 0: continue # Select only the columns we need if usecols: values = [values[_] for _ in usecols] # Check whether we need to update the converter if dtype is None: for (converter, item) in zip(converters, values): converter.upgrade(item) if len(values) != nbcols: row_bad.append(line) # store bad row so the user can search for that line bad_row_numbers.append(row_number) # store just the bad row number so user can go to the appropriate line(s) in file else: append_to_rows(tuple(values)) row_number=row_number+1 # increment row counter Note I assume that nbcols is the expected number of columns but I seem to be one off with my counting. Then if len(rows_bad) is greater than zero you could raise or print out a warning and the rows then raise an exception or continue. The problem with continuing is that a user may not be aware that there is a warning. Bruce _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion