Tim Chase wrote:

> On 2016-02-19 02:47, wrong.addres...@gmail.com wrote:
>> 2 12.657823 0.1823467E-04 114 0
>> 3 4 5 9 11
>> "Lower"
>> 278.15
>> 
>> Is it straightforward to read this, or does one have to read one
>> character at a time and then figure out what the numbers are? --
> 
> It's easy to read.  What you do with that mess of data is the complex
> part.  They come in as byte-strings, but you'd have to convert them
> to the corresponding formats:
> 
>   from shlex import shlex
>   USE_LEX = True # False
>   with open('data.txt') as f:
>     for i, line in enumerate(f, 1):
>       if USE_LEX:
>         bits = shlex(line)
>       else:
>         bits = line.split()
>       for j, bit in enumerate(bits, 1):
>         if bit.isdigit():
>           result = int(bit)
>           t = "an int"
>         elif '"' in bit:
>           result = bit
>           t = "a string"
>         else:
>           result = float(bit)
>           t = "a float"
>         print("On line %i I think that item %i %r is %s: %r" % (
>           i,
>           j,
>           bit,
>           t,
>           result,
>           ))
> 
> The USE_LEX controls whether the example code uses string-splitting
> on white-space, or uses the built-in "shlex" module to parse for
> quoted strings that might contain a space.  The naive way of
> string-splitting will be faster, but choke on string-data containing
> spaces.
> 
> You'd have to make up your own heuristics for determining what type
> each data "bit" is, parsing it out (with int(), float() or whatever),
> but the above gives you some rough ideas with at least one known
> bug/edge-case.  

Or just tell the parser what to expect:

$ cat read_data_shlex2.py
import shlex

CONVERTERS = {
    "i": int,
    "f": float,
    "s": str
}


def parse_line(types, line=None, file=None):
    if line is None:
        line = file.readline()
    values = shlex.split(line)
    if len(values) != len(types):
        raise ValueError("Too few/many values %r <-- %r" % (types, values))
    return tuple(CONVERTERS[t](v) for t, v in zip(types, values))


with open("data.txt") as f:
    print(parse_line("iffii", file=f))
    print(parse_line("iiiii", file=f))
    print(parse_line("s", file=f))
    print(parse_line("fsi", file=f))
    print(parse_line("ff", file=f))
$ cat data.txt
2 12.657823 0.1823467E-04 114 0
3 4 5 9 11
"Lower"
1.2 "foo \" bar \\ baz" 42
278.15
$ python3 read_data_shlex2.py 
(2, 12.657823, 1.823467e-05, 114, 0)
(3, 4, 5, 9, 11)
('Lower',)
(1.2, 'foo " bar \\ baz', 42)
Traceback (most recent call last):
  File "read_data_shlex2.py", line 24, in <module>
    print(parse_line("ff", file=f))
  File "read_data_shlex2.py", line 15, in parse_line
    raise ValueError("Too few/many values %r <-- %r" % (types, values))
ValueError: Too few/many values 'ff' <-- ['278.15']
$ 


But we can't do *all* the work for you ;-)

If this thread goes long enough eventually we will ;)

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to