Robert Bradshaw created BEAM-13454:
--------------------------------------

             Summary: Dataframe read_fwf fails reading incrementally.
                 Key: BEAM-13454
                 URL: https://issues.apache.org/jira/browse/BEAM-13454
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
            Reporter: Robert Bradshaw
            Assignee: Robert Bradshaw


When trying to use beam.dataframe.io.read_fwf one gets the error.


{code:python}
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py",
 line 1206, in process_with_sized_restriction
    return self.do_fn_invoker.invoke_process(
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py",
 line 698, in invoke_process
    residual = self._invoke_process_per_window(
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py",
 line 836, in _invoke_process_per_window
    self.output_processor.process_outputs(
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py",
 line 1334, in process_outputs
    for result in results:
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/dataframe/io.py",
 line 545, in process
    frames = reader(handle, *self.args, **self.kwargs)
  File 
"/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py",
 line 848, in read_fwf
    return _read(filepath_or_buffer, kwds)
  File 
"/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py",
 line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File 
"/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py",
 line 942, in __init__
    self.engine = self._check_file_or_buffer(f, engine)
  File 
"/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py",
 line 1003, in _check_file_or_buffer
    raise ValueError(msg)
ValueError: The 'python' engine cannot iterate through this file buffer.
{code}


Looks like pandas is expecting the file handle to be (line) iterable as well as 
supporting read().



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to