Cloned to https://issues.apache.org/jira/browse/BEAM-12056
On Thu, Mar 25, 2021 at 4:46 PM Brian Hulette <[email protected]> wrote: > Yes this looks like https://issues.apache.org/jira/browse/BEAM-11929, I > removed it from the release blockers since there is a workaround (use a > NamedTuple type), but it's probably worth cherrypicking the fix. > > On Thu, Mar 25, 2021 at 4:44 PM Robert Bradshaw <[email protected]> > wrote: > >> This could be https://issues.apache.org/jira/browse/BEAM-11929 >> >> On Thu, Mar 25, 2021 at 4:26 PM Robert Bradshaw <[email protected]> >> wrote: >> >>> This is definitely wrong. Looking into what's going on here, but this >>> seems severe enough to be a blocker for the next release. >>> >>> On Thu, Mar 25, 2021 at 3:39 PM Xinyu Liu <[email protected]> wrote: >>> >>>> Hi, folks, >>>> >>>> I am playing around with the Python Dataframe API, and seemly got an >>>> schema issue when converting pcollection to dataframe. I wrote the >>>> following code for a simple test: >>>> >>>> import apache_beam as beam >>>> from apache_beam.dataframe.convert import to_dataframe >>>> from apache_beam.dataframe.convert import to_pcollection >>>> >>>> p = beam.Pipeline() >>>> data = p | beam.Create([('a', '1111'), ('b', '2222')]) | beam.Map( >>>> lambda x : beam.Row(word=x[0], val=x[1])) >>>> _ = data | beam.Map(print) >>>> p.run() >>>> >>>> This shows the following: >>>> Row(val='1111', word='a') Row(val='2222', word='b') >>>> >>>> But if I use to_dataframe() to convert it into a df, seems the schema >>>> was reversed: >>>> >>>> df = to_dataframe(data) >>>> dataCopy = to_pcollection(df) >>>> _ = dataCopy | beam.Map(print) >>>> p.run() >>>> >>>> I got: >>>> BeamSchema_4100b64e_16e9_467d_932e_5fc2e4acaca7(word='1111', val='a') >>>> BeamSchema_4100b64e_16e9_467d_932e_5fc2e4acaca7(word='2222', val='b') >>>> >>>> Seems now the column 'word' and 'val' is swapped. The problem seems to >>>> happen during to_dataframe(). If I print out df['word'], I got '1111' and >>>> '2222'. I am not sure whether I am doing something wrong or there is an >>>> issue in the schema conversion. Could someone help me take a look? >>>> >>>> Thanks, Xinyu >>>> >>>
