Cloned to https://issues.apache.org/jira/browse/BEAM-12056

On Thu, Mar 25, 2021 at 4:46 PM Brian Hulette <[email protected]> wrote:

> Yes this looks like https://issues.apache.org/jira/browse/BEAM-11929, I
> removed it from the release blockers since there is a workaround (use a
> NamedTuple type), but it's probably worth cherrypicking the fix.
>
> On Thu, Mar 25, 2021 at 4:44 PM Robert Bradshaw <[email protected]>
> wrote:
>
>> This could be https://issues.apache.org/jira/browse/BEAM-11929
>>
>> On Thu, Mar 25, 2021 at 4:26 PM Robert Bradshaw <[email protected]>
>> wrote:
>>
>>> This is definitely wrong. Looking into what's going on here, but this
>>> seems severe enough to be a blocker for the next release.
>>>
>>> On Thu, Mar 25, 2021 at 3:39 PM Xinyu Liu <[email protected]> wrote:
>>>
>>>> Hi, folks,
>>>>
>>>> I am playing around with the Python Dataframe API, and seemly got an
>>>> schema issue when converting pcollection to dataframe. I wrote the
>>>> following code for a simple test:
>>>>
>>>> import apache_beam as beam
>>>> from apache_beam.dataframe.convert import to_dataframe
>>>> from apache_beam.dataframe.convert import to_pcollection
>>>>
>>>> p = beam.Pipeline()
>>>> data = p | beam.Create([('a', '1111'), ('b', '2222')]) | beam.Map(
>>>> lambda x : beam.Row(word=x[0], val=x[1]))
>>>> _ = data | beam.Map(print)
>>>> p.run()
>>>>
>>>> This shows the following:
>>>> Row(val='1111', word='a') Row(val='2222', word='b')
>>>>
>>>> But if I use to_dataframe() to convert it into a df, seems the schema
>>>> was reversed:
>>>>
>>>> df = to_dataframe(data)
>>>> dataCopy = to_pcollection(df)
>>>> _ = dataCopy | beam.Map(print)
>>>> p.run()
>>>>
>>>> I got:
>>>> BeamSchema_4100b64e_16e9_467d_932e_5fc2e4acaca7(word='1111', val='a')
>>>> BeamSchema_4100b64e_16e9_467d_932e_5fc2e4acaca7(word='2222', val='b')
>>>>
>>>> Seems now the column 'word' and 'val' is swapped. The problem seems to
>>>> happen during to_dataframe(). If I print out df['word'], I got '1111' and
>>>> '2222'. I am not sure whether I am doing something wrong or there is an
>>>> issue in the schema conversion. Could someone help me take a look?
>>>>
>>>> Thanks, Xinyu
>>>>
>>>

Reply via email to