[ https://issues.apache.org/jira/browse/BEAM-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jérémie Bigras-Dunberry updated BEAM-12701: ------------------------------------------- Summary: Converting two deferred dataframes to csv in the same pipeline causes a PCollection label collision (was: Converting two deferred dataframes to csv in the same pipeline causes PCollection label collision) > Converting two deferred dataframes to csv in the same pipeline causes a > PCollection label collision > ---------------------------------------------------------------------------------------------------- > > Key: BEAM-12701 > URL: https://issues.apache.org/jira/browse/BEAM-12701 > Project: Beam > Issue Type: Bug > Components: io-py-common > Affects Versions: 2.31.0 > Reporter: Jérémie Bigras-Dunberry > Priority: P2 > > > If you use the to_csv of the DeferredDataFrame twice in a single pipeline > like this : > {code:java} > df1 = pd.DataFrame.from_records({"a":"b"}, index=[0]) > df2 = pd.DataFrame.from_records({"a":"b"}, index=[0]) > with beam.Pipeline() as p: > df1 = to_dataframe(to_pcollection(df1, pipeline=p), label="df1") > df2 = to_dataframe(to_pcollection(df2, pipeline=p), label="df2") > df1.to_csv("test.csv") > df2.to_csv("test2.csv"){code} > You get this error on the second to_csv call > > {code:java} > RuntimeError: A transform with label "ToPCollection(df)" already exists in > the pipeline. To apply a transform with a specified label write pvalue | > "label" >> transform > {code} > I think it comes from the fact that to_csv is calling a to_pcollection > without any label, causing to infer an identical label for both to_csv > function calls. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)