[jira] [Updated] (BEAM-12701) Converting two deferred dataframes to csv in the same pipeline causes a PCollection label collision

Jira Sat, 31 Jul 2021 13:09:15 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jérémie Bigras-Dunberry updated BEAM-12701:
-------------------------------------------
    Summary: Converting two deferred dataframes  to csv in the same pipeline 
causes a PCollection label collision  (was: Converting two deferred dataframes  
to csv in the same pipeline causes PCollection label collision)

> Converting two deferred dataframes  to csv in the same pipeline causes a 
> PCollection label collision
> ----------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-12701
>                 URL: https://issues.apache.org/jira/browse/BEAM-12701
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-common
>    Affects Versions: 2.31.0
>            Reporter: Jérémie Bigras-Dunberry
>            Priority: P2
>
>  
> If you use  the to_csv of the DeferredDataFrame twice in a single pipeline 
> like this : 
> {code:java}
> df1 = pd.DataFrame.from_records({"a":"b"}, index=[0])
> df2 = pd.DataFrame.from_records({"a":"b"}, index=[0])
> with beam.Pipeline() as p:
>  df1 = to_dataframe(to_pcollection(df1, pipeline=p), label="df1")
>  df2 = to_dataframe(to_pcollection(df2, pipeline=p), label="df2")
>  df1.to_csv("test.csv")
>  df2.to_csv("test2.csv"){code}
> You get this error on the second to_csv call
>  
> {code:java}
> RuntimeError: A transform with label "ToPCollection(df)" already exists in 
> the pipeline. To apply a transform with a specified label write pvalue | 
> "label" >> transform
> {code}
> I think it comes from the fact that to_csv  is calling a  to_pcollection 
> without any label, causing to infer an identical label for both to_csv 
> function calls. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-12701) Converting two deferred dataframes to csv in the same pipeline causes a PCollection label collision

Reply via email to