[ 
https://issues.apache.org/jira/browse/BEAM-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10056:
-----------------------------------
    Status: Open  (was: Triage Needed)

> Side Input Validation too tight, doesn't allow CoGBK
> ----------------------------------------------------
>
>                 Key: BEAM-10056
>                 URL: https://issues.apache.org/jira/browse/BEAM-10056
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: P1
>
> The following doesn't pass validation, though it should as it's a valid 
> signature for ParDo accepting a PCollection<CoGBK<string, *clientHistory, 
> *clientHistory>>
> func (fn *writer) StartBundle(ctx context.Context) error
> func (fn *writer) ProcessElement(
> ctx context.Context,
> key string,
> iter1, iter2 func(**clientHistory) bool)
> func (fn *writer) FinishBundle(ctx context.Context)
> It returns an error:
> Missing side inputs in the StartBundle method of a DoFn. If side inputs are 
> present in ProcessElement those side inputs must also be present in 
> StartBundle.
> Full error:
>         inserting ParDo in scope root:
>         graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
> side inputs expected in method StartBundle [recovered]
>         panic: Missing side inputs in the StartBundle method of a DoFn. If 
> side inputs are present in ProcessElement those side inputs must also be 
> present in StartBundle.
> Full error:
>         inserting ParDo in scope root:
>         graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
> side inputs expected in method StartBundle
> This is happening in the input unaware validation, which means it needs to be 
> loosened, and validated elsewhere.
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L527
> There are "sibling" cases for the DoFn  signature
> func (fn *writer) StartBundle(context.Context, side func(**clientHistory) 
> bool) error
> func (fn *writer) ProcessElement(
> ctx context.Context,
> key string,
> iter, side func(**clientHistory) bool)
> func (fn *writer) FinishBundle( context.Context, side, func(**clientHistory) 
> bool)
> and
> func (fn *writer) StartBundle(context.Context, side1, side2 
> func(**clientHistory) bool) error
> func (fn *writer) ProcessElement(
> ctx context.Context,
> key string,
> side1, side2 func(**clientHistory) bool)
> func (fn *writer) FinishBundle( context.Context, side1, side2 
> func(**clientHistory) bool)
> Would be for  <CoGBK<string, *clientHistory>> with <*clientHistory> on the 
> side, and
>  <string,> with <*clientHistory> and <*clientHistory> on the side 
> respectively.
> Which would only be determinable fully with the input, and should provide a 
> clear error when PCollection binding is occuring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to