shunping commented on PR #35300:
URL: https://github.com/apache/beam/pull/35300#issuecomment-2997958630

   > I just came across this code and I have a couple of concerns:
   > 
   > 1. Why are we creating a new transform instead of extending 
periodicImpulse? It seems like the same goals could be pretty easily 
accomplished by attaching data as an option to ImpulseSeqGenDoFn. There is not 
really anything ml specific to this transform, it could probably be broadly 
useful for testing. We actually already see this in 
https://github.com/apache/beam/blob/4c1c2daa17d508afd2f94bffe83ed61ec9ccec8d/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1267
   
   Extending PeriodicImpulse was actually my first thought. However, I notice 
that it is used extensively in IOs so I hesitate to modify the code to change 
its behavior or affecting its performance. Also, as its name suggests, it may 
be cleaner if we keep PeriodicImpulse as it is to give impulse (timestamp) to 
the whole pipeline. These are the reason I started a new PTransform, and at the 
same time reused the SDF detail from `ImpulseSeqGenDoFn` as much as possible.
   
   >     - it is really odd to have fn_api_runner tests depending on an ML 
package
   
   +1 on this.
   
   > 2. Using time like this in testing is quite fragile - 
https://github.com/apache/beam/blob/4c1c2daa17d508afd2f94bffe83ed61ec9ccec8d/sdks/python/apache_beam/ml/ts/util_test.py#L44
   >     - if a runner is slower or there is a lot of contention (running tests 
in parallel), this will fail. It already fails on prism. We should instead be 
inspecting element timestamps for 
   correctness.
   
   That's a good point. Will update the tests.
   
   > 3. There are actual some regressions in functionality from PeriodicImpulse 
which seem intentional (e.g. not exposing a start time - 
https://github.com/apache/beam/blob/4c1c2daa17d508afd2f94bffe83ed61ec9ccec8d/sdks/python/apache_beam/ml/ts/util.py#L181
   >    ). That further diverges us and doesn't seem useful.
   > 
   
   @damccorm: Could you explain a bit more on this? Yes, the start time is not 
exposed in the new PTransform, because the current catch-up behavior on 
PeriodicImpulse is a bit unintuitive for testing.
   
   > @shunping I think we should back this out and move the functionality to 
PeriodicImpulse
   Yes, let me revert it and address the previous items.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to