Re: [PR] Add PeriodicStream in the new time series folder. [beam]

via GitHub Mon, 23 Jun 2025 13:56:26 -0700


damccorm commented on PR #35300:
URL: https://github.com/apache/beam/pull/35300#issuecomment-2997517699

I just came across this code and I have a couple of concerns:

1) Why are we creating a new transform instead of extending periodicImpulse?
It seems like the same goals could be pretty easily accomplished by attaching
data as an option to ImpulseSeqGenDoFn. There is not really anything ml
specific to this transform, it could probably be broadly useful for testing. We
actually already see this in
https://github.com/apache/beam/blob/4c1c2daa17d508afd2f94bffe83ed61ec9ccec8d/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1267
- it is really odd to have fn_api_runner tests depending on an ML package
2) Using time like this in testing is quite fragile -
https://github.com/apache/beam/blob/4c1c2daa17d508afd2f94bffe83ed61ec9ccec8d/sdks/python/apache_beam/ml/ts/util_test.py#L44
- if a runner is slower or there is a lot of contention (running tests in
parallel), this will fail. It already fails on prism. We should instead be
inspecting element timestamps for correctness.
3) There are actual some regressions in functionality from PeriodicImpulse
which seem intentional (e.g. not exposing a start time -
https://github.com/apache/beam/blob/4c1c2daa17d508afd2f94bffe83ed61ec9ccec8d/sdks/python/apache_beam/ml/ts/util.py#L181).
That further diverges us and doesn't seem useful.

@shunping I think we should back this out and move the functionality to
PeriodicImpulse

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add PeriodicStream in the new time series folder. [beam]

Reply via email to