shunping commented on issue #35867: URL: https://github.com/apache/beam/issues/35867#issuecomment-3348591262
Good news! I am able to reproduce the "DEADLINE_EXCEEDED" error when I set the number of rows to 100k. I did some investigation and found it is due to the timeout of a gPRC connection. https://github.com/apache/beam/blob/50e14ace7f6bfb9a28bff59962c2166729adb778/sdks/python/apache_beam/runners/portability/portable_runner.py#L226-L228 By default, we create a gRPC connection with a timeout of 60 seconds: https://github.com/apache/beam/blob/c84f28f84aa4f38cb7209809fd079835c698f0d4/sdks/python/apache_beam/options/pipeline_options.py#L1747-L1757 If it is idle for more than 1 min, it will be cut off and the "DEADLINE_EXCEEDED" error will be issued. This is a protection mechanism we implemented to avoid a hanging job. You can use the pipeline option `job_server_timeout` to override the default deadline. I verified that if I set the timeout to be 10 mins, the previously failed job (with 100k rows) can run successfully on my end. Could you try that and let me know if it works for you too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
