jiteshsoni opened a new pull request, #55227:
URL: https://github.com/apache/spark/pull/55227

   ### What changes were proposed in this pull request?
   
   This PR fixes two documentation bugs in PySpark's streaming data source API:
   
   1. **Docstring attribute error** (`datasource.py`): The 
`DataSourceStreamReader.latestOffset()` docstring example incorrectly 
referenced `limit.maxRows` when it should be `limit.max_rows`. The 
`ReadMaxRows` dataclass uses Python snake_case convention.
   
   2. **Outdated method signature** (`python_data_source.rst`): The tutorial's 
`FakeStreamReader` example showed the deprecated parameterless signature `def 
latestOffset(self)` instead of the recommended signature with admission control 
support: `def latestOffset(self, start: dict, limit)`.
   
   ### Why are the changes needed?
   
   - Users copying the docstring example would encounter an `AttributeError` at 
runtime due to the incorrect attribute name.
   - Tutorial users wouldn't learn about the `start` offset parameter or 
admission control capabilities introduced in SPARK-55304.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. This is a documentation-only fix.
   
   ### How was this patch tested?
   
   Documentation changes only - no tests required.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes, GitHub Copilot and Claude Code were used to assist with this patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to