Hi Jobin, Thanks for the follow-up and for sharing the branch, happy to give you some pointers so you can shape this into a reviewable PR.
A few observations on the current state of https://github.com/JOBIN-SABU/opennlp-sandbox/tree/grpc: 1. Placement / duplication `opennlp-grpc/examples/python-client/` already exists on apache:main with a working `main.py` (POS tagging) and the generated `opennlp_pb2.py` / `opennlp_pb2_grpc.py` . Your new `opennlp-grpc/opennlp-python-client/` directory duplicates that. Please rebase on the latest `main` and add your sentence-detection example *inside* the existing `opennlp-grpc/examples/python-client/` directory (e.g. as `sentdetect_example.py`, alongside the existing `main.py`). One README per directory is enough — just extend the existing one rather than adding a second. 2. Suitability for an initial PR Not yet, but close. Before opening one, please: - Drop the accidentally-committed files: `[Help`, the two `META-INF/MANIFEST.MF` files under `caseditor-*`, the stray `opennlp-sandbox` entry, and `opennlp-grpc/config.properties` (it contains a hard-coded absolute path `/home/joy/...`). If a sample config is useful, add a sanitized version using relative paths. - Keep the diff focused on a single concern: one Python example + minor README updates. - Add the Apache 2.0 license header to any new `.py` files you author (see `examples/python-client/main.py` for the canonical header). 3. Generated gRPC stubs The existing convention in this module is to commit them under `examples/python-client/` for ease of running the example; the README documents the regeneration command via `grpc_tools.protoc`. That's fine for examples. If/when this grows into a proper Python SDK (your "package as a Python SDK (pip)" idea), the right approach is to *not* commit generated stubs into the SDK package and generate them at build time from the `.proto` in `opennlp-grpc-api/`, mirroring how the Java stubs are generated by the Maven build. For now, since you're adding an example, please follow the existing examples convention and keep the stubs committed. 4. Please add proper `uv` support Apache projects increasingly favor `uv` for Python tooling because it gives reproducible, lockfile-backed environments and is dramatically faster than pip/venv. For the example, that means: - Add a `pyproject.toml` under `opennlp-grpc/examples/python-client/` declaring `grpcio` and `grpcio-tools` as dependencies (pin reasonable minimum versions). - Commit a `uv.lock` for reproducibility. - Update the README to show the canonical workflow: cd opennlp-grpc/examples/python-client uv sync uv run python main.py # existing POS example uv run python sentdetect_example.py # your new example and a regen step for the stubs: uv run python -m grpc_tools.protoc \ -I../../opennlp-grpc-api \ --python_out=. --grpc_python_out=. \ ../../opennlp-grpc-api/opennlp.proto - Keep the plain `pip install grpcio grpcio-tools` instructions as a short fallback for users without `uv`. If you later split out a real Python SDK module, `uv` + `pyproject.toml` should be the build/packaging baseline there too: generated stubs produced by `uv run` during build rather than committed. Suggested next step: a single small, focused PR that - adds `sentdetect_example.py` under `opennlp-grpc/examples/python-client/`, - introduces `pyproject.toml` + `uv.lock` for that directory, - updates the README with the `uv` workflow. POS / NER coverage and SDK packaging are best as separate follow-up PRs once placement and the Python tooling baseline are settled. Thanks again for sticking with this; happy to review once the PR is up. Gruß Richard > Am 24.04.2026 um 18:15 schrieb Jobin Sabu <[email protected]>: > > Hi Richard, Jeff, and OpenNLP Developers, > I hope you’re doing well. > I wanted to follow up on my previous message regarding the gRPC-based > Python integration work. I completely understand that mentoring bandwidth > is limited at the moment, and I appreciate the clarity you shared earlier. > That said, I remain genuinely interested in contributing to OpenNLP and > plan to continue working on this integration independently outside of GSoC. > At this stage, I have a clean and working foundation: > Python client communicating with the OpenNLP gRPC server > End-to-end sentence detection working with proper model loading > A structured branch with minimal examples and setup instructions > Branch for reference: > https://github.com/JOBIN-SABU/opennlp-sandbox/tree/grpc > Before I expand further (e.g., POS tagging, NER, and SDK improvements), I > would really value any brief guidance on: > Whether the current structure is suitable for an initial PR > Preferred placement for Python client/examples within the project > Whether generated gRPC files should be committed or user-generated > Even a small pointer or confirmation would help me align better with > project expectations. > I’ll continue refining and extending the work in the meantime and will aim > to contribute in a way that is useful to the community. > Thank you for your time, and I appreciate any feedback whenever convenient. > Best regards, > Jobin Sabu > [email protected] > https://github.com/JOBIN-SABU > > On Wed, 1 Apr, 2026, 12:07 pm Jobin Sabu, <[email protected]> wrote: > >> Hi Richard, >> I hope you're doing well. >> I’ve cleaned up my work and pushed the current state of the gRPC-based >> Python integration to a separate branch for review: >> https://github.com/JOBIN-SABU/opennlp-sandbox/tree/grpc >> This currently includes: >> A working Python client using the existing proto definitions >> End-to-end sentence detection via gRPC >> A minimal example for testing the setup >> A cleaned project structure (excluding target/, models/, and generated >> artifacts) >> Updated README with step-by-step instructions >> At this stage, my focus has been to establish a clean and working >> foundation that demonstrates Python ↔ OpenNLP integration in a simple and >> reproducible way. >> My broader goal (as discussed earlier) is to extend this further by: >> Adding additional services such as POS tagging and NER >> Improving model handling and configuration >> Developing a more complete Python SDK and documentation >> However, before expanding the scope, I wanted to first confirm that the >> current structure and direction align with project expectations. >> I would really appreciate your feedback on: >> Whether this structure is suitable for an initial PR >> If the placement of the Python example is appropriate >> Whether generated Python gRPC files should be included or generated by >> users >> Based on your guidance, I will continue refining and expanding the >> implementation. >> Thank you again for your time and support. >> Best regards, >> Jobin Sabu >> On Sun, 29 Mar, 2026, 7:22 pm Jobin Sabu, <[email protected]> wrote: >> >>> Dear Richard, Jeff, and OpenNLP Developers, >>> >>> I hope you’re doing well. >>> >>> I wanted to share a quick update regarding the gRPC-based Python client >>> integration. I’m happy to say that the setup is now working end-to-end — >>> the server is running successfully, models are loading correctly, and I’m >>> able to make RPC calls from the Python client with expected outputs. I’ve >>> also captured screenshots of the working setup for reference. >>> >>> This took longer than expected due to my academic commitments, but I’ve >>> been consistently working on this for nearly a year now and have gained a >>> solid understanding of the system. >>> >>> With GSoC 2026 approaching, I would love to continue this work and take >>> it further — including improving the Python SDK, adding more services like >>> NER and chunking, and refining documentation for broader adoption. >>> >>> I wanted to ask if anyone from the OpenNLP community might be available >>> to mentor this effort for GSoC 2026. I’ll have significantly more >>> availability this year and am fully committed to pushing this forward as a >>> meaningful contribution to the project. >>> >>> Thank you again for your guidance and support throughout this journey. >>> I’d really appreciate any feedback or direction. >>> >>> Best regards, >>> Jobin Sabu >>> [image: image.png] >>> [email protected] >>> >>> On Tue, 10 Mar 2026 at 00:58, Richard Zowalla <[email protected]> wrote: >>> >>>> Hi Jobin, >>>> >>>> Thanks for the detailed update, and apologies for the slow reply as >>>> with most volunteer-driven projects, the day job occasionally takes >>>> priority! >>>> >>>> Regarding the model loading error: the server doesn't load raw .bin >>>> files directly from the filesystem. Instead, it expects a model JAR dropped >>>> into the location specified in the config. You can find pre-built model >>>> JARs for OpenNLP on Maven Central via the opennlp-models repository: >>>> https://github.com/apache/opennlp-models >>>> >>>> If you need to deploy a custom model, it needs to follow the packaging >>>> pattern shown in that same repo, so simply pointing to a .bin file won't >>>> work. The best reference for how to set this up correctly is the >>>> integration test in the sandbox repo, which shows the expected directory >>>> structure and configuration in a working example. >>>> >>>> Regarding GSoC 2026: your proposal sounds well thought-out, and it's >>>> great to see the direction you have in mind (NER, Chunking, a PyPI SDK, and >>>> docs). However, I am currently unable to mentor due to time constraints in >>>> my day job. >>>> >>>> Best >>>> Richard >>>> >>>>> Am 06.03.2026 um 05:41 schrieb Jobin Sabu <[email protected]>: >>>>> >>>>> *Hi Richard and Jeff,* >>>>> >>>>> I hope you're both doing well. >>>>> >>>>> I would like to provide an update on the gRPC-based Python client >>>>> integration. I have reached the stage where the client connects and RPC >>>>> calls are being made, but I am consistently receiving a server-side >>>> error >>>>> when attempting sentence detection. >>>>> >>>>> To assist with debugging, I have pushed the *entire raw state* of my >>>>> environment to my repository: >>>>> *https://github.com/JOBIN-SABU/opennlp-sandbox-experiments >>>>> < >>>> https://www.google.com/url?sa=E&source=gmail&q=https://github.com/JOBIN-SABU/opennlp-sandbox-experiments >>>>> * >>>>> >>>>> *Technical Context:* >>>>> >>>>> - >>>>> >>>>> 1. *Working Directory:* >>>>> tmp-opennlp-sandbox1/opennlp-sandbox/opennlp-grpc/target/ >>>>> 2. *Server Command:* java -cp >>>>> "opennlp-grpc-server-2.5.8-SNAPSHOT.jar:models/:" >>>> -Dopennlp.model.dir=. >>>>> org.apache.opennlp.grpc.OpenNLPService >>>>> 3. *Model Location:* ./models/opennlp/tools/sentdetect/en-sent.bin >>>>> 4. *The Error:* grpc._channel._InactiveRpcError: status = >>>>> StatusCode.INTERNAL, details = "Could not find the given model." >>>>> >>>>> >>>>> >>>>> I suspect the issue lies in how the gRPC wrapper handles resource >>>>> loading—specifically whether it expects models on the *ClassPath* or >>>>> supports *Relative/Absolute File System paths* via the config file. >>>> Since >>>>> I’ve experimented with both flat and nested directory hierarchies >>>> without >>>>> success, I would appreciate any insight into the "expected" pathing >>>> for the >>>>> Sandbox server. >>>>> >>>>> *Regarding GSoC 2026:* As I have been contributing to this for nearly a >>>>> year, my goal remains to establish a robust bridge between OpenNLP and >>>> the >>>>> Python community. I would love to formally propose this as a project >>>> for >>>>> the *GSoC 2026 cycle* to move these features from the sandbox into a >>>>> production-ready state. >>>>> >>>>> Beyond fixing the current integration, my proposal includes: >>>>> >>>>> - >>>>> >>>>> *Expanding Services:* Implementing NER (Named Entity Recognition) and >>>>> Chunking as gRPC services. >>>>> - >>>>> >>>>> *Pythonic SDK:* Developing a client library for distribution via >>>>> PyPI/pip. >>>>> - >>>>> >>>>> *Documentation:* Creating comprehensive benchmarks and "Getting >>>> Started" >>>>> guides. >>>>> >>>>> Given my deep involvement in the current implementation, *would either >>>> of >>>>> you be interested in mentoring me for this project during the upcoming >>>> GSoC >>>>> cycle?* I am eager to see this through to completion for the Apache >>>> OpenNLP >>>>> community. >>>>> >>>>> Best regards, >>>>> >>>>> *Jobin Sabu* >>>>> >>>>> >>>>> On Wed, 25 Feb 2026 at 10:45, Jobin Sabu <[email protected]> >>>> wrote: >>>>> >>>>>> Hi Richard and Jeff, >>>>>> >>>>>> I hope you're both doing well. >>>>>> >>>>>> I would like to provide a brief report on the gRPC based Python >>>> client >>>>>> integration. I have got to a stage where the client connects and RPC >>>> calls >>>>>> are being made but I am always receiving a server-side error when >>>> trying to >>>>>> detect the sentence. >>>>>> >>>>>> I have deployed the latest version of my work, such as the models/ >>>> folder >>>>>> and the config.properties, to my repository: >>>>>> https://github.com/JOBIN-SABU/opennlp-sandbox-experiments >>>>>> < >>>> https://www.google.com/url?sa=E&source=gmail&q=https://github.com/JOBIN-SABU/opennlp-sandbox-experiments >>>>> >>>>>> . >>>>>> >>>>>> *Technical Details:* >>>>>> >>>>>> *Server Command:* >>>>>> >>>>>> java -jar opennlp-grpc-server-2.5.8-SNAPSHOT.jar -c config.properties >>>> -p >>>>>> 7071 >>>>>> >>>>>> *config.properties:* >>>>>> >>>>>> sentenceModel=en-sent.bin >>>>>> >>>>>> *The Error:* >>>>>> >>>>>> grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that >>>> terminated >>>>>> with: status = StatusCode.INTERNAL, details = "Could not find the >>>> given >>>>>> model."> >>>>>> >>>>>> As OpenNLP is a library, I believe the problem is in the nature of >>>>>> loading resources in gRPC wrapper, i.e., whether it anticipates the >>>> model >>>>>> existing on the ClassPath or is capable of interpreting File System >>>> paths >>>>>> based on the configuration file. Am I going through an absolute path, >>>> or >>>>>> does the server have a particular directory hierarchy it wants >>>> external >>>>>> models to go through? >>>>>> >>>>>> *Regarding GSoC 2026:* >>>>>> >>>>>> Since I have been working on this for almost a year now, my goal >>>> remains >>>>>> to establish a bridge between OpenNLP and the Python community. While >>>> I am >>>>>> committed to this regardless of GSoC, I would love to formally >>>> propose this >>>>>> as a project for the 2026 cycle to move it from the sandbox into a >>>>>> production-ready feature. >>>>>> >>>>>> Beyond repairing the existing integration, I plan to: >>>>>> >>>>>> - Implement *NER (Named Entity Recognition)* and *Chunking* as gRPC >>>>>> services. >>>>>> - Write a pythonic client SDK that will be distributed through >>>>>> *PyPI/pip*. >>>>>> - Develop detailed documentation and performance benchmarks. >>>>>> >>>>>> Since I am already deep into the implementation, would either of you >>>>>> consider mentoring me for this project during the upcoming cycle of >>>> Gsoc? >>>>>> I’m eager to see this through to completion for the community. >>>>>> >>>>>> Best regards, >>>>>> >>>>>> *Jobin Sabu* >>>>>> >>>>>> *https://www.linkedin.com/in/jobin-sabu-0b18bb2b8/ >>>>>> <https://www.linkedin.com/in/jobin-sabu-0b18bb2b8/>* >>>>>> >>>>>> *https://github.com/JOBIN-SABU <https://github.com/JOBIN-SABU>* >>>>>> >>>> >>>>
