rich7420 opened a new pull request, #1010: URL: https://github.com/apache/mahout/pull/1010
### Purpose of PR <!-- Describe what this PR does. --> Adds file-backed and streaming Parquet data sources to the Quantum Data Loader. Users can call `.source_file(path)` or `.source_file(path, streaming=True)` and iterate in batches. PyO3 loader bindings are restored so `QuantumDataLoader` works with synthetic, file, and streaming sources. - **qdp-core:** `DataSource::InMemory` and `DataSource::Streaming`; `new_from_file` (full read, supports .parquet/.arrow/.npy/.pt/.pb etc.) and `new_from_file_streaming` (Parquet, chunked read). Shared `path_extension_lower`, `take_batch_from_source`, named constants, first-chunk buffer reuse. Basis encoding fix: state vector allocated as Float64 to match kernel, then converted to engine precision. - **qdp-python (Rust):** `PyQuantumLoader`, `create_synthetic_loader`, `create_file_loader`, `create_streaming_file_loader` (Linux only). `batch_limit=None` → `usize::MAX`; path from str or Path; file/streaming build in `py.detach()`. - **loader.py:** `source_file(path, streaming=False)`, `_create_iterator()`. Streaming requires `.parquet`. - **Tests:** New loader tests (mutual exclusion, batch count, extension, streaming). DLPack and bindings tests updated for current error messages. ### Related Issues or PRs <!-- Add links to related issues or PRs. --> <!-- - Closes #123 --> <!-- - Related to #123 --> Related to #969 ### Changes Made <!-- Please mark one with an "x" --> - [ ] Bug fix - [x] New feature - [ ] Refactoring - [ ] Documentation - [ ] Test - [ ] CI/CD pipeline - [ ] Other ### Breaking Changes <!-- Does this PR introduce a breaking change? --> - [ ] Yes - [x] No ### Checklist <!-- Please mark each item with an "x" when complete --> <!-- If not all items are complete, please open this as a **Draft PR**. Once all requirements are met, mark as ready for review. --> - [x] Added or updated unit tests for all changes - [ ] Added or updated documentation for all changes - [x] Successfully built and ran all unit tests or manual tests locally - [ ] PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue) - [ ] Code follows ASF guidelines -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
