rich7420 opened a new pull request, #1010:
URL: https://github.com/apache/mahout/pull/1010

   ### Purpose of PR
   <!-- Describe what this PR does. -->
   Adds file-backed and streaming Parquet data sources to the Quantum Data 
Loader. Users can call `.source_file(path)` or `.source_file(path, 
streaming=True)` and iterate in batches. PyO3 loader bindings are restored so 
`QuantumDataLoader` works with synthetic, file, and streaming sources.
   
   - **qdp-core:** `DataSource::InMemory` and `DataSource::Streaming`; 
`new_from_file` (full read, supports .parquet/.arrow/.npy/.pt/.pb etc.) and 
`new_from_file_streaming` (Parquet, chunked read). Shared 
`path_extension_lower`, `take_batch_from_source`, named constants, first-chunk 
buffer reuse. Basis encoding fix: state vector allocated as Float64 to match 
kernel, then converted to engine precision.
   - **qdp-python (Rust):** `PyQuantumLoader`, `create_synthetic_loader`, 
`create_file_loader`, `create_streaming_file_loader` (Linux only). 
`batch_limit=None` → `usize::MAX`; path from str or Path; file/streaming build 
in `py.detach()`.
   - **loader.py:** `source_file(path, streaming=False)`, `_create_iterator()`. 
Streaming requires `.parquet`.
   - **Tests:** New loader tests (mutual exclusion, batch count, extension, 
streaming). DLPack and bindings tests updated for current error messages.
   
   ### Related Issues or PRs
   <!-- Add links to related issues or PRs. -->
   <!-- - Closes #123  -->
   <!-- - Related to #123   -->
   Related to #969 
   
   ### Changes Made
   <!-- Please mark one with an "x"   -->
   - [ ] Bug fix
   - [x] New feature
   - [ ] Refactoring
   - [ ] Documentation
   - [ ] Test
   - [ ] CI/CD pipeline
   - [ ] Other
   
   ### Breaking Changes
   <!-- Does this PR introduce a breaking change? -->
   - [ ] Yes
   - [x] No
   
   ### Checklist
   <!-- Please mark each item with an "x" when complete -->
   <!-- If not all items are complete, please open this as a **Draft PR**.
   Once all requirements are met, mark as ready for review. -->
   
   - [x] Added or updated unit tests for all changes
   - [ ] Added or updated documentation for all changes
   - [x] Successfully built and ran all unit tests or manual tests locally
   - [ ] PR title follows "MAHOUT-XXX: Brief Description" format (if related to 
an issue)
   - [ ] Code follows ASF guidelines
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to