Dev-iL opened a new pull request, #1509:
URL: https://github.com/apache/hamilton/pull/1509
### Motivation
Hamilton's data quality validators (`DataValidator`, `BaseDefaultValidator`)
are synchronous-only. When using `AsyncDriver`, the validation wrapper
functions created by `@check_output` / `@check_output_custom` are always plain
`def` — even if the underlying validator needs to perform async work. The
`AsyncGraphAdapter` checks `asyncio.iscoroutinefunction(fn)` to decide whether
to `await` a node callable, so a sync wrapper around an async `validate()`
silently returns an unawaited coroutine instead of a `ValidationResult`,
corrupting downstream results.
#### Minimal example
```python
# my_module.py
from hamilton.data_quality.base import AsyncDataValidator, ValidationResult
from hamilton.function_modifiers import check_output_custom
class AsyncPositiveValidator(AsyncDataValidator):
def __init__(self):
super().__init__(importance="fail")
def applies_to(self, datatype):
return datatype == int
def description(self):
return "Value must be positive"
@classmethod
def name(cls):
return "positive_validator"
async def validate(self, dataset: int) -> ValidationResult:
# async validation logic (e.g. await db_check(dataset))
return ValidationResult(
passes=dataset > 0,
message=f"{dataset} is {'positive' if dataset > 0 else 'not
positive'}",
)
@check_output_custom(AsyncPositiveValidator())
async def doubled(input_value: int) -> int:
return input_value * 2
# main.py
from hamilton import async_driver, base
dr = async_driver.AsyncDriver({}, my_module,
result_builder=base.DictResult())
result = await dr.execute(final_vars=["doubled"], inputs={"input_value": 5})
# result == {"doubled": 10}
```
## Changes
**New base classes** (`hamilton/data_quality/base.py`):
- `AsyncDataValidator` — async variant of `DataValidator` with `async def
validate()`
- `AsyncBaseDefaultValidator` — async variant of `BaseDefaultValidator` for
use with `@check_output`
- `is_async_validator()` helper using `inspect.iscoroutinefunction` for
robust detection
**Async-aware wrapper generation**
(`hamilton/function_modifiers/validation.py`):
- `transform_node()` now detects async validators and creates `async def`
wrappers that `await` the validator's `validate()` call
- Sync validators get a runtime guard that raises a clear `TypeError` if a
coroutine is accidentally returned
- Follows the established Hamilton pattern used in `expanders.py`,
`macros.py`, and `recursive.py`
**Documentation updates**:
- `writeups/data_quality.md` — new "Async Validators" section with full
examples
- `docs/concepts/function-modifiers.rst` — mentions async validator base
classes
- `docs/reference/decorators/check_output.rst` — API reference entries for
async classes
- `docs/how-tos/run-data-quality-checks.rst` — async validators subsection
- `examples/async/README.md` — data quality with async section
## How I tested this
- Unit tests verifying async wrapper creation, mixed sync/async validators,
and the misuse guard
- End-to-end tests with `AsyncDriver` for async, sync, and mixed validator
scenarios
## Notes
- `AsyncDataValidator` inherits from `DataValidator`, so all existing
`isinstance` checks and type hints work unchanged.
- Detection uses `inspect.iscoroutinefunction(validator.validate)` rather
than `isinstance`, catching any validator with an async `validate` regardless
of class hierarchy.
- `final_node_callable` (which aggregates validation results) remains sync —
the `AsyncGraphAdapter` awaits all kwargs before calling it, so results are
already resolved.
- All changes are additive with no breaking API modifications. Existing sync
validators continue to work identically.
## Checklist
- [x] PR has an informative and human-readable title (this will be pulled
into the release notes)
- [x] Changes are limited to a single goal (no scope creep)
- [x] Code passed the pre-commit check & code is left cleaner/nicer than
when first encountered.
- [x] Any _change_ in functionality is tested
- [x] New functions are documented (with a description, list of inputs, and
expected output)
- [ ] Placeholder code is flagged / future TODOs are captured in comments
- [x] Project documentation has been updated if adding/changing
functionality.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]