Hi Adnan,

> the "/lineage" API is defined in the OpenLineage spec.

Could you provide a pointer to where this is defined on the OL side?

Thanks,
Dmitri.

On Fri, Jun 12, 2026 at 4:33 PM Adnan Hemani via dev <[email protected]>
wrote:

> Hi EJ,
>
> Unfortunately, the "/lineage" API is defined in the OpenLineage spec.
> Changing this out for Polaris would require client-side changes - leading
> us to the same situation that you confirmed after your investigation.
>
> I agree with tackling the implementation similarly to what you've outlined.
> However, breaking this design into those topics may create more chaos than
> good because all these topics must work hand-in-hand design-wise and no
> other non-OpenLineage proposals for Data Lineage are expected in the near
> future. I request everyone to please review the initial PR that sets the
> Ingest API in Polaris: https://github.com/apache/polaris/pull/4667.
>
> Best,
> Adnan Hemani
>
>
>
> On Fri, Jun 12, 2026 at 12:02 PM EJ Wang <[email protected]>
> wrote:
>
> > Hi Adnan,
> >
> > I think your point about adoption is right, and I'd revise part of my
> > earlier framing after looking more closely at how existing OpenLineage
> > integrations work.
> >
> > I was previously thinking too much about whether clients could emit a
> more
> > Polaris-native or framework-agnostic payload. But that is probably not
> the
> > right first-slice adoption model. Existing OL producers generally already
> > emit OpenLineage events, and the common low-friction knob is the
> transport
> > target, URL/endpoint, not a uniform way to wrap or reshape the event
> body.
> >
> > So I agree that the first slice should optimize for endpoint retargeting
> > and raw OL event ingestion. Clients should not need to know that the
> > backend is Polaris or learn a Polaris-specific payload shape.
> >
> > *The design question I'd still like us to make explicit is where the
> > OpenLineage specificity lives*. My preference would be to make it
> > explicit at the ingress/API layer, for example with an
> OpenLineage-specific
> > route under the lineage namespace such as: /.../lineage/openlineage
> >
> > That still preserves endpoint-retargeting for existing OL producers,
> while
> > avoiding ambiguity about whether the generic `/lineage` namespace is an
> > OpenLineage contract or a broader Polaris lineage namespace. It also
> leaves
> > room for future `/lineage/<format>` ingress adapters if Polaris later
> > supports other lineage formats or frameworks.
> >
> > Behind that ingress route, I'd like to keep the platform boundary
> > Polaris-owned. I would separate:
> >
> > 1. *OpenLineage REST ingress/API* : an OL-aware endpoint that accepts raw
> > OL events.
> > 2. *Polaris lineage capability boundary*: a Polaris-owned contract behind
> > ingress.
> > 3. *Default/OOTB implementation:* a small bundled implementation that
> > proves the SPI capability (encapsulate correctly and expose sufficiently
> > for extension impls) works end-to-end,
> > 4. *Extension implementations*: richer provider/proxy/forwarder/custom
> > behavior for deployments that need it.
> >
> > This is not meant to reduce OpenLineage support. Quite the opposite:
> > OpenLineage can be the first explicit supported ingress format. The point
> > is to make the specificity explicit where it belongs, so Polaris can
> > support OpenLineage well now while preserving room for future
> contributions
> > in the right layer.
> >
> > *With that framing, I'd suggest*:
> > - Initial PR: OpenLineage-specific ingress + Polaris lineage capability
> > boundary + minimal default/OOTB path.
> > - Follow-up PRs: proxy/forwarder/custom provider implementations and
> > richer behavior.
> > - Query/persistence semantics: separate unless this proposal is
> explicitly
> > adding a read/query API.
> >
> > I think that would support the adoption goal you described, while keeping
> > Polaris extensible in an organized way.
> >
> > -ej
> >
> > On Thu, Jun 11, 2026 at 8:19 PM Adnan Hemani <[email protected]
> >
> > wrote:
> >
> >> Hi EJ,
> >>
> >> Thanks for looking at the proposal. I've responded to most of your
> >> comments on the document itself, but I'll summarize the stances here to
> >> close the loop.
> >>
> >> I am consciously making an effort to let the OpenLineage standard drive
> >> the requirements here; this is a feature, not a bug. IMO, OpenLineage is
> >> by-far the most well-used standard for data lineage; I don't even know
> of
> >> any other significant competitors. Big Data engines like Spark and
> Trino,
> >> which represent a significant use case for Polaris, have OpenLineage
> >> integrations and nothing else. Going the extra mile for further
> flexibility
> >> to de-couple our lineage implementations from OpenLineage will likely
> not
> >> produce any ROI in terms of work IMO. Happy to hear any other thoughts
> on
> >> this topic.
> >>
> >> I also don't agree that Polaris should morph into a full-fledged
> >> OpenLineage server. I don't think the Polaris community is attempting to
> >> make a "Swiss-Army Knife" tool out of Polaris. For major lineage use
> cases,
> >> users absolutely should be redirected to other servers like Marquez
> where
> >> they can get full graph history, multi-hop traversal, jobs/runs info,
> etc.
> >> I disagree with the "extensions" piece of your email based on this
> >> reasoning.
> >>
> >> Regarding the "out-of-the-box" experience, I have no doubt: Polaris
> >> cannot have lineage information. An admin must take a small step to
> >> configure how they want to enable Lineage data persistence: either for
> >> Polaris-local persistence or for the passthrough/proxy/AuthZ layer
> modes. I
> >> think you've missed some of the points in the mailing thread replies
> above;
> >> the Query API is really only helpful when using the Polaris local
> >> persistence mode. The current plan is to build toward "passthrough" mode
> >> first, with plans to support the Polaris local implementation soon
> >> afterward. A Query API won't be introduced until the Polaris local
> >> implementation work begins. This means there's no implication that a
> Query
> >> API will exist without returning data to the user. You can see this in
> my
> >> first PR, where only the Ingest API is implemented:
> >> https://github.com/apache/polaris/pull/4667.
> >>
> >> One last note/suggestion for you: the term "default battery" on its own
> >> generally doesn't make much sense. I'm only able to piece together your
> >> comments because you used the phrase "batteries included" in this
> morning's
> >> community sync. I would usually use "out-of-the-box (OOTB)" or "default
> >> implementation". Using similar terms in the future would improve
> >> readability in general.
> >>
> >> Best,
> >> Adnan Hemani
> >>
> >> On Thu, Jun 11, 2026 at 4:12 PM EJ Wang <[email protected]
> >
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I read through the proposal and the comments. One framing that may help
> >>> us converge is to split the proposal into a few separate decisions
> instead
> >>> of reviewing it as one bundled “OpenLineage support in Polaris”
> feature.
> >>>
> >>> This seems related to a broader direction I understand for Polaris as a
> >>> platform: it should be flexible enough to support different deployment
> and
> >>> integration use cases, but still battery-included enough to be useful
> out
> >>> of the box. For lineage, I think that means we should explicitly
> separate:
> >>> what Polaris promises as native lineage semantics, what the default
> battery
> >>> implementation does, and what should remain pluggable for richer or
> >>> deployment-specific implementations.
> >>>
> >>> I have been using a similar exercise in a recent SPI proposal draft:
> >>> first separate external contracts, default/battery implementation,
> >>> extension implementations, and provider-facing replacement points; then
> >>> decide implementation. I think that exercise applies well here because
> this
> >>> proposal touches several different boundary types at once: ingest
> protocol,
> >>> Polaris-native lineage model, persistence, query API, downstream
> >>> forwarding, auth, and dataset resolution.
> >>>
> >>> The questions I think we should separate are:
> >>>
> >>>    1. *OpenLineage compatibility: *Do we require existing OpenLineage
> >>>    clients to emit to Polaris by changing only the endpoint/config?
> >>>       - If yes, then a server-side OpenLineage-compatible adapter
> >>>       endpoint makes sense.
> >>>       - If not, another option is a Polaris-provided OpenLineage
> >>>       transport/client shim that reshapes OpenLineage events into a
> >>>       Polaris-native lineage API.
> >>>    - Those are different adoption tradeoffs, and I think we should
> >>>       choose intentionally rather than letting OpenLineage
> compatibility
> >>>       implicitly define the Polaris-native API.
> >>>    2. *Polaris-native lineage model: *Should the long-term Polaris
> >>>    lineage model/query API be OpenLineage-specific, or
> framework-agnostic with
> >>>    OpenLineage as one adapter?
> >>>       - My preference is the latter. OpenLineage compatibility is
> >>>       useful, but I would avoid making the OpenLineage payload shape
> the
> >>>       Polaris-native lineage model by accident.
> >>>    3. *Default battery behavior: *What should work out of the box?
> >>>       - If query is part of the initial release, I think the battery
> >>>       needs enough local state to answer a minimal query. A narrow
> default could
> >>>       be: latest observed direct table-level upstreams for a
> Polaris-managed
> >>>       target table, with observed timestamp, producer/engine
> identifier, and
> >>>       upstream dataset refs.
> >>>    4. *Extension implementations: *What should be pluggable or future
> >>>    work?
> >>>       - I would put raw OpenLineage forwarding/proxying, external
> >>>       backend query, full graph history, multi-hop traversal,
> column-level query,
> >>>       job/run graph, pruning/staleness, and richer governance-aware
> behavior into
> >>>       extension/future implementation areas rather than the default
> battery.
> >>>
> >>> *One subtle point*: I do not think the default battery and the REST/API
> >>> envelope need to have exactly the same scope.
> >>>
> >>> The default battery can be intentionally small. For example, latest
> >>> direct table-level lineage summary for Polaris-managed target tables.
> *But
> >>> the REST/API envelope can still be designed so that richer
> implementations
> >>> are possible later or through extensions*. For example, the API can
> >>> carry metadata such as *granularity (table/col/job etc.), format/source
> >>> protocol (OpenLineage or other lineage framework)*, or requested mode
> >>> to help Polaris route handling to the configured provider, without
> >>> requiring every default implementation to support every mode.
> >>>
> >>> Said differently, I would separate:
> >>>
> >>>    - what the API envelope can represent;
> >>>    - what the default battery actually guarantees;
> >>>    - what extension implementations can support.
> >>>
> >>> *My concrete recommendation would be*:
> >>>
> >>> If Polaris exposes a lineage Query API in the initial release, the
> >>> default battery should provide a minimal latest table-level summary
> >>> implementation so the query works out of the box. If we do not want any
> >>> local persistence in the initial release, then I think the Query API
> should
> >>> be out of scope for the initial release or clearly extension-provided.
> I
> >>> would avoid exposing a core query API whose default implementation
> cannot
> >>> answer anything.
> >>>
> >>> *My preferred shape would be*:
> >>>
> >>>    - Polaris-native lineage semantics stay *framework-agnostic*.
> >>>    - OpenLineage is supported as an adapter/adoption path, *not as the
> >>>    only Polaris lineage model*.
> >>>    - The default battery, if query is in scope, is latest direct
> >>>    table-level lineage summary only.
> >>>    - *The API envelope leaves room for richer provider
> implementations*.
> >>>    - Full OpenLineage backend behavior, downstream forwarding/proxying,
> >>>    historical graph, column lineage, job/run lineage, multi-hop query,
> >>>    pruning/staleness, and external backend query *are extension or
> >>>    future work*.
> >>>
> >>> This would still give Polaris a useful out-of-the-box lineage
> >>> experience, while avoiding turning Polaris into a full lineage backend
> in
> >>> the first step.
> >>>
> >>> -ej
> >>>
> >>> On Mon, Jun 8, 2026 at 2:31 PM Adnan Hemani via dev <
> >>> [email protected]> wrote:
> >>>
> >>>> Hi Robert,
> >>>>
> >>>> > Is my understanding correct that option 1 is out of scope from your
> >>>> perspective, and option 2 is not sufficient for the M0 you have in
> >>>> mind? In
> >>>> other words, you are proposing option 3 as the baseline, with active
> >>>> planning toward option 4?
> >>>>
> >>>> Yes, that's correct. Happy to hear others' opinions, but Option 4 has
> >>>> been
> >>>> detailed in the proposal document since the very start. I'm happy to
> >>>> wait a
> >>>> few more days for others' opinions, but as of now I don't see any
> active
> >>>> opposition to the plans as-is and the "lazy consensus" suggested
> >>>> deadline
> >>>> was over 2 weeks ago. I-Ting and I will start implementation in the
> >>>> meantime.
> >>>>
> >>>> Best,
> >>>> Adnan Hemani
> >>>>
> >>>> On Mon, Jun 8, 2026 at 3:19 AM Robert Stupp <[email protected]> wrote:
> >>>>
> >>>> > Hi all,
> >>>> >
> >>>> > Thanks Adnan, that helps clarify the shape.
> >>>> >
> >>>> > I think this is the point where broader community input would be
> >>>> useful,
> >>>> > because options 3/4 are a materially different commitment from
> >>>> options 1/2.
> >>>> >
> >>>> > Is my understanding correct that option 1 is out of scope from your
> >>>> > perspective, and option 2 is not sufficient for the M0 you have in
> >>>> mind? In
> >>>> > other words, you are proposing option 3 as the baseline, with active
> >>>> > planning toward option 4?
> >>>> >
> >>>> > Option 3 does not just put a proxy endpoint in Polaris.
> >>>> > It makes Polaris responsible for the OL ingest path: dataset-name
> >>>> > resolution, per-entity authZ over OL assertions, policy for
> >>>> non-Polaris
> >>>> > datasets, trusted-service credentials to downstream systems,
> >>>> request-size
> >>>> > and payload limits, forwarding failure semantics, audit behavior,
> and
> >>>> > tenant isolation.
> >>>> >
> >>>> > Option 4 then adds a Polaris-local lineage storage/query subsystem.
> >>>> > Even if the first version stores only a reduced projection, Polaris
> >>>> would
> >>>> > take on many responsibilities of an OL backend: persistence
> semantics,
> >>>> > query semantics, staleness/pruning, auth-filtered reads, backend
> >>>> > compatibility, migrations, limits, and long-term compatibility with
> OL
> >>>> > event shapes.
> >>>> > At that point, even if intentionally limited, Polaris effectively
> >>>> operates
> >>>> > as an OL backend for the supported subset.
> >>>> >
> >>>> > So before we treat option 3 plus active planning toward option 4 as
> >>>> the M0
> >>>> > baseline, I think it would be good to hear whether others agree that
> >>>> > Polaris should take on that implementation and maintenance surface
> >>>> for the
> >>>> > first milestone.
> >>>> >
> >>>> > Or whether we should start with a smaller integration point first.
> >>>> >
> >>>> > Robert
> >>>> >
> >>>>
> >>>
>


-- 
Dmitri Bourlatchkov
Senior Staff Software Engineer, Dremio
Dremio.com
<https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature>
/
Follow Us on LinkedIn <https://www.linkedin.com/company/dremio> / Get
Started <https://www.dremio.com/get-started/>


The Agentic Lakehouse
The only lakehouse built for agents, managed by agents

Reply via email to