Hi EJ, Unfortunately, the "/lineage" API is defined in the OpenLineage spec. Changing this out for Polaris would require client-side changes - leading us to the same situation that you confirmed after your investigation.
I agree with tackling the implementation similarly to what you've outlined. However, breaking this design into those topics may create more chaos than good because all these topics must work hand-in-hand design-wise and no other non-OpenLineage proposals for Data Lineage are expected in the near future. I request everyone to please review the initial PR that sets the Ingest API in Polaris: https://github.com/apache/polaris/pull/4667. Best, Adnan Hemani On Fri, Jun 12, 2026 at 12:02 PM EJ Wang <[email protected]> wrote: > Hi Adnan, > > I think your point about adoption is right, and I'd revise part of my > earlier framing after looking more closely at how existing OpenLineage > integrations work. > > I was previously thinking too much about whether clients could emit a more > Polaris-native or framework-agnostic payload. But that is probably not the > right first-slice adoption model. Existing OL producers generally already > emit OpenLineage events, and the common low-friction knob is the transport > target, URL/endpoint, not a uniform way to wrap or reshape the event body. > > So I agree that the first slice should optimize for endpoint retargeting > and raw OL event ingestion. Clients should not need to know that the > backend is Polaris or learn a Polaris-specific payload shape. > > *The design question I'd still like us to make explicit is where the > OpenLineage specificity lives*. My preference would be to make it > explicit at the ingress/API layer, for example with an OpenLineage-specific > route under the lineage namespace such as: /.../lineage/openlineage > > That still preserves endpoint-retargeting for existing OL producers, while > avoiding ambiguity about whether the generic `/lineage` namespace is an > OpenLineage contract or a broader Polaris lineage namespace. It also leaves > room for future `/lineage/<format>` ingress adapters if Polaris later > supports other lineage formats or frameworks. > > Behind that ingress route, I'd like to keep the platform boundary > Polaris-owned. I would separate: > > 1. *OpenLineage REST ingress/API* : an OL-aware endpoint that accepts raw > OL events. > 2. *Polaris lineage capability boundary*: a Polaris-owned contract behind > ingress. > 3. *Default/OOTB implementation:* a small bundled implementation that > proves the SPI capability (encapsulate correctly and expose sufficiently > for extension impls) works end-to-end, > 4. *Extension implementations*: richer provider/proxy/forwarder/custom > behavior for deployments that need it. > > This is not meant to reduce OpenLineage support. Quite the opposite: > OpenLineage can be the first explicit supported ingress format. The point > is to make the specificity explicit where it belongs, so Polaris can > support OpenLineage well now while preserving room for future contributions > in the right layer. > > *With that framing, I'd suggest*: > - Initial PR: OpenLineage-specific ingress + Polaris lineage capability > boundary + minimal default/OOTB path. > - Follow-up PRs: proxy/forwarder/custom provider implementations and > richer behavior. > - Query/persistence semantics: separate unless this proposal is explicitly > adding a read/query API. > > I think that would support the adoption goal you described, while keeping > Polaris extensible in an organized way. > > -ej > > On Thu, Jun 11, 2026 at 8:19 PM Adnan Hemani <[email protected]> > wrote: > >> Hi EJ, >> >> Thanks for looking at the proposal. I've responded to most of your >> comments on the document itself, but I'll summarize the stances here to >> close the loop. >> >> I am consciously making an effort to let the OpenLineage standard drive >> the requirements here; this is a feature, not a bug. IMO, OpenLineage is >> by-far the most well-used standard for data lineage; I don't even know of >> any other significant competitors. Big Data engines like Spark and Trino, >> which represent a significant use case for Polaris, have OpenLineage >> integrations and nothing else. Going the extra mile for further flexibility >> to de-couple our lineage implementations from OpenLineage will likely not >> produce any ROI in terms of work IMO. Happy to hear any other thoughts on >> this topic. >> >> I also don't agree that Polaris should morph into a full-fledged >> OpenLineage server. I don't think the Polaris community is attempting to >> make a "Swiss-Army Knife" tool out of Polaris. For major lineage use cases, >> users absolutely should be redirected to other servers like Marquez where >> they can get full graph history, multi-hop traversal, jobs/runs info, etc. >> I disagree with the "extensions" piece of your email based on this >> reasoning. >> >> Regarding the "out-of-the-box" experience, I have no doubt: Polaris >> cannot have lineage information. An admin must take a small step to >> configure how they want to enable Lineage data persistence: either for >> Polaris-local persistence or for the passthrough/proxy/AuthZ layer modes. I >> think you've missed some of the points in the mailing thread replies above; >> the Query API is really only helpful when using the Polaris local >> persistence mode. The current plan is to build toward "passthrough" mode >> first, with plans to support the Polaris local implementation soon >> afterward. A Query API won't be introduced until the Polaris local >> implementation work begins. This means there's no implication that a Query >> API will exist without returning data to the user. You can see this in my >> first PR, where only the Ingest API is implemented: >> https://github.com/apache/polaris/pull/4667. >> >> One last note/suggestion for you: the term "default battery" on its own >> generally doesn't make much sense. I'm only able to piece together your >> comments because you used the phrase "batteries included" in this morning's >> community sync. I would usually use "out-of-the-box (OOTB)" or "default >> implementation". Using similar terms in the future would improve >> readability in general. >> >> Best, >> Adnan Hemani >> >> On Thu, Jun 11, 2026 at 4:12 PM EJ Wang <[email protected]> >> wrote: >> >>> Hi all, >>> >>> I read through the proposal and the comments. One framing that may help >>> us converge is to split the proposal into a few separate decisions instead >>> of reviewing it as one bundled “OpenLineage support in Polaris” feature. >>> >>> This seems related to a broader direction I understand for Polaris as a >>> platform: it should be flexible enough to support different deployment and >>> integration use cases, but still battery-included enough to be useful out >>> of the box. For lineage, I think that means we should explicitly separate: >>> what Polaris promises as native lineage semantics, what the default battery >>> implementation does, and what should remain pluggable for richer or >>> deployment-specific implementations. >>> >>> I have been using a similar exercise in a recent SPI proposal draft: >>> first separate external contracts, default/battery implementation, >>> extension implementations, and provider-facing replacement points; then >>> decide implementation. I think that exercise applies well here because this >>> proposal touches several different boundary types at once: ingest protocol, >>> Polaris-native lineage model, persistence, query API, downstream >>> forwarding, auth, and dataset resolution. >>> >>> The questions I think we should separate are: >>> >>> 1. *OpenLineage compatibility: *Do we require existing OpenLineage >>> clients to emit to Polaris by changing only the endpoint/config? >>> - If yes, then a server-side OpenLineage-compatible adapter >>> endpoint makes sense. >>> - If not, another option is a Polaris-provided OpenLineage >>> transport/client shim that reshapes OpenLineage events into a >>> Polaris-native lineage API. >>> - Those are different adoption tradeoffs, and I think we should >>> choose intentionally rather than letting OpenLineage compatibility >>> implicitly define the Polaris-native API. >>> 2. *Polaris-native lineage model: *Should the long-term Polaris >>> lineage model/query API be OpenLineage-specific, or framework-agnostic >>> with >>> OpenLineage as one adapter? >>> - My preference is the latter. OpenLineage compatibility is >>> useful, but I would avoid making the OpenLineage payload shape the >>> Polaris-native lineage model by accident. >>> 3. *Default battery behavior: *What should work out of the box? >>> - If query is part of the initial release, I think the battery >>> needs enough local state to answer a minimal query. A narrow default >>> could >>> be: latest observed direct table-level upstreams for a Polaris-managed >>> target table, with observed timestamp, producer/engine identifier, and >>> upstream dataset refs. >>> 4. *Extension implementations: *What should be pluggable or future >>> work? >>> - I would put raw OpenLineage forwarding/proxying, external >>> backend query, full graph history, multi-hop traversal, column-level >>> query, >>> job/run graph, pruning/staleness, and richer governance-aware >>> behavior into >>> extension/future implementation areas rather than the default battery. >>> >>> *One subtle point*: I do not think the default battery and the REST/API >>> envelope need to have exactly the same scope. >>> >>> The default battery can be intentionally small. For example, latest >>> direct table-level lineage summary for Polaris-managed target tables. *But >>> the REST/API envelope can still be designed so that richer implementations >>> are possible later or through extensions*. For example, the API can >>> carry metadata such as *granularity (table/col/job etc.), format/source >>> protocol (OpenLineage or other lineage framework)*, or requested mode >>> to help Polaris route handling to the configured provider, without >>> requiring every default implementation to support every mode. >>> >>> Said differently, I would separate: >>> >>> - what the API envelope can represent; >>> - what the default battery actually guarantees; >>> - what extension implementations can support. >>> >>> *My concrete recommendation would be*: >>> >>> If Polaris exposes a lineage Query API in the initial release, the >>> default battery should provide a minimal latest table-level summary >>> implementation so the query works out of the box. If we do not want any >>> local persistence in the initial release, then I think the Query API should >>> be out of scope for the initial release or clearly extension-provided. I >>> would avoid exposing a core query API whose default implementation cannot >>> answer anything. >>> >>> *My preferred shape would be*: >>> >>> - Polaris-native lineage semantics stay *framework-agnostic*. >>> - OpenLineage is supported as an adapter/adoption path, *not as the >>> only Polaris lineage model*. >>> - The default battery, if query is in scope, is latest direct >>> table-level lineage summary only. >>> - *The API envelope leaves room for richer provider implementations*. >>> - Full OpenLineage backend behavior, downstream forwarding/proxying, >>> historical graph, column lineage, job/run lineage, multi-hop query, >>> pruning/staleness, and external backend query *are extension or >>> future work*. >>> >>> This would still give Polaris a useful out-of-the-box lineage >>> experience, while avoiding turning Polaris into a full lineage backend in >>> the first step. >>> >>> -ej >>> >>> On Mon, Jun 8, 2026 at 2:31 PM Adnan Hemani via dev < >>> [email protected]> wrote: >>> >>>> Hi Robert, >>>> >>>> > Is my understanding correct that option 1 is out of scope from your >>>> perspective, and option 2 is not sufficient for the M0 you have in >>>> mind? In >>>> other words, you are proposing option 3 as the baseline, with active >>>> planning toward option 4? >>>> >>>> Yes, that's correct. Happy to hear others' opinions, but Option 4 has >>>> been >>>> detailed in the proposal document since the very start. I'm happy to >>>> wait a >>>> few more days for others' opinions, but as of now I don't see any active >>>> opposition to the plans as-is and the "lazy consensus" suggested >>>> deadline >>>> was over 2 weeks ago. I-Ting and I will start implementation in the >>>> meantime. >>>> >>>> Best, >>>> Adnan Hemani >>>> >>>> On Mon, Jun 8, 2026 at 3:19 AM Robert Stupp <[email protected]> wrote: >>>> >>>> > Hi all, >>>> > >>>> > Thanks Adnan, that helps clarify the shape. >>>> > >>>> > I think this is the point where broader community input would be >>>> useful, >>>> > because options 3/4 are a materially different commitment from >>>> options 1/2. >>>> > >>>> > Is my understanding correct that option 1 is out of scope from your >>>> > perspective, and option 2 is not sufficient for the M0 you have in >>>> mind? In >>>> > other words, you are proposing option 3 as the baseline, with active >>>> > planning toward option 4? >>>> > >>>> > Option 3 does not just put a proxy endpoint in Polaris. >>>> > It makes Polaris responsible for the OL ingest path: dataset-name >>>> > resolution, per-entity authZ over OL assertions, policy for >>>> non-Polaris >>>> > datasets, trusted-service credentials to downstream systems, >>>> request-size >>>> > and payload limits, forwarding failure semantics, audit behavior, and >>>> > tenant isolation. >>>> > >>>> > Option 4 then adds a Polaris-local lineage storage/query subsystem. >>>> > Even if the first version stores only a reduced projection, Polaris >>>> would >>>> > take on many responsibilities of an OL backend: persistence semantics, >>>> > query semantics, staleness/pruning, auth-filtered reads, backend >>>> > compatibility, migrations, limits, and long-term compatibility with OL >>>> > event shapes. >>>> > At that point, even if intentionally limited, Polaris effectively >>>> operates >>>> > as an OL backend for the supported subset. >>>> > >>>> > So before we treat option 3 plus active planning toward option 4 as >>>> the M0 >>>> > baseline, I think it would be good to hear whether others agree that >>>> > Polaris should take on that implementation and maintenance surface >>>> for the >>>> > first milestone. >>>> > >>>> > Or whether we should start with a smaller integration point first. >>>> > >>>> > Robert >>>> > >>>> >>>
