Re: [DISCUSS] SPIP: The metrics & semantic modeling in Spark

Wenchen Fan Thu, 06 Nov 2025 10:47:31 -0800

Hi Anand,

Thanks for the review! You are right that making metric view the single
source of truth brings better user experience, and also allows better
governance and LLM integration. However, I think how to handle the
integration is out of the scope of Spark, and should be taken care by the
vendors who provide these infrastructure.


Thanks,
Wenchen

On Thu, Nov 6, 2025 at 9:57 AM Anand Chinnakannan <[email protected]>
wrote:

> Hi Team,
>
> I’ve reviewed and updated our Metric View proposal (Q1–Q8) to include two
> key enhancements:
>
> 🔐 Governance Integration: Metric Views are now treated as governed,
> first-class catalog objects with access control, lineage tracking, and
> versioning — ensuring metrics remain secure, auditable, and consistently
> defined.
>
> 🤖 LLM Agent Integration: Added guidance on how LLMs and AI agents can
> discover and query metric views through catalog metadata for consistent,
> governed responses to natural-language queries.
>
>
> These updates align with our goal of making Metric Views the single source
> of truth for analytical and AI-driven use cases.
>
> I’d love your input on these sections — especially around:
>
> 1. Any additional governance scenarios we should consider.
>
>
> 2. LLM integration edge cases or optimization ideas.
>
>
> 3. Suggestions for examples, syntax, or long-term roadmap points.
>
>
> Please feel free to add comments, edits, or examples directly in the
> document, or share your thoughts in reply.
> Your contributions will help us finalize a stronger, more complete
> proposal for review.
>
> Thank you for your time and collaboration — looking forward to your
> insights!
>
> Best regards,
> Anand Chinnakannan
> Staff Data Scientist | Walmart
> Executive MBA Candidate, Quantic School of Business & Technology
> 📧 [email protected]
>
> On Thu, Nov 6, 2025, 10:27 AM Wenchen Fan <[email protected]> wrote:
>
>> Thanks for the proposal! I believe this is a very useful feature, as the
>> other alternatives do not work well: people need to either define many
>> similar views with different grouping columns and aggregate functions, or
>> manually maintain a doc page to describe the semantic of these metrics that
>> people need to follow when writing queries to calculate these metrics.
>>
>> Shall we start the vote next week if there is no objections?
>>
>> On Fri, Oct 31, 2025 at 2:30 PM Linhong Liu
>> <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I would like to propose introducing "The metrics & semantic modeling in
>>> Spark".
>>>
>>> This feature enables defining business metrics once and reusing them
>>> across any breakdown, ensuring consistent outcomes and bridging the
>>> semantic gap between business logic and data schemas to help LLMs generate
>>> more precise results.
>>>
>>> Looking forward to your feedback!
>>>
>>> JIRA: SPARK-54119 <https://issues.apache.org/jira/browse/SPARK-54119>
>>> SPIP docs:
>>> https://docs.google.com/document/d/1xVTLijvDTJ90lZ_ujwzf9HvBJgWg0mY6cYM44Fcghl0/edit?tab=t.0#heading=h.4iogryr5qznc
>>> <https://docs.google.com/document/d/1xVTLijvDTJ90lZ_ujwzf9HvBJgWg0mY6cYM44Fcghl0/edit?tab=t.0#heading=h.4iogryr5qznc>
>>>
>>> Thanks,
>>> Linhong
>>>
>>

Re: [DISCUSS] SPIP: The metrics & semantic modeling in Spark

Reply via email to