Re: [DISCUSS] SPIP: The metrics & semantic modeling in Spark

Anand Chinnakannan Thu, 06 Nov 2025 09:57:40 -0800

Hi Team,

I’ve reviewed and updated our Metric View proposal (Q1–Q8) to include two
key enhancements:

🔐 Governance Integration: Metric Views are now treated as governed,
first-class catalog objects with access control, lineage tracking, and
versioning — ensuring metrics remain secure, auditable, and consistently
defined.

🤖 LLM Agent Integration: Added guidance on how LLMs and AI agents can
discover and query metric views through catalog metadata for consistent,
governed responses to natural-language queries.

These updates align with our goal of making Metric Views the single source
of truth for analytical and AI-driven use cases.

I’d love your input on these sections — especially around:

1. Any additional governance scenarios we should consider.

2. LLM integration edge cases or optimization ideas.

3. Suggestions for examples, syntax, or long-term roadmap points.

Please feel free to add comments, edits, or examples directly in the
document, or share your thoughts in reply.
Your contributions will help us finalize a stronger, more complete proposal
for review.

Thank you for your time and collaboration — looking forward to your
insights!

Best regards,
Anand Chinnakannan
Staff Data Scientist | Walmart
Executive MBA Candidate, Quantic School of Business & Technology
📧 [email protected]

On Thu, Nov 6, 2025, 10:27 AM Wenchen Fan <[email protected]> wrote:

> Thanks for the proposal! I believe this is a very useful feature, as the
> other alternatives do not work well: people need to either define many
> similar views with different grouping columns and aggregate functions, or
> manually maintain a doc page to describe the semantic of these metrics that
> people need to follow when writing queries to calculate these metrics.
>
> Shall we start the vote next week if there is no objections?
>
> On Fri, Oct 31, 2025 at 2:30 PM Linhong Liu
> <[email protected]> wrote:
>
>> Hi all,
>>
>> I would like to propose introducing "The metrics & semantic modeling in
>> Spark".
>>
>> This feature enables defining business metrics once and reusing them
>> across any breakdown, ensuring consistent outcomes and bridging the
>> semantic gap between business logic and data schemas to help LLMs generate
>> more precise results.
>>
>> Looking forward to your feedback!
>>
>> JIRA: SPARK-54119 <https://issues.apache.org/jira/browse/SPARK-54119>
>> SPIP docs:
>> https://docs.google.com/document/d/1xVTLijvDTJ90lZ_ujwzf9HvBJgWg0mY6cYM44Fcghl0/edit?tab=t.0#heading=h.4iogryr5qznc
>> <https://docs.google.com/document/d/1xVTLijvDTJ90lZ_ujwzf9HvBJgWg0mY6cYM44Fcghl0/edit?tab=t.0#heading=h.4iogryr5qznc>
>>
>> Thanks,
>> Linhong
>>
>

Re: [DISCUSS] SPIP: The metrics & semantic modeling in Spark

Reply via email to