Re: [GSoC 2026] GSOC-304 design doc — feedback requested

Zh D Sat, 23 May 2026 05:21:58 -0700

Hi all (and thanks again, Xuan),

Following up on Xuan Wang's mentor-side reply from 2026-05-20 and per the
cadence he proposed, I've published v1.1 of the GSOC-304 design doc.


Same Drive link (file replaced in place; the URL is stable):

https://drive.google.com/file/d/1jXMCwF_HVvCR5lHDIT_1pv1DiIZt8j5O/view?usp=sharing

== Changes in v1.1 (vs. the 2026-05-18 v1 posted to this list) ==

Driven by Xuan's reply. Per his proposed timeline, the Q1/Q2/Q3 last-call
objection window ran until 2026-05-22 EOD UTC; no objections surfaced, so
the v1.1 positions become the working baseline going into the 2026-05-25
coding start. Of course, follow-up input on this thread is still welcome.

1. §4.9 Q1 (Attribute schema) — recommendation Option B unchanged, but the
evidence is reordered: alignment with ThingsBoard's own `attribute_kv`
schema (where `attribute_type` and `attribute_key` are already separate
primary-key dimensions) is now the primary, load-bearing argument.
Cardinality math is demoted to secondary tiebreaker. The recommendation
itself does not change.

2. §4.10 Q4 (TAG-column ordering) — the baseline `(tenant_id, entity_type,
entity_id, attribute_scope, key)` ordering is documented with explicit
"most-selective-first" rationale (tenant predicate hits every read path;
entity_id is the next most selective dimension). The question is explicitly
flagged for IoTDB Table Mode committers on this list — if any of you can
confirm whether the 2.x storage engine's predicate-pushdown layer prefers a
different ordering, that would save Wk 6 EXPLAIN ANALYZE cycles. Otherwise
Wk 6 benchmarks remain the empirical safety net.

3. §6.0 Q6 (AttributesDao activation) — mentor preference order is now
captured as (1) > (3) > (2). Path (2) (Spring profile composition without
an upstream change) is explicitly rejected for the reasons Xuan laid out
(externalizes configuration complexity onto every deployer to avoid a small
upstream patch — not worth the long-term cost). Path (3) is the canonical
fallback if Path (1) is not resolved with ThingsBoard maintainers by end of
W4 (2026-06-21).

4. §5.4 (entity_labels contingent design) — retained per Xuan's guidance as
a v2 sketch in case ThingsBoard upstream ever evolves labels into a
multi-tag feature. Phase 1 still does not mirror labels into IoTDB (Q2
unchanged).

Q3 (TTL='INF' default) and Q5 (5-backend benchmark scope, with TimescaleDB
as the W11-12 cut candidate rather than upfront drop) are unchanged in
v1.1; Xuan concurred on both.

== Open asks (last call) ==

Per Xuan's proposed cadence:

- 2026-05-22 (Fri) EOD UTC — last window for community objection on Q1, Q2,
Q3. If no objections, the v1.1 positions become the working baseline.
- 2026-05-23 (Sat) — v1.1 is now posted; this email is that posting.
- 2026-05-25 (Mon) — coding period begins on the basis of v1.1.

Q4 — open call to IoTDB Table Mode committers: any guidance on optimal
TAG-column ordering for 2.x predicate pushdown beyond
"most-selective-first"? Will fall back to Wk 6 EXPLAIN ANALYZE if not.

Q6 — open ask to anyone with working relationships with ThingsBoard DAO
reviewers: please reach out off-list. I plan to draft a focused cross-post
to ThingsBoard Discussion #15296 once this dev@ thread settles, but warm
introductions would meaningfully shorten the upstream-patch timeline.

Thanks again — and Xuan, thanks for the very thorough mentor-side pass.

Zihan Dai
GitHub: https://github.com/PDGGK

On Wed, May 20, 2026 02:08 PM, Wang Critas <[email protected]> wrote:

> Hi Zihan,
>
>
> Thanks for getting this out before community bonding wraps — the 16-page
> doc, the cardinality math, and the per-DAO Spring activation matrix in §6.0
> are exactly the level of detail I was hoping for at this stage.
>
> Below is my mentor-side position on the six questions. Please treat these
> as one input alongside whatever the broader community contributes;
> community feedback wins where it disagrees with me.
>
> Q1 — Attribute table schema: Option B (scope-as-TAG)
>
> Concur with Option B. The load-bearing argument for me is alignment with
> ThingsBoard's own attribute_kv schema, where attribute_type (scope) and
> attribute_key are already separate dimensions. Mirroring that on the IoTDB
> side keeps the mental model identical for any ThingsBoard contributor and
> shortens onboarding time for future maintainers. The cardinality math is a
> fine secondary tiebreaker, but the schema-convention argument is what
> should carry it.
>
> On the §4.10 sub-questions:
>
>   *   scope-constant naming: prefer the literal ThingsBoard enum values
> (CLIENT_SCOPE, SERVER_SCOPE, SHARED_SCOPE) rather than shortened forms.
> Round-trip equality with ThingsBoard's existing identifiers removes a
> translation layer and makes log/SQL inspection less ambiguous.
>
>   *   TTL: see Q3.
>
>   *   TAG ordering: see Q4.
>
> Q2 — Label handling: keep out of IoTDB in Phase 1
>
> Concur. HasLabel in current ThingsBoard is a single private String label
> field on Device/Asset — there is no Phase-1 user-facing requirement that
> needs a per-label time-series or a multi-valued tag set on the IoTDB side.
> Leaving it where it already lives (entity DB) is correct and keeps the
> IoTDB side focused on the actual time-series + attributes story.
>
> Keep §5.4 (entity_labels with current-state contract, no tombstones) in
> the doc as a v2 sketch. If ThingsBoard upstream ever evolves labels into a
> multi-tag feature, we don't want to redesign from scratch.
>
> Q3 — TTL='INF' default for entity_attributes
>
> Strongly recommend TTL='INF'. Attributes are latest-state configuration,
> not telemetry. A finite default TTL would silently drop tenant
> configuration after the window expires — that is a correctness bug from the
> operator's perspective, not a tunable trade-off. If a specific deployment
> ever needs finite retention on an attribute namespace, that should be an
> explicit per-deployment override rather than the default.
>
> Q4 — TAG column ordering
>
> I'd start with (tenant_id, entity_type, entity_id, attribute_scope, key)
> as the baseline: tenant_id first matches the multi-tenant predicate that
> hits every read path, and entity_id is the next most selective dimension
> under any realistic load.
>
> That said, I don't want to overcommit on Table Mode internals without
> input from the IoTDB core side. Flagging this one explicitly for IoTDB
> Table Mode committers on this list — does the 2.x storage engine's
> predicate-pushdown layer prefer a different ordering, or is the "most
> selective first" heuristic still the right one here? EXPLAIN ANALYZE
> benchmarks in W6 are a sensible safety net regardless, but starting from an
> informed baseline beats burning W6 cycles on combinatorial ordering
> experiments.
>
> Q5 — Benchmark scope
>
> Keep the full 5 (IoTDB Table / IoTDB Tree / Cassandra / PostgreSQL /
> TimescaleDB):
>
>   *   IoTDB Tree is the migration baseline — required.
>
>   *   Cassandra and PostgreSQL are ThingsBoard's actual production
> backends — required.
>
>   *   TimescaleDB is the time-series PostgreSQL that any external write-up
> will be compared against — also required, but if W11–12 timeline pressure
> forces a cut, this is the first to move into the appendix as "deferred"
> rather than dropped from scope upfront.
>
> Don't trim early. Cut at the end if W11–12 is tight.
>
> Q6 — AttributesDao activation (for ThingsBoard maintainers)
>
> Mentor preference order: (1) > (3) > (2).
>
>   *   Option (1) (new database.attributes.type property mirroring ts.type)
> is the cleanest separation and the one ThingsBoard maintainers are most
> likely to accept because it reuses an existing, well-understood pattern.
> The upstream patch is small and self-contained.
>
>   *   Option (3) (defer AttributesDao to stretch, attributes stay in
> entity DB Phase 1) is a perfectly acceptable fallback if maintainer
> feedback on (1) is slow or skeptical. Keeps Phase 1 focused on the
> value-dense time-series path; AttributesDao can ship as Phase 2 once (1)'s
> upstream landed.
>
>   *   Option (2) (Spring profile composition without upstream change) is
> the one I'd push back on. It works in theory but creates a configuration
> surface no ThingsBoard operator has seen before; it externalises complexity
> onto deployers in exchange for avoiding a 30-line upstream patch. Not worth
> the long-term cost.
>
> On cross-community engagement: agree with your plan to cross-post a
> focused summary to ThingsBoard Discussion #15296 once this dev@ thread
> settles, and to reach the ThingsBoard dev channel on Q6 specifically. I do
> not personally have warm contacts on the ThingsBoard DAO reviewer team.
> Opening that to this list as well: if anyone here has working relationships
> with ThingsBoard DAO reviewers, please reach out off-list.
>
> Decision timeline
>
> Even if community input is light, we should not block coding on perfect
> consensus. Proposed cadence:
>
>   *   By 2026-05-22 (Fri) EOD UTC — any objector on Q1, Q2, Q3 raises it
> on this thread.
>
>   *   2026-05-23 (Sat) — Zihan posts an updated v1.1 design doc with
> resolved decisions.
>
>   *   2026-05-25 (Mon) — coding starts on the basis of v1.1.
>
> Q4 can be revisited via EXPLAIN ANALYZE benchmarks in W6 if the IoTDB
> Table Mode committers do not surface a definitive answer on this thread. Q6
> follows the ThingsBoard-side conversation on its own clock; if it does not
> resolve by end of W4, we default to Option (3) and AttributesDao moves to
> stretch — as already flagged in your Wk 5 scope note.
>
> Thanks again, Zihan — really solid foundation for the coding period.
>
> Best regards,
>
> Xuan Wang
> Apache IoTDB
>
> 发件人: Zh D <[email protected]>
> 日期: 星期一, 2026年5月18日 13:46
> 收件人: [email protected] <[email protected]>
> 主题: [GSoC 2026] GSOC-304 design doc — feedback requested
>
> Hi all,
>
> Following up on my May 8 self-introduction, I've completed the refined
> design document for GSOC-304 (Enhancing ThingsBoard Integration with IoTDB
> 2.x Table Mode). Posting it now, before community-bonding wraps on May 24,
> so feedback can land before coding starts on May 25.
>
> Full document (16 pages, 12 sections + 3 appendices):
>
>
> https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1jXMCwF_HVvCR5lHDIT_1pv1DiIZt8j5O%2Fview%3Fusp%3Dsharing&data=05%7C02%7C%7C7787e862489f4277525408deb4a0be33%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639146799665605330%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=CLw2nWgoftwPO0ko5cUnN5D9Qz1Tl2FT8HFPo8Mafio%3D&reserved=0
> <
> https://drive.google.com/file/d/1jXMCwF_HVvCR5lHDIT_1pv1DiIZt8j5O/view?usp=sharing
> >
>
> == TL;DR ==
>
> The doc proposes concrete recommendations for the two open design questions
> called out by my mentor (Xuan Wang), with evidence from ThingsBoard
> schema/API/Rule-Engine and IoTDB Table Mode primitives:
>
> 1. Attribute table schema → recommend Option B (single `entity_attributes`
> table with `attribute_scope` as TAG). Backs the PoC's existing choice with
> cardinality math (Options A/B/C all represent ~150K logical attribute
> identities under typical multi-tenant deployment) + ThingsBoard's own SQL
> schema convention (scope and key are separate dimensions in `attribute_kv`)
> + native scope filtering.
>
> 2. Label handling → recommend Phase 1 does not mirror labels into IoTDB.
> Labels in current ThingsBoard are a singular optional entity field on
> Device/Asset (`HasLabel`, `private String label`), not a many-tag feature;
> they fit the existing entity DB. A contingent design (separate
> `entity_labels` table with current-state contract, no tombstones) is
> sketched in §5.4 in case the community wants a label index on the IoTDB
> side.
>
> The doc also includes a per-DAO Spring activation matrix in §6.0, the
> 12-week implementation plan aligned to mentor's phasing, a 5-test-case
> benchmark plan vs. IoTDB Tree / Cassandra / PostgreSQL / TimescaleDB,
> migration approach, and risks.
>
> Note on scope: the Wk 5 plan assumes the AttributesDao activation question
> (Q6 below) resolves to option (1) or (2). If maintainers prefer (3),
> AttributesDao moves to stretch and Wk 5 shifts to telemetry/latest polish.
>
> == Asks for community feedback (full context in §12 of the doc) ==
>
> 1. Do you concur with Option B (scope-as-TAG) for the attribute table? See
> §4.10 for sub-questions on scope-constant naming, TTL, and TAG ordering.
>
> 2. Is the recommendation to keep labels out of IoTDB in Phase 1 acceptable,
> or should the design include `entity_labels` from day one? See §5.5.
>
> 3. Is `TTL='INF'` the right default for `entity_attributes` (latest-state
> metadata), or is there a real use case for a finite default TTL?
>
> 4. Any guidance on optimal TAG column ordering for IoTDB 2.x predicate
> pruning — `(tenant_id, entity_type, entity_id, attribute_scope, key)` or
> scope/key-first — or should I rely on `EXPLAIN ANALYZE` benchmarks during
> Wk 6?
>
> 5. Is the 5-backend benchmark scope (IoTDB Table / IoTDB Tree / Cassandra /
> PostgreSQL / TimescaleDB) the right set, or should TimescaleDB be optional?
>
> 6. (For ThingsBoard maintainers in particular) AttributesDao activation: do
> you prefer (1) a new `database.attributes.type` property mirroring the
> `ts.type` pattern (requires a small upstream patch), (2) Spring profile
> composition with the existing entity-DB switch, or (3) keep attributes in
> the entity DB for Phase 1 and treat AttributesDao as a stretch goal? See
> §6.0.
>
> Communication status on Q6: I have not yet engaged ThingsBoard maintainers
> directly on this question. Once this dev-list thread settles, I plan to
> cross-post a focused summary of Q6 to ThingsBoard Discussion #15296 and
> reach out via the ThingsBoard dev channel for maintainer-side input. If
> anyone here has working contacts with the ThingsBoard DAO reviewer team,
> introductions would be very welcome.
>
> I'd really value input on any of the above before May 25. Specific replies
> on questions 1, 2, and 6 are most time-critical, since they shape Phase 1
> scope and Wk 5 deliverables.
>
> Thanks,
> Zihan Dai
> GitHub:
> https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPDGGK&data=05%7C02%7C%7C7787e862489f4277525408deb4a0be33%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639146799665655691%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6RsuwT4NFNWEaRoCzVSpq4QicI4yZnDoAb4QLPpjls8%3D&reserved=0
> <https://github.com/PDGGK>
>

Re: [GSoC 2026] GSOC-304 design doc — feedback requested

Reply via email to