Hi all (and thanks again, Xuan), Following up on Xuan Wang's mentor-side reply from 2026-05-20 and per the cadence he proposed, I've published v1.1 of the GSOC-304 design doc.
Same Drive link (file replaced in place; the URL is stable): https://drive.google.com/file/d/1jXMCwF_HVvCR5lHDIT_1pv1DiIZt8j5O/view?usp=sharing == Changes in v1.1 (vs. the 2026-05-18 v1 posted to this list) == Driven by Xuan's reply. Per his proposed timeline, the Q1/Q2/Q3 last-call objection window ran until 2026-05-22 EOD UTC; no objections surfaced, so the v1.1 positions become the working baseline going into the 2026-05-25 coding start. Of course, follow-up input on this thread is still welcome. 1. §4.9 Q1 (Attribute schema) — recommendation Option B unchanged, but the evidence is reordered: alignment with ThingsBoard's own `attribute_kv` schema (where `attribute_type` and `attribute_key` are already separate primary-key dimensions) is now the primary, load-bearing argument. Cardinality math is demoted to secondary tiebreaker. The recommendation itself does not change. 2. §4.10 Q4 (TAG-column ordering) — the baseline `(tenant_id, entity_type, entity_id, attribute_scope, key)` ordering is documented with explicit "most-selective-first" rationale (tenant predicate hits every read path; entity_id is the next most selective dimension). The question is explicitly flagged for IoTDB Table Mode committers on this list — if any of you can confirm whether the 2.x storage engine's predicate-pushdown layer prefers a different ordering, that would save Wk 6 EXPLAIN ANALYZE cycles. Otherwise Wk 6 benchmarks remain the empirical safety net. 3. §6.0 Q6 (AttributesDao activation) — mentor preference order is now captured as (1) > (3) > (2). Path (2) (Spring profile composition without an upstream change) is explicitly rejected for the reasons Xuan laid out (externalizes configuration complexity onto every deployer to avoid a small upstream patch — not worth the long-term cost). Path (3) is the canonical fallback if Path (1) is not resolved with ThingsBoard maintainers by end of W4 (2026-06-21). 4. §5.4 (entity_labels contingent design) — retained per Xuan's guidance as a v2 sketch in case ThingsBoard upstream ever evolves labels into a multi-tag feature. Phase 1 still does not mirror labels into IoTDB (Q2 unchanged). Q3 (TTL='INF' default) and Q5 (5-backend benchmark scope, with TimescaleDB as the W11-12 cut candidate rather than upfront drop) are unchanged in v1.1; Xuan concurred on both. == Open asks (last call) == Per Xuan's proposed cadence: - 2026-05-22 (Fri) EOD UTC — last window for community objection on Q1, Q2, Q3. If no objections, the v1.1 positions become the working baseline. - 2026-05-23 (Sat) — v1.1 is now posted; this email is that posting. - 2026-05-25 (Mon) — coding period begins on the basis of v1.1. Q4 — open call to IoTDB Table Mode committers: any guidance on optimal TAG-column ordering for 2.x predicate pushdown beyond "most-selective-first"? Will fall back to Wk 6 EXPLAIN ANALYZE if not. Q6 — open ask to anyone with working relationships with ThingsBoard DAO reviewers: please reach out off-list. I plan to draft a focused cross-post to ThingsBoard Discussion #15296 once this dev@ thread settles, but warm introductions would meaningfully shorten the upstream-patch timeline. Thanks again — and Xuan, thanks for the very thorough mentor-side pass. Zihan Dai GitHub: https://github.com/PDGGK On Wed, May 20, 2026 02:08 PM, Wang Critas <[email protected]> wrote: > Hi Zihan, > > > Thanks for getting this out before community bonding wraps — the 16-page > doc, the cardinality math, and the per-DAO Spring activation matrix in §6.0 > are exactly the level of detail I was hoping for at this stage. > > Below is my mentor-side position on the six questions. Please treat these > as one input alongside whatever the broader community contributes; > community feedback wins where it disagrees with me. > > Q1 — Attribute table schema: Option B (scope-as-TAG) > > Concur with Option B. The load-bearing argument for me is alignment with > ThingsBoard's own attribute_kv schema, where attribute_type (scope) and > attribute_key are already separate dimensions. Mirroring that on the IoTDB > side keeps the mental model identical for any ThingsBoard contributor and > shortens onboarding time for future maintainers. The cardinality math is a > fine secondary tiebreaker, but the schema-convention argument is what > should carry it. > > On the §4.10 sub-questions: > > * scope-constant naming: prefer the literal ThingsBoard enum values > (CLIENT_SCOPE, SERVER_SCOPE, SHARED_SCOPE) rather than shortened forms. > Round-trip equality with ThingsBoard's existing identifiers removes a > translation layer and makes log/SQL inspection less ambiguous. > > * TTL: see Q3. > > * TAG ordering: see Q4. > > Q2 — Label handling: keep out of IoTDB in Phase 1 > > Concur. HasLabel in current ThingsBoard is a single private String label > field on Device/Asset — there is no Phase-1 user-facing requirement that > needs a per-label time-series or a multi-valued tag set on the IoTDB side. > Leaving it where it already lives (entity DB) is correct and keeps the > IoTDB side focused on the actual time-series + attributes story. > > Keep §5.4 (entity_labels with current-state contract, no tombstones) in > the doc as a v2 sketch. If ThingsBoard upstream ever evolves labels into a > multi-tag feature, we don't want to redesign from scratch. > > Q3 — TTL='INF' default for entity_attributes > > Strongly recommend TTL='INF'. Attributes are latest-state configuration, > not telemetry. A finite default TTL would silently drop tenant > configuration after the window expires — that is a correctness bug from the > operator's perspective, not a tunable trade-off. If a specific deployment > ever needs finite retention on an attribute namespace, that should be an > explicit per-deployment override rather than the default. > > Q4 — TAG column ordering > > I'd start with (tenant_id, entity_type, entity_id, attribute_scope, key) > as the baseline: tenant_id first matches the multi-tenant predicate that > hits every read path, and entity_id is the next most selective dimension > under any realistic load. > > That said, I don't want to overcommit on Table Mode internals without > input from the IoTDB core side. Flagging this one explicitly for IoTDB > Table Mode committers on this list — does the 2.x storage engine's > predicate-pushdown layer prefer a different ordering, or is the "most > selective first" heuristic still the right one here? EXPLAIN ANALYZE > benchmarks in W6 are a sensible safety net regardless, but starting from an > informed baseline beats burning W6 cycles on combinatorial ordering > experiments. > > Q5 — Benchmark scope > > Keep the full 5 (IoTDB Table / IoTDB Tree / Cassandra / PostgreSQL / > TimescaleDB): > > * IoTDB Tree is the migration baseline — required. > > * Cassandra and PostgreSQL are ThingsBoard's actual production > backends — required. > > * TimescaleDB is the time-series PostgreSQL that any external write-up > will be compared against — also required, but if W11–12 timeline pressure > forces a cut, this is the first to move into the appendix as "deferred" > rather than dropped from scope upfront. > > Don't trim early. Cut at the end if W11–12 is tight. > > Q6 — AttributesDao activation (for ThingsBoard maintainers) > > Mentor preference order: (1) > (3) > (2). > > * Option (1) (new database.attributes.type property mirroring ts.type) > is the cleanest separation and the one ThingsBoard maintainers are most > likely to accept because it reuses an existing, well-understood pattern. > The upstream patch is small and self-contained. > > * Option (3) (defer AttributesDao to stretch, attributes stay in > entity DB Phase 1) is a perfectly acceptable fallback if maintainer > feedback on (1) is slow or skeptical. Keeps Phase 1 focused on the > value-dense time-series path; AttributesDao can ship as Phase 2 once (1)'s > upstream landed. > > * Option (2) (Spring profile composition without upstream change) is > the one I'd push back on. It works in theory but creates a configuration > surface no ThingsBoard operator has seen before; it externalises complexity > onto deployers in exchange for avoiding a 30-line upstream patch. Not worth > the long-term cost. > > On cross-community engagement: agree with your plan to cross-post a > focused summary to ThingsBoard Discussion #15296 once this dev@ thread > settles, and to reach the ThingsBoard dev channel on Q6 specifically. I do > not personally have warm contacts on the ThingsBoard DAO reviewer team. > Opening that to this list as well: if anyone here has working relationships > with ThingsBoard DAO reviewers, please reach out off-list. > > Decision timeline > > Even if community input is light, we should not block coding on perfect > consensus. Proposed cadence: > > * By 2026-05-22 (Fri) EOD UTC — any objector on Q1, Q2, Q3 raises it > on this thread. > > * 2026-05-23 (Sat) — Zihan posts an updated v1.1 design doc with > resolved decisions. > > * 2026-05-25 (Mon) — coding starts on the basis of v1.1. > > Q4 can be revisited via EXPLAIN ANALYZE benchmarks in W6 if the IoTDB > Table Mode committers do not surface a definitive answer on this thread. Q6 > follows the ThingsBoard-side conversation on its own clock; if it does not > resolve by end of W4, we default to Option (3) and AttributesDao moves to > stretch — as already flagged in your Wk 5 scope note. > > Thanks again, Zihan — really solid foundation for the coding period. > > Best regards, > > Xuan Wang > Apache IoTDB > > 发件人: Zh D <[email protected]> > 日期: 星期一, 2026年5月18日 13:46 > 收件人: [email protected] <[email protected]> > 主题: [GSoC 2026] GSOC-304 design doc — feedback requested > > Hi all, > > Following up on my May 8 self-introduction, I've completed the refined > design document for GSOC-304 (Enhancing ThingsBoard Integration with IoTDB > 2.x Table Mode). Posting it now, before community-bonding wraps on May 24, > so feedback can land before coding starts on May 25. > > Full document (16 pages, 12 sections + 3 appendices): > > > https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1jXMCwF_HVvCR5lHDIT_1pv1DiIZt8j5O%2Fview%3Fusp%3Dsharing&data=05%7C02%7C%7C7787e862489f4277525408deb4a0be33%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639146799665605330%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=CLw2nWgoftwPO0ko5cUnN5D9Qz1Tl2FT8HFPo8Mafio%3D&reserved=0 > < > https://drive.google.com/file/d/1jXMCwF_HVvCR5lHDIT_1pv1DiIZt8j5O/view?usp=sharing > > > > == TL;DR == > > The doc proposes concrete recommendations for the two open design questions > called out by my mentor (Xuan Wang), with evidence from ThingsBoard > schema/API/Rule-Engine and IoTDB Table Mode primitives: > > 1. Attribute table schema → recommend Option B (single `entity_attributes` > table with `attribute_scope` as TAG). Backs the PoC's existing choice with > cardinality math (Options A/B/C all represent ~150K logical attribute > identities under typical multi-tenant deployment) + ThingsBoard's own SQL > schema convention (scope and key are separate dimensions in `attribute_kv`) > + native scope filtering. > > 2. Label handling → recommend Phase 1 does not mirror labels into IoTDB. > Labels in current ThingsBoard are a singular optional entity field on > Device/Asset (`HasLabel`, `private String label`), not a many-tag feature; > they fit the existing entity DB. A contingent design (separate > `entity_labels` table with current-state contract, no tombstones) is > sketched in §5.4 in case the community wants a label index on the IoTDB > side. > > The doc also includes a per-DAO Spring activation matrix in §6.0, the > 12-week implementation plan aligned to mentor's phasing, a 5-test-case > benchmark plan vs. IoTDB Tree / Cassandra / PostgreSQL / TimescaleDB, > migration approach, and risks. > > Note on scope: the Wk 5 plan assumes the AttributesDao activation question > (Q6 below) resolves to option (1) or (2). If maintainers prefer (3), > AttributesDao moves to stretch and Wk 5 shifts to telemetry/latest polish. > > == Asks for community feedback (full context in §12 of the doc) == > > 1. Do you concur with Option B (scope-as-TAG) for the attribute table? See > §4.10 for sub-questions on scope-constant naming, TTL, and TAG ordering. > > 2. Is the recommendation to keep labels out of IoTDB in Phase 1 acceptable, > or should the design include `entity_labels` from day one? See §5.5. > > 3. Is `TTL='INF'` the right default for `entity_attributes` (latest-state > metadata), or is there a real use case for a finite default TTL? > > 4. Any guidance on optimal TAG column ordering for IoTDB 2.x predicate > pruning — `(tenant_id, entity_type, entity_id, attribute_scope, key)` or > scope/key-first — or should I rely on `EXPLAIN ANALYZE` benchmarks during > Wk 6? > > 5. Is the 5-backend benchmark scope (IoTDB Table / IoTDB Tree / Cassandra / > PostgreSQL / TimescaleDB) the right set, or should TimescaleDB be optional? > > 6. (For ThingsBoard maintainers in particular) AttributesDao activation: do > you prefer (1) a new `database.attributes.type` property mirroring the > `ts.type` pattern (requires a small upstream patch), (2) Spring profile > composition with the existing entity-DB switch, or (3) keep attributes in > the entity DB for Phase 1 and treat AttributesDao as a stretch goal? See > §6.0. > > Communication status on Q6: I have not yet engaged ThingsBoard maintainers > directly on this question. Once this dev-list thread settles, I plan to > cross-post a focused summary of Q6 to ThingsBoard Discussion #15296 and > reach out via the ThingsBoard dev channel for maintainer-side input. If > anyone here has working contacts with the ThingsBoard DAO reviewer team, > introductions would be very welcome. > > I'd really value input on any of the above before May 25. Specific replies > on questions 1, 2, and 6 are most time-critical, since they shape Phase 1 > scope and Wk 5 deliverables. > > Thanks, > Zihan Dai > GitHub: > https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPDGGK&data=05%7C02%7C%7C7787e862489f4277525408deb4a0be33%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639146799665655691%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6RsuwT4NFNWEaRoCzVSpq4QicI4yZnDoAb4QLPpjls8%3D&reserved=0 > <https://github.com/PDGGK> >
