gianm opened a new pull request, #15276:
URL: https://github.com/apache/druid/pull/15276
Main changes:
1) The `SystemField` enum defines system fields `__file_uri`, `__file_path`,
and `__file_bucket`. They are associated with each input entity.
2) The `SystemFieldInputSource` interface can be added to any InputSource
to make it system-field-capable. It sets up serialization of a list
of configured `systemFields` in the JSON form of the input source, and
provides a method getSystemFieldValue for computing the value of each
system field. Cloud object, HDFS, HTTP, and Local now have this.
The `SystemFieldInputSource` isn't strictly necessary, since each input
source could have implemented system fields internally in its own way. However,
I think the interface is valuable because it helps ensure system fields are
dealt with consistently, and because it provides a path to exposing system
fields in SQL in a nice way. I think that ideally, they would be referenceable
by name, but not participate in star expansion. AFAICT this would require a new
Calcite feature. Relevant Calcite mailing list thread:
https://lists.apache.org/thread/pnf3bx3jlrmv7q1q7jhwhsylrw4q5t20
Until then, system fields can be used in SQL without the planner's
awareness: with `EXTERN`, add `systemFields` to the `inputSource` section, and
add the system field names to the `signature` section.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]