andrewm-aero opened a new issue #11778: URL: https://github.com/apache/druid/issues/11778
### Description Per https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html#inclusions-and-exclusions, all dimensions not otherwise specified are assumed to be strings >Schemaless interpretation occurs when both dimensions and spatialDimensions are empty or null. In this case, the set of dimensions is determined in the following way: > > 1. ... > ... > 7. All other fields are ingested as string typed dimensions with the default settings. It would be convenient to be able to specify a different type to be used as the default, e.g. ```json "dimensionsSpec" : { "defaultType": "double" } ``` or perhaps even specify default values for all fields for a dimension object ```json "dimensionsSpec" : { "defaultConfig": { "type": "double", "createBitmapIndex": false, "multiValueHandling": "array" } } ``` ### Motivation We are currently evaluating Druid as a more scalable, modern replacement for a data warehousing application for storing historical data, where the equivalent of datasources have 10,000's of dimensions, and are soon to scale to 100,000's of dimensions. All of these dimensions are known ahead of time to be numeric, so currently, we are required to use a script to generate the ingest spec including the types for every single dimension. If it were possible to specify the default type of dimensions, this could be made much simpler, and would also reduce the amount of space taken up by the task metadata. We are also building some custom extensions for data formats which supply these numeric values, if the IngestFormat or InputEntityReader could provide some sort of "schema hint" to the ingest task, that would also solve the issue, and could also possibly be used by JSON, CSV, and other common format extensions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
