I will ask Laszlo if he wants to update his doc. I see both pros and cons of catalog definition in config files. If there is an easy default then I do not mind any of the proposed solutions.
OTOH I am in favor of the "use prefix for Iceberg table properties" solution, because in Hive it is common to add new keys to the property list - no restriction is in place (I am not even sure that the currently implemented blacklist for preventing to propagate properties to Iceberg tables is complete). Also I expect that we want to add Hive write specific configs to table level when the general engine independent configuration is not ideal for Hive, but every Hive query for a given table should use some specific config. Thanks, Peter Jacques Nadeau <jacq...@dremio.com> ezt írta (időpont: 2020. dec. 1., Ke 17:06): > Would someone be willing to create a document that states the current > proposal? > > It is becoming somewhat difficult to follow this thread. I also worry that > without a complete statement of the current shape that people may be > incorrectly thinking they are in alignment. > > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > > On Tue, Dec 1, 2020 at 5:32 AM Zoltán Borók-Nagy <borokna...@cloudera.com> > wrote: > >> Thanks, Ryan. I answered inline. >> >> On Mon, Nov 30, 2020 at 8:26 PM Ryan Blue <rb...@netflix.com> wrote: >> >>> This sounds like a good plan overall, but I have a couple of notes: >>> >>> 1. We need to keep in mind that users plug in their own catalogs, so >>> iceberg.catalog could be a Glue or Nessie catalog, not just Hive or >>> Hadoop. I don’t think it makes much sense to use separate hadoop.catalog >>> and hive.catalog values. Those should just be names for catalogs >>> configured >>> in Configuration, i.e., via hive-site.xml. We then only need a >>> special value for loading Hadoop tables from paths. >>> >>> About extensibility, I think the usual Hive way is to use Java class >> names. So this way the value for 'iceberg.catalog' could be e.g. >> 'org.apache.iceberg.hive.HiveCatalog'. Then each catalog implementation >> would need to have a factory method that constructs the catalog object from >> a properties object (Map<String, String>). E.g. >> 'org.apache.iceberg.hadoop.HadoopCatalog' would require >> 'iceberg.catalog_location' to be present in properties. >> >>> >>> 1. I don’t think that catalog configuration should be kept in table >>> properties. A catalog should not be loaded for each table. So I don’t >>> think >>> we need iceberg.catalog_location. Instead, we should have a way to >>> define catalogs in the Configuration for tables in the metastore to >>> reference. >>> >>> I think it makes sense, on the other hand it would make adding new >> catalogs more heavy-weight, i.e. now you'd need to edit configuration files >> and restart/reinit services. Maybe it can be cumbersome in some >> environments. >> >>> >>> 1. I’d rather use a prefix to exclude properties from being passed >>> to Iceberg than to include them. Otherwise, users don’t know what to do >>> to >>> pass table properties from Hive or Impala. If we exclude a prefix or >>> specific properties, then everything but the properties reserved for >>> locating the table are passed as the user would expect. >>> >>> I don't have a strong opinion about this, but yeah, maybe this behavior >> would cause the least surprises. >> >>> >>> >>> >>> On Mon, Nov 30, 2020 at 7:51 AM Zoltán Borók-Nagy <borokna...@apache.org> >>> wrote: >>> >>>> Thanks, Peter. I answered inline. >>>> >>>> On Mon, Nov 30, 2020 at 3:13 PM Peter Vary <pv...@cloudera.com.invalid> >>>> wrote: >>>> >>>>> Hi Zoltan, >>>>> >>>>> Answers below: >>>>> >>>>> On Nov 30, 2020, at 14:19, Zoltán Borók-Nagy < >>>>> borokna...@cloudera.com.INVALID> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Thanks for the replies. My take for the above questions are as follows >>>>> >>>>> - Should 'iceberg.catalog' be a required property? >>>>> - Yeah, I think it would be nice if this would be required to >>>>> avoid any implicit behavior >>>>> >>>>> Currently we have a Catalogs class to get/initialize/use the different >>>>> Catalogs. At that time the decision was to use HadoopTables as a default >>>>> catalog. >>>>> It might be worthwhile to use the same class in Impala as well, so the >>>>> behavior is consistent. >>>>> >>>> >>>> Yeah, I think it'd be beneficial for us to use the Iceberg classes >>>> whenever possible. The Catalogs class is very similar to what we have >>>> currently in Impala. >>>> >>>>> >>>>> - 'hadoop.catalog' LOCATION and catalog_location >>>>> - In Impala we don't allow setting LOCATION for tables stored >>>>> in 'hadoop.catalog'. But Impala internally sets LOCATION to the >>>>> Iceberg >>>>> table's actual location. We were also thinking about using only the >>>>> table >>>>> LOCATION, and set it to the catalog location, but we also found it >>>>> confusing. >>>>> >>>>> It could definitely work, but it is somewhat strange that we have an >>>>> external table location set to an arbitrary path, and we have a different >>>>> location generated by other configs. It would be nice to have the real >>>>> location set in the external table location as well. >>>>> >>>> >>>> Impala sets the real Iceberg table location for external tables. E.g. >>>> if the user issues >>>> >>>> CREATE EXTERNAL TABLE my_hive_db.iceberg_table_hadoop_catalog >>>> STORED AS ICEBERG >>>> TBLPROPERTIES('iceberg.catalog'='hadoop.catalog', >>>> 'iceberg.catalog_location'='/path/to/hadoop/catalog', >>>> 'iceberg.table_identifier'='namespace1.namespace2.ice_t'); >>>> >>>> If the end user had specified LOCATION, then Impala would have raised >>>> an error. But the above DDL statement is correct, so Impala loads the >>>> iceberg table via Iceberg API, then creates the HMS table and sets LOCATION >>>> to the Iceberg table location (something like >>>> /path/to/hadoop/catalog/namespace1/namespace2/ice_t). >>>> >>>> >>>>> I like the flexibility of setting the table_identifier on table level, >>>>> which could help removing naming conflicts. We might want to have this in >>>>> the Iceberg Catalog implementation. >>>>> >>>>> >>>>> - 'iceberg.table_identifier' for HiveCatalog >>>>> - Yeah, it doesn't add much if we only allow using the current >>>>> HMS. I think it can be only useful if we are allowing external >>>>> HMSes. >>>>> - Moving properties to SERDEPROPERTIES >>>>> - I see that these properties are used by the SerDe classes in >>>>> Hive, but I feel that these properties are just not about >>>>> serialization and >>>>> deserialization. And as I see the current SERDEPROPERTIES are >>>>> things like >>>>> 'field.delim', 'separatorChar', 'quoteChar', etc. So properties >>>>> about table >>>>> loading more naturally belong to TBLPROPERTIES in my opinion. >>>>> >>>>> I have seen it used both ways for HBaseSerDe. (even the wiki page uses >>>>> both :) ). Since Impala prefers TBLPROPERTIES and if we start using prefix >>>>> for separating real Iceberg table properties from other properties, then >>>>> we >>>>> can keep it at TBLPROPERTIES. >>>>> >>>> >>>> In the google doc I also had a comment about prefixing iceberg table >>>> properties. We could use a prefix like 'iceberg.tblproperties.', and pass >>>> every property with this prefix to the Iceberg table. Currently Impala >>>> passes every table property to the Iceberg table. >>>> >>>> >>>>> >>>>> Thanks, >>>>> Zoltan >>>>> >>>>> >>>>> On Mon, Nov 30, 2020 at 1:33 PM Peter Vary <pv...@cloudera.com.invalid> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Based on the discussion below I understand we have the following >>>>>> kinds of properties: >>>>>> >>>>>> 1. Iceberg table properties - Engine independent, storage related >>>>>> parameters >>>>>> 2. "how to get to" - I think these are mostly Hive table specific >>>>>> properties, since for Spark, the Spark catalog configuration serves >>>>>> for the >>>>>> same purpose. I think the best place for storing these would be the >>>>>> Hive SERDEPROPERTIES, as this describes the access information for the >>>>>> SerDe. Sidenote: I think we should decide if we allow >>>>>> HiveCatalogs pointing to a different HMS and the >>>>>> 'iceberg.table_identifier' >>>>>> would make sense only if we allow having multiple catalogs. >>>>>> 3. Query specific properties - These are engine specific and >>>>>> might be mapped to / even override the Iceberg table properties on the >>>>>> engine specific code paths, but currently these properties have >>>>>> independent >>>>>> names and mapped on a case-by-case basis. >>>>>> >>>>>> >>>>>> Based on this: >>>>>> >>>>>> - Shall we move the "how to get to" properties to SERDEPROPERTIES? >>>>>> - Shall we define a prefix for setting Iceberg table properties >>>>>> from Hive queries and omitting other engine specific properties? >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Peter >>>>>> >>>>>> >>>>>> On Nov 27, 2020, at 17:45, Mass Dosage <massdos...@gmail.com> wrote: >>>>>> >>>>>> I like these suggestions, comments inline below on the last round... >>>>>> >>>>>> On Thu, 26 Nov 2020 at 09:45, Zoltán Borók-Nagy < >>>>>> borokna...@apache.org> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> The above aligns with what we did in Impala, i.e. we store >>>>>>> information about table loading in HMS table properties. We are just a >>>>>>> bit >>>>>>> more explicit about which catalog to use. >>>>>>> We have table property 'iceberg.catalog' to determine the catalog >>>>>>> type, right now the supported values are 'hadoop.tables', >>>>>>> 'hadoop.catalog', >>>>>>> and 'hive.catalog'. Additional table properties can be set based on the >>>>>>> catalog type. >>>>>>> >>>>>>> So, if the value of 'iceberg.catalog' is >>>>>>> >>>>>> >>>>>> I'm all for renaming this, having "mr" in the property name is >>>>>> confusing. >>>>>> >>>>>> >>>>>>> >>>>>>> - hadoop.tables >>>>>>> - the table location is used to load the table >>>>>>> >>>>>>> The only question I have is should we have this as the default? i.e. >>>>>> if you don't set a catalog it will assume its HadoopTables and use the >>>>>> location? Or should we require this property to be here to be consistent >>>>>> and avoid any "magic"? >>>>>> >>>>>> >>>>>>> >>>>>>> - hadoop.catalog >>>>>>> - Required table property 'iceberg.catalog_location' >>>>>>> specifies the location of the hadoop catalog in the file system >>>>>>> - Optional table property 'iceberg.table_identifier' >>>>>>> specifies the table id. If it's not set, then >>>>>>> <database_name>.<table_name> >>>>>>> is used as table identifier >>>>>>> >>>>>>> I like this as it would allow you to use a different database and >>>>>> table name in Hive as opposed to the Hadoop Catalog - at the moment they >>>>>> have to match. The only thing here is that I think Hive requires a table >>>>>> LOCATION to be set and it's then confusing as there are now two locations >>>>>> on the table. I'm not sure whether in the Hive storage handler or SerDe >>>>>> etc. we can get Hive to not require that and maybe even disallow it from >>>>>> being set. That would probably be best in conjunction with this. Another >>>>>> solution would be to not have the 'iceberg.catalog_location' property but >>>>>> instead use the table LOCATION for this but that's a bit confusing from a >>>>>> Hive point of view. >>>>>> >>>>>> >>>>>>> - hive.catalog >>>>>>> - Optional table property 'iceberg.table_identifier' >>>>>>> specifies the table id. If it's not set, then >>>>>>> <database_name>.<table_name> >>>>>>> is used as table identifier >>>>>>> - We have the assumption that the current Hive metastore >>>>>>> stores the table, i.e. we don't support external Hive metastores >>>>>>> currently >>>>>>> >>>>>>> These sound fine for Hive catalog tables that are created outside of >>>>>> the automatic Hive table creation (see >>>>>> https://iceberg.apache.org/hive/ -> Using Hive Catalog) we'd just >>>>>> need to document how you can create these yourself and that one could >>>>>> use a >>>>>> different Hive database and table etc. >>>>>> >>>>>> >>>>>>> Independent of catalog implementations, but we also have table >>>>>>> property 'iceberg.file_format' to specify the file format for the data >>>>>>> files. >>>>>>> >>>>>> >>>>>> OK, I don't think we need that for Hive? >>>>>> >>>>>> >>>>>>> We haven't released it yet, so we are open to changes, but I think >>>>>>> these properties are reasonable and it would be great if we could >>>>>>> standardize the properties across engines that use HMS as the primary >>>>>>> metastore of tables. >>>>>>> >>>>>>> >>>>>> If others agree I think we should create an issue where we document >>>>>> the above changes so it's very clear what we're doing and can then go an >>>>>> implement them and update the docs etc. >>>>>> >>>>>> >>>>>>> Cheers, >>>>>>> Zoltan >>>>>>> >>>>>>> >>>>>>> On Thu, Nov 26, 2020 at 2:20 AM Ryan Blue <rb...@netflix.com.invalid> >>>>>>> wrote: >>>>>>> >>>>>>>> Yes, I think that is a good summary of the principles. >>>>>>>> >>>>>>>> #4 is correct because we provide some information that is >>>>>>>> informational (Hive schema) or tracked only by the metastore >>>>>>>> (best-effort >>>>>>>> current user). I also agree that it would be good to have a table >>>>>>>> identifier in HMS table metadata when loading from an external table. >>>>>>>> That >>>>>>>> gives us a way to handle name conflicts. >>>>>>>> >>>>>>>> On Wed, Nov 25, 2020 at 5:14 PM Jacques Nadeau <jacq...@dremio.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Minor error, my last example should have been: >>>>>>>>> >>>>>>>>> db1.table1_etl_branch => >>>>>>>>> nessie.folder1.folder2.folder3.table1@etl_branch >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jacques Nadeau >>>>>>>>> CTO and Co-Founder, Dremio >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Nov 25, 2020 at 4:56 PM Jacques Nadeau <jacq...@dremio.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I agree with Ryan on the core principles here. As I understand >>>>>>>>>> them: >>>>>>>>>> >>>>>>>>>> 1. Iceberg metadata describes all properties of a table >>>>>>>>>> 2. Hive table properties describe "how to get to" Iceberg >>>>>>>>>> metadata (which catalog + possibly ptr, path, token, etc) >>>>>>>>>> 3. There could be default "how to get to" information set at >>>>>>>>>> a global level >>>>>>>>>> 4. Best-effort schema should stored be in the table >>>>>>>>>> properties in HMS. This should be done for information schema >>>>>>>>>> retrieval >>>>>>>>>> purposes within Hive but should be ignored during Hive/other tool >>>>>>>>>> execution. >>>>>>>>>> >>>>>>>>>> Is that a fair summary of your statements Ryan (except 4, which I >>>>>>>>>> just added)? >>>>>>>>>> >>>>>>>>>> One comment I have on #2 is that for different catalogs and use >>>>>>>>>> cases, I think it can be somewhat more complex where it would be >>>>>>>>>> desirable for a table that initially existed without Hive that was >>>>>>>>>> later >>>>>>>>>> exposed in Hive to support a ptr/path/token for how the table is >>>>>>>>>> named >>>>>>>>>> externally. For example, in a Nessie context we support arbitrary >>>>>>>>>> paths for >>>>>>>>>> an Iceberg table (such as folder1.folder2.folder3.table1). If you >>>>>>>>>> then want >>>>>>>>>> to expose that table to Hive, you might have this mapping for #2 >>>>>>>>>> >>>>>>>>>> db1.table1 => nessie:folder1.folder2.folder3.table1 >>>>>>>>>> >>>>>>>>>> Similarly, you might want to expose a particular branch version >>>>>>>>>> of a table. So it might say: >>>>>>>>>> >>>>>>>>>> db1.table1_etl_branch => nessie.folder1@etl_branch >>>>>>>>>> >>>>>>>>>> Just saying that the address to the table in the catalog could >>>>>>>>>> itself have several properties. The key being that no matter what >>>>>>>>>> those >>>>>>>>>> are, we should follow #1 and only store properties that are about >>>>>>>>>> the ptr, >>>>>>>>>> not the content/metadata. >>>>>>>>>> >>>>>>>>>> Lastly, I believe #4 is the case but haven't tested it. Can >>>>>>>>>> someone confirm that it is true? And that it is possible/not >>>>>>>>>> problematic? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jacques Nadeau >>>>>>>>>> CTO and Co-Founder, Dremio >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Nov 25, 2020 at 4:28 PM Ryan Blue < >>>>>>>>>> rb...@netflix.com.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks for working on this, Laszlo. I’ve been thinking about >>>>>>>>>>> these problems as well, so this is a good time to have a discussion >>>>>>>>>>> about >>>>>>>>>>> Hive config. >>>>>>>>>>> >>>>>>>>>>> I think that Hive configuration should work mostly like other >>>>>>>>>>> engines, where different configurations are used for different >>>>>>>>>>> purposes. >>>>>>>>>>> Different purposes means that there is not a global configuration >>>>>>>>>>> priority. >>>>>>>>>>> Hopefully, I can explain how we use the different config sources >>>>>>>>>>> elsewhere >>>>>>>>>>> to clarify. >>>>>>>>>>> >>>>>>>>>>> Let’s take Spark as an example. Spark uses Hadoop, so it has a >>>>>>>>>>> Hadoop Configuration, but it also has its own global configuration. >>>>>>>>>>> There >>>>>>>>>>> are also Iceberg table properties, and all of the various Hive >>>>>>>>>>> properties >>>>>>>>>>> if you’re tracking tables with a Hive MetaStore. >>>>>>>>>>> >>>>>>>>>>> The first step is to simplify where we can, so we effectively >>>>>>>>>>> eliminate 2 sources of config: >>>>>>>>>>> >>>>>>>>>>> - The Hadoop Configuration is only used to instantiate >>>>>>>>>>> Hadoop classes, like FileSystem. Iceberg should not use it for >>>>>>>>>>> any other >>>>>>>>>>> config. >>>>>>>>>>> - Config in the Hive MetaStore is only used to identify that >>>>>>>>>>> a table is Iceberg and point to its metadata location. All other >>>>>>>>>>> config in >>>>>>>>>>> HMS is informational. For example, the input format is >>>>>>>>>>> FileInputFormat so >>>>>>>>>>> that non-Iceberg readers cannot actually instantiate the format >>>>>>>>>>> (it’s >>>>>>>>>>> abstract) but it is available so they also don’t fail trying to >>>>>>>>>>> load the >>>>>>>>>>> class. Table-specific config should not be stored in table or >>>>>>>>>>> serde >>>>>>>>>>> properties. >>>>>>>>>>> >>>>>>>>>>> That leaves Spark configuration and Iceberg table configuration. >>>>>>>>>>> >>>>>>>>>>> Iceberg differs from other tables because it is opinionated: >>>>>>>>>>> data configuration should be maintained at the table level. This is >>>>>>>>>>> cleaner >>>>>>>>>>> for users because config is standardized across engines and in one >>>>>>>>>>> place. >>>>>>>>>>> And it also enables services that analyze a table and update its >>>>>>>>>>> configuration to tune options that users almost never do, like row >>>>>>>>>>> group or >>>>>>>>>>> stripe size in the columnar formats. Iceberg table configuration is >>>>>>>>>>> used to >>>>>>>>>>> configure table-specific concerns and behavior. >>>>>>>>>>> >>>>>>>>>>> Spark configuration is used for engine-specific concerns, and >>>>>>>>>>> runtime overrides. A good example of an engine-specific concern is >>>>>>>>>>> the >>>>>>>>>>> catalogs that are available to load Iceberg tables. Spark has a way >>>>>>>>>>> to load >>>>>>>>>>> and configure catalog implementations and Iceberg uses that for all >>>>>>>>>>> catalog-level config. Runtime overrides are things like target >>>>>>>>>>> split size. >>>>>>>>>>> Iceberg has a table-level default split size in table properties, >>>>>>>>>>> but this >>>>>>>>>>> can be overridden by a Spark option for each table, as well as an >>>>>>>>>>> option >>>>>>>>>>> passed to the individual read. Note that these necessarily have >>>>>>>>>>> different >>>>>>>>>>> config names for how they are used: Iceberg uses >>>>>>>>>>> read.split.target-size and the read-specific option is >>>>>>>>>>> target-size. >>>>>>>>>>> >>>>>>>>>>> Applying this to Hive is a little strange for a couple reasons. >>>>>>>>>>> First, Hive’s engine configuration *is* a Hadoop Configuration. >>>>>>>>>>> As a result, I think the right place to store engine-specific >>>>>>>>>>> config is >>>>>>>>>>> there, including Iceberg catalogs using a strategy similar to what >>>>>>>>>>> Spark >>>>>>>>>>> does: what external Iceberg catalogs are available and their >>>>>>>>>>> configuration >>>>>>>>>>> should come from the HiveConf. >>>>>>>>>>> >>>>>>>>>>> The second way Hive is strange is that Hive needs to use its own >>>>>>>>>>> MetaStore to track Hive table concerns. The MetaStore may have >>>>>>>>>>> tables >>>>>>>>>>> created by an Iceberg HiveCatalog, and Hive also needs to be able >>>>>>>>>>> to load >>>>>>>>>>> tables from other Iceberg catalogs by creating table entries for >>>>>>>>>>> them. >>>>>>>>>>> >>>>>>>>>>> Here’s how I think Hive should work: >>>>>>>>>>> >>>>>>>>>>> - There should be a default HiveCatalog that uses the >>>>>>>>>>> current MetaStore URI to be used for HiveCatalog tables tracked >>>>>>>>>>> in the >>>>>>>>>>> MetaStore >>>>>>>>>>> - Other catalogs should be defined in HiveConf >>>>>>>>>>> - HMS table properties should be used to determine how to >>>>>>>>>>> load a table: using a Hadoop location, using the default >>>>>>>>>>> metastore catalog, >>>>>>>>>>> or using an external Iceberg catalog >>>>>>>>>>> - If there is a metadata_location, then use the >>>>>>>>>>> HiveCatalog for this metastore (where it is tracked) >>>>>>>>>>> - If there is a catalog property, then load that catalog >>>>>>>>>>> and use it to load the table identifier, or maybe an >>>>>>>>>>> identifier from HMS >>>>>>>>>>> table properties >>>>>>>>>>> - If there is no catalog or metadata_location, then use >>>>>>>>>>> HadoopTables to load the table location as an Iceberg table >>>>>>>>>>> >>>>>>>>>>> This would make it possible to access all types of Iceberg >>>>>>>>>>> tables in the same query, and would match how Spark and Flink >>>>>>>>>>> configure >>>>>>>>>>> catalogs. Other than the configuration above, I don’t think that >>>>>>>>>>> config in >>>>>>>>>>> HMS should be used at all, like how the other engines work. Iceberg >>>>>>>>>>> is the >>>>>>>>>>> source of truth for table metadata, HMS stores how to load the >>>>>>>>>>> Iceberg >>>>>>>>>>> table, and HiveConf defines the catalogs (or runtime overrides). >>>>>>>>>>> >>>>>>>>>>> This isn’t quite how configuration works right now. Currently, >>>>>>>>>>> the catalog is controlled by a HiveConf property, >>>>>>>>>>> iceberg.mr.catalog. If that isn’t set, HadoopTables will be >>>>>>>>>>> used to load table locations. If it is set, then that catalog will >>>>>>>>>>> be used >>>>>>>>>>> to load all tables by name. This makes it impossible to load tables >>>>>>>>>>> from >>>>>>>>>>> different catalogs at the same time. That’s why I think the Iceberg >>>>>>>>>>> catalog >>>>>>>>>>> for a table should be stored in HMS table properties. >>>>>>>>>>> >>>>>>>>>>> I should also explain iceberg.hive.engine.enabled flag, but I >>>>>>>>>>> think this is long enough for now. >>>>>>>>>>> >>>>>>>>>>> rb >>>>>>>>>>> >>>>>>>>>>> On Wed, Nov 25, 2020 at 1:41 AM Laszlo Pinter < >>>>>>>>>>> lpin...@cloudera.com.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> I would like to start a discussion, how should we handle >>>>>>>>>>>> properties from various sources like Iceberg, Hive or global >>>>>>>>>>>> configuration. >>>>>>>>>>>> I've put together a short document >>>>>>>>>>>> <https://docs.google.com/document/d/1tyD7mGp_hh0dx9N_Ax9kj5INkg7Wzpj9XQ5t2-7AwNs/edit?usp=sharing>, >>>>>>>>>>>> please have a look and let me know what you think. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Laszlo >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ryan Blue >>>>>>>>>>> Software Engineer >>>>>>>>>>> Netflix >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Software Engineer >>>>>>>> Netflix >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >>