Hi Hop Friends,

There are a couple of areas where I think we can have a lot of benefit from
enriching the metadata we currently have in Apache Hop.
Two main areas where I think there's quite a bit of room for improvement
are:

a) In @Action and @Transform
- Is an action or transform input, output, lookup, ...? This would be
similar to the current categories, but would offer more flexibility
(multiple types) and a different focus.
- Does an action or transform work on files, relational databases, graph
databases, ...?
- What are the (un)supported engines for an action or transform?
- What was the first Apache Hop version of this action or transform?
For these scenarios, there's a lot we can do with some simple extensions to
the existing @Action and @Transform annotations,

b) On the @HopMetadataProperty level
- We currently don't have any information about what a given metadata
property is or does. We have around 20 different metadata types that can be
defined in the metadata perspective, but there's no easy way for users or
developers to find out where any of those are used. A metadata property
type could link a property back to a metadata item or any other piece of
metadata.
- Indicating the type or purpose of a metadata property will help us to
identify properties of the same type. This will allow us to start doing
refactoring and will allow more detailed search and fine-grained impact
analysis than is currently possible from the search perspective.
For example, when a user renames a relational database connection, we could
scan the entire project for actions or transforms with a metadata property
type of "database connection" and find all objects that use that connection.
Another use case would be to enable a relatively simple refactoring use
case like updating a referenced pipeline/workflow when a file is moved
(#4018 [1]).

Even though there are a lot of different possible metadata types for files
and relational databases, and there's no lack of other types of metadata
properties, the list of possible metadata property types is not endless
either.
We could look into the options of using a naming convention or building a
hierarchy of metadata property types.
Based on this catalog of metadata properties and their types, we could
consider integrations with data catalogs etc in a later stage, but I think
the scope of extending our current metadata will keep us sufficiently busy
for a while.

There shouldn't be any backward compatibility issues in either a) or b).

I would love to hear your thoughts. In the meantime, I started exploring
this as part of #3981 [2].

[1] https://github.com/apache/hop/issues/4018
[2] https://github.com/apache/hop/issues/3981

Regards,
Bart

Reply via email to