Aditya Pratap Singh created GOBBLIN-2092:
--------------------------------------------
Summary: `carbon get flow-configs` search facets not consistently
working
Key: GOBBLIN-2092
URL: https://issues.apache.org/jira/browse/GOBBLIN-2092
Project: Apache Gobblin
Issue Type: Bug
Reporter: Aditya Pratap Singh
The `carbon get flow-configs (search) seems to inconsistently apply search
facets. it's possible that the facet indices are correctly set up during flow
creation, but are not properly maintained during flow update. see interaction:
{code:java}
$ carbon get flow-configs -f prod-lva1 -s war-oh-iceberg | jq -c . | wc -l
Searching [fabric: prod-lva1] for flow matching - (flow_group: None; flow_name:
None; template_uri: None; proxy_user: None; source_identifier: war-oh-iceberg;
destination_identifier: None; cron_schedule: None; run_immediately: None;
owning_group: None; start: None; count: None)
63
$ carbon get flow-configs -f prod-lva1 -s war-tl-iceberg | jq -c . | wc -l
Searching [fabric: prod-lva1] for flow matching - (flow_group: None; flow_name:
None; template_uri: None; proxy_user: None; source_identifier: war-tl-iceberg;
destination_identifier: None; cron_schedule: None; run_immediately: None;
owning_group: None; start: None; count: None)
No flows found
0
{code}
so (at least some) results for sourceIdentifier of `war-oh-iceberg` do show,
but none do for `war-tl-iceberg`. that's incorrect, because when I instead
search by user, there are at least two `war-tl-iceberg` flows in `prod-lva1`:
{code:java}
$ carbon get flow-configs -f prod-lva1 -u lyndarel | jq -c '{flowGroup:
.id.flowGroup, flowName: .id.flowName, user: .properties."user.to.proxy",
between: (.properties."gobblin.flow.sourceIdentifier" + " => " +
.properties."gobblin.flow.destinationIdentifier")}' | grep tl-iceberg
Searching [fabric: prod-lva1] for flow matching - (flow_group: None; flow_name:
None; template_uri: None; proxy_user: lyndarel; source_identifier: None;
destination_identifier: None; cron_schedule: None; run_immediately: None;
owning_group: None; start: None; count: None)
{"flowGroup":"iceberg_based_openhouse_replication_u_lyndarel","flowName":"copy_to_holdem_replication_course_features","user":"lyndarel","between":"war-tl-iceberg
=> holdem-tl-iceberg"}
{"flowGroup":"iceberg_based_openhouse_replication_u_lyndarel","flowName":"copy_to_holdem_replication_member_skill_gap","user":"lyndarel","between":"war-tl-iceberg
=> holdem-tl-iceberg"} {code}
when the user and sourceId constraint are combined, those two no longer show up:
{code:java}
$ carbon get flow-configs -f prod-lva1 -u lyndarel -s war-tl-iceberg | jq -c
'{flowGroup: .id.flowGroup, flowName: .id.flowName, user:
.properties."user.to.proxy", between:
(.properties."gobblin.flow.sourceIdentifier" + " => " +
.properties."gobblin.flow.destinationIdentifier")}'
Searching [fabric: prod-lva1] for flow matching - (flow_group: None; flow_name:
None; template_uri: None; proxy_user: lyndarel; source_identifier:
war-tl-iceberg; destination_identifier: None; cron_schedule: None;
run_immediately: None; owning_group: None; start: None; count: None)
No flows found {code}
the reason I suspect flow update as a possible RC is that I had modified these
two flows to use that sourceId, when they were originally created with another
one. e.g. something like:
{code:java}
$ carbon update flow -f prod-lva1 -fg
iceberg_based_openhouse_replication_u_lyndarel -fn
copy_to_holdem_replication_member_skill_gap
properties.gobblin.flow.sourceIdentifier=war-tl-iceberg,properties.gobblin.flow.destinationIdentifier=holdem-tl-iceberg{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)