[
https://issues.apache.org/jira/browse/GOBBLIN-2092?focusedWorklogId=923846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-923846
]
ASF GitHub Bot logged work on GOBBLIN-2092:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 18/Jun/24 08:32
Start Date: 18/Jun/24 08:32
Worklog Time Spent: 10m
Work Description: phet commented on code in PR #3978:
URL: https://github.com/apache/gobblin/pull/3978#discussion_r1644050989
##########
gobblin-runtime/src/test/java/org/apache/gobblin/runtime/spec_store/MysqlBaseSpecStoreTest.java:
##########
@@ -24,6 +24,8 @@
import java.util.Iterator;
import java.util.List;
+import org.apache.gobblin.runtime.api.FlowSpec;
+import org.intellij.lang.annotations.Flow;
Review Comment:
mistaken import?
##########
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_store/MysqlSpecStore.java:
##########
@@ -63,7 +63,8 @@ public class MysqlSpecStore extends MysqlBaseSpecStore {
// record may contain data there... though in practice, all should, given
the passage of time (amidst the usual retention expiry).
protected static final String SPECIFIC_INSERT_STATEMENT = "INSERT INTO %s
(spec_uri, flow_group, flow_name, template_uri, "
+ "user_to_proxy, source_identifier, destination_identifier, schedule,
tag, isRunImmediately, owning_group, spec, spec_json) "
- + "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ON DUPLICATE KEY
UPDATE spec = VALUES(spec), spec_json = VALUES(spec_json)";
+ + "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ON DUPLICATE KEY
UPDATE template_uri = VALUES(template_uri),user_to_proxy =
VALUES(user_to_proxy),source_identifier = VALUES(source_identifier),schedule =
VALUES(schedule),tag = VALUES(tag),isRunImmediately =
VALUES(isRunImmediately),owning_group = VALUES(owning_group),"
Review Comment:
I wasn't familiar w/ the `ON DUPLICATE KEY` ... `VALUES()` function. when I
looked it up, it appears to be deprecated and row/column aliases are to be used
instead. see: https://dev.mysql.com/doc/refman/8.4/en/insert-on-duplicate.html
looks like `destination_identifier` may have been omitted. keeping these in
the same order as they're listed prior (right after `INSERT INTO %s`) might
make that more noticeable
finally, as in java source, leave a space after `,` in SQL
##########
gobblin-runtime/src/test/java/org/apache/gobblin/runtime/spec_store/MysqlBaseSpecStoreTest.java:
##########
@@ -116,6 +135,24 @@ public void testAddSpec() throws Exception {
Assert.assertFalse(this.specStore.exists(URI.create("dummy")));
}
+ @Test
+ public void testUpdateSpec() throws Exception {
+ this.specStore.addSpec(this.flowSpec);
+ Assert.assertEquals(this.specStore.getSize(), 1);
+ Assert.assertTrue(this.specStore.exists(this.uri1));
+ Assert.assertFalse(this.specStore.exists(URI.create("dummy")));
+
+ //Updating
+
+ this.specStore.addSpec(this.flowSpecUpdated);
+ Assert.assertEquals(this.specStore.getSize(), 1);
+ Assert.assertTrue(this.specStore.exists(this.uri1));
+ FlowSpec updated = (FlowSpec)this.specStore.getSpecImpl(this.uri1);
+
Assert.assertEquals(updated.getConfigAsProperties().getProperty("key"),"valueUpdated");
+
Assert.assertEquals(updated.getConfigAsProperties().getProperty("key3"),"value3UPdated");
+
Assert.assertEquals(updated.getConfigAsProperties().getProperty("config.with.dot"),"value4Updated");
Review Comment:
none of these verify any of the special "indexing" columns you specified in
the `ON DUPLICATE KEY` clause, such as `user_to_proxy`, `source_identifier`,
etc.
being the crux of this PR, that would seem worth adding
##########
gobblin-runtime/src/test/java/org/apache/gobblin/runtime/spec_store/MysqlBaseSpecStoreTest.java:
##########
@@ -116,6 +135,24 @@ public void testAddSpec() throws Exception {
Assert.assertFalse(this.specStore.exists(URI.create("dummy")));
}
+ @Test
+ public void testUpdateSpec() throws Exception {
+ this.specStore.addSpec(this.flowSpec);
+ Assert.assertEquals(this.specStore.getSize(), 1);
+ Assert.assertTrue(this.specStore.exists(this.uri1));
+ Assert.assertFalse(this.specStore.exists(URI.create("dummy")));
Review Comment:
what are you going for here? seems unrelated to updating specs...
##########
gobblin-runtime/src/test/java/org/apache/gobblin/runtime/spec_store/MysqlBaseSpecStoreTest.java:
##########
@@ -24,6 +24,8 @@
import java.util.Iterator;
import java.util.List;
+import org.apache.gobblin.runtime.api.FlowSpec;
Review Comment:
see import ordering here -
https://gobblin.apache.org/docs/developer-guide/CodingStyle/
Issue Time Tracking
-------------------
Worklog Id: (was: 923846)
Remaining Estimate: 0h
Time Spent: 10m
> `carbon get flow-configs` search facets not consistently working
> ----------------------------------------------------------------
>
> Key: GOBBLIN-2092
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2092
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: Aditya Pratap Singh
> Priority: Minor
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The `carbon get flow-configs (search) seems to inconsistently apply search
> facets. it's possible that the facet indices are correctly set up during
> flow creation, but are not properly maintained during flow update. see
> interaction:
> {code:java}
> $ carbon get flow-configs -f prod-lva1 -s war-oh-iceberg | jq -c . | wc -l
> Searching [fabric: prod-lva1] for flow matching - (flow_group: None;
> flow_name: None; template_uri: None; proxy_user: None; source_identifier:
> war-oh-iceberg; destination_identifier: None; cron_schedule: None;
> run_immediately: None; owning_group: None; start: None; count: None)
> 63
> $ carbon get flow-configs -f prod-lva1 -s war-tl-iceberg | jq -c . | wc -l
> Searching [fabric: prod-lva1] for flow matching - (flow_group: None;
> flow_name: None; template_uri: None; proxy_user: None; source_identifier:
> war-tl-iceberg; destination_identifier: None; cron_schedule: None;
> run_immediately: None; owning_group: None; start: None; count: None)
> No flows found
> 0
> {code}
> so (at least some) results for sourceIdentifier of `war-oh-iceberg` do show,
> but none do for `war-tl-iceberg`. that's incorrect, because when I instead
> search by user, there are at least two `war-tl-iceberg` flows in `prod-lva1`:
> {code:java}
> $ carbon get flow-configs -f prod-lva1 -u lyndarel | jq -c '{flowGroup:
> .id.flowGroup, flowName: .id.flowName, user: .properties."user.to.proxy",
> between: (.properties."gobblin.flow.sourceIdentifier" + " => " +
> .properties."gobblin.flow.destinationIdentifier")}' | grep tl-iceberg
> Searching [fabric: prod-lva1] for flow matching - (flow_group: None;
> flow_name: None; template_uri: None; proxy_user: lyndarel; source_identifier:
> None; destination_identifier: None; cron_schedule: None; run_immediately:
> None; owning_group: None; start: None; count: None)
> {"flowGroup":"iceberg_based_openhouse_replication_u_lyndarel","flowName":"copy_to_holdem_replication_course_features","user":"lyndarel","between":"war-tl-iceberg
> => holdem-tl-iceberg"}
> {"flowGroup":"iceberg_based_openhouse_replication_u_lyndarel","flowName":"copy_to_holdem_replication_member_skill_gap","user":"lyndarel","between":"war-tl-iceberg
> => holdem-tl-iceberg"} {code}
> when the user and sourceId constraint are combined, those two no longer show
> up:
> {code:java}
> $ carbon get flow-configs -f prod-lva1 -u lyndarel -s war-tl-iceberg | jq -c
> '{flowGroup: .id.flowGroup, flowName: .id.flowName, user:
> .properties."user.to.proxy", between:
> (.properties."gobblin.flow.sourceIdentifier" + " => " +
> .properties."gobblin.flow.destinationIdentifier")}'
> Searching [fabric: prod-lva1] for flow matching - (flow_group: None;
> flow_name: None; template_uri: None; proxy_user: lyndarel; source_identifier:
> war-tl-iceberg; destination_identifier: None; cron_schedule: None;
> run_immediately: None; owning_group: None; start: None; count: None)
> No flows found {code}
> the reason I suspect flow update as a possible RC is that I had modified
> these two flows to use that sourceId, when they were originally created with
> another one. e.g. something like:
> {code:java}
> $ carbon update flow -f prod-lva1 -fg
> iceberg_based_openhouse_replication_u_lyndarel -fn
> copy_to_holdem_replication_member_skill_gap
> properties.gobblin.flow.sourceIdentifier=war-tl-iceberg,properties.gobblin.flow.destinationIdentifier=holdem-tl-iceberg{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)