[jira] [Work logged] (GOBBLIN-2092) `carbon get flow-configs` search facets not consistently working

ASF GitHub Bot (Jira) Tue, 18 Jun 2024 01:56:47 -0700


     [ 
https://issues.apache.org/jira/browse/GOBBLIN-2092?focusedWorklogId=923846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-923846
 ]


ASF GitHub Bot logged work on GOBBLIN-2092:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Jun/24 08:32
            Start Date: 18/Jun/24 08:32
    Worklog Time Spent: 10m 
      Work Description: phet commented on code in PR #3978:
URL: https://github.com/apache/gobblin/pull/3978#discussion_r1644050989


##########
gobblin-runtime/src/test/java/org/apache/gobblin/runtime/spec_store/MysqlBaseSpecStoreTest.java:
##########
@@ -24,6 +24,8 @@
 import java.util.Iterator;
 import java.util.List;
 
+import org.apache.gobblin.runtime.api.FlowSpec;
+import org.intellij.lang.annotations.Flow;

Review Comment:
   mistaken import?



##########
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/spec_store/MysqlSpecStore.java:
##########
@@ -63,7 +63,8 @@ public class MysqlSpecStore extends MysqlBaseSpecStore {
   // record may contain data there... though in practice, all should, given 
the passage of time (amidst the usual retention expiry).
   protected static final String SPECIFIC_INSERT_STATEMENT = "INSERT INTO %s 
(spec_uri, flow_group, flow_name, template_uri, "
       + "user_to_proxy, source_identifier, destination_identifier, schedule, 
tag, isRunImmediately, owning_group, spec, spec_json) "
-      + "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ON DUPLICATE KEY 
UPDATE spec = VALUES(spec), spec_json = VALUES(spec_json)";
+      + "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ON DUPLICATE KEY 
UPDATE template_uri = VALUES(template_uri),user_to_proxy = 
VALUES(user_to_proxy),source_identifier = VALUES(source_identifier),schedule = 
VALUES(schedule),tag = VALUES(tag),isRunImmediately = 
VALUES(isRunImmediately),owning_group = VALUES(owning_group),"

Review Comment:
   I wasn't familiar w/ the `ON DUPLICATE KEY` ... `VALUES()` function.  when I 
looked it up, it appears to be deprecated and row/column aliases are to be used 
instead.  see: https://dev.mysql.com/doc/refman/8.4/en/insert-on-duplicate.html
   
   looks like `destination_identifier` may have been omitted.  keeping these in 
the same order as they're listed prior (right after `INSERT INTO %s`) might 
make that more noticeable
   
   finally, as in java source, leave a space after `,` in SQL



##########
gobblin-runtime/src/test/java/org/apache/gobblin/runtime/spec_store/MysqlBaseSpecStoreTest.java:
##########
@@ -116,6 +135,24 @@ public void testAddSpec() throws Exception {
     Assert.assertFalse(this.specStore.exists(URI.create("dummy")));
   }
 
+  @Test
+  public void testUpdateSpec() throws Exception {
+    this.specStore.addSpec(this.flowSpec);
+    Assert.assertEquals(this.specStore.getSize(), 1);
+    Assert.assertTrue(this.specStore.exists(this.uri1));
+    Assert.assertFalse(this.specStore.exists(URI.create("dummy")));
+
+    //Updating
+
+    this.specStore.addSpec(this.flowSpecUpdated);
+    Assert.assertEquals(this.specStore.getSize(), 1);
+    Assert.assertTrue(this.specStore.exists(this.uri1));
+    FlowSpec updated = (FlowSpec)this.specStore.getSpecImpl(this.uri1);
+    
Assert.assertEquals(updated.getConfigAsProperties().getProperty("key"),"valueUpdated");
+    
Assert.assertEquals(updated.getConfigAsProperties().getProperty("key3"),"value3UPdated");
+    
Assert.assertEquals(updated.getConfigAsProperties().getProperty("config.with.dot"),"value4Updated");

Review Comment:
   none of these verify any of the special "indexing" columns you specified in 
the `ON DUPLICATE KEY` clause, such as `user_to_proxy`, `source_identifier`, 
etc.
   
   being the crux of this PR, that would seem worth adding



##########
gobblin-runtime/src/test/java/org/apache/gobblin/runtime/spec_store/MysqlBaseSpecStoreTest.java:
##########
@@ -116,6 +135,24 @@ public void testAddSpec() throws Exception {
     Assert.assertFalse(this.specStore.exists(URI.create("dummy")));
   }
 
+  @Test
+  public void testUpdateSpec() throws Exception {
+    this.specStore.addSpec(this.flowSpec);
+    Assert.assertEquals(this.specStore.getSize(), 1);
+    Assert.assertTrue(this.specStore.exists(this.uri1));
+    Assert.assertFalse(this.specStore.exists(URI.create("dummy")));

Review Comment:
   what are you going for here?  seems unrelated to updating specs...



##########
gobblin-runtime/src/test/java/org/apache/gobblin/runtime/spec_store/MysqlBaseSpecStoreTest.java:
##########
@@ -24,6 +24,8 @@
 import java.util.Iterator;
 import java.util.List;
 
+import org.apache.gobblin.runtime.api.FlowSpec;

Review Comment:
   see import ordering here - 
https://gobblin.apache.org/docs/developer-guide/CodingStyle/





Issue Time Tracking
-------------------

            Worklog Id:     (was: 923846)
    Remaining Estimate: 0h
            Time Spent: 10m

> `carbon get flow-configs` search facets not consistently working
> ----------------------------------------------------------------
>
>                 Key: GOBBLIN-2092
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2092
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: Aditya Pratap Singh
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The `carbon get flow-configs (search) seems to inconsistently apply search 
> facets.  it's possible that the facet indices are correctly set up during 
> flow creation, but are not properly maintained during flow update.   see 
> interaction:
> {code:java}
> $ carbon get flow-configs -f prod-lva1 -s war-oh-iceberg | jq -c . | wc -l
> Searching [fabric: prod-lva1] for flow matching - (flow_group: None; 
> flow_name: None; template_uri: None; proxy_user: None; source_identifier: 
> war-oh-iceberg; destination_identifier: None; cron_schedule: None; 
> run_immediately: None; owning_group: None; start: None; count: None)
>       63
> $ carbon get flow-configs -f prod-lva1 -s war-tl-iceberg | jq -c . | wc -l
> Searching [fabric: prod-lva1] for flow matching - (flow_group: None; 
> flow_name: None; template_uri: None; proxy_user: None; source_identifier: 
> war-tl-iceberg; destination_identifier: None; cron_schedule: None; 
> run_immediately: None; owning_group: None; start: None; count: None)
> No flows found
>        0
> {code}
> so (at least some) results for sourceIdentifier of `war-oh-iceberg` do show, 
> but none do for `war-tl-iceberg`.  that's incorrect, because when I instead 
> search by user, there are at least two `war-tl-iceberg` flows in `prod-lva1`:
> {code:java}
> $ carbon get flow-configs -f prod-lva1 -u lyndarel | jq -c '{flowGroup: 
> .id.flowGroup, flowName: .id.flowName, user: .properties."user.to.proxy", 
> between: (.properties."gobblin.flow.sourceIdentifier" + " => " + 
> .properties."gobblin.flow.destinationIdentifier")}' | grep tl-iceberg
> Searching [fabric: prod-lva1] for flow matching - (flow_group: None; 
> flow_name: None; template_uri: None; proxy_user: lyndarel; source_identifier: 
> None; destination_identifier: None; cron_schedule: None; run_immediately: 
> None; owning_group: None; start: None; count: None)
> {"flowGroup":"iceberg_based_openhouse_replication_u_lyndarel","flowName":"copy_to_holdem_replication_course_features","user":"lyndarel","between":"war-tl-iceberg
>  => holdem-tl-iceberg"}
> {"flowGroup":"iceberg_based_openhouse_replication_u_lyndarel","flowName":"copy_to_holdem_replication_member_skill_gap","user":"lyndarel","between":"war-tl-iceberg
>  => holdem-tl-iceberg"} {code}
> when the user and sourceId constraint are combined, those two no longer show 
> up:
> {code:java}
> $ carbon get flow-configs -f prod-lva1 -u lyndarel -s war-tl-iceberg | jq -c 
> '{flowGroup: .id.flowGroup, flowName: .id.flowName, user: 
> .properties."user.to.proxy", between: 
> (.properties."gobblin.flow.sourceIdentifier" + " => " + 
> .properties."gobblin.flow.destinationIdentifier")}'
> Searching [fabric: prod-lva1] for flow matching - (flow_group: None; 
> flow_name: None; template_uri: None; proxy_user: lyndarel; source_identifier: 
> war-tl-iceberg; destination_identifier: None; cron_schedule: None; 
> run_immediately: None; owning_group: None; start: None; count: None)
> No flows found {code}
> the reason I suspect flow update as a possible RC is that I had modified 
> these two flows to use that sourceId, when they were originally created with 
> another one.  e.g. something like:
> {code:java}
> $ carbon update flow -f prod-lva1 -fg 
> iceberg_based_openhouse_replication_u_lyndarel -fn 
> copy_to_holdem_replication_member_skill_gap 
> properties.gobblin.flow.sourceIdentifier=war-tl-iceberg,properties.gobblin.flow.destinationIdentifier=holdem-tl-iceberg{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (GOBBLIN-2092) `carbon get flow-configs` search facets not consistently working

Reply via email to