[
https://issues.apache.org/jira/browse/GOBBLIN-1602?focusedWorklogId=721721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721721
]
ASF GitHub Bot logged work on GOBBLIN-1602:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 07/Feb/22 01:18
Start Date: 07/Feb/22 01:18
Worklog Time Spent: 10m
Work Description: Will-Lo commented on a change in pull request #3459:
URL: https://github.com/apache/gobblin/pull/3459#discussion_r800264076
##########
File path:
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/hive/UnpartitionedTableFileSetTest.java
##########
@@ -44,11 +48,47 @@ public void testHiveTableLocationNotMatchException() throws
Exception {
Mockito.when(helper.getDataset()).thenReturn(hiveDataset);
Mockito.when(helper.getExistingTargetTable()).thenReturn(Optional.of(existingTargetTable));
Mockito.when(helper.getTargetTable()).thenReturn(table);
+ // Mock filesystem resolver
+ FileSystem mockFS = Mockito.mock(FileSystem.class);
+ Mockito.when(helper.getTargetFs()).thenReturn(mockFS);
+
Mockito.when(mockFS.resolvePath(Mockito.any())).then(returnsFirstArg());
+
+
Mockito.when(helper.getExistingEntityPolicy()).thenReturn(HiveCopyEntityHelper.ExistingEntityPolicy.ABORT);
+ MetricContext metricContext =
MetricContext.builder("testUnpartitionedTableFileSet").build();
+ EventSubmitter eventSubmitter = new
EventSubmitter.Builder(metricContext,"loc.nomatch.exp").build();
+ Mockito.when(helper.getEventSubmitter()).thenReturn(eventSubmitter);
+ UnpartitionedTableFileSet upts = new
UnpartitionedTableFileSet("testLocationMatch",hiveDataset,helper);
+ List<CopyEntity> copyEntities =
(List<CopyEntity>)upts.generateCopyEntities();
+ }
+
+ @Test
+ public void testHiveTableLocationMatchDifferentPathsResolved() throws
Exception {
+ Path testPath = new Path("/testPath/db/table");
+ Path existingTablePath = new Path("/existing/testPath/db/table");
Review comment:
I think if the existing is logical and the user specified is physical,
then the job should be marked as FAILED and thus never pass is what ended up
being decided. I can add a test case against that but adding test cases is a
bit moot since I'm already mocking the FileSystem class response when trying to
resolve the paths together.
Snapshot tables tend to be overwritten much more often. E.g. When a user has
a complete snapshot of some database collected on 01/03/2022 and is registered
under a date path, then they ingest a new snapshot of their data on 01/06/2022.
The most common behavior is to drop their old table and update hive to point to
the newly ingested table.
For partitioned tables, tables should not be registered under a date
partition, so the occurrence of table locations changing is much lower. Also if
the table location is changed, then the user loses all their past partitions of
data which is almost always unwanted. So in this scenario they probably should
not be registering under the same table as before.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 721721)
Time Spent: 1.5h (was: 1h 20m)
> Handle hive table mismatch when paths are equivalent in the underlying FS
> -------------------------------------------------------------------------
>
> Key: GOBBLIN-1602
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1602
> Project: Apache Gobblin
> Issue Type: Task
> Components: gobblin-core
> Reporter: William Lo
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> In scenarios where the paths are equivalent in the underlying FS, hive copy
> should not treat these paths separately if the user provided URI does not
> match the hive registered URI
--
This message was sent by Atlassian Jira
(v8.20.1#820001)