homatthew commented on code in PR #3542:
URL: https://github.com/apache/gobblin/pull/3542#discussion_r954121995
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/CopySource.java:
##########
@@ -214,6 +214,7 @@ public List<WorkUnit> getWorkunits(final SourceState state)
{
failJobIfAllRequestsRejected(allocator, prioritizedFileSets);
String filesetWuGeneratorAlias =
state.getProp(ConfigurationKeys.COPY_SOURCE_FILESET_WU_GENERATOR_CLASS,
FileSetWorkUnitGenerator.class.getName());
+ boolean isWUFastFailOverEnabled =
state.getPropAsBoolean(ConfigurationKeys.WORK_UNIT_FAST_FAIL_ENABLED, true);
Review Comment:
Nit: Maybe store the default as a static variable? e.g.
`DEFAULT_WORK_UNIT_FAST_FAIL_ENABLED`
##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/CopySourceTest.java:
##########
@@ -339,4 +344,39 @@ public void testDefaultHiveDatasetShardTempPaths()
Assert.assertEquals(datasetPaths.contains(tempDirRoot +
"/targetPath/testDB/table" + i), true);
}
}
+
+ @Test (expectedExceptions = RuntimeException.class)
+ public void testGetWorkUnitsExecutionFastFailure() {
+
+ SourceState state = new SourceState();
+
+ state.setProp(ConfigurationKeys.SOURCE_FILEBASED_FS_URI, "file:///");
+ state.setProp(ConfigurationKeys.WRITER_FILE_SYSTEM_URI, "file:///");
+ state.setProp(ConfigurationKeys.DATA_PUBLISHER_FINAL_DIR, "/target/dir");
+ state.setProp(DatasetUtils.DATASET_PROFILE_CLASS_KEY,
+ TestCopyablePartitionableDatasedFinder.class.getCanonicalName());
+ state.setProp(ConfigurationKeys.COPY_SOURCE_FILESET_WU_GENERATOR_CLASS,
MockedFileSetWorkUnitGenerator.class.getName());
+ state.setProp(ConfigurationKeys.WORK_UNIT_FAST_FAIL_ENABLED, true);
+
+ CopySource source = new CopySource();
+
+ List<WorkUnit> workunits = source.getWorkunits(state);
Review Comment:
Question: Which of the settings in the state is causing the source to throw
an exception? Maybe a comment to indicate this would be helpful.
##########
gobblin-api/src/main/java/org/apache/gobblin/configuration/ConfigurationKeys.java:
##########
@@ -296,6 +296,7 @@ public class ConfigurationKeys {
public static final String WORK_UNIT_STATE_ACTUAL_HIGH_WATER_MARK_KEY =
"workunit.state.actual.high.water.mark";
public static final String WORK_UNIT_DATE_PARTITION_KEY =
"workunit.source.date.partition";
public static final String WORK_UNIT_DATE_PARTITION_NAME =
"workunit.source.date.partitionName";
+ public static final String WORK_UNIT_FAST_FAIL_ENABLED =
"workunit.fast.fail.enabled";
Review Comment:
Are there flows that allow failures? i.e. we only want to throw an exception
if all workunits fail to generate?
If so, maybe this boolean can be named something like `no_partial_failures`
or just `allow_partial_failures` and flip the boolean accordingly.
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/CopySource.java:
##########
@@ -214,6 +214,7 @@ public List<WorkUnit> getWorkunits(final SourceState state)
{
failJobIfAllRequestsRejected(allocator, prioritizedFileSets);
String filesetWuGeneratorAlias =
state.getProp(ConfigurationKeys.COPY_SOURCE_FILESET_WU_GENERATOR_CLASS,
FileSetWorkUnitGenerator.class.getName());
+ boolean isWUFastFailOverEnabled =
state.getPropAsBoolean(ConfigurationKeys.WORK_UNIT_FAST_FAIL_ENABLED, true);
Review Comment:
And do we want the default to be true? The description mentions possibility
of breaking user flows, so we should be careful about if this feature is opt-in
or opt-out.
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/CopySource.java:
##########
@@ -214,6 +214,7 @@ public List<WorkUnit> getWorkunits(final SourceState state)
{
failJobIfAllRequestsRejected(allocator, prioritizedFileSets);
String filesetWuGeneratorAlias =
state.getProp(ConfigurationKeys.COPY_SOURCE_FILESET_WU_GENERATOR_CLASS,
FileSetWorkUnitGenerator.class.getName());
+ boolean isWUFastFailOverEnabled =
state.getPropAsBoolean(ConfigurationKeys.WORK_UNIT_FAST_FAIL_ENABLED, true);
Review Comment:
Also, the term "failover" implies there is a graceful backup plan right?
Maybe you meant just fast fail.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]