vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate 
checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407274513
 
 

 ##########
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws 
IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, 
jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, 
TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, 
jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, 
HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, 
properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, 
Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, 
HiveConf hiveConf) throws IOException {
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, 
Configuration hiveConf,
+                             TypedProperties properties) throws IOException {
+    if (cfg.initialCheckpointProvider != null && cfg.bootstrapFromPath != null 
&& cfg.checkpoint == null) {
+      InitialCheckPointProvider checkPointProvider =
+          
UtilHelpers.createInitialCheckpointProvider(cfg.initialCheckpointProvider, new 
Path(cfg.bootstrapFromPath), fs);
+      cfg.checkpoint = checkPointProvider.getCheckpoint();
 
 Review comment:
   yes.. do you think if we made it such that even if someone runs delta 
streamer few times after initial bootstrap, the initial checkpoint provider is 
used just once?  otherwise, you need to scramble to stop the delta streamer 
after the first run or manually run it by hand once before scheduling it using 
airflow or deploying in --continuous mode?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to