[ 
https://issues.apache.org/jira/browse/HUDI-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173348#comment-17173348
 ] 

Balaji Varadarajan edited comment on HUDI-1146 at 8/7/20, 5:25 PM:
-------------------------------------------------------------------

[~bdscheller]:

I think InputBatch::getSchemaProvider will be called irrespective of whether 
input batch is empty or not. I am suspecting this to be similar to HUDI-1091 
where an empty input batch is triggering this case.

Can you try this change ?

 
{code:java}
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
@@ -321,7 +321,7 @@ public class DeltaSync implements Serializable {
                 .map(r -> (SchemaProvider) new DelegatingSchemaProvider(props, 
jssc,
                     dataAndCheckpoint.getSchemaProvider(),
                     new RowBasedSchemaProvider(r.schema())))
-                .orElse(dataAndCheckpoint.getSchemaProvider());
+                .orElseGet((dataAndCheckpoint::getSchemaProvider));
         avroRDDOptional = transformed
             .map(t -> AvroConversionUtils.createRdd(
                 t, HOODIE_RECORD_STRUCT_NAME, 
HOODIE_RECORD_NAMESPACE).toJavaRDD()); {code}


was (Author: vbalaji):
[~bdscheller]:

I think InputBatch::getSchemaProvider will be called irrespective of whether 
input batch is empty or not. I am suspecting this to be similar to HUDI-1091 
where an empty input batch is triggering this case.

Can you try this change ?

 
{code:java}
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 +++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 @@ -321,7 +321,7 @@ public class DeltaSync implements Serializable { .map(r -> 
(SchemaProvider) new DelegatingSchemaProvider(props, jssc, 
dataAndCheckpoint.getSchemaProvider(), new RowBasedSchemaProvider(r.schema()))) 
- .orElse(dataAndCheckpoint.getSchemaProvider()); + 
.orElseGet((dataAndCheckpoint::getSchemaProvider)); avroRDDOptional = 
transformed .map(t -> AvroConversionUtils.createRdd( t, 
HOODIE_RECORD_STRUCT_NAME, HOODIE_RECORD_NAMESPACE).toJavaRDD());
{code}
 

> DeltaStreamer fails to start when No updated records + schemaProvider not 
> supplied
> ----------------------------------------------------------------------------------
>
>                 Key: HUDI-1146
>                 URL: https://issues.apache.org/jira/browse/HUDI-1146
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Hive Integration
>            Reporter: Brandon Scheller
>            Priority: Major
>
> DeltaStreamer issue — happens with both COW or MOR - Restarting the 
> DeltaStreamer Process crashes, that is, 2nd Run does nothing.
> Steps:
>  Run Hudi DeltaStreamer job in --continuous mode
>  Run the same job again without deleting the output parquet files generated 
> due to step above
>  2nd run crashes with the below error ( it does not crash if we delete the 
> output parquet file)
> {{Caused by: org.apache.hudi.exception.HoodieException: Please provide a 
> valid schema provider class!}}
> {{ at 
> org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:53)}}
> {{ at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:312)}}
> {{ at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)}}
> {{ at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:392)}}
>  
> {{This looks to be because of this line:}}
> {{[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L315]
>  }}
> The "orElse" block here doesn't seem to make sense as if "transformed" is 
> empty then it is likely "dataAndCheckpoint" will have a null schema provider



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to