[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

kumarvishal09 Wed, 09 May 2018 10:33:03 -0700

Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r187116998
  
    --- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
    @@ -151,6 +154,33 @@ public CarbonTable 
getOrCreateCarbonTable(Configuration configuration) throws IO
         SegmentStatusManager segmentStatusManager = new 
SegmentStatusManager(identifier);
         SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = 
segmentStatusManager
             .getValidAndInvalidSegments(loadMetadataDetails, 
this.readCommittedScope);
    +
    +    // For NonTransactional table, compare the schema of all index files 
with inferred schema.
    +    // If there is a mismatch throw exception. As all files must be of 
same schema.
    +    if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +      SchemaConverter schemaConverter = new 
ThriftWrapperSchemaConverterImpl();
    +      for (Segment segment : segments.getValidSegments()) {
    +        Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +        for (Map.Entry<String, String> indexFileEntry : 
indexFiles.entrySet()) {
    +          Path indexFile = new Path(indexFileEntry.getKey());
    +          org.apache.carbondata.format.TableInfo tableInfo = 
CarbonUtil.inferSchemaFromIndexFile(
    +              indexFile.toString(), carbonTable.getTableName());
    +          TableInfo wrapperTableInfo = 
schemaConverter.fromExternalToWrapperTableInfo(
    +              tableInfo, identifier.getDatabaseName(),
    +              identifier.getTableName(),
    +              identifier.getTablePath());
    +          List<ColumnSchema> indexFileColumnList =
    +              wrapperTableInfo.getFactTable().getListOfColumns();
    +          List<ColumnSchema> tableColumnList =
    +              carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +          if (!compareColumnSchemaList(indexFileColumnList, 
tableColumnList)) {
    +            throw new IOException("All the files schema doesn't match. "
    --- End diff --
    
    @kunal642 @sounakr I agree with @gvramana,  skipping data file is not 
correct as it will miss some records which will not be acceptable. Blocking 
user while writing is not possible. I think throwing exception is correct. 
    @ajantha-bhat Can u please check how Parquet works in similar scenario.

---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Reply via email to